WO2023184088A1 - 图像处理方法、装置、设备、系统、及存储介质 - Google Patents

图像处理方法、装置、设备、系统、及存储介质 Download PDF

Info

Publication number
WO2023184088A1
WO2023184088A1 PCT/CN2022/083382 CN2022083382W WO2023184088A1 WO 2023184088 A1 WO2023184088 A1 WO 2023184088A1 CN 2022083382 W CN2022083382 W CN 2022083382W WO 2023184088 A1 WO2023184088 A1 WO 2023184088A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
feature information
reconstructed image
feature
information
Prior art date
Application number
PCT/CN2022/083382
Other languages
English (en)
French (fr)
Inventor
元辉
刘瑶
初彦翰
李明
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2022/083382 priority Critical patent/WO2023184088A1/zh
Publication of WO2023184088A1 publication Critical patent/WO2023184088A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation

Definitions

  • the present application relates to the field of video coding and decoding technology, and in particular, to an image processing method, device, equipment, system, and storage medium.
  • Digital video technology can be incorporated into a variety of video devices, such as digital televisions, smartphones, computers, e-readers, or video players.
  • video data includes a larger amount of data.
  • video devices implement video compression technology to make the video data more efficiently transmitted or stored.
  • Video compression will cause video distortion.
  • the reconstructed image needs to be processed.
  • the current image processing method does not have satisfactory processing results.
  • Embodiments of the present application provide an image processing method, device, equipment, system, and storage medium. By inputting reconstructed image blocks and quantified parameters into an enhancement model for image enhancement, enhanced image blocks are obtained and the quality enhancement of the reconstructed image blocks is improved. Effect.
  • embodiments of the present application provide an image processing method, including:
  • quality enhancement is performed on the reconstructed image block to obtain an enhanced image block.
  • embodiments of the present application provide an image processing method, including:
  • quality enhancement is performed on the reconstructed image block to obtain an enhanced image block.
  • the present application provides an image processing device for performing the method in the above first aspect or its respective implementations.
  • the encoder includes a functional unit for performing the method in the above-mentioned first aspect or its respective implementations.
  • the present application provides an image processing device for performing the method in the above second aspect or its respective implementations.
  • the decoder includes a functional unit for performing the method in the above-mentioned second aspect or its respective implementations.
  • a video encoder including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the above first aspect or its respective implementations.
  • a sixth aspect provides a video decoder, including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program stored in the memory to execute the method in the above second aspect or its respective implementations.
  • a seventh aspect provides a video encoding and decoding system, including a video encoder and a video decoder.
  • the video encoder is used to perform the method in the above-mentioned first aspect or its various implementations
  • the video decoder is used to perform the method in the above-mentioned second aspect or its various implementations.
  • An eighth aspect provides a chip for implementing any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
  • the chip includes: a processor, configured to call and run a computer program from a memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or implementations thereof. method.
  • a ninth aspect provides a computer-readable storage medium for storing a computer program that causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.
  • a computer program product including computer program instructions, which enable a computer to execute any one of the above-mentioned first to second aspects or the methods in each implementation thereof.
  • An eleventh aspect provides a computer program that, when run on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.
  • a twelfth aspect provides a code stream, which is generated by any one of the above-mentioned first aspects or implementations thereof.
  • the reconstructed image block is image enhanced based on the quantized parameters to obtain the enhanced image block. Since the quantization coefficients corresponding to different image blocks may be different, in order to improve the enhancement accuracy of the image block, this application performs quality enhancement on the reconstructed image block of the current image block based on the quantization coefficient, which can improve the enhancement effect. In addition, this application performs image quality enhancement in units of image blocks, so that when using the enhanced image blocks as reference blocks for other image blocks in intra-frame prediction, a more accurate reference can be provided, thereby improving the accuracy of intra-frame prediction. .
  • Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application
  • Figure 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application
  • Figure 3 is a schematic block diagram of a video decoder involved in an embodiment of the present application.
  • Figure 4 is a schematic flow chart of an image processing method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the image processing involved in this application.
  • Figure 6 is a schematic diagram of the enhanced model involved in this application.
  • FIG. 7 is a schematic diagram of the enhanced model involved in this application.
  • Figure 8 is a schematic structural diagram of the first feature extraction layer involved in this application.
  • Figure 9 is another structural schematic diagram of the first feature extraction layer involved in this application.
  • Figure 10 is another structural schematic diagram of the first feature extraction layer involved in this application.
  • Figure 11 is another structural schematic diagram of the first feature extraction layer involved in this application.
  • FIG. 12 is a schematic structural diagram of the weighted processing layer involved in this application.
  • Figure 13 is a schematic structural diagram of the enhanced model involved in this application.
  • Figure 14 is another structural schematic diagram of the enhanced model involved in this application.
  • Figure 15 is another structural schematic diagram of the enhanced model involved in this application.
  • Figure 16 is another structural schematic diagram of the enhanced model involved in this application.
  • Figure 17 is another structural schematic diagram of the enhanced model involved in this application.
  • Figure 18 is another structural schematic diagram of the enhanced model involved in this application.
  • Figure 19 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • Figure 20 is a schematic flow chart of an image processing method provided by an embodiment of the present application.
  • Figure 21 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • Figure 22 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • Figure 23 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • Figure 24 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • Figure 25 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application.
  • This application can be applied to the fields of image encoding and decoding, video encoding and decoding, hardware video encoding and decoding, dedicated circuit video encoding and decoding, real-time video encoding and decoding, etc.
  • the solution of this application can be combined with the audio and video coding standard (AVS for short), such as H.264/audio video coding (AVC for short) standard, H.265/high-efficiency video coding (AVS for short) high efficiency video coding (HEVC) standard and H.266/versatile video coding (VVC) standard.
  • AVC audio video coding
  • HEVC high efficiency video coding
  • VVC variatile video coding
  • the solution of this application can be operated in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
  • SVC scalable video codec
  • MVC multi-view video codec
  • FIG. 1 For ease of understanding, the video encoding and decoding system involved in the embodiment of the present application is first introduced with reference to FIG. 1 .
  • Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application. It should be noted that Figure 1 is only an example, and the video encoding and decoding system in the embodiment of the present application includes but is not limited to what is shown in Figure 1 .
  • the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120 .
  • the encoding device is used to encode the video data (which can be understood as compression) to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 in the embodiment of the present application can be understood as a device with a video encoding function
  • the decoding device 120 can be understood as a device with a video decoding function. That is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120. Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
  • the encoding device 110 may transmit the encoded video data (eg, code stream) to the decoding device 120 via the channel 130 .
  • Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
  • channel 130 includes one or more communication media that enables encoding device 110 to transmit encoded video data directly to decoding device 120 in real time.
  • encoding device 110 may modulate the encoded video data according to the communication standard and transmit the modulated video data to decoding device 120.
  • the communication media includes wireless communication media, such as radio frequency spectrum.
  • the communication media may also include wired communication media, such as one or more physical transmission lines.
  • channel 130 includes a storage medium that can store video data encoded by encoding device 110 .
  • Storage media include a variety of local access data storage media, such as optical disks, DVDs, flash memories, etc.
  • the decoding device 120 may obtain the encoded video data from the storage medium.
  • channel 130 may include a storage server that may store video data encoded by encoding device 110 .
  • the decoding device 120 may download the stored encoded video data from the storage server.
  • the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a File Transfer Protocol (FTP) server, etc.
  • FTP File Transfer Protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113.
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • Video source 111 may include at least one of a video capture device (eg, a video camera), a video archive, a video input interface for receiving video data from a video content provider, a computer graphics system Used to generate video data.
  • a video capture device eg, a video camera
  • a video archive e.g., a video archive
  • video input interface for receiving video data from a video content provider
  • computer graphics system Used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
  • Video data may include one or more images (pictures) or sequence of pictures (sequence of pictures).
  • the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
  • Encoded information may include encoded image data and associated data.
  • the associated data may include sequence parameter set (SPS), picture parameter set (PPS) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • An SPS can contain parameters that apply to one or more sequences.
  • a PPS can contain parameters that apply to one or more images.
  • a syntax structure refers to a collection of zero or more syntax elements arranged in a specified order in a code stream.
  • the video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113 .
  • the encoded video data can also be stored on a storage medium or storage server for subsequent reading by the decoding device 120.
  • decoding device 120 includes input interface 121 and video decoder 122.
  • the decoding device 120 may also include a display device 123.
  • the input interface 121 includes a receiver and/or a modem. Input interface 121 may receive encoded video data over channel 130.
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
  • the display device 123 displays the decoded video data.
  • Display device 123 may be integrated with decoding device 120 or external to decoding device 120 .
  • Display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • plasma display a plasma display
  • OLED organic light emitting diode
  • Figure 1 is only an example, and the technical solution of the embodiment of the present application is not limited to Figure 1.
  • the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.
  • FIG. 2 is a schematic block diagram of a video encoder related to an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression of images (lossy compression), or can also be used to perform lossless compression (lossless compression) of images.
  • the lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
  • the video encoder 200 can be applied to image data in a luminance-chrominance (YCbCr, YUV) format.
  • YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb(U) represents blue chroma, Cr(V) represents red chroma, U and V represent Chroma, which is used to describe color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr)
  • 4:2:2 means that every 4 pixels have 4 luminance components and 4 Chroma component (YYYYCbCrCbCr)
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and for each frame of image in the video data, divides one frame of image into several coding tree units (coding tree units, CTU).
  • CTB may be called “Tree block", “Largest Coding unit” (LCU for short) or “coding tree block” (CTB for short).
  • LCU Large Coding unit
  • CTB coding tree block
  • Each CTU can be associated with an equal-sized block of pixels within the image.
  • Each pixel can correspond to one luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU can be associated with one block of luma samples and two blocks of chroma samples.
  • a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
  • a CTU can be further divided into several coding units (Coding Units, CUs) for encoding.
  • CUs can be rectangular blocks or square blocks.
  • CU can be further divided into prediction unit (PU for short) and transform unit (TU for short), thus enabling coding, prediction, and transformation to be separated and processing more flexible.
  • the CTU is divided into CUs in a quad-tree manner, and the CU is divided into TUs and PUs in a quad-tree manner.
  • Video encoders and video decoders can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, the video encoder and video decoder can support a PU size of 2N ⁇ 2N or N ⁇ N for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or similar sized symmetric PU for inter prediction. The video encoder and video decoder can also support 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N and nR ⁇ 2N asymmetric PUs for inter prediction.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit. 260. Decode the image cache 270 and the entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less, or different functional components.
  • the current block may be called the current coding unit (CU) or the current prediction unit (PU), etc.
  • the prediction block may also be called a predicted image block or an image prediction block
  • the reconstructed image block may also be called a reconstruction block or an image reconstructed image block.
  • prediction unit 210 includes inter prediction unit 211 and intra estimation unit 212. Since there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • Inter-frame prediction can include motion estimation (motion estimation) and motion compensation (motion compensation). It can refer to image information of different frames.
  • Inter-frame prediction uses motion information to find a reference from a reference frame. block, a prediction block is generated based on the reference block to eliminate temporal redundancy; the frames used in inter-frame prediction can be P frames and/or B frames, P frames refer to forward prediction frames, and B frames refer to bidirectional predictions frame.
  • Inter-frame prediction uses motion information to find reference blocks from reference frames and generate prediction blocks based on the reference blocks.
  • the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
  • the motion vector can be in whole pixels or sub-pixels.
  • the reference frame found according to the motion vector is A block of whole pixels or sub-pixels is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will process the reference block to generate a prediction block. Reprocessing to generate a prediction block based on a reference block can also be understood as using the reference block as a prediction block and then processing to generate a new prediction block based on the prediction block.
  • the intra-frame estimation unit 212 only refers to the information of the same frame image and predicts the pixel information in the current coded image block to eliminate spatial redundancy.
  • the frames used in intra prediction may be I frames.
  • Intra-frame prediction has multiple prediction modes. Taking the international digital video coding standard H series as an example, the H.264/AVC standard has 8 angle prediction modes and 1 non-angle prediction mode, and H.265/HEVC has been extended to 33 angles. prediction mode and 2 non-angle prediction modes.
  • the intra-frame prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, for a total of 35 prediction modes.
  • the intra-frame modes used by VVC include Planar, DC and 65 angle modes, for a total of 67 prediction modes.
  • Residual unit 220 may generate a residual block of the CU based on the pixel block of the CU and the prediction block of the PU. For example, residual unit 220 may generate a residual block of a CU such that each sample in the residual block has a value equal to the difference between the sample in the pixel block of the CU and the PU of the CU. Predict the corresponding sample in the block.
  • Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with the TU of the CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with the CU by adjusting the QP value associated with the CU.
  • QP quantization parameter
  • Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct the residual block from the quantized transform coefficients.
  • Reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by prediction unit 210 to produce a reconstructed image block associated with the TU. By reconstructing blocks of samples for each TU of a CU in this manner, video encoder 200 can reconstruct blocks of pixels of the CU.
  • the loop filtering unit 260 is used to process the inversely transformed and inversely quantized pixels to compensate for distortion information and provide a better reference for subsequent encoding of pixels. For example, a deblocking filtering operation can be performed to reduce the number of pixel blocks associated with the CU. block effect.
  • the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, where the deblocking filtering unit is used to remove blocking effects, and the SAO/ALF unit Used to remove ringing effects.
  • SAO/ALF sample adaptive compensation/adaptive loop filtering
  • Decoded image buffer 270 may store reconstructed pixel blocks.
  • Inter prediction unit 211 may perform inter prediction on PUs of other images using reference images containing reconstructed pixel blocks.
  • intra estimation unit 212 may use the reconstructed pixel blocks in decoded image cache 270 to perform intra prediction on other PUs in the same image as the CU.
  • Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
  • Figure 3 is a schematic block diagram of a video decoder related to an embodiment of the present application.
  • the video decoder 300 includes an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filtering unit 350 and a decoded image cache 360 . It should be noted that the video decoder 300 may include more, less, or different functional components.
  • Video decoder 300 can receive the code stream.
  • Entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the code stream, the entropy decoding unit 310 may parse entropy-encoded syntax elements in the code stream.
  • the prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340 and the loop filtering unit 350 may decode the video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
  • prediction unit 320 includes inter prediction unit 321 and intra estimation unit 322.
  • Intra estimation unit 322 may perform intra prediction to generate predicted blocks for the PU. Intra estimation unit 322 may use an intra prediction mode to generate predicted blocks for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 322 may also determine the intra prediction mode of the PU based on one or more syntax elements parsed from the codestream.
  • the inter prediction unit 321 may construct a first reference image list (List 0) and a second reference image list (List 1) according to syntax elements parsed from the code stream. Additionally, if the PU uses inter-prediction encoding, entropy decoding unit 310 may parse the motion information of the PU. Inter prediction unit 321 may determine one or more reference blocks for the PU based on the motion information of the PU. Inter prediction unit 321 may generate a predictive block for the PU based on one or more reference blocks of the PU.
  • Inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.
  • inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients to produce a residual block associated with the TU.
  • Reconstruction unit 340 uses the residual blocks associated with the TU of the CU and the prediction blocks of the PU of the CU to reconstruct the pixel blocks of the CU. For example, reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.
  • Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts for blocks of pixels associated with the CU.
  • Video decoder 300 may store the reconstructed image of the CU in decoded image cache 360 .
  • the video decoder 300 may use the reconstructed image in the decoded image cache 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.
  • the basic process of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into CUs.
  • the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block.
  • the residual unit 220 may calculate a residual block based on the prediction block and the original block of the current block, that is, the difference between the prediction block and the original block of the current block.
  • the residual block may also be called residual information.
  • the residual block undergoes transformation and quantization processes such as transformation/quantization unit 230 to remove information that is insensitive to human eyes to eliminate visual redundancy.
  • the residual block before transformation and quantization by the transformation/quantization unit 230 may be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 may be called a frequency residual block. or frequency domain residual block.
  • the entropy encoding unit 280 receives the quantized change coefficient output from the change quantization unit 230, and may perform entropy encoding on the quantized change coefficient to output a code stream. For example, the entropy encoding unit 280 may eliminate character redundancy according to the target context model and probability information of the binary code stream.
  • the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block.
  • the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
  • the inverse quantization/transform unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstruction block.
  • the reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the blocks to obtain a decoded image.
  • the encoding end also needs similar operations as the decoding end to obtain the decoded image.
  • the decoded image may also be called a reconstructed image, and the reconstructed image may be used as a reference frame for inter-frame prediction for
  • the block division information determined by the encoding end as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the code stream when necessary.
  • the decoding end determines the same block division information as the encoding end by parsing the code stream and analyzing the existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image obtained by the encoding end It is the same as the decoded image obtained by the decoding end.
  • the current block can be the current coding unit (CU) or the current prediction unit (PU), etc.
  • FIG. 4 is a schematic flowchart of a video decoding method provided by an embodiment of the present application. The embodiment of the present application is applied to the video decoders shown in FIGS. 1 and 3 .
  • the method in the embodiment of this application includes:
  • the embodiment of the present application does not limit the specific size of the current image block.
  • the current image block in the embodiment of this application is a CTU.
  • one frame of image is divided into several CTUs, and this application does not limit the size of the CTU.
  • the size of a CTU is 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
  • the current image block in the embodiment of the present application is a CU, for example, one CTU is divided into one or more CUs.
  • the current image block in the embodiment of the present application is a TU or PU, for example, a CU is divided into one or more TUs or PUs.
  • the current image block in the embodiment of the present application only includes chrominance components, which can be understood as chrominance blocks.
  • the current image block in the embodiment of the present application only includes the brightness component, which can be understood as a brightness block.
  • the current image block includes both luma and chrominance components.
  • the quantization coefficients of the current image block include quantization coefficients corresponding to the multiple CUs.
  • the entropy decoding unit 310 can decode the code stream to obtain prediction information, quantization information, etc. of the current image block.
  • the prediction unit 320 uses intra prediction or inter prediction for the current image block based on the prediction information to generate the current image block.
  • the inverse quantization/transform unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstructed image block of the current image block.
  • Video distortion will be caused during the video encoding process.
  • the embodiment of the present application performs post-processing on the reconstructed image blocks, that is, using the image blocks as enhancement units to enhance the image quality, so as to improve the quality of the reconstructed image blocks.
  • each CU can be decoded separately to obtain the reconstruction block of each CU, and the reconstruction blocks of each CU are combined to obtain the reconstructed image block of the current image block. .
  • S402. Determine the quantization parameter corresponding to the current image block, and perform inverse quantization on the quantization coefficient based on the quantization parameter to obtain the transformation coefficient of the current image block.
  • the quantization parameters of the current image block include quantization parameters corresponding to the multiple CUs.
  • the quantization parameters corresponding to these multiple CUs may be the same or different.
  • the quantization parameter of the current image block in the embodiment of the present application can be expressed in the form of a matrix.
  • the size of the current image block is 16X16
  • the quantization parameter of the current image block is a 16X16 matrix.
  • Each element in the matrix is the quantized parameter of the pixel at the corresponding position in the current image block.
  • the embodiments of this application do not limit the specific process of determining the quantization parameters corresponding to the current image block.
  • the codec uses the default quantization parameter as the quantization parameter corresponding to the current image block.
  • the decoder can directly determine the default quantization coefficient as the quantization parameter corresponding to the current image block.
  • the encoding end writes the quantization parameters corresponding to the current image block determined during the encoding process into the code stream. In this way, the decoding end can determine the quantization parameters corresponding to the current image block by decoding the code stream.
  • the decoder can use the same calculation method as the encoder to determine the quantization parameter corresponding to the current image block through calculation.
  • the decoder determines the quantization parameter corresponding to the current image block, it uses the quantization parameter to inversely quantize the quantization coefficient of the current image block to obtain the transformation coefficient of the current image block.
  • the quantization parameter corresponding to the CU uses the quantization parameter corresponding to the CU to inversely quantize the quantization coefficient of the CU to obtain the transform coefficient of the CU.
  • the transformation coefficient of the current image block is inversely transformed to obtain the residual block of the current image block.
  • intra-frame or inter-frame prediction methods are used to obtain the prediction block of the current image block, and the prediction value of the current image block and the residual block are added to obtain the reconstructed image block of the current image block.
  • the quantization parameters corresponding to the current image block are the quantization parameters of these multiple CUs.
  • each CU in the current image block is decoded separately. Obtain the reconstruction block of each CU, combine the reconstruction blocks of each CU, and obtain the reconstructed image block of the current image block.
  • the quantization parameters QP corresponding to different image blocks may be different.
  • the quantization parameter QP includes a quantization step size.
  • the transform coefficients of the image blocks are quantized. The larger the quantization step size is, the greater the image loss is. The smaller the quantization step size is, the greater the image loss is. Small. Therefore, in order to improve the enhancement effect of the current image block, the embodiment of the present application considers the impact of the quantization parameter QP corresponding to the current image block on the quality enhancement during the quality enhancement process of the current image block, thereby improving the quality of the current image block. Enhancement.
  • embodiments of the present application perform quality enhancement on the reconstructed image block of the current image block based on the quantization coefficient corresponding to the current image block.
  • the enhancement effect of reconstructed image patches can be improved.
  • image quality is enhanced in units of image blocks.
  • the enhanced image blocks as reference blocks for other image blocks in intra-frame prediction, a more accurate reference can be provided, thereby improving the accuracy of intra-frame prediction. sex.
  • the embodiment of the present application performs image quality enhancement on an image block basis. Compared with image quality enhancement on the entire frame image, more attention can be paid to the enhancement of finer features in the image block, thereby further improving the enhancement quality of the image block.
  • different enhancement models are trained in advance for different quantization parameters, and the enhancement models under different quantization parameters are used to enhance the quality of image blocks under the quantization parameters.
  • the decoder can select the target enhancement model corresponding to the quantization parameters of the current image block from the enhancement models corresponding to multiple different quantization parameters based on the quantization parameters of the current image block, and use the target enhancement model to reconstruct the image of the current image block. Block quality enhancement.
  • the decoder obtains a universal enhancement model.
  • the enhancement model is trained based on different image blocks and their corresponding quantization parameters, and fully learns the effects of different quantization parameters on image block quality enhancement.
  • high-quality enhancement can be performed on the reconstructed image blocks obtained after inverse quantization of different quantization parameters based on different quantization parameters.
  • the decoder can enhance the quality of the reconstructed image block through the universal enhancement model based on the quantization parameters to obtain an enhanced image block.
  • the decoder determines the reconstructed image block and quantization coefficient of the current image block according to the above steps, in order to reduce the distortion of the reconstructed image block and improve the quality of the reconstructed image block, as shown in Figure 5, the reconstructed image block and The corresponding quantization coefficient is input into the pre-trained enhancement model to perform image enhancement, and finally the enhanced image block of the current image block is obtained.
  • the following embodiments take the general enhancement model as an example to introduce the quality enhancement of reconstructed image blocks.
  • the decoder fuses the reconstructed image blocks and quantization parameters and then inputs the enhancement model.
  • the fusion method of reconstructed image blocks and quantization parameters includes at least the following examples:
  • Example 1 assuming that the size of the above reconstructed image block is N1*N2, where N1 and N2 can be the same or different.
  • the augmentation model is input. Specifically, each pixel of the reconstructed image block is multiplied by the quantization parameter to obtain an N1*N2 matrix, and the matrix is input to the enhancement model.
  • Example 2 after splicing the reconstructed image and quantized parameters, input the enhanced model. Specifically, the quantization coefficient is set to a matrix of size N1*N2, the reconstructed image of size N1*N2 is spliced with the quantization parameter of size N1*N2, and then the enhancement model is input.
  • the decoder can also use other fusion methods to fuse the reconstructed image and the corresponding quantization parameters, and then input the enhancement model for quality enhancement.
  • the decoder in order to prevent features with smaller absolute values from being overwritten by features with larger absolute values, the decoder first normalizes the reconstructed image blocks and quantized parameters before inputting them into the enhancement model. Unified so that all features are treated equally. Then, based on the normalized reconstructed image block and quantization parameters, an enhanced image block of the reconstructed image block is obtained. For example, after splicing the normalized reconstructed image blocks and quantized parameters, the enhancement model is input for quality enhancement to improve the quality enhancement effect.
  • the reconstructed image block in this embodiment of the present application is the reconstructed image block of the current image block under the first component.
  • the first component may be a luminance component or a chrominance component.
  • the above S404 includes the following steps of S404-A and S404-B:
  • the characteristic information of the reconstructed image block is extracted based on the quantized parameters.
  • the quantized parameters and the reconstructed image block are input into a neural network layer, and the characteristic information of the reconstructed image block is extracted.
  • these Feature information analysis assigns different weights to different features, such as assigning larger weights to important features in the feature information to highlight the impact of the feature, and assigning smaller weights to relatively unimportant features to weaken the impact of the feature. , and then weight the feature information of the reconstructed image block according to the weight corresponding to the feature to obtain the first feature information of the reconstructed image block.
  • the enhanced model of the embodiment of the present application includes a first feature extraction module, which is used to extract input information (ie, reconstructed image blocks and quantization parameters) at least one feature, and assign different weights to at least one extracted feature to perform feature weighting processing.
  • input information ie, reconstructed image blocks and quantization parameters
  • the enhanced model in the embodiment of the present application includes a first feature extraction module, which is used to weight the extracted features.
  • the reconstructed image blocks and quantified parameters are input into the first feature extraction module.
  • the first feature extraction module performs feature extraction and extracts at least one Features, and allocate a larger weight to the important features among the at least one feature to highlight the impact of the feature, allocate a smaller weight to the relatively unimportant features to weaken the impact of the feature, and then according to the weight corresponding to at least one feature Perform weighting processing to obtain the first feature information of the reconstructed image block.
  • an enhanced image block of the reconstructed image block is determined based on the first characteristic information of the reconstructed image block.
  • Embodiments of the present application can allocate different weights to different features to highlight the influence of important features and weaken the influence of unimportant features, further improving the quality enhancement effect of reconstructed image blocks.
  • the embodiments of the present application do not limit the network model of the first feature extraction module.
  • the first feature extraction module includes multiple convolutional layers and attention mechanisms.
  • the above S404-A includes the following steps:
  • the decoder performs N feature weighted iterative processing on the reconstructed image block based on the quantization parameters to obtain the Nth feature information of the reconstructed image block. Specifically, based on the quantization parameters, feature weighting processing is performed on the reconstructed image block to obtain the first feature information of the reconstructed image block. Then, based on the quantization parameters, feature weighting processing is performed on the first feature information to obtain the first feature information of the reconstructed image block. 2 feature information, iteratively proceed, and perform feature weighting processing on the N-1th feature information based on the quantization parameters to obtain the Nth feature information of the reconstructed image block. It should be noted that the embodiment of the present application does not limit the specific method of feature weighting processing. For example, the quantization parameters and the i-1th feature information are input into a neural network with feature weighting processing function to obtain the reconstructed image block. The i-th feature information.
  • the first feature extraction module of the embodiment of the present application includes N first feature extraction units. If N is greater than 1, these N first feature extraction units are connected in series, that is, the previous The output of one first feature extraction unit is the input of the next first feature extraction unit. Based on the quantized parameters, the N first feature extraction units perform feature weighting processing on the reconstructed image block to obtain the Nth feature information of the reconstructed image block. As shown in Figure 7, among the N first feature extraction units, each first feature extraction unit is used to extract at least one feature, and assign a larger weight to an important feature in the at least one feature to highlight the feature. Impact, assigning smaller weights to relatively unimportant features to weaken the impact of that feature.
  • the latter first feature extraction unit performs feature extraction and weight distribution on the feature information output by the previous first feature extraction unit to further highlight the important features.
  • the reconstructed image can be The important features of the reconstructed image blocks are mainly enhanced, and the non-important features in the reconstructed image blocks are weakly enhanced, thereby improving the enhancement effect of the reconstructed image blocks.
  • the above S404-A1 includes: the decoder fuses the reconstructed image and the quantized coefficient, and then inputs the first feature
  • the first feature extraction unit performs feature extraction, extracts at least one feature, and allocates different weights according to the importance of the features in a feature. Then, weighting is performed on at least one feature according to different weights to obtain The first feature information.
  • the first characteristic information of the reconstructed image block is determined based on the first characteristic information. For example, the first characteristic information is determined as the first characteristic information of the reconstructed image block.
  • the above S404-A1 includes: after the decoder fuses the reconstructed image and the quantized coefficient, input the first In the first feature extraction unit, the first first feature extraction unit performs feature weighting processing, that is, extracts at least one feature, determines a weight for each feature in the at least one feature, and then weights the at least one feature according to the weight. , to obtain the feature information output by the first first feature extraction unit.
  • this feature information is recorded as the first feature information M 1 .
  • the first feature information M 1 is input into the second first feature extraction unit for feature weighting processing to obtain the second feature information M 2 , and so on, for the i-th of the N first feature extraction units
  • the first feature extraction unit inputs the i-1 feature information Mi-1 output by the i -1 first feature extraction unit into the i-th first feature extraction unit for feature weighting processing to obtain the i-th feature information M i
  • the N-th feature information M N output by the N-th first feature extraction unit is obtained.
  • the first characteristic information of the reconstructed image block is determined according to the Nth piece of characteristic information M N , for example, the Nth piece of characteristic information is determined as the first characteristic information of the reconstructed image block.
  • the first feature extraction unit includes at least one convolution layer and an attention mechanism.
  • feature weighting processing is performed on the i-1th feature information of the reconstructed image block based on the quantization parameter, and obtaining the i-th feature information of the reconstructed image block includes the following steps:
  • the decoding end performs multi-scale feature extraction on the i-1th feature information to obtain M feature information of different scales of the i-1th feature information. Then, the M feature information of different scales are weighted. Obtain the i-th weighted feature information. For example, according to the importance of the feature, assign a larger weight to the important features, assign a smaller weight to the unimportant features, and then assign M different scales based on the weight of each feature. The feature information is weighted to obtain the i-th weighted feature information of the reconstructed image block. Finally, the i-th feature information is determined based on the i-th weighted feature information. For example, the i-th weighted feature information is determined as the i-th feature information.
  • the first feature extraction unit includes a multi-scale extraction layer, where the multi-scale extraction layer is to extract features at multiple scales.
  • the input of the i-th first feature extraction unit is the i-1 first The output of the feature extraction unit.
  • the i-1th feature information Mi -1 output by the i-1th first feature extraction unit is input into the multi-scale extraction layer in the i-th first feature extraction unit.
  • the multi-scale extraction layer is used to extract multiple Scale features, such as extracting feature information at M different scales.
  • different weights are assigned to different features in the M feature information (D 1 , D 2 ...D M ) extracted by the multi-scale extraction layer at different scales, and a weighting operation is performed to obtain the i-th weighted feature information G 1 .
  • the i-th feature information Mi is determined based on the i-th weighted feature information G 1 , for example, the i-th weighted feature information G 1 is used as the i-th feature information output by the i-th first feature extraction unit. Mi.
  • the first feature extraction unit in the embodiment of the present application performs multi-scale feature extraction to better explore the relationship between the input reconstructed image blocks and the real image blocks, so as to further improve the enhancement effect of the reconstructed image blocks.
  • the above-mentioned multi-scale extraction layer is combined by a convolution layer and a down-sampling layer.
  • the convolution layer is used to output feature information
  • the down-sampling layer is used to down-sample the feature information output by the convolution layer, obtaining M feature information at different scales.
  • the above-mentioned multi-scale extraction layer includes M first feature extraction layers of different scales, and each first feature extraction layer can extract feature information at a corresponding scale.
  • the above S404-A11 includes: extracting M different scale feature information D1 , D2 ...D M of the i-1th feature information through M first feature extraction layers of different scales.
  • the embodiments of this application do not limit the specific network structure of the above-mentioned first feature extraction layer.
  • the above-mentioned first feature extraction layer includes a convolution layer, and the convolution sums of the convolution layers included in different first feature extraction layers are different.
  • the first feature extraction unit includes two first feature extraction layers.
  • the size of the convolution kernel of one first feature extraction layer is 3X3
  • the size of the convolution kernel of the other first feature extraction layer is 3X3.
  • the size is 5X5, and 3X3 and 5X5 convolution kernels are used to compare the i-1th feature information of the input Perform feature extraction to obtain feature information and feature information
  • At least one first feature extraction layer includes an activation function.
  • the decoder can fuse M feature information of different scales and perform weighting processing to obtain the i-th weighted feature information.
  • the embodiments of the present application do not limit the specific method of fusing the feature information of M different scales, for example, adding or multiplying the feature information of M different scales.
  • the above S404-A12 includes:
  • S404-A12-1 Splice M pieces of feature information of different scales to obtain the first spliced feature information; perform weighting processing on the first spliced feature information to obtain the i-th weighted feature information.
  • the first spliced feature information X is obtained, and X is weighted to obtain the i-th weighted feature information G 1 , for example, assign a larger weight to important features in X, assign a smaller weight to unimportant features, and then weight the features in X according to the weight of each feature to obtain the i-th weighted feature information G 1 .
  • the embodiment of the present application does not limit the specific implementation method of weighting M feature information of different scales in the above-mentioned S404-A12 to obtain the i-th weighted feature information.
  • the splicing feature information is weighted in the above S404-A12-1, and the i-th weighted feature information obtained includes:
  • the first feature extraction unit also includes a weighting processing layer, which is used to perform weighting processing on features at multiple scales.
  • the decoder inputs the i-1th feature information Mi- 1 output by the i-1th first feature extraction unit into the multi-scale extraction layer, for example, inputs M first feature extraction layers of different scales. These M The first feature extraction layer of different scales outputs M pieces of feature information D 1 , D 2 ...D M . After splicing these M feature information D 1 , D 2 ...DM Perform feature weighting processing on the final feature information to obtain weighted feature information with the first number of channels. Next, according to the weighted feature information with the first channel number Obtain the i-th weighted feature information.
  • the first feature extraction unit in the embodiment of the present application also includes a second feature extraction layer, which is used to change the number of channels.
  • the decoder can use the weighted feature information with the first channel number to The second feature extraction layer is input to change the number of channels, and the i-th weighted feature information G 1 is output.
  • the number of channels of feature information output by each of the above-mentioned N first feature extraction units may be the same.
  • the number of channels of the i-th weighted feature information may be the same as the number of channels of the i-th weighted feature information.
  • the number of channels for one piece of feature information is the same.
  • the embodiments of the present application do not limit the specific network structure of the weighted processing layer.
  • the weighted processing layer includes a neuron attention mechanism.
  • the embodiment of the present application does not limit the network structure of the second feature extraction layer, for example, it includes a 1X1 convolution layer.
  • the i-th weighted feature information is determined as the i-th feature information.
  • the sum of the i-th weighted feature information and the i-1th feature information is determined as the i-th feature information.
  • the following uses an example to introduce the network structure of the i-th first feature extraction unit in the embodiment of the present application.
  • the i-th first feature extraction unit in the embodiment of the present application includes a multi-scale extraction layer, a weighted processing layer and a second feature extraction layer, where the multi-scale extraction layer includes 2 first feature extraction layers, 2
  • the network structures of the two first feature extraction layers are basically the same, including a convolution layer and an activation function.
  • the convolution kernels included in the two first feature extraction layers are the same size, and the same convolution kernel size is 3X3 , the other convolution kernel size is 5X5.
  • the activation function included in the two first feature extraction layers is ReLu. It should be noted that the activation function can also be other forms of activation functions.
  • the second feature extraction layer includes a convolution layer with a convolution kernel size of 1X1 to reduce the number of feature channels.
  • the i-1th feature information output by the i-1th first feature extraction unit is Input 2 first feature extraction layers respectively, and the 2 first feature extraction layers perform multi-scale feature extraction and output feature information. and feature information Next, the feature information and feature information After splicing, the first splicing feature information is obtained
  • the above characteristic information C 1 , C 2 and X are determined by the following formula (1):
  • represents the ReLU activation function
  • W 1 and W 2 represent the 3X3 and 5X5 convolution kernels
  • * represents the convolution operation. It should be noted that the size of the convolution kernel and the type of activation function can be changed according to actual needs.
  • the input weighting processing layer performs feature weighting processing, specifically assigning larger weights to important features to highlight the impact of the feature, and assigning smaller weights to relatively unimportant features to weaken the impact of the feature.
  • the weighted processing layer outputs weighted feature information with the first channel number
  • the feature information Enter the second feature extraction layer to reduce the number of feature channels, specifically, Obtain the i-th weighted feature information through 1X1 convolution operation
  • D 3 is added to the input M i-1 to obtain the i-th feature information output by the i-th first feature extraction unit.
  • the above characteristic information D 3 and Mi -1 are determined by the following formula (2):
  • represents the ReLU activation function
  • W 3 represents the 1X1 convolution kernel
  • * represents the convolution operation. It should be noted that the size of the convolution kernel and the type of activation function can be changed according to actual needs.
  • the weighted processing layer described above includes a neuron attention mechanism.
  • the network structure of the neuron attention mechanism is shown in Figure 12, including depth convolution (Depth wise Conv), point convolution (Point wise Conv) and activation function.
  • Depthwise convolution operation the spliced feature X is convolved on different channels, and then through the ReLU activation function, the information of different feature maps is fused through the Pointwise convolution operation. Then, the weight feature map is obtained through the Sigmoid activation function. Multiply the corresponding elements of V and X to obtain the weighted feature information with the first number of channels.
  • the above-mentioned weighted feature information with the first channel number It can be determined by the following formula (3):
  • Y c represents the characteristics of feature information Y on one channel
  • X c represents the characteristics of feature information X on one channel
  • represents the ReLU activation function
  • represents the Sigmoid activation function
  • W d and W p represent Depthwise convolution respectively.
  • Pointwise convolution Represents the operation of multiplying corresponding elements. It should be noted that the type of activation function can be changed according to actual needs.
  • the above takes the i-th first feature extraction unit among the N first feature extraction units as an example to extract the i-th feature information.
  • first feature extraction units among the N first feature extraction units refer to the above-mentioned i-th feature extraction unit.
  • the process of extracting the i-th feature information by the first feature extraction unit can then obtain the final N-th feature information extracted by the N-th first feature extraction unit.
  • step S404-A2 is performed to determine the first feature information of the reconstructed image block based on the Nth piece of feature information.
  • the implementation process of the above S404-A2 includes but is not limited to the following methods:
  • Method 1 determine the Nth feature information as the first feature information of the reconstructed image block.
  • the above-mentioned S404-A2 includes S404-A2-1: obtain the first feature information of the reconstructed image block based on at least one of the first N-1 feature information of the N-th feature information and the N-th feature information.
  • the decoder performs N iterative feature weighting processing on the reconstructed image block based on the quantization parameters. For example, based on the quantization parameters, the i-1th feature information of the reconstructed image block is subjected to feature weighting processing to obtain the i-th feature information of the reconstructed image block.
  • feature information, i is a positive integer from 1 to N, and is repeatedly executed to obtain the Nth feature information of the reconstructed image block.
  • the decoder can obtain the first feature information of the reconstructed image block based on at least one of the first N-1 feature information of the Nth feature information and the Nth feature information. , for example, after splicing at least one of the first N-1 feature information and the N-th feature information, feature extraction is performed to obtain the first feature information of the reconstructed image block.
  • the first feature extraction module includes, in addition to N first feature extraction units, a second feature extraction unit, which is used to extract the first N-1
  • the feature information output by at least one of the first feature extraction units and the Nth feature information output by the Nth first feature extraction unit are re-extracted to obtain deeper feature information.
  • the decoder will use at least one of the first N-1 first feature extraction modules among the N first feature extraction modules.
  • the feature information output by the feature extraction module and the N-th feature information M N output by the N-th first feature extraction module are input into the second feature extraction unit.
  • the first N-1 first feature extraction modules The feature information output by at least one first feature extraction module is spliced with the Nth feature information M N and then input into the second feature extraction unit.
  • the second feature extraction unit performs deeper feature extraction to obtain the first feature information F1 of the reconstructed image block.
  • the above-mentioned S404-A2-1 also includes: converting the feature information output by at least one of the first N-1 first feature extraction modules, the Nth feature information, reconstructed image blocks, quantization After the parameters are spliced, they are input into the second feature extraction unit to obtain the first feature information of the reconstructed image block.
  • the reconstructed image blocks and quantization parameters are input into the second feature extraction together with the feature information output by at least one first feature extraction module and the Nth feature information.
  • Feature extraction is performed in the unit, so that the feature extraction process is supervised by the reconstructed image blocks and quantized parameters, and the first feature information that is more in line with the requirements is output.
  • the embodiments of the present application first extract the shallow feature information of the reconstructed image block (that is, the second feature information), and then use the second feature information to determine the first feature information of the reconstructed image block. information to determine the first characteristic information of the reconstructed image block.
  • S404-A includes: extracting the second feature information of the reconstructed image block based on the quantization parameter; performing feature weighting processing on the second feature information to obtain the first feature information of the reconstructed image block.
  • shallow feature extraction is performed on the reconstructed image blocks to obtain the second feature information of the reconstructed image blocks.
  • the reconstructed image blocks and the quantified parameters are spliced to obtain splicing information, and shallow features are performed on the spliced information.
  • the first feature information of the reconstructed image block is determined. For example, depth feature extraction is performed based on the second feature information to obtain the first feature information of the reconstructed image block.
  • the embodiments of this application perform feature extraction on the spliced information and obtain the second feature information in a specific manner without limitation.
  • feature extraction is performed on the spliced information through a second feature extraction module to obtain the second feature information.
  • the enhanced model of the embodiment of the present application also includes a second feature extraction module.
  • the second feature extraction module is used to extract shallow feature information of the reconstructed image block.
  • the extracted shallow feature information is input into the first feature extraction module for deep feature extraction.
  • the decoder first splices the reconstructed image and the quantized parameters to obtain the splicing information, and then inputs the splicing information into the second feature extraction module for shallow feature extraction to obtain the second feature information of the reconstructed image block. C2.
  • the second feature information C2 of the shallow layer is input into the first feature extraction module to obtain the first feature information F1 of the reconstructed image block.
  • the above-mentioned S404-A2-1 includes: splicing at least one of the first N-1 feature information, the N-th feature information and the second feature information to obtain the second spliced feature information;
  • the second splicing feature information is used for feature re-extraction to obtain the first feature information of the reconstructed image block.
  • the N-th feature information M N and the second feature information C2 Input it into the second feature extraction unit to obtain the first feature information F1 of the reconstructed image block.
  • the embodiments of this application do not limit the specific network structure of the second feature extraction module.
  • the above-mentioned second feature extraction module includes at least one convolutional layer.
  • the decoding end obtains the second feature information of the reconstructed image block through the two convolutional layers.
  • the following takes the second feature extraction module including two convolution layers as an example to introduce the process of determining the second feature information.
  • the decoder normalizes the reconstructed image blocks to obtain the normalized reconstructed image blocks.
  • the quantization parameter QP is normalized to obtain the normalized quantization parameter.
  • the second feature extraction module is input, and the first convolutional layer outputs features Input the feature C 1 into the second convolution layer, and the second convolution layer outputs the second feature information
  • the above-mentioned second characteristic information C 2 can be determined by the following formula (4):
  • represents the ReLU activation function
  • W 1 and W 2 represent the 3X3 convolution kernel
  • * represents the convolution operation
  • represents the splicing operation. It should be noted that the size of the convolution kernel and the type of activation function can be changed according to actual needs.
  • the above method after determining the second characteristic information of the reconstructed image block, perform feature weighting processing on the second characteristic information to obtain the first characteristic information of the reconstructed image block, and determine the reconstructed image block based on the first characteristic information of the reconstructed image block of enhanced image blocks.
  • the embodiment of the present application does not limit the specific method of determining the enhanced image block of the reconstructed image block based on the first characteristic information of the reconstructed image block in S404-B.
  • the first feature information of the reconstructed image block may be determined as the enhanced image block.
  • the above-mentioned S404-B includes: performing non-linear mapping on the first feature information of the reconstructed image block to obtain an enhanced image block.
  • the embodiment of the present application does not limit the specific method of performing nonlinear mapping on the first feature information of the reconstructed image block in S404-B to obtain the enhanced image block.
  • a nonlinear mapping method is used to process the first feature information of the reconstructed image block so that the size of the first feature information of the processed reconstructed image block is consistent with the size of the reconstructed image block, and then the processed reconstructed image block is The first feature information of the block is used as the enhanced image block.
  • the enhanced model of the embodiment of the present application also includes a reconstruction module, which is used to perform further non-linear mapping on the first feature information extracted by the first feature extraction module.
  • the above-mentioned S404-B includes: non-linear mapping of the first feature information of the reconstructed image block through the reconstruction module to obtain the enhanced image block.
  • the embodiments of this application do not limit the network model of the reconstruction module.
  • the reconstruction module includes at least one convolutional layer.
  • the reconstruction module includes two convolutional layers.
  • the encoding end inputs the first feature information F1 of the reconstructed image block output by the first feature extraction module into the reconstruction module, and passes through the convolutions of the two convolutional layers.
  • the product operation is performed to obtain the enhanced image block O1 of the reconstructed image block.
  • the enhanced image block O1 can be determined by the following formula (5):
  • F1 represents the first feature information of the reconstructed image
  • represents the ReLU activation function
  • W3, W4 and W5 represent the 3X3 convolution kernel
  • * represents the convolution operation. It should be noted that the size of the convolution kernel and the type of activation function can be changed according to actual needs.
  • the decoding end performs quality enhancement on the reconstructed image block based on the quantization parameters through the above steps to obtain an enhanced image block of the reconstructed image block.
  • the decoder before performing quality enhancement on the reconstructed image block, the decoder first needs to determine whether quality enhancement of the reconstructed image block is allowed. That is to say, when the decoding end determines that the effect of quality enhancement on the reconstructed image block is greater than the effect of no enhancement, the decoder performs quality enhancement on the reconstructed image block.
  • the decoding end determines whether to allow quality enhancement of the reconstructed image block, including but not limited to the following:
  • Method 1 decode the code stream to obtain a first flag, which is used to indicate whether quality enhancement of the reconstructed image block of the current image block is allowed.
  • the encoding end determines whether to perform quality enhancement on the reconstructed image block of the current image block, and notifies the decoding end of the encoding end's judgment result through the first flag, so that the decoding end and encoding end adopt consistent image enhancement.
  • the value of the first flag is set to the first value, for example, 1. If the encoding end does not perform quality enhancement on the reconstructed image block of the current image block, When enhancing, the value of the first flag is set to the second value, for example, set to 0.
  • the decoder first obtains the first flag by decoding the code stream, and determines whether to allow quality enhancement of the reconstructed image block of the current image block based on the first flag. For example, if the value of the first flag is 1, Then the decoder determines to use the method of the embodiment of the present application to enhance the quality of the reconstructed image block, that is, to enhance the quality of the reconstructed image block based on the quantization coefficient. If the value of the first flag is 0, the decoder determines not to enhance the quality of the reconstructed image block, but uses the existing loop filtering method to filter the reconstructed image block.
  • the above-mentioned first flag may be a sequence-level flag.
  • the above-mentioned first flag may be a frame-level flag.
  • the above-mentioned first flag may be a slice-level flag.
  • the first flag may be a block-level flag, such as a CTU-level flag or a CU-level flag.
  • Method 2 The decoder determines whether to enhance the quality of the reconstructed image block by itself.
  • the decoder first performs quality enhancement on the reconstructed image block based on the quantized parameters to obtain the test enhanced image block. Then, the decoder determines the image quality corresponding to the test enhanced image block and the unenhanced reconstructed image block. If the test enhancement When the image quality of the image block is greater than the image quality of the reconstructed image block, it means that the enhancement method of the embodiment of the present application can achieve a significant enhancement effect. At this time, the decoder determines the above-determined test enhanced image block as the enhancement of the reconstructed image block. The image block is directly output for display, and/or the above-determined test enhanced image block is saved in the decoded image cache as an intra-frame reference for subsequent image blocks.
  • the reconstructed image block is directly output after loop filtering. display, and/or save the loop-filtered reconstructed image block in the decoded image buffer as an intra-frame reference for subsequent image blocks.
  • the embodiments of this application do not limit the method of determining image quality.
  • peak signal-to-noise ratio Peak Signal-to-Noise Ratio, PSNR
  • structural similarity Structural SIMilarity, SSIN
  • the reconstructed image block is a reconstructed image block that has been processed by loop filtering.
  • the decoder determines the prediction block of the current image block and the residual block of the current image block, and adds the residual block and the prediction block to obtain the reconstructed image block. Then a loop filter is used to filter the reconstructed image blocks, and the filtered reconstructed image blocks are input into the enhancement model for quality enhancement.
  • embodiments of the present application may use an enhancement model to enhance the quality of the reconstructed image blocks, and then perform loop filtering processing.
  • loop filtering is no longer performed.
  • the enhanced image blocks can be displayed and stored in the decoded image cache as a reference for other image blocks.
  • the decoder can also display the enhanced image blocks and store the unenhanced reconstructed image blocks in the decoded image cache as a reference for other image blocks.
  • the decoder can also display the reconstructed image blocks and store the enhanced image blocks in the decoded image cache as a reference for other image blocks.
  • the training of the enhancement model can be completed by other devices, and the decoder directly uses the trained enhancement model to perform quality enhancement.
  • the training of the enhanced model can be completed by the decoder.
  • the decoder uses training data to train the enhanced model, and uses the trained enhanced model to perform quality enhancement.
  • the training set of the enhanced model consists of 800 images used for training from the super-reconstructed DIV2K dataset.
  • VTM8.2 Very Video Coding and Test Model 8.2, VVC test platform version 8.2
  • set the quantization parameter QP to 22, 27, 32, 37, AI mode and turn off the configuration of loop filtering LMCS, DB, SAO and ALF
  • These 800 pictures are coded below, and a total of 3200 coded images are obtained.
  • These 3200 encoded images are used as the input of the network model, and their corresponding unencoded original images are used as the real values to form the final training set.
  • random cropping is used to randomly crop image blocks of size 128X128 in each image as input to the enhanced model.
  • the initial learning rate of the enhanced model is set to 1X10 -2 , and the learning rate is reduced to 1/2 of the original value every 30 iterations (epoch).
  • the training was finally completed on the Pytorch1.6.0 platform.
  • the image processing method provided by the embodiment of the present application decodes the code stream to obtain the quantization coefficient of the current image block; determines the quantization parameter corresponding to the current image block, and performs inverse quantization on the quantization coefficient based on the quantization parameter to obtain the transformation coefficient of the current image block ; According to the transformation coefficient, determine the reconstructed image block of the current image block; based on the quantization parameter, perform quality enhancement on the reconstructed image block to obtain an enhanced image block. Since the quantization coefficients corresponding to different image blocks may be different, in order to improve the enhancement accuracy of the image blocks, this application performs quality enhancement on the reconstructed image blocks based on the quantization coefficients, which can improve the enhancement effect. In addition, this application performs image quality enhancement in units of image blocks, so that when using the enhanced image blocks as reference blocks for other image blocks in intra-frame prediction, a more accurate reference can be provided, thereby improving the accuracy of intra-frame prediction. .
  • Figure 19 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • Figure 19 can be understood as a more specific way of the image processing method shown in Figure 4.
  • the image processing method according to the embodiment of the present application includes the following steps:
  • Method 1 decode the code stream to obtain the first flag, and determine whether quality enhancement of the reconstructed image block of the current image block is allowed based on the first flag.
  • Method 2 The decoder performs quality enhancement on the reconstructed image block based on the quantized parameters to obtain the test enhanced image block; determines the first image quality of the test enhanced image block and the second image quality of the reconstructed image block, and determines the first image quality and the second image quality of the reconstructed image block. 2. Image quality, determine whether to enhance the quality of the reconstructed image block of the current image block.
  • the decoder determines to enhance the quality of the reconstructed image block of the current image block, the following steps of S505 are performed.
  • the decoding end determines not to perform quality enhancement on the reconstructed image block of the current image block, the following steps of S508 are performed.
  • an enhancement model is used to perform quality enhancement on the reconstructed image block to obtain an enhanced image block.
  • the enhanced model of the embodiment of the present application includes a second feature extraction module, a first feature extraction module and a reconstruction module, where the second feature extraction module is used to extract shallow features (ie, reconstruct image blocks second feature information), the first feature extraction module is used to extract deep features (that is, the first feature information of the reconstructed image block), and the reconstruction module is used to perform nonlinear mapping of the deep features to obtain the final enhanced image block.
  • the second feature extraction module is used to extract shallow features (ie, reconstruct image blocks second feature information)
  • the first feature extraction module is used to extract deep features (that is, the first feature information of the reconstructed image block)
  • the reconstruction module is used to perform nonlinear mapping of the deep features to obtain the final enhanced image block.
  • the second feature extraction module shown in Figure 18 includes two convolutional layers.
  • the structure of the second feature extraction module provided by this embodiment of the application includes but is not limited to that shown in Figure 18.
  • Figure 18 shows that the first feature extraction module includes N first feature extraction layers and one second feature extraction layer, and the N first feature extraction layers are connected in series, and the N first feature extraction layers The output and the output of the second feature extraction module are input into the second feature extraction layer.
  • the network structure of the first feature extraction module provided by the embodiment of this application includes but is not limited to that shown in Figure 18.
  • Figure 18 outputs that the reconstruction module includes two convolutional layers, but the network structure of the reconstruction module provided by this embodiment of the application includes, but is not limited to, what is shown in Figure 18 .
  • the first feature extraction layer in the embodiment of the present application includes a multi-scale extraction layer and a neuron attention mechanism, wherein the multi-scale extraction layer is used for multi-scale feature extraction, and the neuron attention mechanism is used for Feature weighting.
  • the first feature extraction layer in the embodiment of this application can also be called a multi-scale neuron attention (Multi-scale and Neuron Attention, MSNA for short) unit.
  • the enhanced model of the embodiment of the present application is also called a neural network model based on neuron attention mechanism (Neuron Attention-based CNN, referred to as NACNN).
  • NACNN Neuron Attention-based CNN
  • the decoder normalizes the reconstructed image blocks and quantization parameters, then splices them, and inputs the splicing results into the enhancement model.
  • the first convolution layer in the second feature extraction module reconstructs The image block and the quantization parameter are subjected to a convolution operation to obtain feature information C1, and then the feature information C is input into the second convolution layer for a convolution operation to obtain the second feature information C2.
  • the decoding end inputs the second feature information C2 into the first feature extraction module.
  • the first first feature extraction unit in the first feature extraction module performs multi-scale feature extraction and feature weighting on C2 to obtain the first feature information M1.
  • the first feature information M1 is input into the second first feature extraction unit for multi-scale feature extraction and feature weighting to obtain the second feature information M2.
  • the Nth feature information MN output by the Nth first feature extraction unit is obtained.
  • the decoder inputs the first feature information F1 into the reconstruction module for reconstruction.
  • the first convolution layer in the reconstruction module performs a convolution operation to obtain the feature information C3, and the second convolution in the reconstruction module The layer performs a convolution operation on feature information C3 to obtain feature information C4.
  • the enhanced image block O1 of the reconstructed image block is obtained.
  • the feature information C4 is output as the enhanced image block O1 of the reconstructed image block.
  • step S508 is performed to perform loop filtering on the reconstructed image block.
  • the decoder before the decoder performs quality enhancement on the reconstructed image block, it first determines whether to perform quality enhancement on the reconstructed image block, thereby improving the reliability of image processing.
  • the image processing method provided by the embodiment of the present application is introduced above by taking the decoding end as an example. On this basis, the image processing method provided by the embodiment of the present application is introduced below by taking the encoding end as an example.
  • FIG. 20 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the embodiment of the present application is applied to the encoder shown in FIGS. 1 and 2 .
  • the method in the embodiment of this application includes:
  • the encoder receives a video stream, which is composed of a series of image frames, and performs video encoding on each frame of image in the video stream.
  • the video encoder divides the image frames into blocks to obtain the current encoding block.
  • the embodiment of the present application does not limit the specific size of the current image block.
  • the current image block in the embodiment of this application is a CTU.
  • one frame of image is divided into several CTUs, and this application does not limit the size of the CTU.
  • the size of a CTU is 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
  • the current image block in the embodiment of the present application is a CU, for example, one CTU is divided into one or more CUs.
  • the current image block in the embodiment of the present application is a TU or PU, for example, a CU is divided into one or more TUs or PUs.
  • the current image block in the embodiment of the present application only includes chrominance components, which can be understood as chrominance blocks.
  • the current image block in the embodiment of the present application only includes the brightness component, which can be understood as a brightness block.
  • the current image block includes both luma and chrominance components.
  • the encoding process in this embodiment of the present application is to divide the current image frame into blocks to obtain the current image block, and use the intra-frame or inter-frame prediction method to calculate the current image block. Make a prediction and get the predicted value of the current image block. The original value of the current image block is subtracted from the predicted value to obtain the residual value of the current image block. Determine the transformation method corresponding to the current image block, use this transformation method to transform the residual value, and obtain the transformation coefficient. Use the determined quantization parameters to quantize the transform coefficients to obtain quantized coefficients, and encode the quantized coefficients to obtain a code stream.
  • the encoding end also includes a decoding process. Specifically, as shown in Figure 2, the inverse quantization/transform unit 240 inversely quantizes the quantization coefficient based on the determined quantization parameter to obtain the transform coefficient, and then inversely transforms the transform coefficient to obtain the residual. piece.
  • the reconstruction unit 250 adds the prediction block and the residual block to obtain a reconstructed image block of the current image block.
  • the encoding end performs block division on the current image frame to obtain multiple CUs. For each CU, according to the above method, a reconstructed block of each CU can be obtained. In this way, the reconstructed blocks of the CUs included in the current image block are combined to obtain the reconstructed image block of the current image block.
  • the quantization parameters of the current image block include quantization parameters corresponding to the multiple CUs.
  • the quantization parameters corresponding to these multiple CUs may be the same or different.
  • the quantization parameter of the current image block in the embodiment of the present application can be expressed in the form of a matrix.
  • the size of the current image block is 16X16
  • the quantization parameter of the current image block can be a 16X16 matrix
  • Each element in this matrix is the quantized parameter of the pixel at the corresponding position in the current image block.
  • the embodiments of this application do not limit the specific process of determining the quantization parameters corresponding to the current image block.
  • the codec uses the default quantization parameter as the quantization parameter corresponding to the current image block.
  • the encoding end determines the quantization parameter corresponding to the current image block through calculation.
  • the encoding end can write the determined quantization parameters into the code stream, so that the decoding end can determine the quantization parameters corresponding to the current image block by decoding the code stream.
  • the quantization parameters QP corresponding to different image blocks may be different.
  • the quantization parameter QP includes a quantization step size.
  • the transform coefficients of the image blocks are quantized. The larger the quantization step size is, the greater the image loss is. The smaller the quantization step size is, the greater the image loss is. Small. Therefore, in order to improve the enhancement effect on the current image block, the embodiment of the present application considers the influence of the quantization parameter QP corresponding to the current image block during the quality enhancement process of the current image block.
  • embodiments of the present application perform quality enhancement on the reconstructed image block of the current image block based on the quantization coefficient corresponding to the current image block.
  • the enhancement effect of reconstructed image patches can be improved.
  • image quality is enhanced in units of image blocks.
  • the enhanced image blocks as reference blocks for other image blocks in intra-frame prediction, a more accurate reference can be provided, thereby improving the accuracy of intra-frame prediction. sex.
  • the embodiment of the present application performs image quality enhancement on an image block basis. Compared with image quality enhancement on the entire frame image, more attention can be paid to the enhancement of finer features in the image block, thereby further improving the enhancement quality of the image block.
  • the encoding end performs quality enhancement on the reconstructed image block through an enhancement model based on the quantization parameters to obtain an enhanced image block. Specifically, after the encoding end determines the reconstructed image block of the current image block according to the above steps, in order to reduce the distortion of the reconstructed image block and improve the quality of the reconstructed image block, as shown in Figure 5, the reconstructed image block and the corresponding quantized The coefficients are input into the pre-trained enhancement model to perform image enhancement, and finally the enhanced image block of the current image block is obtained. It should be noted that this enhancement model is trained based on different image blocks and their corresponding quantization parameters.
  • the enhancement model in the embodiment of the present application can be any neural network model that can enhance the quality of image blocks.
  • the embodiment of the present application does not limit the specific network model of the enhancement model.
  • the encoding end fuses the reconstructed image blocks and quantization parameters and then inputs the enhancement model.
  • the fusion method of reconstructed image blocks and quantization parameters includes at least the following examples:
  • Example 1 assuming that the size of the above reconstructed image block is N1*N2, where N1 and N2 can be the same or different.
  • the augmentation model is input. Specifically, each pixel of the reconstructed image block is multiplied by the quantization parameter to obtain an N1*N2 matrix, and the matrix is input to the enhancement model.
  • Example 2 after splicing the reconstructed image and quantized parameters, input the enhanced model. Specifically, the quantization coefficient is set to a matrix of size N1*N2, the reconstructed image of size N1*N2 is spliced with the quantization parameter of size N1*N2, and then the enhancement model is input.
  • the encoding end can also use other fusion methods to fuse the reconstructed image and the corresponding quantization parameters, and then input the enhancement model for quality enhancement.
  • the encoding end in order to prevent features with smaller absolute values from being overwritten by features with larger absolute values, the encoding end first normalizes the reconstructed image blocks and quantized parameters before inputting them into the enhancement model. Unified so that all features are treated equally. Then, based on the normalized reconstructed image block and quantization parameters, an enhanced image block of the reconstructed image block is obtained. For example, after splicing the normalized reconstructed image blocks and quantized parameters, the enhancement model is input for quality enhancement to improve the quality enhancement effect.
  • the reconstructed image block in this embodiment of the present application is the reconstructed image block of the current image block under the first component.
  • the first component may be a luminance component or a chrominance component.
  • the above S604 includes the following steps of S604-A and S604-B:
  • S604-B Determine the enhanced image block according to the first feature information.
  • the characteristic information of the reconstructed image block is extracted based on the quantized parameters.
  • the quantized parameters and the reconstructed image block are input into a neural network layer, and the characteristic information of the reconstructed image block is extracted.
  • these Feature information analysis assigns different weights to different features, such as assigning larger weights to important features in the feature information to highlight the impact of the feature, and assigning smaller weights to relatively unimportant features to weaken the impact of the feature. , and then weight the feature information of the reconstructed image block according to the weight corresponding to the feature to obtain the first feature information of the reconstructed image block.
  • the enhanced model of the embodiment of the present application includes a first feature extraction module, which is used to extract input information (ie, reconstructed image blocks and quantization parameters) at least one feature, and assign different weights to at least one extracted feature to perform feature weighting processing.
  • input information ie, reconstructed image blocks and quantization parameters
  • the enhanced model in the embodiment of the present application includes a first feature extraction module, which is used to weight the extracted features.
  • the reconstructed image blocks and quantified parameters are input into the first feature extraction module.
  • the first feature extraction module performs feature extraction and extracts at least one Features, and allocate a larger weight to the important features among the at least one feature to highlight the impact of the feature, allocate a smaller weight to the relatively unimportant features to weaken the impact of the feature, and then according to the weight corresponding to at least one feature Perform weighting processing to obtain the first feature information of the reconstructed image block.
  • an enhanced image block of the reconstructed image block is determined based on the first characteristic information of the reconstructed image block.
  • Embodiments of the present application can allocate different weights to different features to highlight the influence of important features and weaken the influence of unimportant features, further improving the quality enhancement effect of reconstructed image blocks.
  • the embodiments of the present application do not limit the network model of the first feature extraction module.
  • the first feature extraction module includes multiple convolutional layers and attention mechanisms.
  • the above S604-A includes the following steps:
  • S604-A2 Determine the first feature information of the reconstructed image block based on the Nth piece of feature information.
  • the encoding end performs N feature weighted iterative processing on the reconstructed image block based on the quantization parameters to obtain the Nth feature information of the reconstructed image block. Specifically, based on the quantization parameters, feature weighting processing is performed on the reconstructed image block to obtain the first feature information of the reconstructed image block. Then, based on the quantization parameters, feature weighting processing is performed on the first feature information to obtain the first feature information of the reconstructed image block. 2 feature information, iteratively proceed, and perform feature weighting processing on the N-1th feature information based on the quantization parameters to obtain the Nth feature information of the reconstructed image block. It should be noted that the embodiment of the present application does not limit the specific method of feature weighting processing. For example, the quantization parameters and the i-1th feature information are input into a neural network with feature weighting processing function to obtain the reconstructed image block. The i-th feature information.
  • the first feature extraction module of the embodiment of the present application includes N first feature extraction units. If N is greater than 1, these N first feature extraction units are connected in series, that is, the previous The output of one first feature extraction unit is the input of the next first feature extraction unit. Based on the quantized parameters, the N first feature extraction units perform feature weighting processing on the reconstructed image block to obtain the Nth feature information of the reconstructed image block. As shown in Figure 7, among the N first feature extraction units, each first feature extraction unit is used to extract at least one feature, and assign a larger weight to an important feature in the at least one feature to highlight the feature. Impact, assigning smaller weights to relatively unimportant features to weaken the impact of that feature.
  • the latter first feature extraction unit performs feature extraction and weight distribution on the feature information output by the previous first feature extraction unit to further highlight the important features.
  • the reconstructed image can be The important features of the reconstructed image blocks are mainly enhanced, and the non-important features in the reconstructed image blocks are weakly enhanced, thereby improving the enhancement effect of the reconstructed image blocks.
  • the above S603-A1 includes: after the encoding end fuses the reconstructed image and the quantization coefficient, the first feature is input In the extraction unit, the first feature extraction unit performs feature extraction, extracts at least one feature, and assigns different weights according to the importance of the features in a feature. Then, weighting is performed on at least one feature according to different weights to obtain The first feature information. Finally, the first characteristic information of the reconstructed image block is determined based on the first characteristic information. For example, the first characteristic information is determined as the first characteristic information of the reconstructed image block.
  • the above S603-A1 includes: after the encoding end fuses the reconstructed image and the quantization coefficient, input the first In the first feature extraction unit, the first first feature extraction unit performs feature weighting processing, that is, extracts at least one feature, determines a weight for each feature in the at least one feature, and then weights the at least one feature according to the weight. , to obtain the feature information output by the first first feature extraction unit.
  • this feature information is recorded as the first feature information M 1 .
  • the first feature information M 1 is input into the second first feature extraction unit for feature weighting processing to obtain the second feature information M 2 , and so on, for the i-th of the N first feature extraction units
  • the first feature extraction unit inputs the i-1 feature information Mi-1 output by the i -1 first feature extraction unit into the i-th first feature extraction unit for feature weighting processing to obtain the i-th feature information M i
  • the N-th feature information M N output by the N-th first feature extraction unit is obtained.
  • the first characteristic information of the reconstructed image block is determined according to the Nth piece of characteristic information M N , for example, the Nth piece of characteristic information is determined as the first characteristic information of the reconstructed image block.
  • the first feature extraction unit includes at least one convolution layer and an attention mechanism.
  • feature weighting processing is performed on the i-1th feature information of the reconstructed image block, and obtaining the i-th feature information of the reconstructed image block includes the following steps:
  • the encoding end performs multi-scale feature extraction on the i-1th feature information to obtain M feature information of different scales of the i-1th feature information, and then weights the M feature information of different scales.
  • Obtain the i-th weighted feature information For example, according to the importance of the feature, assign a larger weight to the important features, assign a smaller weight to the unimportant features, and then assign M different scales based on the weight of each feature.
  • the feature information is weighted to obtain the i-th weighted feature information of the reconstructed image block.
  • the i-th feature information is determined based on the i-th weighted feature information. For example, the i-th weighted feature information is determined as the i-th feature information.
  • the first feature extraction unit includes a multi-scale extraction layer, where the multi-scale extraction layer is to extract features at multiple scales.
  • the input of the i-th first feature extraction unit is the i-1 first The output of the feature extraction unit.
  • the i-1th feature information Mi -1 output by the i-1th first feature extraction unit is input into the multi-scale extraction layer in the i-th first feature extraction unit.
  • the multi-scale extraction layer is used to extract multiple Scale features, such as extracting feature information at M different scales.
  • different weights are assigned to different features in the M feature information (D 1 , D 2 ...D M ) extracted by the multi-scale extraction layer at different scales, and a weighting operation is performed to obtain the i-th weighted feature information G 1 .
  • the i-th feature information Mi is determined based on the i-th weighted feature information G 1 , for example, the i-th weighted feature information G 1 is used as the i-th feature information output by the i-th first feature extraction unit. Mi.
  • the first feature extraction unit in the embodiment of the present application performs multi-scale feature extraction to better explore the relationship between the input reconstructed image blocks and the real image blocks, so as to further improve the enhancement effect of the reconstructed image blocks.
  • the above-mentioned multi-scale extraction layer is combined by a convolution layer and a down-sampling layer.
  • the convolution layer is used to output feature information
  • the down-sampling layer is used to down-sample the feature information output by the convolution layer, obtaining M feature information at different scales.
  • the above-mentioned multi-scale extraction layer includes M first feature extraction layers of different scales, and each first feature extraction layer can extract feature information at a corresponding scale.
  • the above-mentioned S604-A11 includes: extracting M different scale feature information D 1 , D 2 ...D M of the i-1th feature information through M first feature extraction layers of different scales.
  • the embodiments of this application do not limit the specific network structure of the above-mentioned first feature extraction layer.
  • the above-mentioned first feature extraction layer includes a convolution layer, and the convolution sums of the convolution layers included in different first feature extraction layers are different.
  • the first feature extraction unit includes two first feature extraction layers.
  • the size of the convolution kernel of one first feature extraction layer is 3X3
  • the size of the convolution kernel of the other first feature extraction layer is 3X3.
  • the size is 5X5, and 3X3 and 5X5 convolution kernels are used to compare the i-1th feature information of the input Perform feature extraction to obtain feature information and feature information
  • At least one first feature extraction layer includes an activation function.
  • the encoding end can fuse M feature information of different scales and perform weighting processing to obtain the i-th weighted feature information.
  • the embodiments of the present application do not limit the specific method of fusing the feature information of M different scales, for example, adding or multiplying the feature information of M different scales.
  • the above S604-A12 includes:
  • S604-A12-1 Splice M pieces of feature information of different scales to obtain the first spliced feature information; perform weighting processing on the first spliced feature information to obtain the i-th weighted feature information.
  • the first spliced feature information X is obtained, and X is weighted to obtain the i-th weighted feature information G 1 , for example, assign a larger weight to important features in X, assign a smaller weight to unimportant features, and then weight the features in X according to the weight of each feature to obtain the i-th weighted feature information G 1 .
  • the embodiment of the present application does not limit the specific implementation method of weighting M feature information of different scales in the above-mentioned S604-A12 to obtain the i-th weighted feature information.
  • the splicing feature information is weighted in the above S604-A12-1, and the i-th weighted feature information obtained includes:
  • the first feature extraction unit also includes a weighting processing layer, which is used to perform weighting processing on features at multiple scales.
  • the encoding end inputs the i-1th feature information Mi- 1 output by the i-1th first feature extraction unit into the multi-scale extraction layer, for example, inputs M first feature extraction layers of different scales. These M The first feature extraction layer of different scales outputs M pieces of feature information D 1 , D 2 ...D M . After splicing these M feature information D 1 , D 2 ...DM Perform feature weighting processing on the final feature information to obtain weighted feature information with the first number of channels. Next, according to the weighted feature information with the first channel number Obtain the i-th weighted feature information.
  • the first feature extraction unit in the embodiment of the present application also includes a second feature extraction layer, which is used to change the number of channels.
  • the encoding end can transfer the weighted feature information with the first number of channels to The second feature extraction layer is input to change the number of channels, and the i-th weighted feature information G 1 is output.
  • the number of channels of feature information output by each of the above-mentioned N first feature extraction units may be the same.
  • the number of channels of the i-th weighted feature information may be the same as the number of channels of the i-th weighted feature information.
  • the number of channels for one piece of feature information is the same.
  • the embodiments of the present application do not limit the specific network structure of the weighted processing layer.
  • the weighted processing layer includes a neuron attention mechanism.
  • the embodiment of the present application does not limit the network structure of the second feature extraction layer, for example, it includes a 1X1 convolution layer.
  • S604-A13 is executed to determine the i-th feature information based on the i-th weighted feature information.
  • the i-th weighted feature information is determined as the i-th feature information.
  • the sum of the i-th weighted feature information and the i-1th feature information is determined as the i-th feature information.
  • the following uses an example to introduce the network structure of the i-th first feature extraction unit in the embodiment of the present application.
  • the i-th first feature extraction unit in the embodiment of the present application includes a multi-scale extraction layer, a weighted processing layer and a second feature extraction layer, where the multi-scale extraction layer includes 2 first feature extraction layers, 2
  • the network structures of the two first feature extraction layers are basically the same, including a convolution layer and an activation function.
  • the convolution kernels included in the two first feature extraction layers are the same size, and the same convolution kernel size is 3X3 , the other convolution kernel size is 5X5.
  • the activation function included in the two first feature extraction layers is ReLu. It should be noted that the activation function can also be other forms of activation functions.
  • the second feature extraction layer includes a convolution layer with a convolution kernel size of 1X1 to reduce the number of feature channels.
  • the following uses an example to introduce the network structure of the i-th first feature extraction unit in the embodiment of the present application.
  • the i-th first feature extraction unit in the embodiment of the present application includes a multi-scale extraction layer, a weighted processing layer and a second feature extraction layer, where the multi-scale extraction layer includes 2 first feature extraction layers, 2
  • the network structures of the two first feature extraction layers are basically the same, including a convolution layer and an activation function.
  • the convolution kernels included in the two first feature extraction layers are the same size, and the same convolution kernel size is 3X3 , the other convolution kernel size is 5X5.
  • the activation function included in the two first feature extraction layers is ReLu. It should be noted that the activation function can also be other forms of activation functions.
  • the second feature extraction layer includes a convolution layer with a convolution kernel size of 1X1 to reduce the number of feature channels.
  • the i-1th feature information output by the i-1th first feature extraction unit is Input 2 first feature extraction layers respectively, and the 2 first feature extraction layers perform multi-scale feature extraction and output feature information. and feature information Next, the feature information and feature information After splicing, the first splicing feature information is obtained
  • the above characteristic information C 1 , C 2 and X are determined by the following formula (1).
  • the input weighting processing layer performs feature weighting processing, specifically assigning larger weights to important features to highlight the impact of the features, and assigning smaller weights to relatively unimportant features to weaken the impact of the features.
  • the weighted processing layer outputs weighted feature information with the first channel number
  • the feature information Enter the second feature extraction layer to reduce the number of feature channels, specifically, Obtain the i-th weighted feature information through 1X1 convolution operation
  • D 3 is added to the input M i-1 to obtain the i-th feature information output by the i-th first feature extraction unit.
  • the above characteristic information D 3 and Mi -1 are determined by the following formula (2).
  • the weighted processing layer described above includes a neuron attention mechanism.
  • the network structure of the neuron attention mechanism is shown in Figure 12, including depth convolution (Depth wise Conv), point convolution (Point wise Conv) and activation function.
  • Depthwise convolution operation the spliced feature X is convolved on different channels, and then through the ReLU activation function, the information of different feature maps is fused through the Pointwise convolution operation. Then, the weight feature map is obtained through the Sigmoid activation function. Multiply the corresponding elements of V and X to obtain the weighted feature information with the first number of channels.
  • the above-mentioned weighted feature information with the first channel number It can be determined by the above formula (3).
  • the above takes the i-th first feature extraction unit among the N first feature extraction units as an example to extract the i-th feature information.
  • first feature extraction units among the N first feature extraction units refer to the above-mentioned i-th feature extraction unit.
  • the process of extracting the i-th feature information by the first feature extraction unit can then obtain the final N-th feature information extracted by the N-th first feature extraction unit.
  • step S603-A2 is performed to determine the first feature information of the reconstructed image block based on the Nth feature information output by the Nth first feature extraction unit.
  • the implementation process of the above S604-A2 includes but is not limited to the following methods:
  • Method 1 determine the Nth feature information as the first feature information of the reconstructed image block.
  • the above-mentioned S604-A2 includes S604-A2-1: obtain the first feature information of the reconstructed image block based on at least one of the first N-1 feature information of the N-th feature information and the N-th feature information.
  • the encoding end performs N iterative feature weighting processing on the reconstructed image block based on the quantization parameters. For example, based on the quantization parameters, performs feature weighting processing on the i-1th feature information of the reconstructed image block to obtain the i-th feature information of the reconstructed image block.
  • feature information i is a positive integer from 1 to N, repeat the execution to obtain the Nth feature information of the reconstructed image block, so that at least one of the first N-1 feature information of the Nth feature information can be obtained, and
  • the Nth feature information is used to obtain the first feature information of the reconstructed image block.
  • at least one of the first N-1 feature information is spliced with the Nth feature information, and then feature extraction is performed to obtain the reconstructed image block.
  • the first feature extraction module includes, in addition to N first feature extraction units, a second feature extraction unit, which is used to extract the first N-1
  • the feature information output by at least one of the first feature extraction units and the Nth feature information output by the Nth first feature extraction unit are re-extracted to obtain deeper feature information.
  • the feature information output by at least one first feature extraction module among the first N-1 first feature extraction modules among the N first feature extraction modules is compared with the Nth first feature extraction module output by the Nth first feature extraction module.
  • the feature information M N is input into the second feature extraction unit.
  • the feature information output by at least one first feature extraction module among the first N-1 first feature extraction modules is spliced with the Nth feature information M N and then input into the second feature extraction unit.
  • the second feature extraction unit performs deeper feature extraction to obtain the first feature information F1 of the reconstructed image block.
  • the above S604-A2-1 also includes: converting the feature information output by at least one of the first N-1 first feature extraction modules, the Nth feature information, reconstructed image blocks, quantization After the parameters are spliced, they are input into the second feature extraction unit to obtain the first feature information of the reconstructed image block.
  • the reconstructed image blocks and quantization parameters are input into the second feature extraction together with the feature information output by at least one first feature extraction module and the Nth feature information.
  • Feature extraction is performed in the unit, so that the feature extraction process is supervised by the reconstructed image blocks and quantized parameters, and the first feature information that is more in line with the requirements is output.
  • the embodiments of the present application first extract the shallow feature information of the reconstructed image block (that is, the second feature information), and then use the second feature information to determine the first feature information of the reconstructed image block. information to determine the first characteristic information of the reconstructed image block.
  • S604-A includes: extracting the second feature information of the reconstructed image block based on the quantization parameter; performing feature weighting processing on the second feature information to obtain the first feature information of the reconstructed image block.
  • shallow feature extraction is performed on the reconstructed image blocks to obtain the second feature information of the reconstructed image blocks.
  • the reconstructed image blocks and the quantified parameters are spliced to obtain splicing information, and shallow features are performed on the spliced information.
  • the first feature information of the reconstructed image block is determined. For example, depth feature extraction is performed based on the second feature information to obtain the first feature information of the reconstructed image block.
  • the embodiments of this application perform feature extraction on the spliced information and obtain the second feature information in a specific manner without limitation.
  • feature extraction is performed on the spliced information through a second feature extraction module to obtain the second feature information.
  • the enhanced model of the embodiment of the present application also includes a second feature extraction module.
  • the second feature extraction module is used to extract shallow feature information of the reconstructed image block.
  • the extracted shallow feature information is input into the first feature extraction module for deep feature extraction.
  • the encoding end first splices the reconstructed image and the quantized parameters to obtain the splicing information, and then inputs the splicing information into the second feature extraction module for shallow feature extraction to obtain the second feature information of the reconstructed image block.
  • the second feature information C2 of the shallow layer is input into the first feature extraction module to obtain the first feature information F1 of the reconstructed image block.
  • the above-mentioned S604-A2-1 includes: splicing at least one of the first N-1 feature information, the N-th feature information and the second feature information to obtain the second spliced feature information;
  • the second splicing feature information is used for feature re-extraction to obtain the first feature information of the reconstructed image block.
  • the N-th feature information M N and the second feature information C2 Input it into the second feature extraction unit to obtain the first feature information F1 of the reconstructed image block.
  • the embodiments of this application do not limit the specific network structure of the second feature extraction module.
  • the above-mentioned second feature extraction module includes at least one convolutional layer.
  • the encoding end passes the two convolutional layers to obtain the second feature information of the reconstructed image block.
  • the above-mentioned second characteristic information C 2 can be determined by the following formula (4).
  • the second feature information is input into the first feature extraction module for deep feature extraction to obtain the first feature information of the reconstructed image block.
  • Feature information determines the enhanced image block of the reconstructed image block.
  • the embodiment of the present application does not limit the specific method of determining the enhanced image block of the reconstructed image block based on the first characteristic information of the reconstructed image block in S603-B.
  • the above method after determining the second characteristic information of the reconstructed image block, perform feature weighting processing on the second characteristic information to obtain the first characteristic information of the reconstructed image block, and determine the reconstructed image block based on the first characteristic information of the reconstructed image block of enhanced image blocks.
  • the embodiment of the present application does not limit the specific method of determining the enhanced image block of the reconstructed image block based on the first characteristic information of the reconstructed image block in S604-B.
  • the first feature information of the reconstructed image block may be determined as the enhanced image block.
  • the above-mentioned S604-B includes: performing non-linear mapping on the first feature information of the reconstructed image block to obtain an enhanced image block.
  • the embodiment of the present application does not limit the specific method of performing nonlinear mapping on the first feature information of the reconstructed image block in S604-B to obtain the enhanced image block.
  • a nonlinear mapping method is used to process the first feature information of the reconstructed image block so that the size of the first feature information of the processed reconstructed image block is consistent with the size of the reconstructed image block, and then the processed reconstructed image block is The first feature information of the block is used as the enhanced image block.
  • the enhanced model of the embodiment of the present application also includes a reconstruction module, which is used to perform further non-linear mapping on the first feature information extracted by the first feature extraction module.
  • the above-mentioned S604-B includes: non-linear mapping of the first feature information of the reconstructed image block through the reconstruction module to obtain the enhanced image block.
  • the embodiments of this application do not limit the network model of the reconstruction module.
  • the reconstruction module includes at least one convolutional layer.
  • the reconstruction module includes two convolutional layers.
  • the encoding end inputs the first feature information F1 of the reconstructed image block output by the first feature extraction module into the reconstruction module, and passes through the convolutions of the two convolutional layers.
  • the product operation is performed to obtain the enhanced image block O1 of the reconstructed image block.
  • the encoding end performs quality enhancement on the reconstructed image block based on the quantization parameters through the above steps to obtain an enhanced image block of the reconstructed image block.
  • the encoding end before performing quality enhancement on the reconstructed image block, the encoding end first needs to determine whether to perform quality enhancement on the reconstructed image block. That is to say, when the encoding end determines that the effect of quality enhancement on the reconstructed image block is greater than the effect of no enhancement, the quality enhancement of the reconstructed image block is used.
  • the encoding end determines whether to enhance the quality of the reconstructed image block, including but not limited to the following:
  • the configuration file includes a first flag, which is used to indicate whether to perform quality enhancement on the reconstructed image block of the current image block.
  • the encoding end can determine whether to perform quality enhancement on the reconstructed image block of the current image block based on the first flag. For example, if the value of the first flag is a first value, such as 1, then the encoding end determines whether to enhance the quality of the reconstructed image block of the current image block. To perform quality enhancement on the reconstructed image block, the method in the above embodiment is performed. If the value of the first flag is a second value, for example, 0, the encoding end determines not to perform quality enhancement on the reconstructed image block of the current image block, but uses the existing loop filtering method to filter the reconstructed image block. .
  • Method 2 The encoding end determines whether to enhance the quality of the reconstructed image block by itself.
  • the encoding end first performs quality enhancement on the reconstructed image block based on the quantization parameters to obtain the test enhanced image block, and then determines the image quality corresponding to the test enhanced image block and the unenhanced reconstructed image block. If the test enhanced image block When the image quality of is greater than the image quality of the reconstructed image block, it means that the enhancement method of the embodiment of the present application can achieve a significant enhancement effect.
  • the above-determined test enhanced image block is determined to be the enhanced image block of the reconstructed image block and is directly output. Display, and/or save the above-determined test enhanced image block in the decoded image cache as an intra-frame reference for subsequent image blocks.
  • the reconstructed image block will be loop filtered and directly output for display, and/ Or the reconstructed image block after loop filtering is saved in the decoded image buffer as an intra-frame reference for subsequent image blocks.
  • the encoding end writes a first flag in the code stream.
  • the first flag is used to indicate whether to perform quality enhancement on the reconstructed image block of the current image block, so that the decoding end determines whether to perform quality enhancement based on the first flag. Enhance the quality of the reconstructed image block of the current image block to ensure consistency at the encoding and decoding ends.
  • the value of the first flag is set to a first value, for example, 1. If the encoding end determines not to enhance the quality of the reconstructed image block of the current image block, When performing quality enhancement, the value of the first flag is set to the second value, for example, set to 0. In this way, the decoder first obtains the first flag by decoding the code stream, and determines whether to enhance the quality of the reconstructed image block of the current image block based on the first flag.
  • the above-mentioned first flag may be a sequence-level flag.
  • the above-mentioned first flag may be a frame-level flag.
  • the above-mentioned first flag may be a slice-level flag.
  • the first flag may be a block-level flag, such as a CTU-level flag.
  • the reconstructed image block is a reconstructed image block that has been processed by loop filtering.
  • the encoding end determines the prediction block of the current image block and the residual block of the current image block, and adds the residual block and the prediction block to obtain the reconstructed image block.
  • the loop filter is then used to filter the reconstructed image blocks, and the filtered reconstructed image blocks are input into the enhancement model for quality enhancement.
  • embodiments of the present application may use an enhancement model to enhance the quality of the reconstructed image blocks, and then perform loop filtering processing.
  • loop filtering is no longer performed.
  • the enhanced image blocks can be displayed and stored in the decoded image cache as a reference for other image blocks.
  • the encoding end can also display the enhanced image blocks and store the unenhanced reconstructed image blocks in the decoded image cache as a reference for other image blocks.
  • the encoding end can also display the reconstructed image blocks and store the enhanced image blocks in the decoded image cache as a reference for other image blocks.
  • the solutions of the embodiments of the present application are tested, for example, implemented in the VVC test software VTM8.2, and the test sequences used are those given in the general test conditions.
  • VVC test software VTM8.2 defined sequence of Class A, Class B, Class C and Class E.
  • Table 1 The results in Table 1 are obtained by encoding in AI mode with QP settings of 32, 37, 42, and 47.
  • BD-rate is a way to measure the performance of the algorithm. It indicates the change in code rate and PSNR of the new encoding algorithm compared to the original algorithm. An overall negative value indicates that the performance has improved. , and the larger the absolute value, the greater the performance improvement.
  • the encoding end determines the quantization parameters of the current image block, and encodes the current image block based on the quantization parameters to obtain the quantization coefficient of the current image block; based on the quantization parameters of the current image block, the quantization coefficient Inverse quantization is performed to obtain the residual block of the current image block. According to the residual block and based on the quantization parameters, the quality of the reconstructed image block is enhanced to obtain an enhanced image block. Since the quantization coefficients corresponding to different image blocks may be different, in order to improve the enhancement accuracy of the image blocks, this application performs quality enhancement on the reconstructed image blocks based on the quantization coefficients, which can improve the enhancement effect. In addition, this application performs image quality enhancement in units of image blocks, so that when using the enhanced image blocks as reference blocks for other image blocks in intra-frame prediction, a more accurate reference can be provided, thereby improving the accuracy of intra-frame prediction. .
  • Figure 21 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • Figure 21 can be understood as a more specific way of the image processing method shown in Figure 20.
  • the image processing method according to the embodiment of the present application includes the following steps:
  • Method 1 Obtain a first flag, and determine whether quality enhancement of the reconstructed image block of the current image block is allowed based on the first flag.
  • the second method is to perform quality enhancement on the reconstructed image block based on the quantified parameters to obtain the test enhanced image block; determine the first image quality of the test enhanced image block and the second image quality of the reconstructed image block, and determine the first image quality and the second image quality of the reconstructed image block based on the quantified parameters. Quality, determines whether to perform quality enhancement on the reconstructed image block of the current image block.
  • an enhancement model is used to perform quality enhancement on the reconstructed image block to obtain an enhanced image block.
  • the enhanced model of the embodiment of the present application includes a second feature extraction module, a first feature extraction module and a reconstruction module, where the second feature extraction module is used to extract shallow features (ie, reconstruct image blocks second feature information), the first feature extraction module is used to extract deep features (that is, the first feature information of the reconstructed image block), and the reconstruction module is used to perform nonlinear mapping of the deep features to obtain the final enhanced image block.
  • the second feature extraction module is used to extract shallow features (ie, reconstruct image blocks second feature information)
  • the first feature extraction module is used to extract deep features (that is, the first feature information of the reconstructed image block)
  • the reconstruction module is used to perform nonlinear mapping of the deep features to obtain the final enhanced image block.
  • the second feature extraction module shown in Figure 18 includes two convolutional layers.
  • the structure of the second feature extraction module provided by this embodiment of the application includes but is not limited to that shown in Figure 18 .
  • Figure 18 shows that the first feature extraction module includes N first feature extraction layers and one second feature extraction layer, and the N first feature extraction layers are connected in series, and the N first feature extraction layers The output and the output of the second feature extraction module are input into the second feature extraction layer.
  • the network structure of the first feature extraction module provided by the embodiment of this application includes but is not limited to that shown in Figure 18.
  • Figure 18 outputs that the reconstruction module includes two convolutional layers, but the network structure of the reconstruction module provided by this embodiment of the application includes, but is not limited to, what is shown in Figure 18 .
  • the first feature extraction layer in the embodiment of the present application includes a multi-scale extraction layer and a neuron attention mechanism, wherein the multi-scale extraction layer is used for multi-scale feature extraction, and the neuron attention mechanism is used for Feature weighting.
  • the first feature extraction layer in the embodiment of this application can also be called a multi-scale neuron attention (Multi-scale and Neuron Attention, MSNA for short) unit.
  • the enhanced model of the embodiment of the present application is also called a neural network model based on neuron attention mechanism (Neuron Attention-based CNN, referred to as NACNN).
  • NACNN Neuron Attention-based CNN
  • the encoding end normalizes the reconstructed image blocks and quantization parameters, then splices them, and inputs the splicing results into the enhancement model.
  • the first convolution layer in the second feature extraction module reconstructs The image block and the quantization parameter are subjected to a convolution operation to obtain feature information C1, and then the feature information C is input into the second convolution layer for a convolution operation to obtain the second feature information C2.
  • the encoding end inputs the second feature information C2 into the first feature extraction module.
  • the first first feature extraction unit in the first feature extraction module performs multi-scale feature extraction and feature weighting on C2 to obtain the first feature information M1.
  • the first feature information M1 is input into the second first feature extraction unit for multi-scale feature extraction and feature weighting to obtain the second feature information M2.
  • the Nth feature information MN output by the Nth first feature extraction unit is obtained.
  • the encoding end after splicing the N feature information M1, M2...MN extracted by each layer in the first feature extraction module and the second feature information C2 output by the second feature module, input it into the first feature extraction module.
  • the number of channels is changed in the second feature extraction layer, and the first feature information F1 of the reconstructed image block is output.
  • the encoding end inputs the first feature information F1 into the reconstruction module for reconstruction.
  • the first convolution layer in the reconstruction module performs a convolution operation to obtain the feature information C3, and the second convolution in the reconstruction module The layer performs a convolution operation on feature information C3 to obtain feature information C4.
  • the enhanced image block O1 of the reconstructed image block is obtained.
  • the feature information C4 is output as the enhanced image block O1 of the reconstructed image block.
  • S708 Perform loop filtering on the reconstructed image block.
  • step S708 is performed to perform loop filtering on the reconstructed image block.
  • the encoding end before the encoding end performs quality enhancement on the reconstructed image block, it first determines whether to perform quality enhancement on the reconstructed image block, thereby improving the reliability of image processing.
  • the encoding end before using the enhancement model to enhance the quality of the reconstructed image block, the encoding end first determines whether to use the enhancement model to enhance the quality of the reconstructed image block, thereby improving the reliability of image processing.
  • FIG. 4 to FIG. 21 are only examples of the present application and should not be understood as limitations of the present application.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in this application.
  • the implementation of the examples does not constitute any limitations.
  • the term "and/or" is only an association relationship describing associated objects, indicating that three relationships can exist. Specifically, A and/or B can represent three situations: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character "/" in this article generally indicates that the related objects are an "or" relationship.
  • Figure 22 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • the image processing device 10 includes:
  • Decoding unit 11 is used to decode the code stream and obtain the quantization coefficient of the current image block
  • the determination unit 12 is configured to determine the quantization parameter corresponding to the current image block, and perform inverse quantization on the quantization coefficient based on the quantization parameter to obtain the transformation coefficient of the current image block;
  • Reconstruction unit 13 configured to determine a reconstructed image block of the current image block according to the transformation coefficient
  • the enhancement unit 14 is configured to input the reconstructed image block and the quantization parameter into an enhancement model to perform image enhancement to obtain an enhanced image block.
  • the enhancement unit 14 is specifically configured to perform feature weighting processing on the reconstructed image block based on the quantization parameter to obtain the first feature information of the reconstructed image block; according to the first feature information, Determine the enhanced image block.
  • the enhancement unit 14 is specifically configured to perform feature weighting processing on the i-1th feature information of the reconstructed image block based on the quantization parameter to obtain the i-th feature information of the reconstructed image block.
  • the i is a positive integer from 1 to N. Repeat the execution to obtain the Nth feature information of the reconstructed image block. If i is 1, then the i-1th feature information is the reconstructed image. block; determine the first characteristic information of the reconstructed image block according to the Nth characteristic information.
  • the enhancement unit 14 is specifically configured to extract feature information of M different scales of the i-1th feature information, where M is a positive integer greater than 1;
  • the feature information is weighted to obtain the i-th weighted feature information; the i-th feature information is determined based on the i-th weighted feature information.
  • the enhancement unit 14 is specifically configured to extract feature information of M different scales of the i-1th feature information through M first feature extraction layers of different scales.
  • the first feature extraction layer includes a convolutional layer, and the convolutional layers included in different first feature extraction layers have different convolution kernel sizes.
  • At least one first feature extraction layer includes an activation function.
  • the enhancement unit 14 is specifically configured to splice the M feature information of different scales to obtain the first spliced feature information; perform weighting processing on the first spliced feature information to obtain the i-th spliced feature information. weighted feature information.
  • the enhancement unit 14 is specifically configured to weight the first splicing feature information through a weighted processing layer to obtain weighted feature information with a first number of channels; according to the weighted feature information with a first number of channels feature information to obtain the i-th weighted feature information.
  • the number of channels of the i-th weighted feature information is the same as the number of channels of the i-1 th feature information.
  • the weighted processing layer includes a neuron attention mechanism.
  • the enhancement unit 14 is specifically configured to determine the sum of the i-th weighted feature information and the i-1 th feature information as the i-th feature information.
  • the enhancement unit 14 is specifically configured to determine the i-th weighted feature information as the i-th feature information.
  • the enhancement unit 14 is specifically configured to extract second feature information of the reconstructed image block based on the quantization parameter; perform feature weighting processing on the second feature information to obtain the reconstructed image block. First characteristic information.
  • the enhancement unit 14 is specifically configured to splice the reconstructed image block and the quantization parameter to obtain splicing information; perform feature extraction on the splicing information to obtain the second feature information.
  • the enhancement unit 14 is specifically configured to perform feature extraction on the spliced information through a second feature extraction module to obtain the second feature information.
  • the second feature extraction module includes at least one convolutional layer.
  • the enhancement unit 14 is specifically configured to obtain the reconstructed image block according to at least one of the first N-1 feature information of the N-th feature information and the N-th feature information. First characteristic information.
  • the enhancement unit 14 is specifically configured to splice at least one of the first N-1 feature information, the Nth feature information, and the second feature information to obtain the second spliced feature information. ; Perform feature re-extraction on the second splicing feature information to obtain the first feature information of the reconstructed image block.
  • the enhancement unit 14 is specifically configured to non-linearly map the first feature information of the reconstructed image block to obtain the enhanced image block.
  • the enhancement unit 14 is specifically configured to non-linearly map the first feature information of the reconstructed image block through a reconstruction module to obtain the enhanced image block.
  • the reconstruction module includes at least one convolutional layer.
  • the decoding unit 11 is also used to decode the code stream to obtain a first flag, which is used to indicate whether to perform quality enhancement on the reconstructed image block of the current image block;
  • the enhancement unit 14 is also It is configured to perform quality enhancement on the reconstructed image block based on the quantization parameter when it is determined that quality enhancement of the reconstructed image block is allowed according to the first flag, so as to obtain the enhanced image block.
  • the enhancement unit 14 is further configured to perform quality enhancement on the reconstructed image block based on the quantization parameter to obtain a test enhanced image block; determine the first image quality of the test enhanced image block and the Reconstruct the second image quality of the image block; if the first image quality is greater than the second image quality, determine the test enhanced image block as the enhanced image block of the reconstructed image block.
  • the reconstructed image block is a reconstructed image block of the current image block under the first component.
  • the first component is a brightness component or a chrominance component.
  • the enhancement unit 14 is specifically configured to normalize the reconstructed image block and the quantization parameter; based on the normalized reconstructed image block and the quantization parameter, obtain the The enhanced image block.
  • the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
  • the device 10 shown in FIG. 22 can execute the image processing method on the decoding end of the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the device 10 are respectively intended to implement the image processing method on the decoding end and other methods. The corresponding process in , for the sake of brevity, will not be repeated here.
  • Figure 23 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • the image processing device 20 includes:
  • the determination unit 21 is used to determine the quantization parameter of the current image block, and encode the current image block based on the quantization parameter to obtain the quantization coefficient of the current image block;
  • the inverse quantization unit 22 is configured to perform inverse quantization on the quantization coefficient based on the quantization parameter of the current image block to obtain the residual block of the current image block;
  • Reconstruction unit 23 configured to obtain a reconstructed image block of the current image block according to the residual block
  • the enhancement unit 24 is configured to perform quality enhancement on the reconstructed image block based on the quantization parameter to obtain an enhanced image block.
  • the enhancement unit 24 is specifically configured to perform feature weighting processing on the reconstructed image block based on the quantization parameter to obtain the first feature information of the reconstructed image block; according to the first feature information, Determine the enhanced image block.
  • the enhancement unit 24 is specifically configured to perform feature weighting processing on the i-1th feature information of the reconstructed image block based on the quantization parameter to obtain the i-th feature information of the reconstructed image block.
  • the i is a positive integer from 1 to N, repeat the process to obtain the Nth feature information of the reconstructed image block, if i is 1, then the i-1th feature information is the reconstructed image block; determine the first characteristic information of the reconstructed image block according to the Nth characteristic information.
  • the enhancement unit 24 is specifically configured to extract feature information of M different scales of the i-1th feature information, where M is a positive integer greater than 1;
  • the feature information is weighted to obtain the i-th weighted feature information; the i-th feature information is determined based on the i-th weighted feature information.
  • the enhancement unit 24 is specifically configured to extract feature information of M different scales of the i-1th feature information through M first feature extraction layers of different scales.
  • the first feature extraction layer includes a convolutional layer, and the convolutional layers included in different first feature extraction layers have different convolution kernel sizes.
  • At least one first feature extraction layer includes an activation function.
  • the enhancement unit 24 is specifically configured to splice the M feature information of different scales to obtain the first spliced feature information; perform weighting processing on the first spliced feature information to obtain the i-th spliced feature information. weighted feature information.
  • the enhancement unit 24 is specifically configured to weight the first splicing feature information through a weighted processing layer to obtain weighted feature information with a first number of channels; according to the weighted feature information with a first number of channels feature information to obtain the i-th weighted feature information.
  • the number of channels of the i-th weighted feature information is the same as the number of channels of the i-1 th feature information.
  • the weighted processing layer includes a neuron attention mechanism.
  • the enhancement unit 24 is specifically configured to determine the sum of the i-th weighted feature information and the i-1 th feature information as the i-th feature information.
  • the enhancement unit 24 is specifically configured to determine the i-th weighted feature information as the i-th feature information.
  • the enhancement unit 24 is specifically configured to extract second feature information of the reconstructed image block based on the quantization parameter; perform feature weighting processing on the second feature information to obtain the reconstructed image block. First characteristic information.
  • the enhancement unit 24 is specifically configured to splice the reconstructed image block and the quantization parameter to obtain splicing information; perform feature extraction on the splicing information to obtain the second feature information.
  • the enhancement unit 24 is specifically configured to perform feature extraction on the spliced information through a second feature extraction module to obtain the second feature information.
  • the second feature extraction module includes at least one convolutional layer.
  • the enhancement unit 24 is specifically configured to obtain the reconstructed image block according to at least one of the first N-1 feature information of the N-th feature information and the N-th feature information. First characteristic information.
  • the enhancement unit 24 is specifically configured to splice at least one of the first N-1 feature information, the Nth feature information, and the second feature information to obtain the second spliced feature information. ; Perform feature re-extraction on the second splicing feature information to obtain the first feature information of the reconstructed image block.
  • the enhancement unit 24 is specifically configured to non-linearly map the first feature information of the reconstructed image block to obtain the enhanced image block.
  • the enhancement unit 24 is specifically configured to non-linearly map the first feature information of the reconstructed image block through a reconstruction module to obtain the enhanced image block.
  • the reconstruction module includes at least one convolutional layer.
  • the reconstruction module includes at least one convolutional layer.
  • the encoding unit 22 is also configured to write a first flag in the code stream, where the first flag is used to indicate whether to perform quality enhancement on the reconstructed image block of the current image block.
  • the reconstructed image block is a reconstructed image block of the current image block under the first component.
  • the first component is a brightness component or a chrominance component.
  • the enhancement unit 24 is specifically configured to normalize the reconstructed image block and the quantization parameter; based on the normalized reconstructed image block and the quantization parameter, obtain the The enhanced image block.
  • the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
  • the device 20 shown in FIG. 23 may correspond to the corresponding subject in performing the image processing method of encoding points in the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the device 20 are respectively to realize the encoding end.
  • the corresponding processes in various methods such as image processing methods will not be described here for the sake of simplicity.
  • the software unit may be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
  • Figure 24 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 30 may be the video encoder or video decoder described in the embodiment of the present application.
  • the electronic device 30 may include:
  • Memory 33 and processor 32 the memory 33 is used to store the computer program 34 and transmit the program code 34 to the processor 32.
  • the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
  • the processor 32 may be configured to perform the steps in the above method 200 according to instructions in the computer program 34 .
  • the processor 32 may include but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 33 includes but is not limited to:
  • Non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Random Access Memory
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the tasks provided by this application.
  • the one or more units may be a series of computer program instruction segments capable of completing specific functions. The instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
  • the electronic device 30 may also include:
  • Transceiver 33 the transceiver 33 can be connected to the processor 32 or the memory 33 .
  • the processor 32 can control the transceiver 33 to communicate with other devices. Specifically, it can send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 33 may include a transmitter and a receiver.
  • the transceiver 33 may further include an antenna, and the number of antennas may be one or more.
  • bus system where in addition to the data bus, the bus system also includes a power bus, a control bus and a status signal bus.
  • Figure 25 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application.
  • the video encoding and decoding system 40 may include: a video encoder 41 and a video decoder 42, where the video encoder 41 is used to perform the video encoding method involved in the embodiment of the present application, and the video decoder 42 is used to perform
  • the embodiment of the present application relates to a video decoding method.
  • This application also provides a computer storage medium on which a computer program is stored.
  • the computer program When the computer program is executed by a computer, the computer can perform the method of the above method embodiment.
  • embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to perform the method of the above method embodiments.
  • This application also provides a code stream, which is generated by the above encoding method.
  • the code stream includes a first flag.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes), optical media (such as digital video discs (DVD)), or semiconductor media (such as solid state disks (SSD)), etc.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separate.
  • a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present application can be integrated into a processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供一种图像处理方法、装置、设备、系统、及存储介质,基于量化参数,对重建图像块进行质量增强,得到增强图像块。由于不同的图像块对应的量化系数可能不同,为了提高图像块的增强准确性,本申请基于量化系数对重建图像块进行质量增强,可以提高增强效果。另外,本申请以图像块为单位进行图像质量增强,这样使用增强后的图像块作为帧内预测中其他图像块的参考块时,可以提供更精准的参考,进而可以提高帧内预测的准确性。

Description

图像处理方法、装置、设备、系统、及存储介质 技术领域
本申请涉及视频编解码技术领域,尤其涉及一种图像处理方法、装置、设备、系统、及存储介质。
背景技术
数字视频技术可以并入多种视频装置中,例如数字电视、智能手机、计算机、电子阅读器或视频播放器等。随着视频技术的发展,视频数据所包括的数据量较大,为了便于视频数据的传输,视频装置执行视频压缩技术,以使视频数据更加有效的传输或存储。
视频压缩会使视频失真,为了降低视频失真,需要对重建图像进行处理,但是,目前的图像处理方法,其处理效果不理想。
发明内容
本申请实施例提供了一种图像处理方法、装置、设备、系统、及存储介质,通过将重建图像块和量化参数,输入增强模型进行图像增强,得到增强图像块,提高重建图像块的质量增强效果。
第一方面,本申请实施例提供一种图像处理方法,包括:
解码码流,得到当前图像块的量化系数;
确定当前图像块对应的量化参数,并基于所述量化参数,对所述量化系数进行反量化,得到所述当前图像块的变换系数;
根据所述变换系数,确定所述当前图像块的重建图像块;
基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块。
第二方面,本申请实施例提供一种图像处理方法,包括:
确定当前图像块的量化参数,并基于所述量化参数对所述当前图像块进行编码,得到所述当前图像块的量化系数;
基于所述当前图像块的量化参数,对所述量化系数进行反量化,得到所述当前图像块的残差块;
根据所述残差块,得到所述当前图像块的重建图像块;
基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块。
第三方面,本申请提供了一种图像处理装置,用于执行上述第一方面或其各实现方式中的方法。具体地,该编码器包括用于执行上述第一方面或其各实现方式中的方法的功能单元。
第四方面,本申请提供了一种图像处理装置,用于执行上述第二方面或其各实现方式中的方法。具体地,该解码器包括用于执行上述第二方面或其各实现方式中的方法的功能单元。
第五方面,提供了一种视频编码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第一方面或其各实现方式中的方法。
第六方面,提供了一种视频解码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第二方面或其各实现方式中的方法。
第七方面,提供了一种视频编解码系统,包括视频编码器和视频解码器。视频编码器用于执行上述第一方面或其各实现方式中的方法,视频解码器用于执行上述第二方面或其各实现方式中的方法。
第八方面,提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十一方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十二方面,提供了一种码流,该码流是通过上述第一方面中的任一方面或其各实现方式生成的。
基于以上技术方案,通过基于量化参数,对重建图像块进行图像增强,得到增强图像块。由于不同的图像块对应的量化系数可能不同,为了提高图像块的增强准确性,本申请基于量化系数对当前图像块的重建图像块进行质量增强,可以提高增强效果。另外,本申请以图像块为单位进行图像质量增强,这样使用增强后的图像块作为帧内预测中其他图像块的参考块时,可以提供更精准的参考,进而可以提高帧内预测的准确性。
附图说明
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;
图2是本申请实施例涉及的视频编码器的示意性框图;
图3是本申请实施例涉及的视频解码器的示意性框图;
图4为本申请一实施例提供的图像处理方法流程示意图;
图5为本申请涉及的图像处理示意图;
图6为本申请涉及的增强模型的一种示意图;
图7为本申请涉及的增强模型的一种示意图;
图8为本申请涉及的第一特征提取层的一种结构示意图;
图9为本申请涉及的第一特征提取层的另一种结构示意图;
图10为本申请涉及的第一特征提取层的另一种结构示意图;
图11为本申请涉及的第一特征提取层的另一种结构示意图;
图12为本申请涉及的加权处理层的一种结构示意图;
图13为本申请涉及的增强模型的一种结构示意图;
图14为本申请涉及的增强模型的另一种结构示意图;
图15为本申请涉及的增强模型的另一种结构示意图;
图16为本申请涉及的增强模型的另一种结构示意图;
图17为本申请涉及的增强模型的另一种结构示意图;
图18为本申请涉及的增强模型的另一种结构示意图;
图19为本申请一实施例提供的图像处理方法流程示意图;
图20为本申请一实施例提供的图像处理方法流程示意图;
图21为本申请一实施例提供的图像处理方法流程示意图;
图22是本申请一实施例提供的图像处理装置的示意性框图;
图23是本申请一实施例提供的图像处理装置的示意性框图;
图24是本申请实施例提供的电子设备的示意性框图;
图25是本申请实施例提供的视频编解码系统的示意性框图。
具体实施方式
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。
为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于 存储介质或存储服务器上,以供解码设备120后续读取。
在一些实施例中,解码设备120包含输入接口121和视频解码器122。
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。
下面对本申请实施例涉及的视频编码框架进行介绍。
图2是本申请实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。
在一些实施例中,如图2所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
帧间预测单元211可用于帧间预测,帧间预测可以包括运动估计(motion estimation)和运动补偿(motion compensation),可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要在参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。
帧内估计单元212只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。
残差单元220可基于CU的像素块及PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量 化程度。
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。
环路滤波单元260用于对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考,例如可执行消块滤波操作以减少与CU相关联的像素块的块效应。
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。
图3是本申请实施例涉及的视频解码器的示意性框图。
如图3所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。
在一些实施例中,预测单元320包括帧间预测单元321和帧内估计单元322。
帧内估计单元322可执行帧内预测以产生PU的预测块。帧内估计单元322可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元322还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。
帧间预测单元321可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元321可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元321可根据PU的一个或多个参考块来产生PU的预测块。
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。
视频编解码的基本流程如下:在编码端,将一帧图像划分成CU,针对当前块(即当前CU),预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。
当前块(current block)可以是当前编码单元(CU)或当前预测单元(PU)等。
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。
下面结合具体的实施例,对本申请实施例提供的图像编解码方法进行介绍。
首先结合图4,以解码端为例,对本申请实施例提供的视频解码方法进行介绍。
图4为本申请一实施例提供的视频解码方法流程示意图,本申请实施例应用于图1和图3所示视频解码器。
如图4所示,本申请实施例的方法包括:
S401、解码码流,得到当前图像块的量化系数。
本申请实施例对当前图像块的具体大小不做限制。
在一些实施例中,本申请实施例的当前图像块为CTU,例如,将一帧图像划分成若干个CTU,且本申请对CTU的大小不做限制,例如一个CTU大小为128×128、64×64、32×32等。
在一些实施例中,本申请实施例的当前图像块为CU,例如,将一个CTU划分为一个或多个CU。
在一些实施例中,本申请实施例的当前图像块为TU或PU,例如,将一个CU划分为一个或多个TU或PU。
在一些实施例中,本申请实施例的当前图像块只包括色度分量,可以理解为色度块。
在一些实施例中,本申请实施例的当前图像块只包括亮度分量,可以理解为亮度块。
在一些实施例中,该当前图像块即包括亮度分量又包括色度分量。
需要说明的是,若当前图像块包括多个CU时,则当前图像块的量化系数包括这多个CU对应的量化系数。
如图3所示,在解码端,熵解码单元310可解码码流得到当前图像块的预测信息、量化信息等,预测单元320基于预测信息对当前图像块使用帧内预测或帧间预测产生当前图像块的预测值。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到当前图像块的重建图像块。
视频编码的过程中会造成视频失真,为了减少失真,本申请实施例对重建图像块进行后处理,即以图像块为增强单元进行图像质量增强,以提高重建图像块的质量。
需要说明的是,若当前图像块包括多个CU时,则可以对每个CU分别进行解码,得到每个CU的重建块,每个CU的重建块进行组合,得到当前图像块的重建图像块。
S402、确定当前图像块对应的量化参数,并基于量化参数,对量化系数进行反量化,得到当前图像块的变换系数。
需要说明的是,若当前图像块包括多个CU时,则当前图像块的量化参数包括这多个CU对应的量化参数。可选的,这多个CU对应的量化参数可以相同,也可以不同。
在一种可能的实现方式中,本申请实施例中当前图像块的量化参数可以以矩阵的形式表示,例如当前图像块的大小为16X16,则当前图像块的量化参数为一个16X16的矩阵,该矩阵中的每一个元素为当前图像块中对应位置的像素点的量化参数。
本申请实施例对确定当前图像块对应的量化参数的具体过程不做限制。
在一些实施例中,编解码将默认的量化参数作为当前图像块对应的量化参数,此时,解码端直接可以将默认的量化系数,确定为当前图像块对应的量化参数。
在一些实施例中,编码端将编码过程中确定的当前图像块对应的量化参数写入码流,这样,解码端可以通过解码码流,确定出当前图像块对应的量化参数。
在一些实施例中,解码端可以采用与编码端相同的计算方法,通过计算确定出当前图像块对应的量化参数。
解码端确定出当前图像块对应的量化参数时,使用量化参数对当前图像块的量化系数进行反量化,得到当前图像块的变换系数。
例如,若当前图像块包括多个CU时,针对每个CU,使用该CU对应的量化参数对该CU的量化系数进行反量化,得到该CU的变换系数。
S403、根据变换系数,确定当前图像块的重建图像块。
具体的,对当前图像块的变换系数进行反变换,得到当前图像块的残差块。同时,使用帧内或帧间等预测方式,得到当前图像块的预测块,将当前图像块的预测值与残差块进行相加,得到当前图像块的重建图像块。
需要说明的是,若当前图像块包括多个CU时,则当前图像块对应的量化参数为这多个CU的量化参数,在进行解码时,对当前图像块中的每个CU单独进行解码,得到每个CU的重建块,将每个CU的重建块进行组合,得到当前图像块的重建图像块。
S404、基于量化参数,对重建图像块进行质量增强,得到增强图像块。
在视频编码过程中,不同的图像块所对应的量化参数QP可能不同。在一些实施例中,量化参数QP包括量化步长,在视频编码过程中,对图像块的变换系数进行量化,其中量化步长越大,图像损失越大,量化步长越小,图像损失越小。因此,本申请实施例为了提高对当前图像块的增强效果,在当前图像块的质量增强过程中,考虑该当前图像块对应的量化参数QP,对质量增强的影响,进而提高当前图像块的质量增强效果。
由于不同的量化系数对图像块进行反量化时的损失也不同,为了提高图像块的增强效果,本申请实施例基于当前图像块对应的量化系数,对当前图像块的重建图像块进行质量增强,可以提高重建图像块的增强效果。
本申请实施例,以图像块为单位进行图像质量增强,这样使用增强后的图像块作为帧内预测中其他图像块的参考块时,可以提供更精准的参考,进而可以提高帧内预测的准确性。
另外,本申请实施例以图像块为单位进行图像质量增强,相比于对整帧图像进行图像质量增强,可以更加注重图像块中较精细特征的增强,进一步提升图像块的增强质量。
本申请实施例,对基于量化参数,对重建图像块进行质量增强,得到增强图像块的方式不做限制。
在一种可能的实现方式中,针对不同的量化参数,事先训练成不同的增强模型,不同量化参数下的增强模型用于对该量化参数下的图像块进行质量增强。这样,解码端可以根据当前图像块的量化参数,从多个不同量化参数对应的增强模型中,选取当前图像块的量化参数对应的目标增强模型,使用该目标增强模型对当前图像块的重建图像块进行质量增强。
在另一种可能的实现方式中,解码端获取一通用增强模型,该增强模型是基于不同的图像块和其对应的量化参数训练得到的,充分学习了不同的量化参数对图像块质量增强的影响,进而可以基于不同量化参数,对经过不同量化参数反量化后得到重建图像块进行高质量增强。这样,解码端可以基于量化参数,通过该通用增强模型对重建图像块进行质量增强,得到增强图像块。具体的,解码端根据上述步骤,确定出当前图像块的重建图像块和量化系数后,为了降低重建图像块的失真,提升重建图像块的质量,如图5所示,将该重建图像块和对应的量化系数输入预先训练好的 增强模型中,进行图像增强,最终得到该当前图像块的增强图像块。
下面实施例均以通用增强模型为例,对重建图像块的质量增强进行介绍。
在一些实施例中,解码端将重建图像块和量化参数进行融合后,输入增强模型。其中,重建图像块和量化参数的融合方法至少包括如下几种示例:
示例1,假设上述重建图像块的大小为N1*N2,其中N1和N2可以相同也可以不同。将重建图像块与量化参数相乘后,输入增强模型。具体是,重建图像块的每一个像素点上乘以量化参数,得到N1*N2的矩阵,将该矩阵输入增强模型。
示例2,将重建图像与量化参数拼接后,输入增强模型。具体是,将量化系数设置为大小为N1*N2的矩阵,将大小为N1*N2的重建图像与大小为N1*N2的量化参数进行拼接后,输入增强模型。
需要说明的是,除了上述示例1和示例2所示的融合方法外,解码端还可以采用其他的融合方法,将重建图像和对应的量化参数进行融合后,输入增强模型进行质量增强。
在一些实施例中,为了防止绝对值较小的特征被绝对值较大的特征覆盖,则解码端在将上述重建图像块和量化参数输入增强模型之前,先对重建图像块和量化参数进行归一化处理,以使所有特征得到平等处理。接着,基于归一化处理后的重建图像块和量化参数,得到重建图像块的增强图像块。例如,将归一化处理后的重建图像块和量化参数拼接后,输入增强模型进行质量增强,以提高质量增强的效果。
本申请实施例的重建图像块为当前图像块在第一分量下的重建图像块。
其中,第一分量可以为亮度分量或色度分量。
在一些实施例中,上述S404包括如下S404-A和S404-B的步骤:
S404-A、基于量化参数,对重建图像块进行特征加权处理,得到重建图像的第一特征信息;
S404-B、根据第一特征信息,确定增强图像块。
本申请实施例对S404-A的具体实现方式不做限制。
在一种可能的实现方式中,基于量化参数,提取重建图像块的特征信息,例如,将量化参数和重建图像块输入一个神经网络层中,提取出重建图像块的特征信息,接着,对这些特征信息分析,为不同的特征分配不同的权重,例如为特征信息中的重要特征分配较大的权重以突出该特征的影响,为相对不重要的特征分配较小的权重以削弱该特征的影响,接着根据特征对应的权重对重建图像块的特征信息进行加权处理,得到该重建图像块的第一特征信息。
在另一种可能的实现方式中,如图6所示,本申请实施例的增强模型包括第一特征提取模块,该第一特征提取模块用于提取输入信息(即重建图像块和量化参数)的至少一个特征,并对提取的至少一个特征赋予不同的权重,以进行特征加权处理。
如图6所示,本申请实施例的增强模型包括第一特征提取模块,该第一特征提取模块用于对提取的特征进行加权处理。具体是,将重建图像块和量化参数(可选的,通过上述示例1或示例2的方法进行融合后)输入第一特征提取模块中,该第一特征提取模块进行特征提取,提取出至少一个特征,并为这至少一个特征中的重要特征分配较大的权重以突出该特征的影响,为相对不重要的特征分配较小的权重以削弱该特征的影响,接着根据至少一个特征对应的权重进行加权处理,得到该重建图像块的第一特征信息。最后,根据该重建图像块的第一特征信息,确定重建图像块的增强图像块。
本申请实施例可以为不同的特征分配不同的权重,以凸显出重要特征的影响,削弱不重要特征的影响,进一步提高了重建图像块的质量增强效果。
本申请实施例对第一特征提取模块的网络模型不做限制,例如该第一特征提取模块包括多个卷积层和注意力机制等。
在另一种可能的实现方式中,上述S404-A包括如下步骤:
S404-A1、基于量化参数,对重建图像块的第i-1个特征信息进行特征加权处理,得到重建图像块的第i个特征信息,i为从1到N的正整数,重复执行,得到重建图像块的第N个特征信息,若i为1时,则第i-1个特征信息为重建图像块;
S404-A2、根据第N个特征信息,确定重建图像块的第一特征信息。
在该实现方式中,解码端基于量化参数,对重建图像块进行N个特征加权迭代处理,得到重建图像块的第N个特征信息。具体是,基于量化参数,对重建图像块进行特征加权处理,得到重建图像块的第1个特征信息,接着,基于量化参数,对第1个特征信息进行特征加权处理,得到重建图像块的第2个特征信息,迭代进行,基于量化参数,对第N-1个特征信息进行特征加权处理,得到重建图像块的第N个特征信息。需要说明的是,本申请实施例对特征加权处理的具体方式不做限制,示例性的,将量化参数和第i-1个特征信息输入具有特征加权处理功能的神经网络中,得到重建图像块的第i个特征信息。
在一些实施例中,如图7所示,本申请实施例的第一特征提取模块包括N个第一特征提取单元,若N大于1时,这N个第一特征提取单元串联连接,即前一个第一特征提取单元的输出为下一个第一特征提取单元的输入。基于量化参数,通过这N个第一特征提取单元对重建图像块进行特征加权处理,得到重建图像块的第N个特征信息。如图7所示,这N个第一特征提取单元中,每个第一特征提取单元用于提取至少一个特征,并为这至少一个特征中的重要特征分配较大的权重以突出该特征的影响,为相对不重要的特征分配较小的权重以削弱该特征的影响。其中,后一个第一特征提取单元对前一个第一特征提取单元输出的特征信息再进行特征提取和权重分配,以进一步凸显重要特征,经过N个第一特征提取单元的处理,可以对重建图像块的重要特征进行主要增强,对重建图像块中的非重要特征进行较弱的增强,进而提升了重建图像块的增强效果。
在一种示例中,若N为1,即第一特征提取模块包括一个第一特征提取单元时,则上述S404-A1包括,解码端将重建图像和量化系数进行融合后,输入该第一特征提取单元中,该第一特征提取单元进行特征提取,提取出至少一个特征,并根据至一个特征中特征的重要程度分配不同的权重,接着,根据不同的权重对至少一个特征进行加权处理, 得到第一个特征信息。最后,根据第一个特征信息,确定重建图像块的第一特征信息,例如,将该第一个特征信息,确定为重建图像块的第一特征信息。
在另一种示例中,若N大于1,即第一特征提取模块包括多个第一特征提取单元时,则上述S404-A1包括,解码端将重建图像和量化系数进行融合后,输入第一个第一特征提取单元中,该第一个第一特征提取单元进行特征加权处理,即提取至少一个特征,并为至少一个特征中的每个特征确定权重,再根据权重对至少一个特征进行加权,得到该第一个第一特征提取单元输出的特征信息,为了便于描述,将该特征信息记为第一个特征信息M 1。接着,将该第一个特征信息M 1输入第二个第一特征提取单元中进行特征加权处理,得到第二个特征信息M 2,依次类推,对于N个第一特征提取单元中的第i个第一特征提取单元,将第i-1个第一特征提取单元输出的第i-1个特征信息M i-1,输入第i个第一特征提取单元中进行特征加权处理,得到第i个特征信息M i,最后,得到第N个第一特征提取单元输出的第N个特征信息M N。根据第N个特征信息M N,确定重建图像块的第一特征信息,例如,将该第N个特征信息,确定为重建图像块的第一特征信息。
本申请实施例对第一特征提取单元的具体网络结构不做限制,例如,第一特征提取单元包括至少一个卷积层和注意力机制。
在一些实施例中,上述S404-A1中基于量化参数,对重建图像块的第i-1个特征信息进行特征加权处理,得到重建图像块的第i个特征信息包括如下步骤:
S404-A11、提取第i-1个特征信息的M个不同尺度的特征信息,M为大于1的正整数;
S404-A12、对M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息;
S404-A13、根据第i个加权特征信息,确定第i个特征信息。
具体的,解码端对第i-1个特征信息进行多尺度特征提取,得到第i-1个特征信息的M个不同尺度的特征信息,接着,对M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息,例如,根据特征的要重程度,为重要特征分配较大的权重,为不重要的特征分配较小的权重,再根据各特征的权重,对M个不同尺度的特征信息进行加权,得到重建图像块的第i个加权特征信息。最后,根据第i个加权特征信息,确定第i个特征信息,例如,将第i个加权特征信息,确定为第i个特征信息。
在一种示例中,若解码端通过第一特征提取单元提取重建图像块的第i个特征信息,则如图8所示,第一特征提取单元包括多尺度提取层,其中多尺度提取层用于提取多个尺度下的特征。
示例性的,如图8所示,以N个第一特征提取单元中的第i个第一特征提取单元为例,该第i个第一特征提取单元的输入为第i-1个第一特征提取单元的输出。将第i-1个第一特征提取单元输出的第i-1个特征信息M i-1输入第i个第一特征提取单元中的多尺度提取层中,该多尺度提取层用于提取多尺度特征,例如提取M个不同尺度下的特征信息。接着,为多尺度提取层提取的M个不同尺度下的特征信息(D 1、D 2…D M)中不同的特征分配不同的权重,并进行加权运算,得到第i个加权特征信息G 1。最后,根据该第i个加权特征信息G 1,确定第i个特征信息M i,例如将该第i个加权特征信息G 1,作为第i个第一特征提取单元输出的第i个特征信息M i
由上述可知,本申请实施例的第一特征提取单元进行多尺度的特征提取,以更好的探索输入的重建图像块与真实图像块之间的关系,以进一步提升重建图像块的增强效果。
在一些示例中,上述多尺度提取层由卷积层和下采样层组合,例如,卷积层用于输出特征信息,下采样层用于对卷积层输出的特征信息进行下采样,得到M个不同尺度下的特征信息。
在另一种示例中,如图9所示,上述多尺度提取层包括M个不同尺度的第一特征提取层,每个第一特征提取层可以提取对应尺度下的特征信息。
在图9的基础上,上述S404-A11包括:通过M个不同尺度的第一特征提取层,提取第i-1个特征信息的M个不同尺度的特征信息D 1、D 2…D M
本申请实施例对上述第一特征提取层的具体网络结构不做限制。
在一些实施例中,上述第一特征提取层包括卷积层,且不同的第一特征提取层所包括的卷积层的卷积和大小不同。
示例的,假设M=2,即第一特征提取单元包括2个第一特征提取层,假设一个第一特征提取层的卷积核的大小为3X3,另一个第一特征提取层的卷积核的大小为5X5,使用3X3和5X5的卷积核对输入的第i-1个特征信息
Figure PCTCN2022083382-appb-000001
进行特征提取得到特征信息
Figure PCTCN2022083382-appb-000002
和特征信息
Figure PCTCN2022083382-appb-000003
在一些实施例中,上述M个不同尺度的第一特征提取层中,至少一个第一特征提取层包括激活函数。
根据上述步骤,将第i-1个特征信息输入多尺度提取层,得到M个不同尺度的特征信息D 1、D 2…D M,接着,执行S404-A12,对M个不同尺度的特征信息D 1、D 2…D M进行加权处理,得到第i个加权特征信息G 1
解码端可以将M个不同尺度的特征信息融合后进行加权处理,得到第i个加权特征信息。
本申请实施例对M个不同尺度的特征信息进行融合的具体方式不做限制,例如将M个不同尺度的特征信息进行相加或相乘等。
在一些实施例中,上述S404-A12包括:
S404-A12-1、将M个不同尺度的特征信息进行拼接,得到第一拼接特征信息;对第一拼接特征信息进行加权处理,得到第i个加权特征信息。
具体的,是将M个不同尺度的特征信息D 1、D 2…D M在通道上进行拼接后,得到第一拼接特征信息X,对X进行加权处理,得到第i个加权特征信息G 1,例如为X中的重要特征分配较大的权重,为不重要的特征分配较小的权重,再根据各特征的权重,对X中特征进行加权,得到第i个加权特征信息G 1
本申请实施例对上述S404-A12中对M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息的具体实现方式不做限制。
在一种可能的实现方式中,上述S404-A12-1中对所述拼接特征信息进行加权处理,得到所述第i个加权特征信息 包括:
S404-A12-11、通过加权处理层对第一拼接特征信息进行加权处理,得到具有第一通道数的加权特征信息;
S404-A12-12、根据具有第一通道数的加权特征信息,得到第i个加权特征信息。
例如图8所示,第一特征提取单元还包括加权处理层,加权处理层用于对多个尺度下的特征进行加权处理。这样,解码端将第i-1个第一特征提取单元输出的第i-1个特征信息M i-1输入多尺度提取层,例如输入M个不同尺度的第一特征提取层,这M个不同尺度的第一特征提取层输出M个特征信息D 1、D 2…D M,将这M个特征信息D 1、D 2…D M进行拼接后,输入加权处理层,加权处理层对拼接后的特征信息进行特征加权处理,得到具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000004
接着,根据具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000005
得到第i个加权特征信息。
在一些实施例中,由于对M个不同尺度的特征信息进行拼接后输入加权处理层,加权处理层输出的特征信息的通道数可能与第i-1个特征信息的通道数不同,因此,如图10所示,本申请实施例的第一特征提取单元还包括第二特征提取层,该第二特征提取层用于改变通道数。此时,解码端可以将具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000006
输入第二特征提取层进行通道数的变化,输出第i个加权特征信息G 1
在一些实施例中,上述N个第一特征提取单元中每个第一特征提取单元输出的特征信息的通道数可以相同,例如,第i个加权特征信息的通道数,与所述第i-1个特征信息的通道数相同。
本申请实施例对上述加权处理层的具体网络结构不做限制,示例性的,加权处理层包括神经元注意力机制。
本申请实施例对上述第二特征提取层的网络结构也不做限制,例如包括1X1的卷积层。
根据上述步骤,得到第i个加权特征信息后,执行S404-A13,根据第i个加权特征信息,确定第i个特征信息。
在一种示例中,将第i个加权特征信息,确定为第i个特征信息。
在另一种示例中,将第i个加权特征信息与第i-1个特征信息之和,确定为第i个特征信息。
下面通过举例,对本申请实施例中,第i个第一特征提取单元的网络结构进行介绍。
如图11所示,本申请实施例的第i个第一特征提取单元包括多尺度提取层、加权处理层和第二特征提取层,其中多尺度提取层包括2个第一特征提取层,2个第一特征提取层的网络结构基本一致,均包括一个卷积层和一个激活函数,但是,2个第一特征提取层所包括的卷积核大小一样,其中同一个卷积核大小为3X3,另一个卷积核大小为5X5。示例性的,2个第一特征提取层包括的激活函数为ReLu,需要说明的是,该激活函数还可以是其他形式的激活函数。第二特征提取层包括一个卷积核大小为1X1的卷积层,以进行特征通道数的降低。
具体的,如图11所示,将第i-1个第一特征提取单元输出的第i-1个特征信息
Figure PCTCN2022083382-appb-000007
分别输入2个第一特征提取层,2个第一特征提取层进行多尺度特征提取,输出特征信息
Figure PCTCN2022083382-appb-000008
和特征信息
Figure PCTCN2022083382-appb-000009
接着,将特征信息
Figure PCTCN2022083382-appb-000010
和特征信息
Figure PCTCN2022083382-appb-000011
进行拼接后得到第一拼接特征信息
Figure PCTCN2022083382-appb-000012
示例性的,上述特征信息C 1、C 2和X,通过如下公式(1)确定出:
Figure PCTCN2022083382-appb-000013
其中,σ表示ReLU激活函数,W 1和W 2表示3X3和5X5的卷积核,*表示卷积操作。需要说明的是,卷积核的大小,以及激活函数的类型可以根据实际需要进行更改。
接着,将第一拼接特征信息
Figure PCTCN2022083382-appb-000014
输入加权处理层进行特征加权处理,具体是为重要的特征分配较大的权重以突出该特征的影响,为相对不重要的特征分配较小的权重以削弱该特征的影响。加权处理层输出具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000015
再将该特征信息
Figure PCTCN2022083382-appb-000016
输入第二特征提取层进行特征通道数的降低,具体是,
Figure PCTCN2022083382-appb-000017
通过1X1的卷积操作得到第i个加权特征信息
Figure PCTCN2022083382-appb-000018
实现通道上的特征数的减少,D 3与输入M i-1进行相加得到第i个第一特征提取单元输出的第i个特征信息
Figure PCTCN2022083382-appb-000019
示例性的,上述特征信息D 3和M i-1,通过如下公式(2)确定出:
Figure PCTCN2022083382-appb-000020
其中,σ表示ReLU激活函数,W 3表示1X1的卷积核,*表示卷积操作。需要说明的是,卷积核的大小,以及激活函数的类型可以根据实际需要进行更改。
本申请实施例对加权处理层的具体网络结构不做限制。
在一些实施例中,上述加权处理层包括神经元注意力机制。
示例性的,神经元注意力机制的网络结构如图12所示,包深度卷积(Depth wise Conv)、点卷积(Point wise Conv)和激活函数。具体的,通过Depthwise卷积操作,将拼接特征X在不同通道上进行卷积,接着经过ReLU激活函数,通过Pointwise卷积操作,使得不同特征图的信息得到融合。再接着,经过Sigmoid激活函数得到权重特征图
Figure PCTCN2022083382-appb-000021
V与X进行对应元素相乘,得到具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000022
示例性的,上述具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000023
可以通过如下公式(3)确定出:
Figure PCTCN2022083382-appb-000024
其中,Y c表示特征信息Y在一个通道上的特征,X c表示特征信息X在一个通道上的特征,σ表示ReLU激活函数,δ表示Sigmoid激活函数,W d和W p分别表示Depthwise卷积和Pointwise卷积,
Figure PCTCN2022083382-appb-000025
表示对应元素相乘的操作。需要说明的是,激活函数的类型可以根据实际需要进行更改。
根据上述公式(3)确定出具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000026
后,将该
Figure PCTCN2022083382-appb-000027
代入上述公式(2)中,确定出第i个特征信息M i
上述以N个第一特征提取单元中的第i个第一特征提取单元提取第i个特征信息为例,对于N个第一特征提取单元中的其他第一特征提取单元,参照上述第i个第一特征提取单元提取第i个特征信息的过程,进而可以得到最终的第N个第一特征提取单元提取的第N个特征信息。
接着,执行上述S404-A2的步骤,以根据第N个特征信息,确定重建图像块的第一特征信息。
其中,上述S404-A2的实现过程包括但不限于如下几种方式:
方式1,将第N个特征信息,确定为重建图像块的第一特征信息。
方式2,上述S404-A2包括S404-A2-1:根据第N个特征信息的前N-1个特征信息中的至少一个,以及第N个特征信息,得到重建图像块的第一特征信息。
由上述可知,解码端基于量化参数对重建图像块进行N次迭代特征加权处理,例如基于量化参数,对重建图像块的第i-1个特征信息进行特征加权处理,得到重建图像块的第i个特征信息,i为从1到N的正整数,重复执行,得到重建图像块的第N个特征信息。这样为了提高第一特征信息的获取准确性,解码端可以根据第N个特征信息的前N-1个特征信息中的至少一个,以及第N个特征信息,得到重建图像块的第一特征信息,例如,将前N-1个特征信息中的至少一个与第N个特征信息进行拼接后,再进行特征提取,得到重建图像块的第一特征信息。
在一些实施例中,如图13所示,第一特征提取模块除了包括N个第一特征提取单元外,还包括第二特征提取单元,该第二特征提取单元用于对前N-1个第一特征提取单元中至少一个第一特征提取单元输出的特征信息和第N个第一特征提取单元输出的第N个特征信息进行特征再提取,以得到更深层次的特征信息。具体的,为了防止随着网络的深入,当前图像块的重建图像块的特征丢失,则解码端将N个第一特征提取模块中前N-1个第一特征提取模块中的至少一个第一特征提取模块输出的特征信息,与第N个第一特征提取模块输出的第N个特征信息M N输入该第二特征提取单元中,例如,将前N-1个第一特征提取模块中的至少一个第一特征提取模块输出的特征信息,与第N个特征信息M N进行拼接后输入第二特征提取单元中。该第二特征提取单元进行更深层次的特征提取,得到重建图像块的第一特征信息F1。
在一些实施例中,上述S404-A2-1还包括:将前N-1个第一特征提取模块中至少一个第一特征提取模块输出的特征信息、第N个特征信息、重建图像块、量化参数拼接后,输入第二特征提取单元中,得到重建图像块的第一特征信息。
在该实施例中,为了进一步生成符合要求的第一特征信息,则将重建图像块和量化参数,与至少一个第一特征提取模块输出的特征信息和第N个特征信息一同输入第二特征提取单元中进行特征提取,使得特征提取过程中受到重建图像块和量化参数的监督,输出更符合要求的第一特征信息。
在一些实施例中,为了提高对重建图像块的第一特征信息的确定准确性,则本申请实施例先提取重建图像块的浅层特征信息(即第二特征信息),再根据第二特征信息,确定重建图像块的第一特征信息。
基于此,S404-A包括:基于量化参数,提取重建图像块的第二特征信息;对第二特征信息进行特征加权处理,得到重建图像块的第一特征信息。
具体的,基于量化参数,对重建图像块进行浅层特征提取,得到重建图像块的第二特征信息,例如,将重建图像块和量化参数进行拼接,得到拼接信息,对拼接信息进行浅层特征提取,得到重建图像块的第二特征信息。接着,基于该第二特征信息,确定重建图像块的第一特征信息,例如,基于第二特征信息进行深度特征提取,得到重建图像块的第一特征信息。
本申请实施例对拼接信息进行特征提取,得到第二特征信息的具体方式不做限制。
在一种可能的实现方式中,通过第二特征提取模块对拼接信息进行特征提取,得到第二特征信息。
例如,如图14所示,本申请实施例的增强模型除了包括第一特征提取模块外,还包括第二特征提取模块,该第二特征提取模块用于提取重建图像块的浅层特征信息,并将提取的浅层特征信息输入第一特征提取模块进行深层次特征提取。具体的,如图14所示,解码端先将重建图像和量化参数进行拼接,得到拼接信息,接着将拼接信息输入第二特征提取模块进行浅层特征提取,得到重建图像块的第二特征信息C2。接着,将该浅层的第二特征信息C2输入第一特征提取模块中,得到重建图像块的第一特征信息F1。
此时,在一些实施例中,上述S404-A2-1包括:对前N-1个特征信息中的至少一个、第N个特征信息以及第二特征信息拼接,得到第二拼接特征信息;对第二拼接特征信息进行特征再提取,得到重建图像块的第一特征信息。
示例性的,如图15所示,将前N-1个第一特征提取模块中至少一个第一特征提取模块输出的特征信息、第N个特征信息M N以及第二特征信息C2拼接后,输入第二特征提取单元中,得到重建图像块的第一特征信息F1。
本申请实施例对第二特征提取模块的具体网络结构不做限制。
在一种示例中,上述第二特征提取模块包括至少一个卷积层。
示例性的,若第二特征提取模块包括两个卷积层,则解码端通过两个卷积层,得到重建图像块的第二特征信息。
下面以第二特征提取模块包括两个卷积层为例,对确定第二特征信息的过程进行介绍。
举例说明,如图16所示,解码端将重建图像块进行归一化处理,得到归一化处理后的重建图像块
Figure PCTCN2022083382-appb-000028
同理,对量化参数QP进行归一化处理,得到归一化处理后的量化参数
Figure PCTCN2022083382-appb-000029
接着,将
Figure PCTCN2022083382-appb-000030
Figure PCTCN2022083382-appb-000031
拼接后,输入第二特征提取模块,第一个卷积层输出特征
Figure PCTCN2022083382-appb-000032
将该特征C 1输入第二个卷积层,第二个卷积层输出第二特征信息
Figure PCTCN2022083382-appb-000033
在一种示例中,上述第二特征信息C 2,可以通过如下公式(4)确定出:
Figure PCTCN2022083382-appb-000034
其中,σ表示ReLU激活函数,W 1和W 2表示3X3的卷积核,*表示卷积操作,·表示拼接运算。需要说明的是,卷积核的大小,以及激活函数的类型可以根据实际需要进行更改。
根据上述方法,确定出重建图像块的第二特征信息后,对第二特征信息进行特征加权处理,得到重建图像块的第一特征信息,根据重建图像块的第一特征信息,确定重建图像块的增强图像块。
本申请实施例对上述S404-B中根据重建图像块的第一特征信息,确定重建图像块的增强图像块的具体方式不做限制。
在一些实施例中,若重建图像块的第一特征信息与重建图像块的大小一致时,则可以将该第一特征信息,确定为增强图像块。
在一些实施例中,上述S404-B包括:对重建图像块的第一特征信息进行非线性映射,得到增强图像块。
本申请实施例对S404-B中对重建图像块的第一特征信息进行非线性映射,得到增强图像块的具体方式不做限制。
例如,使用非线性映射方式,对重建图像块的第一特征信息进行处理,以使处理后的重建图像块的第一特征信息的大小与重建图像块的大小一致,进而将处理后的重建图像块的第一特征信息作为增强图像块。
在一些实施例中,如图17所示,本申请实施例的增强模型还包括重建模块,该重建模块用于对第一特征提取模块提取的第一特征信息做进一步非线性映射。
在图17的基础上,上述S404-B包括:通过重建模块,对重建图像块的第一特征信息非线性映射,得到增强图像块。
本申请实施例对重建模块的网络模型不做限制。
在一些实施例中,重建模块包括至少一个卷积层。
示例性的,如图18所示,重建模块包括2个卷积层,编码端将第一特征提取模块输出的重建图像块的第一特征信息F1输入重建模块,经过两个卷积层的卷积操作,得到重建图像块的增强图像块O1。
在一种示例中,增强图像块O1可以通过如下公式(5)确定出:
Figure PCTCN2022083382-appb-000035
其中,F1表示重建图像的第一特征信息,σ表示ReLU激活函数,W3、W4和W5表示3X3的卷积核,*表示卷积操作。需要说明的是,卷积核的大小,以及激活函数的类型可以根据实际需要进行更改。
本申请实施例,解码端通过上述步骤,基于量化参数,对重建图像块进行质量增强,得到重建图像块的增强图像块。
在一些实施例中,解码端在对重建图像块进行质量增强之前,首先需要判断是否允许对该重建图像块进行质量增强。也就是说,解码端在确定对重建图像块进行质量增强时带来的效果大于不增强的效果时,对重建图像块进行质量增强。
其中,解码端判断是否允许对该重建图像块进行质量增强的方式包括但不限于如下几种:
方式1,解码码流,得到第一标志,该第一标志用于指示是否允许对当前图像块的重建图像块进行质量增强。
在该方式1中,编码端判断是否对当前图像块的重建图像块进行质量增强,并通过该第一标志将编码端的判断结果通知给解码端,以使解码端与编码端采用一致的图像增强策略。具体是,若编码端对当前图像块的重建图像块进行质量增强时,则将第一标志的值置为第一数值,例如置为1,若编码端不对当前图像块的重建图像块进行质量增强时,则将第一标志的值置为第二数值,例如置为0。这样,解码端通过解码码流,首先得到该第一标志,并根据该第一标志判断是否允许对当前图像块的重建图像块进行质量增强,例如,若第一标志的取值为1时,则解码端确定使用本申请实施例的方法,对重建图像块进行质量增强,即基于量化系数,对重建图像块进行质量增强。若第一标志的取值为0时,则解码端确定不对重建图像块进行质量增强,而是使用已有的环路滤波方式,对重建图像块进行滤波处理。
可选的,上述第一标志可以为序列级标志。
可选的,上述第一标志可以为帧级标志。
可选的,上述第一标志可以为片级标志。
可选的,上述第一标志可以为块级标志,例如为CTU级标志或者为CU级标志。
方式2,解码端自行判断是否对该重建图像块进行质量增强。
具体的,解码端首先基于量化参数,对重建图像块进行质量增强,得到测试增强图像块,接着,解码端确定该测 试增强图像块和未增强的重建图像块分别对应的图像质量,若测试增强图像块的图像质量大于重建图像块的图像质量时,说明采用本申请实施例的增强方法可以实现显著的增强效果,此时,解码端将上述确定的测试增强图像块确定为重建图像块的增强图像块直接输出进行显示,和/或将上述确定的测试增强图像块保存在解码图像缓存中,作为后续图像块的帧内参考。
若上述测试增强图像块的图像质量小于或等于重建图像块的图像质量时,说明使用本申请实施例的增强方法无法实现显著的增强效果,此时,对重建图像块进行环路滤波后直接输出进行显示,和/或将环路滤波后的重建图像块保存在解码图像缓存中,作为后续图像块的帧内参考。
本申请实施例对确定图像质量的方式不做限制,例如通过峰值信噪比(Peak Signal-to-Noise Ratio,PSNR)或结构相似性(Structural SIMilarity,SSIN)作为评价图像质量的指标。
在一些实施例中,上述重建图像块为经过环路滤波处理后的重建图像块。例如,解码端确定出当前图像块的预测块,以及当前图像块的残差块,将残差块与预测块相加,得到重建图像块。再采用环路滤波器对重建图像块进行滤波处理,将滤波处理后的重建图像块输入增强模型中进行质量增强。
在一些实施例中,本申请实施例可以使用增强模型对重建图像块进行质量增强后,再经过环路滤波处理。
在一些实施例中,本申请实施例对重建图像块通过增强模型进行质量增强后,不再进行环路滤波处理。
在一些实施例中,本申请实施例使用增强模型对重建图像块进行质量增强后,可以将增强图像块进行显示,且将增强图像块存放在在解码图像缓存中,作为其他图像块的参考。
可选的,解码端还可以将增强图像块进行显示,将未增强的重建图像块存放在在解码图像缓存中,作为其他图像块的参考。
可选的,解码端还可以将重建图像块进行显示,将增强图像块存放在在解码图像缓存中,作为其他图像块的参考。
需要说明的是,在使用上述增强模型进行质量增强之前,首先需要对增强模型进行训练。
在一些实施例中,增强模型的训练可以由其他设备完成,解码端直接使用训练好的增强模型进行质量增强。
在一些实施例中,增强模型的训练可以由解码端完成,例如,解码端使用训练数据,对增强模型进行训练,并使用训练好的增强模型进行质量增强。
在一些实施例中,增强模型的训练集由超分重建DIV2K数据集的800张用于训练的图像构成。使用VTM8.2(Versatile Video Coding and Test Model 8.2,VVC测试平台8.2版本),在量化参数QP设置为22,27,32,37,AI模式且关闭环路滤波LMCS、DB、SAO和ALF的配置下对这800张图片进行编码,共得到3200张编码个图像。这3200个编码图像作为网络模型的输入,其对应的未编码的原始图像作为真实值,构成最终的训练集。
可选的,为了扩充数据集的多样性,在加载训练集时,采用随机裁剪的方式,在每个图像中随机裁剪出大小为128X128的图像块作为增强模型的输入。
可选的,增强模型的初始学习率设为1X10 -2,每隔30个迭代(epoch)学习率下降为原来的1/2。示例性的,最终在Pytorch1.6.0平台上完成了训练。
本申请实施例提供的图像处理方法,解码码流,得到当前图像块的量化系数;确定当前图像块对应的量化参数,并基于量化参数,对量化系数进行反量化,得到当前图像块的变换系数;根据变换系数,确定当前图像块的重建图像块;基于量化参数,对重建图像块进行质量增强,得到增强图像块。由于不同的图像块对应的量化系数可能不同,为了提高图像块的增强准确性,本申请基于量化系数对重建图像块进行质量增强,可以提高增强效果。另外,本申请以图像块为单位进行图像质量增强,这样使用增强后的图像块作为帧内预测中其他图像块的参考块时,可以提供更精准的参考,进而可以提高帧内预测的准确性。
图19为本申请一实施例提供的图像处理方法流程示意图。图19可以理解为图4所示的图像处理方法的一种更为具体方式,如图19所示,本申请实施例的图像处理方法包括如下步骤:
S501、解码码流,得到当前图像块的量化系数;
S502、确定当前图像块对应的量化参数,并基于所述量化参数,对所述量化系数进行反量化,得到所述当前图像块的变换系数;
S503、根据变换系数,确定当前图像块的重建图像块。
上述S501至S503的具体实现过程可以参照上述S401至S403的描述,在此不再赘述。
S504、判断是否对当前图像块的重建图像块进行质量增强。
方式一,解码码流,得到第一标志,根据第一标志确定是否允许对当前图像块的重建图像块进行质量增强。
方式二,解码端基于量化参数,对重建图像块进行质量增强,得到测试增强图像块;确定测试增强图像块的第一图像质量和重建图像块的第二图像质量,根据第一图像质量和第二图像质量,确定是否对当前图像块的重建图像块进行质量增强。
若解码端确定对当前图像块的重建图像块进行质量增强时,则执行如下S505的步骤。
若解码端确定不对当前图像块的重建图像块进行质量增强时,则执行如下S508的步骤。
S505、基于量化参数,对重建图像块进行特征提取,得到重建图像块的第二特征信息。
S506、对第二特征信息进行特征加权处理,得到重建图像块的第一特征信息。
S507、对重建图像块的第一特征信息非线性映射,得到增强图像块。
在一些实施例中,使用增强模型对重建图像块进行质量增强,得到增强图像块。
示例性的,如图18所示,本申请实施例的增强模型包括第二特征提取模块、第一特征提取模块和重建模块,其中第二特征提取模块用于提取浅层特征(即重建图像块的第二特征信息),第一特征提取模块用于提取深层特征(即重建图像块的第一特征信息),重建模块用于对深层特征进行非线性映射,得到最终的增强图像块。
示例性的,图18中示出的第二特征提取模块包括两个卷积层,但是,本申请实施例提供的第二特征提取模块的结 构包括但不限于图18所示。
示例性的,图18示出了第一特征提取模块包括N个第一特征提取层和一个第二特征提取层,且N个第一特征提取层串联连接,且N个第一特征提取层的输出以及第二特征提取模块的输出均输入至第二特征提取层中。但是,本申请实施例提供的第一特征提取模块的网络结构包括但不限于图18所示。
示例性的,图18输出了重建模块包括两个卷积层,但是本申请实施例提供的重建模块的网络结构包括但不限于图18所示。
在一些实施例中,本申请实施例的第一特征提取层包括多尺度提取层和神经元注意力机制,其中,多尺度提取层用于进行多尺度的特征提取,神经元注意力机制用于特征加权。此时,本申请实施例的第一特征提取层也可以称为多尺度神经元注意(Multi-scale and Neuron Attention,简称MSNA)单元。
在一些实施例中,本申请实施例的增强模型也称为基于神经元注意力机制的神经网络模型((Neuron Attention-based CNN,简称NACNN)。
具体的,如图18所示,解码端将重建图像块和量化参数进行归一化处理后,进行拼接,将拼接结果输入增强模型,第二特征提取模块中的第一个卷积层对重建图像块和量化参数进行卷积操作,得到特征信息C1,再将特征信息C输入第二个卷积层中进行卷积操作,得到第二特征信息C2。解码端将第二特征信息C2输入第一特征提取模块中,第一特征提取模块中的第一个第一特征提取单元对C2进行多尺度特征提取和特征加权,得到第一个特征信息M1。接着,将第一个特征信息M1输入第二个第一特征提取单元中进行多尺度特征提取和特征加权,得到第二个特征信息M2。依次类推,得到第N个第一特征提取单元输出的第N个特征信息MN。解码端,将第一特征提取模块中各层提取的N个特征信息M1、M2…..MN,以及第二特征模块输出的第二特征信息C2进行拼接后,输入第一特征提取模块中的第二特征提取层中进行通道数的变化,输出重建图像块的第一特征信息F1。然后,解码端将第一特征信息F1输入重建模块中进行重建,具体是,经过重建模块中的第一个卷积层进行卷积操作,得到特征信息C3,重建模块中的第二个卷积层对特征信息C3进行卷积操作,得到特征信息C4。最后,根据特征信息C4,得到重建图像块的增强图像块O1,例如,将特征信息C4输出为重建图像块的增强图像块O1。
需要说明的是,上述S505至S507的具体实现过程可以参照上述S404的具体描述,在此不再赘述。
S508、对重建图像块进行环路滤波。
在一些实施例中,若解码端确定不使用增强模型对当前图像块的重建图像块进行质量增强时,则执行S508的步骤,对重建图像块进行环路滤波。
本申请实施例提供的图像处理方法,解码端在对重建图像块进行质量增强之前,首先判断是否对重建图像块进行质量增强,进行提高了图像处理的可靠性。
上文以解码端为例,对本申请实施例的图像处理方法进行介绍,在此基础上,下面以编码端为例,对本申请实施例提供的图像处理方法进行介绍。
图20为本申请一实施例提供的图像处理方法流程示意图,本申请实施例应用于图1和图2所示编码器。如图20所示,本申请实施例的方法包括:
S601、确定当前图像块的量化参数,并基于量化参数对当前图像块进行编码,得到当前图像块的量化系数;
S602、基于当前图像块的量化参数,对量化系数进行反量化,得到当前图像块的残差块;
S603、根据残差块,得到当前图像块的重建图像块。
在图像编码过程中,编码器接收视频流,该视频流由一系列图像帧组成,针对视频流中的每一帧图像进行视频编码,视频编码器对图像帧进行块划分,得到当前编码块。
本申请实施例对当前图像块的具体大小不做限制。
在一些实施例中,本申请实施例的当前图像块为CTU,例如,将一帧图像划分成若干个CTU,且本申请对CTU的大小不做限制,例如一个CTU大小为128×128、64×64、32×32等。
在一些实施例中,本申请实施例的当前图像块为CU,例如,将一个CTU划分为一个或多个CU。
在一些实施例中,本申请实施例的当前图像块为TU或PU,例如,将一个CU划分为一个或多个TU或PU。
在一些实施例中,本申请实施例的当前图像块只包括色度分量,可以理解为色度块。
在一些实施例中,本申请实施例的当前图像块只包括亮度分量,可以理解为亮度块。
在一些实施例中,该当前图像块即包括亮度分量又包括色度分量。
在一些实施例中,若当前图像块包括一个CU时,本申请实施例的编码过程为,对当前图像帧进行块划分,得到当前图像块,采用帧内或帧间预测方法,对当前图像块进行预测,得到当前图像块的预测值。当前图像块的原始值与预测值相减,得到当前图像块的残差值。确定当前图像块对应的变换方式,使用该变换方式对残差值进行变换,得到变换系数。使用确定的量化参数,对变换系数进行量化,得到量化系数,对量化系数进行编码,得到码流。
同时,编码端还包括解码过程,具体是,如图2所示,反量化/变换单元240基于确定的量化参数对量化系数进行反量化,得到变换系数,再对变换系数进行反变换得到残差块。重建单元250将预测块和残差块相加得到当前图像块的重建图像块。
在一些实施例中,若当前图像块包括对个CU时,则编码端对当前图像帧进行块划分,得到多个CU,针对每个CU根据上述方法,可以的每个CU的重建块。这样,将当前图像块所包括的CU的重建块进行组合,得到当前图像块的重建图像块。
需要说明的是,若当前图像块包括多个CU时,则当前图像块的量化参数包括这多个CU对应的量化参数。可选的,这多个CU对应的量化参数可以相同,也可以不同。
在一种可能的实现方式中,本申请实施例中当前图像块的量化参数可以以矩阵的形式表示,例如当前图像块的大小为16X16,则当前图像块的量化参数可以为一个16X16的矩阵,该矩阵中的每一个元素为当前图像块中对应位置 的像素点的量化参数。
本申请实施例对确定当前图像块对应的量化参数的具体过程不做限制。
在一些实施例中,编解码将默认的量化参数作为当前图像块对应的量化参数。
在一些实施例中,编码端通过计算,确定出当前图像块对应的量化参数。可选的,此时,编码端可以将确定的量化参数写入码流,这样,解码端可以通过解码码流,确定出当前图像块对应的量化参数。
S604、基于量化参数,对重建图像块进行质量增强,得到增强图像块。
在视频编码过程中,不同的图像块所对应的量化参数QP可能不同。在一些实施例中,量化参数QP包括量化步长,在视频编码过程中,对图像块的变换系数进行量化,其中量化步长越大,图像损失越大,量化步长越小,图像损失越小。因此,本申请实施例为了提高对当前图像块的增强效果,在当前图像块的质量增强过程中,考虑该当前图像块对应的量化参数QP的影响。
由于不同的量化系数对图像块进行反量化时的损失也不同,为了提高图像块的增强效果,本申请实施例基于当前图像块对应的量化系数,对当前图像块的重建图像块进行质量增强,可以提高重建图像块的增强效果。
本申请实施例,以图像块为单位进行图像质量增强,这样使用增强后的图像块作为帧内预测中其他图像块的参考块时,可以提供更精准的参考,进而可以提高帧内预测的准确性。
另外,本申请实施例以图像块为单位进行图像质量增强,相比于对整帧图像进行图像质量增强,可以更加注重图像块中较精细特征的增强,进一步提升图像块的增强质量。
本申请实施例,对基于量化参数,对重建图像块进行质量增强,得到增强图像块的方式不做限制。
在一种实施例中,编码端基于量化参数,通过增强模型对重建图像块进行质量增强,得到增强图像块。具体的,编码端根据上述步骤,确定出当前图像块的重建图像块后,为了降低重建图像块的失真,提升重建图像块的质量,如图5所示,将该重建图像块和对应的量化系数输入预先训练好的增强模型中,进行图像增强,最终得到该当前图像块的增强图像块。需要说明的是,该增强模型是基于不同的图像块和其对应的量化参数训练得到的,充分学习了量化参数对图像块质量增强的影响,进而可以基于不同量化参数,对经过不同量化参数反量化后得到重建图像块进行高质量增强。本申请实施例的增强模型可以为任一可以对图像块进行质量增强的神经网络模型,本申请实施例对增强模型的具体网络模型不做限制。
在一些实施例中,编码端将重建图像块和量化参数进行融合后,输入增强模型。其中,重建图像块和量化参数的融合方法至少包括如下几种示例:
示例1,假设上述重建图像块的大小为N1*N2,其中N1和N2可以相同也可以不同。将重建图像块与量化参数相乘后,输入增强模型。具体是,重建图像块的每一个像素点上乘以量化参数,得到N1*N2的矩阵,将该矩阵输入增强模型。
示例2,将重建图像与量化参数拼接后,输入增强模型。具体是,将量化系数设置为大小为N1*N2的矩阵,将大小为N1*N2的重建图像与大小为N1*N2的量化参数进行拼接后,输入增强模型。
需要说明的是,除了上述示例1和示例2所示的融合方法外,编码端还可以采用其他的融合方法,将重建图像和对应的量化参数进行融合后,输入增强模型进行质量增强。
在一些实施例中,为了防止绝对值较小的特征被绝对值较大的特征覆盖,则编码端在将上述重建图像块和量化参数输入增强模型之前,先对重建图像块和量化参数进行归一化处理,以使所有特征得到平等处理。接着,基于归一化处理后的重建图像块和量化参数,得到重建图像块的增强图像块。例如,将归一化处理后的重建图像块和量化参数拼接后,输入增强模型进行质量增强,以提高质量增强的效果。
本申请实施例的重建图像块为当前图像块在第一分量下的重建图像块。
其中,第一分量可以为亮度分量或色度分量。
在一些实施例中,上述S604包括如下S604-A和S604-B的步骤:
S604-A、基于量化参数,对重建图像块进行特征加权处理,得到重建图像的第一特征信息;
S604-B、根据第一特征信息,确定增强图像块。
本申请实施例对S604-A的具体实现方式不做限制。
在一种可能的实现方式中,基于量化参数,提取重建图像块的特征信息,例如,将量化参数和重建图像块输入一个神经网络层中,提取出重建图像块的特征信息,接着,对这些特征信息分析,为不同的特征分配不同的权重,例如为特征信息中的重要特征分配较大的权重以突出该特征的影响,为相对不重要的特征分配较小的权重以削弱该特征的影响,接着根据特征对应的权重对重建图像块的特征信息进行加权处理,得到该重建图像块的第一特征信息。
在另一种可能的实现方式中,如图6所示,本申请实施例的增强模型包括第一特征提取模块,该第一特征提取模块用于提取输入信息(即重建图像块和量化参数)的至少一个特征,并对提取的至少一个特征赋予不同的权重,以进行特征加权处理。
如图6所示,本申请实施例的增强模型包括第一特征提取模块,该第一特征提取模块用于对提取的特征进行加权处理。具体是,将重建图像块和量化参数(可选的,通过上述示例1或示例2的方法进行融合后)输入第一特征提取模块中,该第一特征提取模块进行特征提取,提取出至少一个特征,并为这至少一个特征中的重要特征分配较大的权重以突出该特征的影响,为相对不重要的特征分配较小的权重以削弱该特征的影响,接着根据至少一个特征对应的权重进行加权处理,得到该重建图像块的第一特征信息。最后,根据该重建图像块的第一特征信息,确定重建图像块的增强图像块。
本申请实施例可以为不同的特征分配不同的权重,以凸显出重要特征的影响,削弱不重要特征的影响,进一步提高了重建图像块的质量增强效果。
本申请实施例对第一特征提取模块的网络模型不做限制,例如该第一特征提取模块包括多个卷积层和注意力机制等。
在另一种可能的实现方式中,上述S604-A包括如下步骤:
S604-A1、基于量化参数,对重建图像块的第i-1个特征信息进行特征加权处理,得到重建图像块的第i个特征信息,i为从1到N的正整数,重复执行,得到重建图像块的第N个特征信息,若i为1时,则第i-1个特征信息为重建图像块;
S604-A2、根据第N个特征信息,确定重建图像块的第一特征信息。
在该实现方式中,编码端基于量化参数,对重建图像块进行N个特征加权迭代处理,得到重建图像块的第N个特征信息。具体是,基于量化参数,对重建图像块进行特征加权处理,得到重建图像块的第1个特征信息,接着,基于量化参数,对第1个特征信息进行特征加权处理,得到重建图像块的第2个特征信息,迭代进行,基于量化参数,对第N-1个特征信息进行特征加权处理,得到重建图像块的第N个特征信息。需要说明的是,本申请实施例对特征加权处理的具体方式不做限制,示例性的,将量化参数和第i-1个特征信息输入具有特征加权处理功能的神经网络中,得到重建图像块的第i个特征信息。
在一些实施例中,如图7所示,本申请实施例的第一特征提取模块包括N个第一特征提取单元,若N大于1时,这N个第一特征提取单元串联连接,即前一个第一特征提取单元的输出为下一个第一特征提取单元的输入。基于量化参数,通过这N个第一特征提取单元对重建图像块进行特征加权处理,得到重建图像块的第N个特征信息。如图7所示,这N个第一特征提取单元中,每个第一特征提取单元用于提取至少一个特征,并为这至少一个特征中的重要特征分配较大的权重以突出该特征的影响,为相对不重要的特征分配较小的权重以削弱该特征的影响。其中,后一个第一特征提取单元对前一个第一特征提取单元输出的特征信息再进行特征提取和权重分配,以进一步凸显重要特征,经过N个第一特征提取单元的处理,可以对重建图像块的重要特征进行主要增强,对重建图像块中的非重要特征进行较弱的增强,进而提升了重建图像块的增强效果。
在一种示例中,若N为1,即第一特征提取模块包括一个第一特征提取单元时,则上述S603-A1包括,编码端将重建图像和量化系数进行融合后,输入该第一特征提取单元中,该第一特征提取单元进行特征提取,提取出至少一个特征,并根据至一个特征中特征的重要程度分配不同的权重,接着,根据不同的权重对至少一个特征进行加权处理,得到第一个特征信息。最后,根据第一个特征信息,确定重建图像块的第一特征信息,例如,将该第一个特征信息,确定为重建图像块的第一特征信息。
在另一种示例中,若N大于1,即第一特征提取模块包括多个第一特征提取单元时,则上述S603-A1包括,编码端将重建图像和量化系数进行融合后,输入第一个第一特征提取单元中,该第一个第一特征提取单元进行特征加权处理,即提取至少一个特征,并为至少一个特征中的每个特征确定权重,再根据权重对至少一个特征进行加权,得到该第一个第一特征提取单元输出的特征信息,为了便于描述,将该特征信息记为第一个特征信息M 1。接着,将该第一个特征信息M 1输入第二个第一特征提取单元中进行特征加权处理,得到第二个特征信息M 2,依次类推,对于N个第一特征提取单元中的第i个第一特征提取单元,将第i-1个第一特征提取单元输出的第i-1个特征信息M i-1,输入第i个第一特征提取单元中进行特征加权处理,得到第i个特征信息M i,最后,得到第N个第一特征提取单元输出的第N个特征信息M N。根据第N个特征信息M N,确定重建图像块的第一特征信息,例如,将该第N个特征信息,确定为重建图像块的第一特征信息。
本申请实施例对第一特征提取单元的具体网络结构不做限制,例如,第一特征提取单元包括至少一个卷积层和注意力机制。
在一些实施例中,上述S604-A1中基于量化参数,对重建图像块的第i-1个特征信息进行特征加权处理,得到重建图像块的第i个特征信息包括如下步骤:
S604-A11、提取第i-1个特征信息的M个不同尺度的特征信息,M为大于1的正整数;
S604-A12、对M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息;
S604-A13、根据第i个加权特征信息,确定第i个特征信息。
具体的,编码端对第i-1个特征信息进行多尺度特征提取,得到第i-1个特征信息的M个不同尺度的特征信息,接着,对M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息,例如,根据特征的要重程度,为重要特征分配较大的权重,为不重要的特征分配较小的权重,再根据各特征的权重,对M个不同尺度的特征信息进行加权,得到重建图像块的第i个加权特征信息。最后,根据第i个加权特征信息,确定第i个特征信息,例如,将第i个加权特征信息,确定为第i个特征信息。
在一种示例中,若编码端通过第一特征提取单元提取重建图像块的第i个特征信息,则如图8所示,第一特征提取单元包括多尺度提取层,其中多尺度提取层用于提取多个尺度下的特征。
示例性的,如图8所示,以N个第一特征提取单元中的第i个第一特征提取单元为例,该第i个第一特征提取单元的输入为第i-1个第一特征提取单元的输出。将第i-1个第一特征提取单元输出的第i-1个特征信息M i-1输入第i个第一特征提取单元中的多尺度提取层中,该多尺度提取层用于提取多尺度特征,例如提取M个不同尺度下的特征信息。接着,为多尺度提取层提取的M个不同尺度下的特征信息(D 1、D 2…D M)中不同的特征分配不同的权重,并进行加权运算,得到第i个加权特征信息G 1。最后,根据该第i个加权特征信息G 1,确定第i个特征信息M i,例如将该第i个加权特征信息G 1,作为第i个第一特征提取单元输出的第i个特征信息M i
由上述可知,本申请实施例的第一特征提取单元进行多尺度的特征提取,以更好的探索输入的重建图像块与真实图像块之间的关系,以进一步提升重建图像块的增强效果。
在一些示例中,上述多尺度提取层由卷积层和下采样层组合,例如,卷积层用于输出特征信息,下采样层用于对卷积层输出的特征信息进行下采样,得到M个不同尺度下的特征信息。
在另一种示例中,如图9所示,上述多尺度提取层包括M个不同尺度的第一特征提取层,每个第一特征提取层可以提取对应尺度下的特征信息。
在图9的基础上,上述S604-A11包括:通过M个不同尺度的第一特征提取层,提取第i-1个特征信息的M个不同尺度的特征信息D 1、D 2…D M
本申请实施例对上述第一特征提取层的具体网络结构不做限制。
在一些实施例中,上述第一特征提取层包括卷积层,且不同的第一特征提取层所包括的卷积层的卷积和大小不同。
示例的,假设M=2,即第一特征提取单元包括2个第一特征提取层,假设一个第一特征提取层的卷积核的大小为3X3,另一个第一特征提取层的卷积核的大小为5X5,使用3X3和5X5的卷积核对输入的第i-1个特征信息
Figure PCTCN2022083382-appb-000036
进行特征提取得到特征信息
Figure PCTCN2022083382-appb-000037
和特征信息
Figure PCTCN2022083382-appb-000038
在一些实施例中,上述M个不同尺度的第一特征提取层中,至少一个第一特征提取层包括激活函数。
根据上述步骤,将第i-1个特征信息输入多尺度提取层,得到M个不同尺度的特征信息D 1、D 2…D M,接着,执行S604-A12,对M个不同尺度的特征信息D 1、D 2…D M进行加权处理,得到第i个加权特征信息G 1
编码端可以将M个不同尺度的特征信息融合后进行加权处理,得到第i个加权特征信息。
本申请实施例对M个不同尺度的特征信息进行融合的具体方式不做限制,例如将M个不同尺度的特征信息进行相加或相乘等。
在一些实施例中,上述S604-A12包括:
S604-A12-1、将M个不同尺度的特征信息进行拼接,得到第一拼接特征信息;对第一拼接特征信息进行加权处理,得到第i个加权特征信息。
具体的,是将M个不同尺度的特征信息D 1、D 2…D M在通道上进行拼接后,得到第一拼接特征信息X,对X进行加权处理,得到第i个加权特征信息G 1,例如为X中的重要特征分配较大的权重,为不重要的特征分配较小的权重,再根据各特征的权重,对X中特征进行加权,得到第i个加权特征信息G 1
本申请实施例对上述S604-A12中对M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息的具体实现方式不做限制。
在一种可能的实现方式中,上述S604-A12-1中对所述拼接特征信息进行加权处理,得到所述第i个加权特征信息包括:
S604-A12-11、通过加权处理层对第一拼接特征信息进行加权处理,得到具有第一通道数的加权特征信息;
S604-A12-12、根据具有第一通道数的加权特征信息,得到第i个加权特征信息。
例如图8所示,第一特征提取单元还包括加权处理层,加权处理层用于对多个尺度下的特征进行加权处理。这样,编码端将第i-1个第一特征提取单元输出的第i-1个特征信息M i-1输入多尺度提取层,例如输入M个不同尺度的第一特征提取层,这M个不同尺度的第一特征提取层输出M个特征信息D 1、D 2…D M,将这M个特征信息D 1、D 2…D M进行拼接后,输入加权处理层,加权处理层对拼接后的特征信息进行特征加权处理,得到具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000039
接着,根据具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000040
得到第i个加权特征信息。
在一些实施例中,由于对M个不同尺度的特征信息进行拼接后输入加权处理层,加权处理层输出的特征信息的通道数可能与第i-1个特征信息的通道数不同,因此,如图10所示,本申请实施例的第一特征提取单元还包括第二特征提取层,该第二特征提取层用于改变通道数。此时,编码端可以将具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000041
输入第二特征提取层进行通道数的变化,输出第i个加权特征信息G 1
在一些实施例中,上述N个第一特征提取单元中每个第一特征提取单元输出的特征信息的通道数可以相同,例如,第i个加权特征信息的通道数,与所述第i-1个特征信息的通道数相同。
本申请实施例对上述加权处理层的具体网络结构不做限制,示例性的,加权处理层包括神经元注意力机制。
本申请实施例对上述第二特征提取层的网络结构也不做限制,例如包括1X1的卷积层。
根据上述步骤,得到第i个加权特征信息后,执行S604-A13,根据第i个加权特征信息,确定第i个特征信息。
在一种示例中,将第i个加权特征信息,确定为第i个特征信息。
在另一种示例中,将第i个加权特征信息与第i-1个特征信息之和,确定为第i个特征信息。
下面通过举例,对本申请实施例中,第i个第一特征提取单元的网络结构进行介绍。
如图11所示,本申请实施例的第i个第一特征提取单元包括多尺度提取层、加权处理层和第二特征提取层,其中多尺度提取层包括2个第一特征提取层,2个第一特征提取层的网络结构基本一致,均包括一个卷积层和一个激活函数,但是,2个第一特征提取层所包括的卷积核大小一样,其中同一个卷积核大小为3X3,另一个卷积核大小为5X5。示例性的,2个第一特征提取层包括的激活函数为ReLu,需要说明的是,该激活函数还可以是其他形式的激活函数。第二特征提取层包括一个卷积核大小为1X1的卷积层,以进行特征通道数的降低。
下面通过举例,对本申请实施例中,第i个第一特征提取单元的网络结构进行介绍。
如图11所示,本申请实施例的第i个第一特征提取单元包括多尺度提取层、加权处理层和第二特征提取层,其中多尺度提取层包括2个第一特征提取层,2个第一特征提取层的网络结构基本一致,均包括一个卷积层和一个激活函数,但是,2个第一特征提取层所包括的卷积核大小一样,其中同一个卷积核大小为3X3,另一个卷积核大小为5X5。示例性的,2个第一特征提取层包括的激活函数为ReLu,需要说明的是,该激活函数还可以是其他形式的激活函数。第二特征提取层包括一个卷积核大小为1X1的卷积层,以进行特征通道数的降低。
具体的,如图11所示,将第i-1个第一特征提取单元输出的第i-1个特征信息
Figure PCTCN2022083382-appb-000042
分别输入2个第一特征提取层,2个第一特征提取层进行多尺度特征提取,输出特征信息
Figure PCTCN2022083382-appb-000043
和特征信息
Figure PCTCN2022083382-appb-000044
接着,将特征信息
Figure PCTCN2022083382-appb-000045
和特征信息
Figure PCTCN2022083382-appb-000046
进行拼接后得到第一拼接特征信息
Figure PCTCN2022083382-appb-000047
示例性的,上述特征信息C 1、C 2和X,通过如下公式(1)确定出。
接着,将第一拼接特征信息
Figure PCTCN2022083382-appb-000048
输入加权处理层进行特征加权处理,具体是为重要的特征分配较大的 权重以突出该特征的影响,为相对不重要的特征分配较小的权重以削弱该特征的影响。加权处理层输出具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000049
再将该特征信息
Figure PCTCN2022083382-appb-000050
输入第二特征提取层进行特征通道数的降低,具体是,
Figure PCTCN2022083382-appb-000051
通过1X1的卷积操作得到第i个加权特征信息
Figure PCTCN2022083382-appb-000052
实现通道上的特征数的减少,D 3与输入M i-1进行相加得到第i个第一特征提取单元输出的第i个特征信息
Figure PCTCN2022083382-appb-000053
示例性的,上述特征信息D 3和M i-1,通过如下公式(2)确定出。
本申请实施例对加权处理层的具体网络结构不做限制。
在一些实施例中,上述加权处理层包括神经元注意力机制。
示例性的,神经元注意力机制的网络结构如图12所示,包深度卷积(Depth wise Conv)、点卷积(Point wise Conv)和激活函数。具体的,通过Depthwise卷积操作,将拼接特征X在不同通道上进行卷积,接着经过ReLU激活函数,通过Pointwise卷积操作,使得不同特征图的信息得到融合。再接着,经过Sigmoid激活函数得到权重特征图
Figure PCTCN2022083382-appb-000054
V与X进行对应元素相乘,得到具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000055
示例性的,上述具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000056
可以通过上述公式(3)确定出。
根据上述公式(3)确定出具有第一通道数的加权特征信息
Figure PCTCN2022083382-appb-000057
后,将该
Figure PCTCN2022083382-appb-000058
代入上述公式(2)中,确定出第i个特征信息M i
上述以N个第一特征提取单元中的第i个第一特征提取单元提取第i个特征信息为例,对于N个第一特征提取单元中的其他第一特征提取单元,参照上述第i个第一特征提取单元提取第i个特征信息的过程,进而可以得到最终的第N个第一特征提取单元提取的第N个特征信息。
接着,执行上述S603-A2的步骤,以根据第N个第一特征提取单元输出的第N个特征信息,确定重建图像块的第一特征信息。
其中,上述S604-A2的实现过程包括但不限于如下几种方式:
方式1,将第N个特征信息,确定为重建图像块的第一特征信息。
方式2,上述S604-A2包括S604-A2-1:根据第N个特征信息的前N-1个特征信息中的至少一个,以及第N个特征信息,得到重建图像块的第一特征信息。
由上述可知,编码端基于量化参数对重建图像块进行N次迭代特征加权处理,例如基于量化参数,对重建图像块的第i-1个特征信息进行特征加权处理,得到重建图像块的第i个特征信息,i为从1到N的正整数,重复执行,得到重建图像块的第N个特征信息,这样可以根据第N个特征信息的前N-1个特征信息中的至少一个,以及第N个特征信息,得到重建图像块的第一特征信息,例如,将前N-1个特征信息中的至少一个与第N个特征信息进行拼接后,再进行特征提取,得到重建图像块的第一特征信息。
在一些实施例中,如图13所示,第一特征提取模块除了包括N个第一特征提取单元外,还包括第二特征提取单元,该第二特征提取单元用于对前N-1个第一特征提取单元中至少一个第一特征提取单元输出的特征信息和第N个第一特征提取单元输出的第N个特征信息进行特征再提取,以得到更深层次的特征信息。具体的,将N个第一特征提取模块中前N-1个第一特征提取模块中的至少一个第一特征提取模块输出的特征信息,与第N个第一特征提取模块输出的第N个特征信息M N输入该第二特征提取单元中,例如,将前N-1个第一特征提取模块中的至少一个第一特征提取模块输出的特征信息,与第N个特征信息M N进行拼接后输入第二特征提取单元中。该第二特征提取单元进行更深层次的特征提取,得到重建图像块的第一特征信息F1。
在一些实施例中,上述S604-A2-1还包括:将前N-1个第一特征提取模块中至少一个第一特征提取模块输出的特征信息、第N个特征信息、重建图像块、量化参数拼接后,输入第二特征提取单元中,得到重建图像块的第一特征信息。
在该实施例中,为了进一步生成符合要求的第一特征信息,则将重建图像块和量化参数,与至少一个第一特征提取模块输出的特征信息和第N个特征信息一同输入第二特征提取单元中进行特征提取,使得特征提取过程中受到重建图像块和量化参数的监督,输出更符合要求的第一特征信息。
在一些实施例中,为了提高对重建图像块的第一特征信息的确定准确性,则本申请实施例先提取重建图像块的浅层特征信息(即第二特征信息),再根据第二特征信息,确定重建图像块的第一特征信息。
基于此,S604-A包括:基于量化参数,提取重建图像块的第二特征信息;对第二特征信息进行特征加权处理,得到重建图像块的第一特征信息。
具体的,基于量化参数,对重建图像块进行浅层特征提取,得到重建图像块的第二特征信息,例如,将重建图像块和量化参数进行拼接,得到拼接信息,对拼接信息进行浅层特征提取,得到重建图像块的第二特征信息。接着,基于该第二特征信息,确定重建图像块的第一特征信息,例如,基于第二特征信息进行深度特征提取,得到重建图像块的第一特征信息。
本申请实施例对拼接信息进行特征提取,得到第二特征信息的具体方式不做限制。
在一种可能的实现方式中,通过第二特征提取模块对拼接信息进行特征提取,得到第二特征信息。
例如,如图14所示,本申请实施例的增强模型除了包括第一特征提取模块外,还包括第二特征提取模块,该第二特征提取模块用于提取重建图像块的浅层特征信息,并将提取的浅层特征信息输入第一特征提取模块进行深层次特征提取。具体的,如图14所示,编码端先将重建图像和量化参数进行拼接,得到拼接信息,接着将拼接信息输入第二特征提取模块进行浅层特征提取,得到重建图像块的第二特征信息C2。接着,将该浅层的第二特征信息C2输入第一特征提取模块中,得到重建图像块的第一特征信息F1。
此时,在一些实施例中,上述S604-A2-1包括:对前N-1个特征信息中的至少一个、第N个特征信息以及第二特征信息拼接,得到第二拼接特征信息;对第二拼接特征信息进行特征再提取,得到重建图像块的第一特征信息。
示例性的,如图15所示,将前N-1个第一特征提取模块中至少一个第一特征提取模块输出的特征信息、第N个特征信息M N以及第二特征信息C2拼接后,输入第二特征提取单元中,得到重建图像块的第一特征信息F1。
本申请实施例对第二特征提取模块的具体网络结构不做限制。
在一种示例中,上述第二特征提取模块包括至少一个卷积层。
示例性的,若第二特征提取模块包括两个卷积层,则编码端通过两个卷积层,得到重建图像块的第二特征信息。
在一种示例中,上述第二特征信息C 2,可以通过如下公式(4)确定出。
根据上述方法,确定出重建图像块的第二特征信息后,将该第二特征信息输入第一特征提取模块进行深层特征提取,得到重建图像块的第一特征信息,根据重建图像块的第一特征信息,确定重建图像块的增强图像块。
本申请实施例对上述S603-B中根据重建图像块的第一特征信息,确定重建图像块的增强图像块的具体方式不做限制。
根据上述方法,确定出重建图像块的第二特征信息后,对第二特征信息进行特征加权处理,得到重建图像块的第一特征信息,根据重建图像块的第一特征信息,确定重建图像块的增强图像块。
本申请实施例对上述S604-B中根据重建图像块的第一特征信息,确定重建图像块的增强图像块的具体方式不做限制。
在一些实施例中,若重建图像块的第一特征信息与重建图像块的大小一致时,则可以将该第一特征信息,确定为增强图像块。
在一些实施例中,上述S604-B包括:对重建图像块的第一特征信息进行非线性映射,得到增强图像块。
本申请实施例对S604-B中对重建图像块的第一特征信息进行非线性映射,得到增强图像块的具体方式不做限制。
例如,使用非线性映射方式,对重建图像块的第一特征信息进行处理,以使处理后的重建图像块的第一特征信息的大小与重建图像块的大小一致,进而将处理后的重建图像块的第一特征信息作为增强图像块。
在一些实施例中,如图17所示,本申请实施例的增强模型还包括重建模块,该重建模块用于对第一特征提取模块提取的第一特征信息做进一步非线性映射。
在图17的基础上,上述S604-B包括:通过重建模块,对重建图像块的第一特征信息非线性映射,得到增强图像块。
本申请实施例对重建模块的网络模型不做限制。
在一些实施例中,重建模块包括至少一个卷积层。
示例性的,如图18所示,重建模块包括2个卷积层,编码端将第一特征提取模块输出的重建图像块的第一特征信息F1输入重建模块,经过两个卷积层的卷积操作,得到重建图像块的增强图像块O1。
本申请实施例,编码端通过上述步骤,基于量化参数,对重建图像块进行质量增强,得到重建图像块的增强图像块。
在一些实施例中,编码端在对重建图像块进行质量增强之前,首先需要判断是否对该重建图像块进行质量增强。也就是说,编码端在确定对重建图像块进行质量增强时带来的效果大于不增强的效果时,使用对重建图像块进行质量增强。
其中,编码端判断是否对该重建图像块进行质量增强的方式包括但不限于如下几种:
方式1,配置文件中包括第一标志,该第一标志用于指示是否对当前图像块的重建图像块进行质量增强。这样,编码端可以根据该第一标志,确定是否对当前图像块的重建图像块进行质量增强,例如若第一标志的取值为第一数值,例如1时,则编码端确定对当前图像块的重建图像块进行质量增强,则执行上述实施例的方法。若第一标志的取值为第二数值,例如0时,则编码端确定不对当前图像块的重建图像块进行质量增强,而是使用已有的环路滤波方式,对重建图像块进行滤波处理。
方式2,编码端自行判断是否对该重建图像块进行质量增强。
具体的,编码端首先基于量化参数,对重建图像块进行质量增强,得到测试增强图像块,接着,确定该测试增强图像块和未增强的重建图像块分别对应的图像质量,若测试增强图像块的图像质量大于重建图像块的图像质量时,说明采用本申请实施例的增强方法可以实现显著的增强效果,此时,将上述确定的测试增强图像块确定为重建图像块的增强图像块直接输出进行显示,和/或将上述确定的测试增强图像块保存在解码图像缓存中,作为后续图像块的帧内参考。
若上述测试增强图像块的图像质量小于或等于重建图像块的图像质量时,说明使用增强模型无法实现显著的增强效果,此时,将重建图像块进行环路滤波后直接输出进行显示,和/或将环路滤波后的重建图像块保存在解码图像缓存中,作为后续图像块的帧内参考。
在一些实施例中,编码端在码流中写入第一标志,该第一标志用于指示是否对当前图像块的重建图像块进行质量增强,以使解码端根据该第一标志,确定是否对当前图像块的重建图像块进行质量增强,保证编解码端的一致。
具体是,若编码端确定对当前图像块的重建图像块进行质量增强时,则将第一标志的值置为第一数值,例如置为1,若编码端确定不对当前图像块的重建图像块进行质量增强时,则将第一标志的值置为第二数值,例如置为0。这样,解码端通过解码码流,首先得到该第一标志,并根据该第一标志判断是否对当前图像块的重建图像块进行质量增强。
可选的,上述第一标志可以为序列级标志。
可选的,上述第一标志可以为帧级标志。
可选的,上述第一标志可以为片级标志。
可选的,上述第一标志可以为块级标志,例如为CTU级标志。
在一些实施例中,上述重建图像块为经过环路滤波处理后的重建图像块。例如,编码端确定出当前图像块的预测块,以及当前图像块的残差块,将残差块与预测块相加,得到重建图像块。再采用环路滤波器对重建图像块进行滤波 处理,将滤波处理后的重建图像块输入增强模型中进行质量增强。
在一些实施例中,本申请实施例可以使用增强模型对重建图像块进行质量增强后,再经过环路滤波处理。
在一些实施例中,本申请实施例对重建图像块通过增强模型进行质量增强后,不再进行环路滤波处理。
在一些实施例中,本申请实施例使用增强模型对重建图像块进行质量增强后,可以将增强图像块进行显示,且将增强图像块存放在在解码图像缓存中,作为其他图像块的参考。
可选的,编码端还可以将增强图像块进行显示,将未增强的重建图像块存放在在解码图像缓存中,作为其他图像块的参考。
可选的,编码端还可以将重建图像块进行显示,将增强图像块存放在在解码图像缓存中,作为其他图像块的参考。
需要说明的是,在使用上述增强模型进行质量增强之前,首先需要对增强模型进行训练。
进一步的,为了说明本申请实施例提供的图像处理方法的有益效果,将本申请实施例的方案进行测试,例如,在VVC测试软件VTM8.2中实现,所用的测试序列为通用测试条件中给定的Class A,Class B,Class C和Class E序列。表1中的结果在QP为32,37,42,47设置,AI模式下编码得到。
表1提出的方法在VTM8.2中BD-rate的测试结果
Figure PCTCN2022083382-appb-000059
上述表1中的参数解释:
Class为视频类别,Sequence为具体的测试序列,Y,Cb和Cr表示视频亮度和色度三种分量的性能。表1中的数值为BD-rate,BD-rate是衡量算法性能的一种方式,表示新的编码算法相对于原始算法在码率和PSNR上的变化情况,整体为负值说明性能有所提升,且绝对值越大,说明性能提升越多。
本申请实施例提供的图像处理方法,编码端确定当前图像块的量化参数,并基于量化参数对当前图像块进行编码,得到当前图像块的量化系数;基于当前图像块的量化参数,对量化系数进行反量化,得到当前图像块的残差块,根据残差块,基于量化参数,对重建图像块进行质量增强,得到增强图像块。由于不同的图像块对应的量化系数可能不同,为了提高图像块的增强准确性,本申请基于量化系数对重建图像块进行质量增强,可以提高增强效果。另外,本申请以图像块为单位进行图像质量增强,这样使用增强后的图像块作为帧内预测中其他图像块的参考块时,可以提供更精准的参考,进而可以提高帧内预测的准确性。
图21为本申请一实施例提供的图像处理方法流程示意图。图21可以理解为图20所示的图像处理方法的一种更为具体方式,如图21所示,本申请实施例的图像处理方法包括如下步骤:
S701、确定当前图像块的量化参数,并基于量化参数对当前图像块进行编码,得到当前图像块的量化系数;
S702、基于当前图像块的量化参数,对量化系数进行反量化,得到当前图像块的残差块;
S703、根据残差块,得到当前图像块的重建图像块。
上述S701和S703的具体实现过程可以参照上述S601和S603的描述,在此不再赘述。
S704、判断是否对当前图像块的重建图像块进行质量增强。
方式一,获得第一标志,根据第一标志确定是否允许对当前图像块的重建图像块进行质量增强。
方式二,基于量化参数,对重建图像块进行质量增强,得到测试增强图像块;确定测试增强图像块的第一图像质量和重建图像块的第二图像质量,根据第一图像质量和第二图像质量,确定是否对当前图像块的重建图像块进行质量增强。
若确定对当前图像块的重建图像块进行质量增强时,则执行如下S705的步骤。
若确定不对当前图像块的重建图像块进行质量增强时,则执行如下S708的步骤。
S705、基于量化参数,对重建图像块进行特征提取,得到重建图像块的第二特征信息。
S706、对第二特征信息进行特征加权处理,得到重建图像块的第一特征信息。
S707、对重建图像块的第一特征信息非线性映射,得到增强图像块。
在一些实施例中,使用增强模型对重建图像块进行质量增强,得到增强图像块。
示例性的,如图18所示,本申请实施例的增强模型包括第二特征提取模块、第一特征提取模块和重建模块,其中第二特征提取模块用于提取浅层特征(即重建图像块的第二特征信息),第一特征提取模块用于提取深层特征(即重建图像块的第一特征信息),重建模块用于对深层特征进行非线性映射,得到最终的增强图像块。
示例性的,图18中示出的第二特征提取模块包括两个卷积层,但是,本申请实施例提供的第二特征提取模块的结构包括但不限于图18所示。
示例性的,图18示出了第一特征提取模块包括N个第一特征提取层和一个第二特征提取层,且N个第一特征提取层串联连接,且N个第一特征提取层的输出以及第二特征提取模块的输出均输入至第二特征提取层中。但是,本申请实施例提供的第一特征提取模块的网络结构包括但不限于图18所示。
示例性的,图18输出了重建模块包括两个卷积层,但是本申请实施例提供的重建模块的网络结构包括但不限于图18所示。
在一些实施例中,本申请实施例的第一特征提取层包括多尺度提取层和神经元注意力机制,其中,多尺度提取层用于进行多尺度的特征提取,神经元注意力机制用于特征加权。此时,本申请实施例的第一特征提取层也可以称为多尺度神经元注意(Multi-scale and Neuron Attention,简称MSNA)单元。
在一些实施例中,本申请实施例的增强模型也称为基于神经元注意力机制的神经网络模型((Neuron Attention-based CNN,简称NACNN)。
具体的,如图18所示,编码端将重建图像块和量化参数进行归一化处理后,进行拼接,将拼接结果输入增强模型,第二特征提取模块中的第一个卷积层对重建图像块和量化参数进行卷积操作,得到特征信息C1,再将特征信息C输入第二个卷积层中进行卷积操作,得到第二特征信息C2。编码端将第二特征信息C2输入第一特征提取模块中,第一特征提取模块中的第一个第一特征提取单元对C2进行多尺度特征提取和特征加权,得到第一个特征信息M1。接着,将第一个特征信息M1输入第二个第一特征提取单元中进行多尺度特征提取和特征加权,得到第二个特征信息M2。依次类推,得到第N个第一特征提取单元输出的第N个特征信息MN。编码端,将第一特征提取模块中各层提取的N个特征信息M1、M2…..MN,以及第二特征模块输出的第二特征信息C2进行拼接后,输入第一特征提取模块中的第二特征提取层中进行通道数的变化,输出重建图像块的第一特征信息F1。然后,编码端将第一特征信息F1输入重建模块中进行重建,具体是,经过重建模块中的第一个卷积层进行卷积操作,得到特征信息C3,重建模块中的第二个卷积层对特征信息C3进行卷积操作,得到特征信息C4。最后,根据特征信息C4,得到重建图像块的增强图像块O1,例如,将特征信息C4输出为重建图像块的增强图像块O1。
需要说明的是,上述S705至S707的具体实现过程可以参照上述S604的具体描述,在此不再赘述。
S708、对重建图像块进行环路滤波。
在一些实施例中,若编码端确定不使用增强模型对当前图像块的重建图像块进行质量增强时,则执行S708的步骤,对重建图像块进行环路滤波。
本申请实施例提供的图像处理方法,编码端在对重建图像块进行质量增强之前,首先判断是否对重建图像块进行质量增强,进行提高了图像处理的可靠性。本申请实施例提供的图像处理方法,编码端在使用增强模型对重建图像块进行质量增强之前,首先判断是否使用增强模型对重建图像块进行质量增强,进行提高了图像处理的可靠性。
应理解,图4至图21仅为本申请的示例,不应理解为对本申请的限制。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。另外,本申请实施例中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。具体地,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
上文结合图4至图21,详细描述了本申请的方法实施例,下文结合图22至图24,详细描述本申请的装置实施例。
图22是本申请一实施例提供的图像处理装置的示意性框图。
如图22所示,图像处理装置10包括:
解码单元11,用于解码码流,得到当前图像块的量化系数;
确定单元12,用于确定当前图像块对应的量化参数,并基于所述量化参数,对所述量化系数进行反量化,得到所述当前图像块的变换系数;
重建单元13,用于根据所述变换系数,确定所述当前图像块的重建图像块;
增强单元14,用于将所述重建图像块和所述量化参数,输入增强模型进行图像增强,得到增强图像块。
在一些实施例中,增强单元14,具体用于基于所述量化参数,对所述重建图像块进行特征加权处理,得到所述重建图像块的第一特征信息;根据所述第一特征信息,确定所述增强图像块。
在一些实施例中,增强单元14,具体用于基于所述量化参数,对所述重建图像块的第i-1个特征信息进行特征加权处理,得到所述重建图像块的第i个特征信息,所述i为从1到N的正整数,重复执行,得到所述重建图像块的第N个特征信息,若i为1时,则所述第i-1个特征信息为所述重建图像块;根据所述第N个特征信息,确定所述重建图像块的第一特征信息。
在一些实施例中,增强单元14,具体用于提取所述第i-1个特征信息的M个不同尺度的特征信息,所述M为大于1的正整数;对所述M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息;根据所述第i个加权特征 信息,确定所述第i个特征信息。
在一些实施例中,增强单元14,具体用于通过M个不同尺度的第一特征提取层,提取所述第i-1个特征信息的M个不同尺度的特征信息。
在一些实施例中,所述第一特征提取层包括卷积层,且不同的第一特征提取层所包括的卷积层的卷积核大小不同。
在一些实施例中,所述M个不同尺度的第一特征提取层中,至少一个第一特征提取层包括激活函数。
在一些实施例中,增强单元14,具体用于将所述M个不同尺度的特征信息进行拼接,得到第一拼接特征信息;对所述第一拼接特征信息进行加权处理,得到所述第i个加权特征信息。
在一些实施例中,增强单元14,具体用于通过加权处理层对所述第一拼接特征信息进行加权处理,得到具有第一通道数的加权特征信息;根据所述具有第一通道数的加权特征信息,得到所述第i个加权特征信息。
在一些实施例中,所述第i个加权特征信息的通道数,与所述第i-1个特征信息的通道数相同。
可选的,所述加权处理层包括神经元注意力机制。
在一些实施例中,增强单元14,具体用于将所述第i个加权特征信息与所述第i-1个特征信息之和,确定为所述第i个特征信息。
在一些实施例中,增强单元14,具体用于将所述第i个加权特征信息,确定为所述第i个特征信息。
在一些实施例中,增强单元14,具体用于基于所述量化参数,提取所述重建图像块的第二特征信息;对所述第二特征信息进行特征加权处理,得到所述重建图像块的第一特征信息。
在一些实施例中,增强单元14,具体用于将所述重建图像块和所述量化参数进行拼接,得到拼接信息;对所述拼接信息进行特征提取,得到所述第二特征信息。
在一些实施例中,增强单元14,具体用于通过第二特征提取模块对所述拼接信息进行特征提取,得到所述第二特征信息。
可选的,所述第二特征提取模块包括至少一个卷积层。
在一些实施例中,增强单元14,具体用于根据所述第N个特征信息的前N-1个特征信息中的至少一个,以及所述第N个特征信息,得到所述重建图像块的第一特征信息。
在一些实施例中,增强单元14,具体用于对所述前N-1个特征信息中的至少一个、所述第N个特征信息以及所述第二特征信息拼接,得到第二拼接特征信息;对所述第二拼接特征信息进行特征再提取,得到所述重建图像块的第一特征信息。
在一些实施例中,增强单元14,具体用于对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块。
在一些实施例中,增强单元14,具体用于通过重建模块,对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块。
可选的,所述重建模块包括至少一个卷积层。
在一些实施例中,解码单元11,还用于解码码流,得到第一标志,所述第一标志用于指示是否对所述当前图像块的重建图像块进行质量增强;增强单元14,还用于根据所述第一标志,在确定允许对所述重建图像块进行质量增强时,基于所述量化参数,对所述重建图像块进行质量增强,得到所述增强图像块。
在一些实施例中,增强单元14,还用于基于所述量化参数,对所述重建图像块进行质量增强,得到测试增强图像块;确定所述测试增强图像块的第一图像质量和所述重建图像块的第二图像质量;若所述第一图像质量大于所述第二图像质量,则将所述测试增强图像块确定为所述重建图像块的增强图像块。
可选的,所述重建图像块为所述当前图像块在第一分量下的重建图像块。
可选的,所述第一分量为亮度分量或色度分量。
在一些实施例中,增强单元14,具体用于对所述重建图像块和所述量化参数进行归一化处理;基于归一化处理后的所述重建图像块和所述量化参数,得到所述增强图像块。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图22所示的装置10可以执行本申请实施例解码端的图像处理方法,并且装置10中的各个单元的前述和其它操作和/或功能分别为了实现上述解码端的图像处理方法等各个方法中的相应流程,为了简洁,在此不再赘述。
图23是本申请一实施例提供的图像处理装置的示意性框图。
如图23所示,图像处理装置20包括:
确定单元21,用于确定当前图像块的量化参数,并基于所述量化参数对所述当前图像块进行编码,得到所述当前图像块的量化系数;
反量化单元22,用于基于所述当前图像块的量化参数,对所述量化系数进行反量化,得到所述当前图像块的残差块;
重建单元23,用于根据所述残差块,得到所述当前图像块的重建图像块;
增强单元24,用于基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块。
在一些实施例中,增强单元24,具体用于基于所述量化参数,对所述重建图像块进行特征加权处理,得到所述重建图像块的第一特征信息;根据所述第一特征信息,确定所述增强图像块。
在一些实施例中,增强单元24,具体用于基于所述量化参数,对所述重建图像块的第i-1个特征信息进行特征加权处理,得到所述重建图像块的第i个特征信息,所述i为从1到N的正整数,重复进行,得到所述重建图像块的第N个特征信息,若i为1时,则所述第i-1个特征信息为所述重建图像块;根据所述第N个特征信息,确定所述重建图像块的第一特征信息。
在一些实施例中,增强单元24,具体用于提取所述第i-1个特征信息的M个不同尺度的特征信息,所述M为大于1的正整数;对所述M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息;根据所述第i个加权特征 信息,确定所述第i个特征信息。
在一些实施例中,增强单元24,具体用于通过M个不同尺度的第一特征提取层,提取所述第i-1个特征信息的M个不同尺度的特征信息。
在一些实施例中,所述第一特征提取层包括卷积层,且不同的第一特征提取层所包括的卷积层的卷积核大小不同。
在一些实施例中,所述M个不同尺度的第一特征提取层中,至少一个第一特征提取层包括激活函数。
在一些实施例中,增强单元24,具体用于将所述M个不同尺度的特征信息进行拼接,得到第一拼接特征信息;对所述第一拼接特征信息进行加权处理,得到所述第i个加权特征信息。
在一些实施例中,增强单元24,具体用于通过加权处理层对所述第一拼接特征信息进行加权处理,得到具有第一通道数的加权特征信息;根据所述具有第一通道数的加权特征信息,得到所述第i个加权特征信息。
在一些实施例中,所述第i个加权特征信息的通道数,与所述第i-1个特征信息的通道数相同。
可选的,所述加权处理层包括神经元注意力机制。
在一些实施例中,增强单元24,具体用于将所述第i个加权特征信息与所述第i-1个特征信息之和,确定为所述第i个特征信息。
在一些实施例中,增强单元24,具体用于将所述第i个加权特征信息,确定为所述第i个特征信息。
在一些实施例中,增强单元24,具体用于基于所述量化参数,提取所述重建图像块的第二特征信息;对所述第二特征信息进行特征加权处理,得到所述重建图像块的第一特征信息。
在一些实施例中,增强单元24,具体用于将所述重建图像块和所述量化参数进行拼接,得到拼接信息;对所述拼接信息进行特征提取,得到所述第二特征信息。
在一些实施例中,增强单元24,具体用于通过第二特征提取模块对所述拼接信息进行特征提取,得到所述第二特征信息。
可选的,所述第二特征提取模块包括至少一个卷积层。
在一些实施例中,增强单元24,具体用于根据所述第N个特征信息的前N-1个特征信息中的至少一个,以及所述第N个特征信息,得到所述重建图像块的第一特征信息。
在一些实施例中,增强单元24,具体用于对所述前N-1个特征信息中的至少一个、所述第N个特征信息以及所述第二特征信息拼接,得到第二拼接特征信息;对所述第二拼接特征信息进行特征再提取,得到所述重建图像块的第一特征信息。
在一些实施例中,增强单元24,具体用于对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块。
在一些实施例中,增强单元24,具体用于通过重建模块,对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块,所述重建模块包括至少一个卷积层。
可选的,所述重建模块包括至少一个卷积层。
在一些实施例中,编码单元22,还用于在码流中写入第一标志,所述第一标志用于指示是否对所述当前图像块的重建图像块进行质量增强。
可选的,所述重建图像块为所述当前图像块在第一分量下的重建图像块。
可选的,所述第一分量为亮度分量或色度分量。
在一些实施例中,增强单元24,具体用于对所述重建图像块和所述量化参数进行归一化处理;基于归一化处理后的所述重建图像块和所述量化参数,得到所述增强图像块。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图23所示的装置20可以对应于执行本申请实施例编码点的图像处理方法中的相应主体,并且装置20中的各个单元的前述和其它操作和/或功能分别为了实现编码端的图像处理方法等各个方法中的相应流程,为了简洁,在此不再赘述。
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图24是本申请实施例提供的电子设备的示意性框图。
如图32所示,该电子设备30可以为本申请实施例所述的视频编码器,或者视频解码器,该电子设备30可包括:
存储器33和处理器32,该存储器33用于存储计算机程序34,并将该程序代码34传输给该处理器32。换言之,该处理器32可以从存储器33中调用并运行计算机程序34,以实现本申请实施例中的方法。
例如,该处理器32可用于根据该计算机程序34中的指令执行上述方法200中的步骤。
在本申请的一些实施例中,该处理器32可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器33包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取 存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在本申请的一些实施例中,该计算机程序34可以被分割成一个或多个单元,该一个或者多个单元被存储在该存储器33中,并由该处理器32执行,以完成本申请提供的方法。该一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序34在该电子设备30中的执行过程。
如图24所示,该电子设备30还可包括:
收发器33,该收发器33可连接至该处理器32或存储器33。
其中,处理器32可以控制该收发器33与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器33可以包括发射机和接收机。收发器33还可以进一步包括天线,天线的数量可以为一个或多个。
应当理解,该电子设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
图25是本申请实施例提供的视频编解码系统的示意性框图。
如图25所示,该视频编解码系统40可包括:视频编码器41和视频解码器42,其中视频编码器41用于执行本申请实施例涉及的视频编码方法,视频解码器42用于执行本申请实施例涉及的视频解码方法。
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
本申请还提供了一种码流,该码流是通过上述编码方式生成的。可选的,该码流中包括第一标志。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。

Claims (58)

  1. 一种图像处理方法,其特征在于,包括:
    解码码流,得到当前图像块的量化系数;
    确定当前图像块对应的量化参数,并基于所述量化参数,对所述量化系数进行反量化,得到所述当前图像块的变换系数;
    根据所述变换系数,确定所述当前图像块的重建图像块;
    基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块,包括:
    基于所述量化参数,对所述重建图像块进行特征加权处理,得到所述重建图像块的第一特征信息;
    根据所述第一特征信息,确定所述增强图像块。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块进行特征加权处理,得到所述重建图像块的第一特征信息,包括:
    基于所述量化参数,对所述重建图像块的第i-1个特征信息进行特征加权处理,得到所述重建图像块的第i个特征信息,所述i为从1到N的正整数,重复执行,得到所述重建图像块的第N个特征信息,若i为1时,则所述第i-1个特征信息为所述重建图像块;
    根据所述第N个特征信息,确定所述重建图像块的第一特征信息。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块的第i-1个特征信息进行特征加权处理,得到所述重建图像块的第i个特征信息,包括:
    提取所述第i-1个特征信息的M个不同尺度的特征信息,所述M为大于1的正整数;
    对所述M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息;
    根据所述第i个加权特征信息,确定所述第i个特征信息。
  5. 根据权利要求4所述的方法,其特征在于,所述提取所述第i-1个特征信息的M个不同尺度的特征信息,包括:
    通过M个不同尺度的第一特征提取层,提取所述第i-1个特征信息的M个不同尺度的特征信息。
  6. 根据权利要求5所述的方法,其特征在于,所述第一特征提取层包括卷积层,且不同的第一特征提取层所包括的卷积层的卷积核大小不同。
  7. 根据权利要求6所述的方法,其特征在于,所述M个不同尺度的第一特征提取层中,至少一个第一特征提取层包括激活函数。
  8. 根据权利要求4所述的方法,其特征在于,所述对所述M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息,包括:
    将所述M个不同尺度的特征信息进行拼接,得到第一拼接特征信息;
    对所述第一拼接特征信息进行加权处理,得到所述第i个加权特征信息。
  9. 根据权利要求8所述的方法,其特征在于,所述对所述拼接特征信息进行加权处理,得到所述第i个加权特征信息,包括:
    通过加权处理层对所述第一拼接特征信息进行加权处理,得到具有第一通道数的加权特征信息;
    根据所述具有第一通道数的加权特征信息,得到所述第i个加权特征信息。
  10. 根据权利要求9所述的方法,其特征在于,所述第i个加权特征信息的通道数,与所述第i-1个特征信息的通道数相同。
  11. 根据权利要求9所述的方法,其特征在于,所述加权处理层包括神经元注意力机制。
  12. 根据权利要求4-11任一项所述的方法,其特征在于,所述根据所述第i个加权特征信息,确定所述第i个特征信息,包括:
    将所述第i个加权特征信息与所述第i-1个特征信息之和,确定为所述第i个特征信息。
  13. 根据权利要求4-11任一项所述的方法,其特征在于,所述根据所述第i个加权特征信息,得到所述第i个特征信息,包括:
    将所述第i个加权特征信息,确定为所述第i个特征信息。
  14. 根据权利要求3-11任一项所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块进行特征加权处理,得到所述重建图像块的第一特征信息,包括:
    基于所述量化参数,提取所述重建图像块的第二特征信息;
    对所述第二特征信息进行特征加权处理,得到所述重建图像块的第一特征信息。
  15. 根据权利要求14所述的方法,其特征在于,所述基于所述量化参数,提取所述重建图像块的第二特征信息,包括:
    将所述重建图像块和所述量化参数进行拼接,得到拼接信息;
    对所述拼接信息进行特征提取,得到所述第二特征信息。
  16. 根据权利要求15所述的方法,其特征在于,所述对所述拼接信息进行特征提取,得到所述第二特征信息,包括:
    通过第二特征提取模块对所述拼接信息进行特征提取,得到所述第二特征信息。
  17. 根据权利要求14所述的方法,其特征在于,所述根据所述第N个特征信息,确定所述重建图像块的第一特征信息,包括:
    根据所述第N个特征信息的前N-1个特征信息中的至少一个,以及所述第N个特征信息,得到所述重建图像块 的第一特征信息。
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述第N个特征信息的前N-1个特征信息中的至少一个,以及所述第N个特征信息,得到所述重建图像块的第一特征信息,包括:
    对所述前N-1个特征信息中的至少一个、所述第N个特征信息以及所述第二特征信息拼接,得到第二拼接特征信息;
    对所述第二拼接特征信息进行特征再提取,得到所述重建图像块的第一特征信息。
  19. 根据权利要求2-11任一项所述的方法,其特征在于,所述根据所述第一特征信息,得到所述增强图像块,包括:
    对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块。
  20. 根据权利要求19所述的方法,其特征在于,所述对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块,包括:
    通过重建模块,对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块。
  21. 根据权利要求1-11任一项所述的方法,其特征在于,所述方法还包括:
    解码码流,得到第一标志,所述第一标志用于指示是否允许对所述当前图像块的重建图像块进行质量增强;
    所述基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块,包括:
    根据所述第一标志,在确定允许对所述重建图像块进行质量增强时,基于所述量化参数,对所述重建图像块进行质量增强,得到所述增强图像块。
  22. 根据权利要求1-11任一项所述的方法,其特征在于,所述方法还包括:
    基于所述量化参数,对所述重建图像块进行质量增强,得到测试增强图像块;
    确定所述测试增强图像块的第一图像质量和所述重建图像块的第二图像质量;
    所述基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块,包括:
    若所述第一图像质量大于所述第二图像质量,则将所述测试增强图像块确定为所述重建图像块的增强图像块。
  23. 根据权利要求1-11任一项所述的方法,其特征在于,所述重建图像块为所述当前图像块在第一分量下的重建图像块。
  24. 根据权利要求23所述的方法,其特征在于,所述第一分量为亮度分量或色度分量。
  25. 根据权利要求1-11任一项所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块,包括:
    对所述重建图像块和所述量化参数进行归一化处理;
    基于归一化处理后的所述重建图像块和所述量化参数,得到所述增强图像块。
  26. 一种图像处理方法,其特征在于,包括:
    确定当前图像块的量化参数,并基于所述量化参数对所述当前图像块进行编码,得到所述当前图像块的量化系数;
    基于所述当前图像块的量化参数,对所述量化系数进行反量化,得到所述当前图像块的残差块;
    根据所述残差块,得到所述当前图像块的重建图像块;
    基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块。
  27. 根据权利要求26所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块,包括:
    基于所述量化参数,对所述重建图像块进行特征加权处理,得到所述重建图像块的第一特征信息;
    根据所述第一特征信息,确定所述增强图像块。
  28. 根据权利要求27所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块进行特征加权处理,得到所述重建图像块的第一特征信息,包括:
    基于所述量化参数,对所述重建图像块的第i-1个特征信息进行特征加权处理,得到所述重建图像块的第i个特征信息,所述i为从1到N的正整数,重复进行,得到所述重建图像块的第N个特征信息,若i为1时,则所述第i-1个特征信息为所述重建图像块;
    根据所述第N个特征信息,确定所述重建图像块的第一特征信息。
  29. 根据权利要求28所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块的第i-1个特征信息进行特征加权处理,得到所述重建图像块的第i个特征信息,包括:
    提取所述第i-1个特征信息的M个不同尺度的特征信息,所述M为大于1的正整数;
    对所述M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息;
    根据所述第i个加权特征信息,确定所述第i个特征信息。
  30. 根据权利要求29所述的方法,其特征在于,所述提取所述第i-1个特征信息的M个不同尺度的特征信息,包括:
    通过M个不同尺度的第一特征提取层,提取所述第i-1个特征信息的M个不同尺度的特征信息。
  31. 根据权利要求30所述的方法,其特征在于,所述第一特征提取层包括卷积层,且不同的第一特征提取层所包括的卷积层的卷积核大小不同。
  32. 根据权利要求31所述的方法,其特征在于,所述M个不同尺度的第一特征提取层中,至少一个第一特征提取层包括激活函数。
  33. 根据权利要求29所述的方法,其特征在于,所述对所述M个不同尺度的特征信息进行加权处理,得到第i个加权特征信息,包括:
    将所述M个不同尺度的特征信息进行拼接,得到第一拼接特征信息;
    对所述第一拼接特征信息进行加权处理,得到所述第i个加权特征信息。
  34. 根据权利要求33所述的方法,其特征在于,所述对所述拼接特征信息进行加权处理,得到所述第i个加权特征信息,包括:
    通过加权处理层对所述第一拼接特征信息进行加权处理,得到具有第一通道数的加权特征信息;
    根据所述具有第一通道数的加权特征信息,得到所述第i个加权特征信息。
  35. 根据权利要求34所述的方法,其特征在于,所述第i个加权特征信息的通道数,与所述第i-1个特征信息的通道数相同。
  36. 根据权利要求34所述的方法,其特征在于,所述加权处理层包括神经元注意力机制。
  37. 根据权利要求29-36任一项所述的方法,其特征在于,所述根据所述第i个加权特征信息,确定所述第i个特征信息,包括:
    将所述第i个加权特征信息与所述第i-1个特征信息之和,确定为所述第i个特征信息。
  38. 根据权利要求29-36任一项所述的方法,其特征在于,所述根据所述第i个加权特征信息,得到所述第i个特征信息,包括:
    将所述第i个加权特征信息,确定为所述第i个特征信息。
  39. 根据权利要求28-36任一项所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块进行特征加权处理,得到所述重建图像块的第一特征信息,包括:
    基于所述量化参数,提取所述重建图像块的第二特征信息;
    对所述第二特征信息进行特征加权处理,得到所述重建图像块的第一特征信息。
  40. 根据权利要求39所述的方法,其特征在于,所述基于所述量化参数,提取所述重建图像块的第二特征信息,包括:
    将所述重建图像块和所述量化参数进行拼接,得到拼接信息;
    对所述拼接信息进行特征提取,得到所述第二特征信息。
  41. 根据权利要求40所述的方法,其特征在于,所述对所述拼接信息进行特征提取,得到所述第二特征信息,包括:
    通过第二特征提取模块对所述拼接信息进行特征提取,得到所述第二特征信息。
  42. 根据权利要求39所述的方法,其特征在于,所述根据所述第N个特征信息,确定所述重建图像块的第一特征信息,包括:
    根据所述第N个特征信息的前N-1个特征信息中的至少一个,以及所述第N个特征信息,得到所述重建图像块的第一特征信息。
  43. 根据权利要求42所述的方法,其特征在于,所述根据所述第N个特征信息的前N-1个特征信息中的至少一个,以及所述第N个特征信息,得到所述重建图像块的第一特征信息,包括:
    对所述前N-1个特征信息中的至少一个、所述第N个特征信息以及所述第二特征信息拼接,得到第二拼接特征信息;
    对所述第二拼接特征信息进行特征再提取,得到所述重建图像块的第一特征信息。
  44. 根据权利要求27-36任一项所述的方法,其特征在于,所述根据所述第一特征信息,得到所述增强图像块,包括:
    对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块。
  45. 根据权利要求44所述的方法,其特征在于,所述对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块,包括:
    通过重建模块,对所述重建图像块的第一特征信息非线性映射,得到所述增强图像块,所述重建模块包括至少一个卷积层。
  46. 根据权利要求26-36任一项所述的方法,其特征在于,所述方法还包括:
    获得第一标志,所述第一标志用于指示是否允许对所述当前图像块的重建图像块进行质量增强;
    所述基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块,包括:
    根据所述第一标志,在确定允许对所述重建图像块进行质量增强时,基于所述量化参数,对所述重建图像块进行质量增强,得到所述增强图像块。
  47. 根据权利要求26-36任一项所述的方法,其特征在于,所述方法还包括:
    基于所述量化参数,对所述重建图像块进行质量增强,得到测试增强图像块;
    确定所述测试增强图像块的第一图像质量和所述重建图像块的第二图像质量;
    所述基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块,包括:
    若所述第一图像质量大于所述第二图像质量,则将所述测试增强图像块确定为所述重建图像块的增强图像块。
  48. 根据权利要求26-36任一项所述的方法,其特征在于,所述方法还包括:
    在码流中写入第一标志,所述第一标志用于指示是否允许对所述当前图像块的重建图像块进行质量增强。
  49. 根据权利要求26-36任一项所述的方法,其特征在于,所述重建图像块为所述当前图像块在第一分量下的重建图像块。
  50. 根据权利要求49所述的方法,其特征在于,所述第一分量为亮度分量或色度分量。
  51. 根据权利要求26-36任一项所述的方法,其特征在于,所述基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块,包括:
    对所述重建图像块和所述量化参数进行归一化处理;
    基于归一化处理后的所述重建图像块和所述量化参数,得到所述增强图像块。
  52. 一种图像处理装置,其特征在于,包括:
    解码单元,用于解码码流,得到当前图像块的量化系数;
    确定单元,用于确定当前图像块对应的量化参数,并基于所述量化参数,对所述量化系数进行反量化,得到所述当前图像块的变换系数;
    重建单元,用于根据所述变换系数,确定所述当前图像块的重建图像块;
    增强单元,用于将所述重建图像块和所述量化参数,输入增强模型进行图像增强,得到增强图像块。
  53. 一种图像处理装置,其特征在于,包括:
    确定单元,用于确定当前图像块的量化参数,并基于所述量化参数对所述当前图像块进行编码,得到所述当前图像块的量化系数;
    反量化单元,用于基于所述当前图像块的量化参数,对所述量化系数进行反量化,得到所述当前图像块的残差块;
    重建单元,用于根据所述残差块,得到所述当前图像块的重建图像块;
    增强单元,用于基于所述量化参数,对所述重建图像块进行质量增强,得到增强图像块。
  54. 一种视频解码器,其特征在于,包括处理器和存储器;
    所示存储器用于存储计算机程序;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现如上述权利要求1至25任一项所述的方法。
  55. 一种视频编码器,其特征在于,包括处理器和存储器;
    所示存储器用于存储计算机程序;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现如上述权利要求26至51任一项所述的方法。
  56. 一种编解码系统,其特征在于,包括:
    根据权利要求54所述的视频解码器;
    以及根据权利要求55所述的视频编码器。
  57. 一种计算机可读存储介质,其特征在于,用于存储计算机程序;
    所述计算机程序使得计算机执行如上述权利要求1至25或26至51任一项所述的方法。
  58. 一种码流,其特征在于,所述码流是通过如上述权利要求26至51任一项所述的方法生成的。
PCT/CN2022/083382 2022-03-28 2022-03-28 图像处理方法、装置、设备、系统、及存储介质 WO2023184088A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/083382 WO2023184088A1 (zh) 2022-03-28 2022-03-28 图像处理方法、装置、设备、系统、及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/083382 WO2023184088A1 (zh) 2022-03-28 2022-03-28 图像处理方法、装置、设备、系统、及存储介质

Publications (1)

Publication Number Publication Date
WO2023184088A1 true WO2023184088A1 (zh) 2023-10-05

Family

ID=88198559

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083382 WO2023184088A1 (zh) 2022-03-28 2022-03-28 图像处理方法、装置、设备、系统、及存储介质

Country Status (1)

Country Link
WO (1) WO2023184088A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106612436A (zh) * 2016-01-28 2017-05-03 四川用联信息技术有限公司 一种基于dct变换下的视觉感知修正图像压缩方法
WO2017093188A1 (en) * 2015-11-30 2017-06-08 Telefonaktiebolaget Lm Ericsson (Publ) Encoding and decoding of pictures in a video
CN109302608A (zh) * 2017-07-25 2019-02-01 华为技术有限公司 图像处理方法、设备及系统
WO2020117781A1 (en) * 2018-12-04 2020-06-11 Interdigital Vc Holdings, Inc. Method and apparatus for video encoding and decoding with adjusting the quantization parameter to block size

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017093188A1 (en) * 2015-11-30 2017-06-08 Telefonaktiebolaget Lm Ericsson (Publ) Encoding and decoding of pictures in a video
CN106612436A (zh) * 2016-01-28 2017-05-03 四川用联信息技术有限公司 一种基于dct变换下的视觉感知修正图像压缩方法
CN109302608A (zh) * 2017-07-25 2019-02-01 华为技术有限公司 图像处理方法、设备及系统
WO2020117781A1 (en) * 2018-12-04 2020-06-11 Interdigital Vc Holdings, Inc. Method and apparatus for video encoding and decoding with adjusting the quantization parameter to block size

Similar Documents

Publication Publication Date Title
WO2017071480A1 (zh) 参考帧编解码的方法与装置
EP3711302B1 (en) Spatially adaptive quantization-aware deblocking filter
US11589055B2 (en) Method and apparatus of mode- and size-dependent block-level restrictions for position dependent prediction combination
WO2023039859A1 (zh) 视频编解码方法、设备、系统、及存储介质
WO2023279961A1 (zh) 视频图像的编解码方法及装置
CN114125446A (zh) 图像编码方法、解码方法和装置
WO2022194137A1 (zh) 视频图像的编解码方法及相关设备
CN113822824B (zh) 视频去模糊方法、装置、设备及存储介质
WO2022111233A1 (zh) 帧内预测模式的译码方法和装置
WO2022266955A1 (zh) 图像解码及处理方法、装置及设备
CN116965029A (zh) 使用卷积神经网络对图像进行译码的装置和方法
WO2023044868A1 (zh) 视频编解码方法、设备、系统、及存储介质
WO2023184088A1 (zh) 图像处理方法、装置、设备、系统、及存储介质
WO2023220969A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023000182A1 (zh) 图像编解码及处理方法、装置及设备
WO2023206420A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023184248A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023220946A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023173255A1 (zh) 图像编解码方法、装置、设备、系统、及存储介质
WO2023184747A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023092404A1 (zh) 视频编解码方法、设备、系统、及存储介质
RU2786022C1 (ru) Устройство и способ для ограничений уровня блока в зависимости от режима и размера
WO2023221599A1 (zh) 图像滤波方法、装置及设备
WO2023044919A1 (zh) 视频编解码方法、设备、系统、及存储介质
WO2023151365A1 (zh) 图像滤波方法、装置、设备及存储介质、程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22933946

Country of ref document: EP

Kind code of ref document: A1