WO2023092404A1 - 视频编解码方法、设备、系统、及存储介质 - Google Patents

视频编解码方法、设备、系统、及存储介质 Download PDF

Info

Publication number
WO2023092404A1
WO2023092404A1 PCT/CN2021/133240 CN2021133240W WO2023092404A1 WO 2023092404 A1 WO2023092404 A1 WO 2023092404A1 CN 2021133240 W CN2021133240 W CN 2021133240W WO 2023092404 A1 WO2023092404 A1 WO 2023092404A1
Authority
WO
WIPO (PCT)
Prior art keywords
cnnlf
model
target
models
under
Prior art date
Application number
PCT/CN2021/133240
Other languages
English (en)
French (fr)
Inventor
戴震宇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2021/133240 priority Critical patent/WO2023092404A1/zh
Publication of WO2023092404A1 publication Critical patent/WO2023092404A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Definitions

  • the present application relates to the technical field of video coding and decoding, and in particular to a video coding and decoding method, device, system, and storage medium.
  • Digital video technology can be incorporated into a variety of video devices, such as digital televisions, smartphones, computers, e-readers, or video players, among others.
  • video devices implement video compression technology to enable more effective transmission or storage of video data.
  • Loop filtering is an important part of video coding and decoding, which is used to filter the reconstructed image and improve the effect of the reconstructed image.
  • Loop filtering includes Convolutional Neural Network based In-Loop Filter (CNNLF for short) based on the residual neural network.
  • CNNNLF Convolutional Neural Network based In-Loop Filter
  • QP quantization Parameter
  • the QP will fluctuate during encoding.
  • the CNNLF model is selected based on the QP, there is a problem that the selection of the CNNLF model is inaccurate, resulting in poor filtering effect.
  • Embodiments of the present application provide a video encoding and decoding method, device, system, and storage medium, and propose a CNNLF model selection accuracy to improve filtering effects.
  • the present application provides a video coding method, including:
  • the target reconstruction block is the reconstruction block of the loop filter CNNLF model based on the residual neural network of the target to be input;
  • the target CNNLF model is determined according to the rate-distortion costs and selection probabilities respectively corresponding to the preset N CNNLF models, and the selection probability is obtained by predicting the target reconstruction block through a neural network model , the N is a positive integer;
  • the target reconstruction block is filtered using the target CNNLF model.
  • the embodiment of the present application provides a video decoding method, including:
  • the target reconstruction block is the reconstruction block of the loop filter CNNLF model based on the residual neural network of the target to be input;
  • the target reconstruction block is filtered using the target CNNLF model.
  • the present application provides a video encoder, configured to execute the method in the above first aspect or various implementations thereof.
  • the encoder includes a functional unit configured to execute the method in the above first aspect or its implementations.
  • the present application provides a video decoder, configured to execute the method in the above second aspect or various implementations thereof.
  • the decoder includes a functional unit for executing the method in the above second aspect or its various implementations.
  • a video encoder including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above first aspect or its various implementations.
  • a sixth aspect provides a video decoder, including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to invoke and run the computer program stored in the memory, so as to execute the method in the above second aspect or its various implementations.
  • a video codec system including a video encoder and a video decoder.
  • the video encoder is configured to execute the method in the above first aspect or its various implementations
  • the video decoder is configured to execute the method in the above second aspect or its various implementations.
  • the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or any of the implementations thereof. method.
  • a computer-readable storage medium for storing a computer program, and the computer program causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
  • a computer program product including computer program instructions, the computer program instructions cause a computer to execute any one of the above first to second aspects or the method in each implementation manner.
  • a computer program which, when running on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
  • a code stream is provided, and the code stream is generated based on the method in the first aspect above.
  • the decoding end obtains the target reconstruction block of the current block by decoding the code stream, and the target reconstruction block is the reconstruction block of the loop filter CNNLF model based on the residual neural network to be input; then, determine the target CNNLF model , the target CNNLF model is determined according to the rate-distortion cost and selection probability corresponding to the preset N CNNLF models, wherein the selection probability is obtained by predicting the target reconstruction block through the neural network model; finally, the target CNNLF model is used to reconstruct the target block to filter.
  • the target CNNLF model of this application is determined according to the rate-distortion cost and selection probability corresponding to the preset N CNNLF models, which improves the selection accuracy of the target CNNLF model, so that when filtering based on the accurately selected target CNNLF model, The filtering effect can be improved, thereby improving the decoding performance.
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application
  • Fig. 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application
  • Fig. 3 is a schematic block diagram of a video decoder involved in an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a loop filter unit involved in the present application.
  • FIG. 5 is a schematic flowchart of a video decoding method provided in an embodiment of the present application.
  • Fig. 6 is a kind of network structure schematic diagram of neural network model
  • FIG. 7 is a schematic diagram of an implementation process of an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a selection probability involved in the embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a CNNLF model corresponding to a brightness component
  • Fig. 10 is a schematic structural diagram of a CNNLF model corresponding to the chroma component
  • FIG. 11 is a schematic diagram of a residual block involved in an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a video decoding method provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a selection probability involved in the embodiment of the present application.
  • FIG. 14 is a schematic flowchart of a video decoding method provided by an embodiment of the present application.
  • FIG. 15 is a schematic diagram of a selection probability involved in the embodiment of the present application.
  • FIG. 16 is a schematic flowchart of a video encoding method provided in an embodiment of the present application.
  • FIG. 17 is a schematic flowchart of a video coding method provided in an embodiment of the present application.
  • FIG. 18 is a schematic flowchart of a video encoding method provided in an embodiment of the present application.
  • Fig. 19 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
  • Fig. 20 is a schematic block diagram of a video encoder provided by an embodiment of the present application.
  • Fig. 21 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • Fig. 22 is a schematic block diagram of a video codec system provided by an embodiment of the present application.
  • the application can be applied to the field of image codec, video codec, hardware video codec, dedicated circuit video codec, real-time video codec, etc.
  • the solution of the present application can be combined with audio and video coding standards (audio video coding standard, referred to as AVS), for example, H.264/audio video coding (audio video coding, referred to as AVC) standard, H.265/high efficiency video coding ( High efficiency video coding (HEVC for short) standard and H.266/versatile video coding (VVC for short) standard.
  • the solutions of the present application may operate in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
  • SVC scalable video codec
  • MVC multi-view video codec
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG. 1 is only an example, and the video codec system in the embodiment of the present application includes but is not limited to what is shown in FIG. 1 .
  • the video codec system 100 includes an encoding device 110 and a decoding device 120 .
  • the encoding device is used to encode (can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 in the embodiment of the present application can be understood as a device having a video encoding function
  • the decoding device 120 can be understood as a device having a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
  • the encoding device 110 may transmit the encoded video data (such as code stream) to the decoding device 120 via the channel 130 .
  • Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
  • channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded video data directly to decoding device 120 in real-time.
  • encoding device 110 may modulate the encoded video data according to a communication standard and transmit the modulated video data to decoding device 120 .
  • the communication medium includes a wireless communication medium, such as a radio frequency spectrum.
  • the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
  • the channel 130 includes a storage medium that can store video data encoded by the encoding device 110 .
  • the storage medium includes a variety of local access data storage media, such as optical discs, DVDs, flash memory, and the like.
  • the decoding device 120 may acquire encoded video data from the storage medium.
  • channel 130 may include a storage server that may store video data encoded by encoding device 110 .
  • the decoding device 120 may download the stored encoded video data from the storage server.
  • the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a file transfer protocol (FTP) server, and the like.
  • FTP file transfer protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113 .
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • the video source 111 may include at least one of a video capture device (for example, a video camera), a video archive, a video input interface, a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system Used to generate video data.
  • a video capture device for example, a video camera
  • a video archive for example, a video archive
  • a video input interface for example, a video archive
  • video input interface for example, a video input interface
  • computer graphics system used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
  • Video data may include one or more pictures or a sequence of pictures.
  • the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
  • Encoding information may include encoded image data and associated data.
  • the associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • An SPS may contain parameters that apply to one or more sequences.
  • a PPS may contain parameters applied to one or more images.
  • the syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the code stream.
  • the video encoder 112 directly transmits encoded video data to the decoding device 120 via the output interface 113 .
  • the encoded video data can also be stored on a storage medium or a storage server for subsequent reading by the decoding device 120 .
  • the decoding device 120 includes an input interface 121 and a video decoder 122 .
  • the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
  • the input interface 121 includes a receiver and/or a modem.
  • the input interface 121 can receive encoded video data through the channel 130 .
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
  • the display device 123 displays the decoded video data.
  • the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120 .
  • the display device 123 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • plasma display a plasma display
  • OLED organic light emitting diode
  • FIG. 1 is only an example, and the technical solutions of the embodiments of the present application are not limited to FIG. 1 .
  • the technology of the present application may also be applied to one-sided video encoding or one-sided video decoding.
  • Fig. 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression on images, and can also be used to perform lossless compression on images.
  • the lossless compression may be visually lossless compression or mathematically lossless compression.
  • the video encoder 200 can be applied to image data in luminance-chrominance (YCbCr, YUV) format.
  • the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y means brightness (Luma), Cb (U) means blue chroma, Cr (V) means red chroma, U and V are expressed as chroma (Chroma) for describing color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components
  • 2 chroma components YYYYCbCr
  • 4:2:2 means that every 4 pixels have 4 luminance components
  • 4 Chroma component YYYYCbCrCbCr
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and divides a frame of image into several coding tree units (coding tree units, CTUs) for each frame of image in the video data.
  • CTB can be called “Tree block", “Largest Coding unit” (LCU for short) or “coding tree block” (CTB for short).
  • LCU Large Coding unit
  • CTB coding tree block
  • Each CTU may be associated with a pixel block of equal size within the image. Each pixel may correspond to one luminance (luma) sample and two chrominance (chrominance or chroma) samples.
  • each CTU may be associated with one block of luma samples and two blocks of chroma samples.
  • a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32 and so on.
  • a CTU can be further divided into several coding units (Coding Unit, CU) for coding, and the CU can be a rectangular block or a square block.
  • the CU can be further divided into a prediction unit (PU for short) and a transform unit (TU for short), so that encoding, prediction, and transformation are separated, and processing is more flexible.
  • a CTU is divided into CUs in a quadtree manner, and a CU is divided into TUs and PUs in a quadtree manner.
  • the video encoder and video decoder can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, video encoders and video decoders may support 2N ⁇ 2N or N ⁇ N PU sizes for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, NxN or similarly sized symmetric PUs for inter prediction. The video encoder and video decoder may also support asymmetric PUs of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit 260. Decoded image cache 270 and entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less or different functional components.
  • the current block may be called a current coding unit (CU) or a current prediction unit (PU).
  • a predicted block may also be called a predicted image block or an image predicted block, and a reconstructed image block may also be called a reconstructed block or an image reconstructed image block.
  • the prediction unit 210 includes an inter prediction unit 211 and an intra estimation unit 212 . Because there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Due to the strong similarity between adjacent frames in video, the inter-frame prediction method is used in video coding and decoding technology to eliminate time redundancy between adjacent frames, thereby improving coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • the inter-frame prediction can refer to image information of different frames.
  • the inter-frame prediction uses motion information to find a reference block from the reference frame, and generates a prediction block according to the reference block to eliminate temporal redundancy;
  • Frames used for inter-frame prediction may be P frames and/or B frames, P frames refer to forward predictive frames, and B frames refer to bidirectional predictive frames.
  • the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
  • the motion vector can be an integer pixel or a sub-pixel. If the motion vector is sub-pixel, then it is necessary to use interpolation filtering in the reference frame to make the required sub-pixel block.
  • the reference frame found according to the motion vector A block of whole pixels or sub-pixels is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will further process the reference block to generate a prediction block. Reprocessing and generating a prediction block based on a reference block can also be understood as taking the reference block as a prediction block and then processing and generating a new prediction block based on the prediction block.
  • the intra-frame estimation unit 212 only refers to the information of the same frame of images to predict the pixel information in the current code image block for eliminating spatial redundancy.
  • a frame used for intra prediction may be an I frame.
  • the intra prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, a total of 35 prediction modes.
  • the intra-frame modes used by VVC include Planar, DC and 65 angle modes, with a total of 67 prediction modes.
  • a prediction matrix Matrix based intra prediction, MIP
  • MIP prediction matrix
  • CCLM CCLM prediction mode
  • MIP for a rectangular prediction block with a width of W and a height of H, MIP will select W reconstructed pixels in the row above the block and H reconstructed pixels in the left column as input. If the pixels at these positions have not been reconstructed, the pixels at the unreconstructed positions will be set to a default value, for example, for a 10bit pixel, the default filling value is 512. MIP generates predicted values mainly based on three steps, which are reference pixel averaging, matrix-vector multiplication, and linear interpolation upsampling.
  • MIP works on blocks with a size of 4x4 to 64x64.
  • the MIP mode will select the appropriate prediction matrix according to the length of the rectangle side; for a rectangle with a short side of 4, there are 16 sets of matrix parameters to choose from; for a short side For the rectangle of 8, there are 8 sets of matrix parameters for selection; for other rectangles, there are 6 sets of matrix parameters for selection.
  • MIP will use the optional matrix for prediction, and the index of the matrix with the lowest cost will be encoded into the code stream for the decoder to read and modify the matrix parameters for prediction.
  • the intra-frame prediction will be more accurate, and it will be more in line with the demand for the development of high-definition and ultra-high-definition digital video.
  • the residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, residual unit 220 may generate a residual block for a CU such that each sample in the residual block has a value equal to the difference between the samples in the pixel blocks of the CU, and the samples in the PUs of the CU. Corresponding samples in the predicted block.
  • Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with TUs of a CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with a CU by adjusting the QP value associated with the CU.
  • QP quantization parameter
  • Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
  • the reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed image block associated with the TU. By reconstructing the sample blocks of each TU of the CU in this way, the video encoder 200 can reconstruct the pixel blocks of the CU.
  • the loop filtering unit 260 is used to process the inversely transformed and inversely quantized pixels, compensate for distortion information, and provide better references for subsequent encoded pixels. For example, deblocking filtering operations can be performed to reduce block effect.
  • the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, wherein the deblocking filtering unit is used for deblocking, and the SAO/ALF unit Used to remove ringing effects.
  • SAO/ALF sample adaptive compensation/adaptive loop filtering
  • the decoded image buffer 270 may store reconstructed pixel blocks.
  • Inter prediction unit 211 may use reference pictures containing reconstructed pixel blocks to perform inter prediction on PUs of other pictures.
  • intra estimation unit 212 may use the reconstructed pixel blocks in decoded picture cache 270 to perform intra prediction on other PUs in the same picture as the CU.
  • Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
  • Fig. 3 is a schematic block diagram of a video decoder involved in an embodiment of the present application.
  • the video decoder 300 includes: an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filter unit 350 and a decoded image buffer 360 . It should be noted that the video decoder 300 may include more, less or different functional components.
  • the video decoder 300 can receive code streams.
  • the entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the codestream, the entropy decoding unit 310 may parse the entropy-encoded syntax elements in the codestream.
  • the prediction unit 320 , the inverse quantization/transformation unit 330 , the reconstruction unit 340 and the loop filter unit 350 can decode video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
  • the prediction unit 320 includes an intra estimation unit 321 and an inter prediction unit 322 .
  • Intra estimation unit 321 may perform intra prediction to generate a predictive block for a PU. Intra estimation unit 321 may use an intra prediction mode to generate a prediction block for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 321 may also determine the intra prediction mode of the PU from one or more syntax elements parsed from the codestream.
  • the inter prediction unit 322 may construct a first reference picture list (list 0) and a second reference picture list (list 1) according to the syntax elements parsed from the codestream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 310 may parse the motion information for the PU. Inter prediction unit 322 may determine one or more reference blocks for the PU according to the motion information of the PU. Inter prediction unit 322 may generate a predictive block for the PU from one or more reference blocks for the PU.
  • Inverse quantization/transform unit 330 may inverse quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use QP values associated with CUs of the TU to determine the degree of quantization.
  • inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
  • Reconstruction unit 340 uses the residual blocks associated with the TUs of the CU and the prediction blocks of the PUs of the CU to reconstruct the pixel blocks of the CU. For example, the reconstruction unit 340 may add the samples of the residual block to the corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain the reconstructed image block.
  • Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
  • Video decoder 300 may store the reconstructed picture of the CU in decoded picture cache 360 .
  • the video decoder 300 may use the reconstructed picture in the decoded picture buffer 360 as a reference picture for subsequent prediction, or transmit the reconstructed picture to a display device for presentation.
  • the basic process of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block.
  • the residual unit 220 may calculate a residual block based on the predicted block and the original block of the current block, that is, the difference between the predicted block and the original block of the current block, and the residual block may also be called residual information.
  • the residual block can be transformed and quantized by the transformation/quantization unit 230 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
  • the residual block before being transformed and quantized by the transform/quantization unit 230 may be called a time domain residual block, and the time domain residual block after being transformed and quantized by the transform/quantization unit 230 may be called a frequency residual block or a frequency-domain residual block.
  • the entropy coding unit 280 receives the quantized variation coefficients output by the variation quantization unit 230 , and may perform entropy coding on the quantized variation coefficients to output a code stream.
  • the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary code stream.
  • the entropy decoding unit 310 can analyze the code stream to obtain the prediction information of the current block, the quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
  • the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks form a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain a decoded image.
  • the encoding end also needs similar operations to the decoding end to obtain the decoded image.
  • the decoded image may also be referred to as a reconstructed image, and the reconstructed image may be a subsequent frame as a reference frame for inter-frame prediction.
  • the block division information determined by the encoder as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, and loop filtering, etc., are carried in the code stream when necessary.
  • the decoding end analyzes the code stream and analyzes the existing information to determine the same block division information as the encoding end, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, so as to ensure the decoding image obtained by the encoding end It is the same as the decoded image obtained by the decoder.
  • the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework.
  • the basic process of the video codec but not limited to the framework and process.
  • FIG. 4 is a schematic structural diagram of a loop filtering unit involved in the present application.
  • the loop filtering unit mainly includes a deblocking filter (DeBlocking Filter, referred to as DBF), sample adaptive compensation (Sample adaptive Offset, referred to as SAO) and an adaptive correction filter (Adaptive loop filter, referred to as ALF) ).
  • DBF deblocking filter
  • SAO sample adaptive compensation
  • ALF adaptive correction filter
  • a loop filter based on a residual neural network Convolutional Neural Network based In -Loop Filter, CNNLF for short
  • CNNLF Convolutional Neural Network based In -Loop Filter
  • the embodiments of the present application determine the Rate Distortion Optimization (RDO) corresponding to the N CNNTF models respectively when using N preset CNNTF models to filter the target reconstruction block of the current block. ), and input the target reconstruction block of the current block into the neural network model, and predict the selection probabilities corresponding to the N CNNTF models through the neural network model, and according to the rate-distortion costs and selection probabilities corresponding to the N CNNTF models, N CNNLF
  • the target CNNLF is selected in the model to realize the accurate selection of the target CNNLF.
  • the video decoding method provided by the embodiment of the present application is introduced below with reference to FIG. 5 and taking the decoding end as an example.
  • FIG. 5 is a schematic flowchart of a video decoding method provided by an embodiment of the present application, and the embodiment of the present application is applied to the video decoder shown in FIG. 1 and FIG. 2 .
  • the method of the embodiment of the present application includes:
  • the current block is also referred to as a current decoding block, a current decoding unit, a decoding block, a block to be decoded, a current block to be decoded, and the like.
  • the decoding process involved in the embodiment of the present application may be that the entropy decoding unit 310 can analyze the code stream to obtain the prediction information of the current block, the quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction for the current block based on the prediction information Or inter prediction produces a prediction block for the current block.
  • the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
  • the loop filtering unit 350 performs loop filtering on the reconstructed block to obtain a decoded block.
  • the target reconstruction block in this embodiment is a reconstruction block to be input into the target CNNLF model.
  • the reconstruction block of the current block needs to pass through DBF and/or SAO before CNNLF, then the reconstruction block after DBF and/or SAO is determined as the target reconstruction block of the current block.
  • the target CNNLF model is determined according to the rate-distortion cost and selection probability corresponding to the preset N CNNLF models respectively, and the selection probability is obtained by predicting the target reconstruction block through the neural network model, and N is a positive integer.
  • the present application does not limit the specific network structure of the neural network, for example, it may be a convolutional neural network.
  • K and L are both positive integers.
  • At least one convolutional layer in the K convolutional layers is connected with a downsampling unit, and the downsampling unit is used for downsampling the feature map output by the convolutional layer.
  • the downsampling unit may be a maximum pooling layer or an average pooling layer.
  • the neural network model includes 3 convolutional layers (Conv), the convolution kernel size of each convolutional layer is 3X3, and each convolutional layer is connected with a maximum In the pooling layer, the convolution kernel size of the maximum pooling layer is 2X2, that is to say, the maximum pooling layer realizes 2 times downsampling.
  • the convolution kernel size of the maximum pooling layer is 2X2, that is to say, the maximum pooling layer realizes 2 times downsampling.
  • the reconstruction block is input into the neural network model, and the neural network model outputs the selection probability corresponding to each CNNLF model in the N CNNLF models, such as a, b, ... n.
  • the neural network of the present application is obtained through the training of the reconstruction block and the CNNLF model corresponding to the reconstruction block, wherein the training process can be that the training reconstruction block is input into the neural network model, and the N CNNLF models respectively corresponding to the input of the neural network model are obtained.
  • the selection probability prediction value is compared with the selection probability prediction value corresponding to the N CNNLF models and the true value of the CNNLF model corresponding to the training reconstruction block to obtain a loss, and the neural network model is updated according to the loss.
  • the next training reconstruction block into the updated neural network model, and obtain the predicted values of the selection probability corresponding to the N CNNLF models input by the neural network model, and compare the predicted values of the selection probability corresponding to the N CNNLF models with the next training
  • the true value of the CNNLF model corresponding to the reconstruction block is compared to obtain a loss, and the neural network model is updated according to the loss.
  • the training end condition may be that the number of training times reaches a preset number, or that the loss reaches a preset loss.
  • the above-mentioned target reconstruction block is input into the trained neural network model, and the neural network model can output the probabilities that the N CNNLF models are respectively selected to filter the target reconstruction block.
  • the target CNNLF model in the embodiment of the present application is determined based on the rate-distortion costs and selection probabilities corresponding to N CNNLF models respectively. Compared with determining the target CNNLF model according to the quantization step size, the selection accuracy of the target CNNLF model can be improved. When filtering based on an accurately selected target CNNLF model, the filtering effect can be improved.
  • the target CNNLF model is determined by the decoder according to the rate-distortion costs and selection probabilities respectively corresponding to the preset N CNNLF models.
  • the target CNNLF model is determined by the encoder according to the rate-distortion costs and selection probabilities respectively corresponding to the preset N CNNLF models.
  • the implementation methods of determining the target CNNLF model in S502 above include but are not limited to the following:
  • decode code stream obtain the first sign, the first sign is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability in N CNNLF models;
  • the encoder selects the target CNNLF model from the N CNNLF models according to the rate-distortion costs and selection probabilities corresponding to the preset N CNNLF models, and writes the first flag in the code stream, through
  • the first flag indicates whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models.
  • the decoder determines the target CNNLF model according to the value of the first flag. For example, if the value of the first flag is the first value, the decoder selects the model corresponding to the maximum selection probability from the N CNNLF models. CNNLF model, the CNNLF model corresponding to the maximum selection probability is determined as the target CNNLF model.
  • the decoder selects the CNNLF model with the smallest rate-distortion cost from the N CNNLF models, and determines the CNNLF model with the smallest rate-distortion cost as the target CNNLF model.
  • the present application does not limit the specific values of the above-mentioned first value and the second value.
  • the first value is 1.
  • the second value is 0.
  • the manner of determining the target CNNLF model includes but is not limited to the following examples:
  • Example 1 if the value of the first flag is the first value, input the target reconstruction block into the neural network model, obtain the selection probabilities corresponding to the N CNNLF models output by the neural network model, and set the largest among the N CNNLF models The CNNLF model corresponding to the selected probability is determined as the target CNNLF model.
  • the first value is used to indicate that the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models.
  • the decoding end decodes the code stream to obtain the first flag. If the value of the first flag is the first value, it indicates that the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models.
  • the switch is connected to the above nodes, and the selection probabilities corresponding to the N CNNLF models are predicted through the neural network model. Specifically, the target reconstruction block of the current block is input into the neural network model, and the selection probabilities corresponding to the N CNNLF models predicted by the neural network model are obtained, and then the CNNLF model corresponding to the maximum selection probability is determined as the target CNNLF model.
  • the CNNLF model corresponding to the maximum selection probability is determined.
  • the index of the CNNLF model corresponding to the maximum selection probability can also be directly carried in the code stream through the encoding end. In this way, the decoder can directly decode the index of the CNNLF model corresponding to the maximum selection probability from the code stream, and determine the CNNLF model corresponding to the index as the target CNNLF model without calculating it by itself, thereby reducing the amount of data to be processed for decoding , improving the decoding efficiency.
  • the decoder predicts the selection probabilities corresponding to the N CNNLF models through the neural network model, and determines the CNNLF model corresponding to the maximum selection probability as the target CNNLF model , compared to the rate-distortion cost method that must transmit the index of the target CNNLF model in the code stream, this example 1 reduces codewords and reduces coding complexity.
  • Example 2 if the value of the first flag is the second value, decode the code stream to obtain the index of the target CNNLF model, and determine the CNNLF model corresponding to the index of the target CNNLF model among the N CNNLF models as the target CNNLF model , wherein the second value is used to indicate that the target CNNLF model is the CNNLF model corresponding to the smallest rate-distortion cost among the N CNNLF models.
  • the encoder determines that the target CNNLF model is the CNNLF model corresponding to the minimum rate-distortion cost
  • the encoder carries the index of the target CNNLF model (that is, the CNNLF model corresponding to the minimum rate-distortion cost) in the code stream.
  • the decoding end decodes the code stream to obtain the first flag. If the value of the first flag is the second value, the second value indicates that the target CNNLF model is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models.
  • the switch is connected to the following nodes, and the decoder continues to decode the code stream to obtain the index of the target CNNLF model, and then the CNNLF model corresponding to the index of the target CNNLF model among the N CNNLF models is determined as the target CNNLF model.
  • the decoder directly decodes from the code stream to obtain the index of the target CNNLF model, and then obtains the target CNNLF model.
  • the method is simple and can quickly obtain the target CNNLF model , thereby improving the decoding efficiency.
  • the decoder determines the target CNNLT model through the first flag carried in the code stream, which is simple and fast.
  • Method 2 if the code stream does not carry the first flag and carries the index of the target CNNLF model, the decoder directly obtains the index of the target CNNLF model by decoding the code stream, and then converts the CNNLF model corresponding to the index in the N CNNLF models , determined as the target CNNLF model.
  • This method is simple, and the decoding end does not need to perform other unnecessary operations, thereby improving decoding efficiency.
  • the target CNNLF model is a neural network model.
  • the target reconstruction block is input into the target CNNLF model, and after the processing of each layer in the target CNNLF model, the filtered reconstruction block is finally output.
  • the filtered reconstruction block is closer to the original image block, thereby improving the decoding performance.
  • the embodiment of the present application further includes a second flag at the sequence level, where the second flag is used to indicate whether the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model.
  • the value of the second flag is the third value, it indicates that the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model.
  • the value of the second flag is the fourth numerical value, it indicates that the current sequence does not allow the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model.
  • the third value is different from the fourth value.
  • the present application does not limit the specific values of the above-mentioned third value and the fourth value.
  • the third value is 1.
  • the fourth value is 0.
  • the decoder Before executing this embodiment, or before executing the above S502, the decoder needs to judge whether the current block can determine the target CNNLF model by combining the rate-distortion cost and the neural network model according to the second flag at the sequence level. If it is judged that the current block allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model, for example, when the value of the second flag is the third value, the decoding end executes the method of the embodiment of the present application, or executes the above S502 , through the above method 1 or method 2, determine the target CNNLF model.
  • the field adaptive_model_selection_enable_flag can be used to indicate the second field.
  • a second field is added to the sequence header, as shown in Table 1 specifically:
  • the decoder When decoding, the decoder first parses the sequence header information in Table 1 above, and parses the second flag adaptive_model_selection_enable_flag from the sequence header information, and judges whether the current block can use the rate-distortion cost and The method of combining neural network models to determine the target CNNLF model, if it is judged that the current block can use the method of combining the rate-distortion cost and the neural network model to determine the target CNNLF model, execute the method of the embodiment of the present application to determine the target CNNLF model. If it is judged that the current block cannot use the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model, then skip the solution of the embodiment of the present application and use other solutions to determine the target CNNLF model.
  • the decoding end obtains the target reconstruction block of the current block by decoding the code stream, and the target reconstruction block is the reconstruction block of the loop filter CNNLF model based on the residual neural network of the target to be input; then , to determine the target CNNLF model, the target CNNLF model is determined according to the rate-distortion cost and selection probability corresponding to the preset N CNNLF models respectively, where the selection probability is obtained by predicting the neural network model based on the target reconstruction block; finally use the target The CNNLF model filters the object reconstruction blocks.
  • the target CNNLF model of this application is determined according to the rate-distortion cost and selection probability corresponding to the preset N CNNLF models, which improves the selection accuracy of the target CNNLF model, so that when filtering based on the accurately selected target CNNLF model, The filtering effect can be improved, thereby improving the decoding performance.
  • the loop filter based on the residual neural network designs different network structures for the luminance component and the chrominance component, as shown in FIG. 9 and FIG. 10
  • FIG. 9 is a CNNLF corresponding to the luminance component.
  • Figure 10 is a schematic diagram of the structure of a CNNLF model corresponding to the chroma component.
  • the CNNLF model consists of convolutional layers, activation layers, residual blocks, skip connections, etc.
  • the network structure of the residual block is shown in Figure 11, which consists of a convolutional layer CONV, an activation layer RELU, and a jump connection.
  • the CNNLF model also includes a global jump connection from input to output, which enables the network to focus on learning residuals and accelerates the convergence process of the network.
  • the luminance component is introduced as one of the inputs to guide the filtering of the chrominance component.
  • Figure 9 and Figure 10 are only a network structure of the CNNLF model corresponding to the luminance component and the chrominance component involved in this application, and the network structure of the CNNLF model involved in the embodiment of the application includes but is not limited to Figure 9 and Figure 10 shows.
  • multiple (for example 4) I-frame luminance component models, multiple (for example 4) non-I-frame luminance component models, and multiple (for example 4) chroma U component models are trained offline.
  • Multiple (eg 4) chroma V-component models Exemplarily, the DIV2K image data set is used to convert the image from RGB to a single-frame video sequence in YUV4:2:0 format as label data. Then use HPM (High Performance Model, AVS high-performance test model) to encode the sequence under the All Intra (full frame) configuration, turn off traditional filters such as DBF, SAO and ALF, and set the quantization step size to 27 to 50.
  • HPM High Performance Model, AVS high-performance test model
  • the quantization step size may also be set to other values, and the present application is not limited thereto.
  • the reconstructed sequence obtained by encoding it is divided into 4 intervals according to the range of QP27 ⁇ 31, 32 ⁇ 37, 38 ⁇ 44, and 45 ⁇ 50 (the division of the quantization interval is only an example, and the application is not limited thereto), cutting Be 128*128 (the size of image block is just an example, the application is not limited to this) image block as training data, has respectively trained a plurality of (for example 4) I frame luminance component models, a plurality of (for example 4) color chroma U component model, multiple (eg 4) chroma V component models.
  • BVI-DVC BVI-Digital Video Compression, BVI digital video compression
  • HPM-ModAI to encode under Random Access (random access) configuration
  • close traditional filters such as DBF, SAO and ALF
  • Turn on the CNNLF of the I frame collect the encoded and reconstructed non-I frame data, and train multiple (for example, 4) non-I frame luminance component models respectively.
  • the CNNLF models corresponding to the chroma component and the luminance component are different.
  • the CNNLF model corresponding to the chroma component is used to filter the chroma component of the current block.
  • the CNNLF model corresponding to the luminance component is used to filter the luminance component of the current block, as shown in FIG. 12 and FIG. 14 above.
  • FIG. 12 is a schematic flowchart of a video decoding method provided by an embodiment of the present application. If the current block includes chroma components, this embodiment mainly introduces the process of determining the target CNNLF model under the chroma components and the process of using the determined target CNNLF model to filter the target reconstruction block under the chroma components. As shown in Figure 12, including:
  • reconstruction blocks under chroma components are also referred to as chroma reconstruction blocks.
  • the target CNNLF model under the chroma component is determined according to the rate-distortion cost and selection probability corresponding to the N CNNLF models under the preset chroma component, and the selection probability is based on the neural network model under the chroma component.
  • the target reconstruction block is predicted.
  • this embodiment has N CNNLF models in the embodiment of the present application that are CNNLF models under chroma components, for example, including 4 chroma U component models and 4 chroma V component models.
  • the implementation of determining the target CNNLF model under the chroma component in S602 above includes but is not limited to the following:
  • the code stream includes a first flag
  • the first flag is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component
  • the above S602 includes the following S602- A1 and S602-A2:
  • the encoder selects the target CNNLF model under the chroma component from the N CNNLF models under the chroma component according to the rate-distortion costs and selection probabilities respectively corresponding to the preset N CNNLF models under the chroma component Afterwards, the first flag is written in the code stream, and the first flag indicates whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component.
  • the decoding end obtains the code stream, decodes the code stream, obtains the first mark, and determines the target CNNLF model under the chroma component according to the first mark, so as to realize the simple and fast determination of the target CNNLF model under the chroma component.
  • the decoder determines the target CNNLF model under the chroma component according to the value of the first flag.
  • the decoder starts from the N Select the CNNLF model corresponding to the maximum selection probability from the CNNLF model, and determine the CNNLF model corresponding to the maximum selection probability as the target CNNLF model under the chroma component. If the value of the first flag is the second value, the decoder selects the CNNLF model with the smallest rate-distortion cost from the N CNNLF models under the chroma component, and determines the CNNLF model with the smallest rate-distortion cost as the CNNLF model under the chroma component.
  • the target CNNLF model is the first value.
  • the manner of determining the target CNNLF model under the chroma component includes but is not limited to the following examples:
  • Example 1 if the value of the first flag is the first value, then input the target reconstruction block under the chroma component into the neural network model, and obtain the N CNNLF models corresponding to the chroma component output by the neural network model respectively
  • the selection probability is to determine the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component as the target CNNLF model under the chroma component.
  • the first value is used to indicate that the target CNNLF model under the chroma component is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component.
  • the decoder decodes the code stream to obtain the first flag. If the value of the first flag is the first value, it indicates that the target CNNLF model under the chroma component is N CNNLF models under the chroma component. In the CNNLF model corresponding to the maximum selection probability, at this time, the selection probability corresponding to the N CNNLF models under the chroma component is predicted by the neural network model. Specifically, the target reconstruction block of the current block under the chrominance component is input into the neural network model, and the selection probabilities corresponding to the N CNNLF models under the chrominance component predicted by the neural network model are obtained, and then the maximum selection probability corresponding to CNNLF model, determined as the target CNNLF model under the chroma component.
  • the decoder will use the neural network model to predict the selection probabilities corresponding to the N CNNLF models under the chroma component, and calculate the CNNLF model corresponding to the maximum selection probability It is determined as the target CNNLF model under the chroma component.
  • the index of the target CNNLF model under the chroma component must be transmitted in the code stream.
  • Example 1 reduces codewords, reduces encoding complexity, and improves encoding and decoding. performance.
  • the above-mentioned target reconstruction block under the chrominance component is input into the neural network model, and the N under the chrominance component output by the neural network model is obtained.
  • the selection probabilities corresponding to the two CNNLF models specifically include: obtaining the neural network model corresponding to the chroma component, inputting the target reconstruction block under the chroma component into the neural network model corresponding to the chroma component, and obtaining the neural network model corresponding to the chroma component The selection probabilities corresponding to the N CNNLF models under the output chroma component.
  • Example 2 if the value of the first flag is the second value, decode the code stream to obtain the index of the target CNNLF model under the chroma component, and set the index of the target CNNLF model among the N CNNLF models under the chroma component
  • the corresponding CNNLF model is determined as the target CNNLF model under the chroma component, where the second value is used to indicate that the target CNNLF model under the chroma component is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models under the chroma component .
  • the encoder determines that the target CNNLF model under the chroma component is the CNNLF model corresponding to the minimum rate-distortion cost
  • the encoder determines that the target CNNLF model under the chroma component (that is, the CNNLF model corresponding to the minimum rate-distortion cost ) index is carried in the code stream.
  • the decoding end decodes the code stream to obtain the first flag. If the value of the first flag is the second value, the second value indicates that the target CNNLF model under the chroma component is the smallest rate-distortion among the N CNNLF models under the chroma component.
  • the CNNLF model corresponding to the cost is the first flag.
  • the decoder continues to decode the code stream to obtain the index of the target CNNLF model under the chroma component, and then the CNNLF model corresponding to the index of the target CNNLF model among the N CNNLF models under the chroma component. Determined as the target CNNLF model under the chroma component.
  • the decoder directly decodes from the code stream to obtain the index of the target CNNLF model under the chroma component, and then obtains the target CNNLF model under the chroma component,
  • the method is simple, and the target CNNLF model under the chrominance component can be quickly obtained, thereby improving the decoding efficiency.
  • the above-mentioned first flag may be represented by a field chroma_nn_rdo_equal_flag.
  • the index of the target CNNLF model under the chroma component can be represented by the field chroma_cnnlf_model_index.
  • the above chroma_cnnlf_model_index is carried in the image header.
  • the definition of the image header is shown in Table 2 below:
  • the above-mentioned chroma_nn_rdo_equal_flag is also called the image-level chroma model selection consistency flag. If it is 0, it means that the target CNNLF model under the chroma component is the one corresponding to the minimum rate-distortion cost among the N CNNLF models under the chroma component. CNNLF model, and then continue to decode chroma_cnnlf_model_index in Table 2, and determine the CNNLF model corresponding to chroma_cnnlf_model_index as the target CNNLF model under the chroma component.
  • the decoder determines the target CNNLT model under the chrominance component through the first flag carried in the code stream, which is simple and fast.
  • Method 2 if the code stream does not carry the first flag and carries the index of the target CNNLF model under the chroma component, the decoder directly obtains the index of the target CNNLF model under the chroma component by decoding the code stream, and then converts the color
  • the CNNLF model corresponding to the index among the N CNNLF models under the degree component is determined as the target CNNLF model under the chroma component.
  • step S603 is performed.
  • the target CNNLF model under the chroma component is a neural network model.
  • the target reconstruction block under the chroma component is input into the target CNNLF model under the chroma component, and after the target CNNLF under the chroma component
  • the processing of each layer in the model finally outputs the reconstructed block under the filtered chroma component, and the reconstructed block under the filtered chroma component is closer to the original chroma block, thereby improving the decoding performance.
  • the embodiment of the present application further includes a second flag at the sequence level, which is used to indicate whether the chroma component of the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the chroma The target CNNLF model under the component.
  • the value of the second flag is the third value, it indicates that the chroma component of the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model under the chroma component.
  • the value of the second flag is the fourth numerical value, it indicates that the chrominance component of the current sequence does not allow the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model under the chrominance component.
  • the third value is different from the fourth value.
  • the decoder needs to judge whether the chrominance component of the current block can be determined by combining the rate-distortion cost and the neural network model according to the second flag at the sequence level Targeted CNNLF model under chroma components. If it is judged that the chroma component of the current block is allowed to use the method of combining the rate-distortion cost and the neural network model to determine the target CNNLF model under the chroma component, for example, when the value of the second flag is the third value, the decoder executes this application In the manner of the embodiment, or execute the above S602, determine the target CNNLF model under the chrominance component.
  • the decoding end obtains the target reconstruction block under the chroma components of the current block by decoding the code stream, and the target reconstruction block under the chroma components is The reconstruction block of the target CNNLF model is to be input; then, the target CNNLF model under the chroma component is determined, and the target CNNLF model under the chroma component is the rate-distortion cost corresponding to the N CNNLF models under the preset chroma component respectively and the selection probability, wherein the selection probability is obtained by predicting the target reconstruction block under the chroma component through a neural network model (such as the neural network model corresponding to the chroma component); The target reconstruction block under the degree component is filtered.
  • a neural network model such as the neural network model corresponding to the chroma component
  • the target CNNLF model under the chroma component of this application is determined according to the rate-distortion costs and selection probabilities corresponding to the N CNNLF models under the preset chroma component, which improves the selection of the target CNNLF model under the chroma component Accuracy, in this way, when filtering based on the target CNNLF model under the accurately selected chroma component, the filtering effect of the chroma component can be improved, thereby improving the decoding performance of the chroma component.
  • the above embodiments introduce the process of determining the target CNNLF model under the chroma component and the process of using the target CNNLF model under the chroma component to filter the target reconstruction block under the chroma component.
  • the following describes the process of determining the target CNNLF model under the luminance component and the process of using the target CNNLF model under the luminance component to filter the target reconstruction block under the luminance component according to the embodiment of the present application with reference to FIG. 14 .
  • FIG. 14 is a schematic flowchart of a video decoding method provided by an embodiment of the present application. If the current block includes a luminance component, this embodiment mainly introduces the process of determining the target CNNLF model under the luminance component and the process of using the determined target CNNLF model to filter the target reconstruction block under the luminance component. As shown in Figure 14, including:
  • a reconstruction block under the luma component is also referred to as a luma reconstruction block.
  • the target CNNLF model under the luminance component is determined according to the rate-distortion cost and selection probability corresponding to the N CNNLF models under the preset luminance component, and the selection probability is based on the target reconstruction block under the luminance component through the neural network model predicted.
  • this embodiment has N CNNLF models in the embodiment of the present application that are CNNLF models under the luminance component, for example, including 4 types of luminance U component models and 4 types of luminance V component models.
  • the implementation methods of determining the target CNNLF model under the luminance component in the above S702 include but are not limited to the following:
  • the code stream includes a first flag
  • the first flag is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the luminance component
  • the above S702 includes the following S702-A1 and S702-A2:
  • the encoder selects the target CNNLF model under the luminance component from the N CNNLF models under the luminance component according to the rate-distortion costs and selection probabilities respectively corresponding to the preset N CNNLF models under the luminance component.
  • the first flag is written in the code stream, and the first flag indicates whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the luminance component.
  • the decoding end obtains the code stream, decodes the code stream, obtains the first mark, and determines the target CNNLF model under the luminance component according to the first mark, so as to realize the simple and fast determination of the target CNNLF model under the luminance component.
  • the decoder determines the target CNNLF model under the brightness component according to the value of the first flag.
  • the decoder determines the target CNNLF model under the brightness component Select the CNNLF model corresponding to the maximum selection probability, and determine the CNNLF model corresponding to the maximum selection probability as the target CNNLF model under the brightness component. If the value of the first flag is the second value, the decoder selects the CNNLF model with the smallest rate-distortion cost from the N CNNLF models under the luminance component, and determines the CNNLF model with the smallest rate-distortion cost as the target under the luminance component CNNLF model.
  • the method of determining the target CNNLF model under the brightness component according to the first flag in the above S701-A2 includes but is not limited to the following examples:
  • Example 1 if the value of the first flag is the first value, input the target reconstruction block under the luminance component into the neural network model, and obtain the selection probabilities corresponding to the N CNNLF models under the luminance component output by the neural network model , the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the brightness component is determined as the target CNNLF model under the brightness component.
  • the first value is used to indicate that the target CNNLF model under the brightness component is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the brightness component.
  • the decoding end decodes the code stream to obtain the first flag. If the value of the first flag is the first value, it indicates that the target CNNLF model under the luminance component is the largest among the N CNNLF models under the luminance component.
  • the CNNLF model corresponding to the probability is selected.
  • the neural network model is used to predict the selection probability corresponding to the N CNNLF models under the brightness component.
  • the target reconstruction block of the current block under the luminance component is input into the neural network model, and the selection probabilities corresponding to the N CNNLF models under the luminance component predicted by the neural network model are obtained, and then the CNNLF model corresponding to the maximum selection probability , determined as the target CNNLF model under the luminance component.
  • the decoder predicts the selection probabilities corresponding to the N CNNLF models under the brightness component by itself through the neural network model, and determines the CNNLF model corresponding to the maximum selection probability is the target CNNLF model under the luminance component.
  • the index of the target CNNLF model under the luminance component must be transmitted in the code stream.
  • Example 1 reduces codewords, reduces encoding complexity, and improves encoding and decoding performance.
  • the corresponding selection probabilities specifically include: obtaining the neural network model corresponding to the luminance component, inputting the target reconstruction block under the luminance component into the neural network model corresponding to the luminance component, and obtaining N under the luminance component output by the neural network model corresponding to the luminance component.
  • the selection probabilities corresponding to the CNNLF models respectively.
  • Example 2 if the value of the first flag is the second value, decode the code stream to obtain the index of the target CNNLF model under the brightness component, and set the corresponding index of the target CNNLF model among the N CNNLF models under the brightness component
  • the CNNLF model is determined as the target CNNLF model under the luminance component, wherein the second value is used to indicate that the target CNNLF model under the luminance component is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models under the luminance component.
  • the encoding end determines that the target CNNLF model under the luminance component is the CNNLF model corresponding to the minimum rate-distortion cost, then the encoding end indexes the target CNNLF model under the luminance component (that is, the CNNLF model corresponding to the minimum rate-distortion cost) carried in the code stream.
  • the decoding end decodes the code stream to obtain the first flag. If the value of the first flag is the second value, the second value indicates that the target CNNLF model under the luminance component corresponds to the minimum rate-distortion cost among the N CNNLF models under the luminance component.
  • the decoder continues to decode the code stream to obtain the index of the target CNNLF model under the brightness component, and then determine the CNNLF model corresponding to the index of the target CNNLF model among the N CNNLF models under the brightness component as the brightness component The target CNNLF model under.
  • the decoder directly decodes from the code stream to obtain the index of the target CNNLF model under the luminance component, and then obtains the target CNNLF model under the luminance component.
  • the target CNNLF model under the luminance component can be quickly obtained, thereby improving the decoding efficiency.
  • the above-mentioned first flag may be represented by a field luma_nn_rdo_equal_flag.
  • the index of the target CNNLF model under the above luma component may be represented by the field luma_cnnlf_model_index.
  • the above luma_cnnlf_model_index is carried in the image header.
  • the definition of the image header is shown in Table 3 below:
  • the above-mentioned chroma_nn_rdo_equal_flag is also called the image-level luminance model selection consistency flag, if it is 0, it means that the target CNNLF model under the luminance component is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models under the luminance component, Then continue to decode chroma_cnnlf_model_index in Table 2, and determine the CNNLF model corresponding to chroma_cnnlf_model_index as the target CNNLF model under the luminance component.
  • the first flag includes the above-mentioned luma_nn_rdo_equal_flag and the above-mentioned chroma_nn_rdo_equal_flag, the above-mentioned luma_nn_rdo_equal_flag and the above-mentioned chroma_nn_rdo_equal_flag, and the chroma_cnnlf_model_index corresponding to the chroma component and the luma_cnnlf_model_index corresponding to the luma component can be carried together in the image header.
  • the definition of the image header is shown in Table 4 below:
  • the decoder determines the target CNNLT model under the luminance component through the first flag carried in the code stream, which is simple and fast.
  • Method 2 if the code stream does not carry the first flag and carries the index of the target CNNLF model under the luminance component, the decoder directly obtains the index of the target CNNLF model under the luminance component by decoding the code stream, and then downloads the index of the target CNNLF model under the luminance component to The CNNLF model corresponding to the index among the N CNNLF models in is determined as the target CNNLF model under the brightness component.
  • This method is simple, and the decoding end does not need to perform other unnecessary operations, thereby improving decoding efficiency.
  • step S703 is performed.
  • the target CNNLF model under the luminance component is a neural network model.
  • the target reconstruction block under the luminance component is input into the target CNNLF model under the luminance component, and after each layer in the target CNNLF model under the luminance component After processing, the reconstructed block under the filtered luminance component is finally output, and the reconstructed block under the filtered luminance component is closer to the original luminance block, thereby improving the decoding performance.
  • the embodiment of the present application further includes a sequence-level second flag, which is used to indicate whether the luminance component of the current sequence is allowed to use the method of combining the rate-distortion cost and the neural network model to determine the lower value of the luminance component.
  • the target CNNLF model is used to indicate whether the luminance component of the current sequence is allowed to use the method of combining the rate-distortion cost and the neural network model to determine the lower value of the luminance component.
  • the value of the second flag is the third value, it indicates that the luminance component of the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model under the luminance component.
  • the value of the second flag is the fourth numerical value, it indicates that the luminance component of the current sequence does not allow the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model under the luminance component.
  • the third value is different from the fourth value.
  • the decoder needs to judge whether the brightness component of the current block can be determined by combining the rate-distortion cost and the neural network model according to the second flag at the sequence level The target CNNLF model under the component. If it is judged that the luminance component of the current block is allowed to use the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model under the luminance component, for example, when the value of the second flag is the third value, then the decoder executes the embodiment of the present application or execute the above S702 to determine the target CNNLF model under the luminance component.
  • the decoding end obtains the target reconstruction block under the luminance component of the current block by decoding the code stream, and the target reconstruction block under the luminance component is the target to be input
  • the reconstruction block of the CNNLF model determines the target CNNLF model under the luminance component, and the target CNNLF model under the luminance component is determined according to the rate-distortion cost and the selection probability respectively corresponding to N CNNLF models under the preset luminance component,
  • the selection probability is obtained by predicting the target reconstruction block under the luminance component through a neural network model (such as the neural network model corresponding to the luminance component); finally, the target reconstruction block under the luminance component is filtered using the target CNNLF model under the luminance component.
  • the target CNNLF model under the luminance component of the present application is determined according to the rate-distortion costs and selection probabilities respectively corresponding to the N CNNLF models under the preset luminance component, which improves the selection accuracy of the target CNNLF model under the luminance component, In this way, when filtering based on the target CNNLF model under the accurately selected luminance component, the filtering effect of the luminance component can be improved, thereby improving the decoding performance of the luminance component.
  • BD-rate is one of the main parameters to evaluate the performance of the video coding algorithm, which indicates the change in bit rate and PSNR (Peak Signal-to-Noise Ratio, Peak Signal-to-Noise Ratio) of the video coded by the new algorithm compared to the original algorithm.
  • PSNR Peak Signal-to-Noise Ratio
  • the technical solution of this application is compared with the model adaptive selection scheme based on deep learning (that is, the selection probability predicted by the neural network model, and the target CNNLF model is selected), and the BD under the Random Access and Low Delay B configurations -rate performance is shown in Table 7 and Table 8.
  • Table 7 The RA performance of this scheme compared with the model adaptive selection scheme based on deep learning
  • Table 8 This scheme compares the LDB performance of the model adaptive selection scheme based on deep learning
  • FIG. 16 is a schematic flowchart of a video encoding method provided by an embodiment of the present application, and the embodiment of the present application is applied to the video encoder shown in FIG. 1 and FIG. 2 .
  • the method of the embodiment of the present application includes:
  • the video encoder receives a video stream, which is composed of a series of image frames, performs video encoding for each frame of image in the video stream, and divides the image frames into blocks to obtain the current block.
  • the current block is also referred to as a current coding block, a current image block, a coding block, a current coding unit, a current block to be coded, a current image block to be coded, and the like.
  • the block divided by the traditional method includes not only the chrominance component of the current block position, but also the luminance component of the current block position.
  • the separation tree technology can divide separate component blocks, such as a separate luma block and a separate chrominance block, where the luma block can be understood as only containing the luma component of the current block position, and the chrominance block can be understood as containing only the current block The chroma component of the position. In this way, the luma component and the chrominance component at the same position can belong to different blocks, and the division can have greater flexibility. If the separation tree is used in CU partitioning, some CUs contain both luma and chroma components, some CUs only contain luma components, and some CUs only contain chroma components.
  • the current block in the embodiment of the present application only includes chroma components, which may be understood as a chroma block.
  • the current block in this embodiment of the present application only includes a luma component, which may be understood as a luma block.
  • the current block includes both luma and chroma components.
  • the encoding process at the encoding end may be: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block.
  • the residual unit 220 may calculate a residual block based on the predicted block and the original block of the current block.
  • the residual block can be transformed and quantized by the transformation/quantization unit 230 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
  • the entropy coding unit 280 receives the quantized variation coefficients output by the variation quantization unit 230 , and may perform entropy coding on the quantized variation coefficients to output a code stream.
  • the inverse quantization/transformation unit 240 performs inverse quantization and inverse transformation on the quantized coefficient matrix to obtain a residual block.
  • the reconstruction unit 250 adds the predicted block and the residual block to obtain a reconstructed block.
  • the reconstructed block undergoes loop filtering through the loop filtering unit 260 .
  • the target reconstruction block in this embodiment is a reconstruction block to be input into the target CNNLF model.
  • the reconstruction block of the current block needs to pass through DBF and/or SAO before CNNLF, then the reconstruction block after DBF and/or SAO is determined as the target reconstruction block of the current block.
  • the rate-distortion cost corresponding to the CNNLF model is calculated when the CNNLF model is used to filter the target reconstruction block.
  • the rate-distortion cost RDcost corresponding to the CNNLF model is determined according to the following formula (1):
  • D1 is the distortion when using the CNNLF model to filter the target reconstruction block
  • ⁇ 1 is the Lagrangian multiplier
  • R1 is the bit amount required when using the CNNLF model to filter the target reconstruction block.
  • the rate-distortion cost RDcost1 corresponding to the CNNLF model is determined according to the following formula (2):
  • the above-mentioned distortion D is determined according to the following formula (3):
  • Dnet1 is the loss of the target reconstruction block after being filtered by the CNNLF model
  • Drec1 is the loss of the target reconstruction block before being filtered by the CNNLF model.
  • the pixel difference between the reconstruction block after the target reconstruction block is filtered by the CNNLF model and the current block is determined as Dnet1
  • the pixel difference between the reconstruction block (i.e., the target reconstruction block) and the current block before the target reconstruction block is filtered by the CNNLF model, Determined to be Drec1.
  • other methods may also be used to determine Dnet1 and Drec1, which is not limited in this application.
  • the rate-distortion cost corresponding to each CNNLF model among the N CNNLF models can be determined.
  • the aforementioned preset neural network models are pre-trained, and can predict the selection probabilities corresponding to the N CNNLF models based on the image blocks. Based on this, as shown in Figure 8, the target reconstruction block is input into the preset neural network model, and the selection probabilities corresponding to the N CNNLF models output by the neural network model are obtained, for example, selection probability 1, selection probability 2..., The probability of selection is N. The higher the selection probability corresponding to the CNNLF model, the greater the probability that the CNNLF model is used to filter the target reconstruction block.
  • the CNNLF model is selected based on the feature information of the target reconstruction block, which not only enriches the selection method of the CNNLF model, but also considers the feature information of the target reconstruction block in the selection process of the CNNLF model, so as to select the target reconstruction block.
  • the CNNLF model of feature information improves the selection accuracy of the CNNLF model.
  • the rate-distortion cost and selection probability corresponding to each CNNLF model in the N CNNLF models can be determined, and then, according to the rate-distortion cost and selection probability corresponding to the N CNNLF models respectively, from the N CNNLF models Select the target CNNLF model from the model.
  • the methods for selecting the target CNNLF model from the N CNNLF models in the above S804 include but are not limited to the following examples:
  • Example 1 Select M CNNLF models with the smallest rate-distortion cost from N CNNLF models, and select P CNNLF models with the highest selection probability from N CNNLF models. If the same CNNLF model among the M CNNLF models and the P CNNLF models is one, the same CNNLF model is determined as the target CNNLF model. If the same CNNLF model among the M CNNLF models and the P CNNLF models is multiple, the CNNLF model with the smallest rate-distortion cost among the same multiple CNNLF models is determined as the target CNNLF model. Alternatively, if there are multiple CNNLF models among the M CNNLF models and the P CNNLF models, the CNNLF model with the highest probability of being selected among the same multiple CNNLF models is determined as the target CNNLF model.
  • Example 2 the above S804 includes the following steps from S804-A1 to S804-A3:
  • the encoding end selects the first CNNLF model with the smallest rate-distortion cost from the N CNNLF models according to the rate-distortion costs corresponding to the N CNNLF models, and according to the selection probabilities corresponding to the N CNNLF models, from Select the second CNNLF model with the highest probability of selection from the N CNNLF models, and then determine the target CNNLF model according to the first CNNLF model and the second CNNLF model.
  • the first CNNLF model is the same as the second CNNLF model
  • the first CNNLF model or the second CNNLF model is determined as the target CNNLF model. This can ensure that the selected target CNNLF model not only meets the feature information of the target reconstructed image, but also is the CNNLF model with the smallest rate-distortion cost, making the selected target CNNLF model the best CNNLF model, thereby improving the selection accuracy of the target CNNLF model , to improve the filtering effect.
  • the first CNNLF model is determined as the target CNNLF model. This is because the consideration of the rate-distortion cost is more important than the feature information of the target reconstructed image during decoding. Therefore, when the first CNNLF model is different from the second CNNLF model, the first CNNLF model with the smallest rate-distortion cost is selected as The target CNNLF model, thereby ensuring that CNNLF filtering will not bring too much distortion and ensure decoding accuracy.
  • This application determines the target CNNLF model according to the rate-distortion costs and selection probabilities corresponding to the N CNNLF models respectively. This ensures that the selected target CNNLF model not only considers the rate-distortion cost, but also considers the selection probability, thereby improving the selection accuracy of the target CNNLF model, so that the subsequent filtering based on the accurately selected target CNNLF model can improve the filtering effect .
  • the target CNNLF model is a neural network model.
  • the target reconstruction block is input into the target CNNLF model, and after the processing of each layer in the target CNNLF model, the filtered reconstruction block is finally output.
  • the filtered reconstruction block is closer to the original image block, thereby improving the Encoding performance.
  • the embodiments of the present application also include:
  • the decoding end can determine the target CNNLF model by decoding the first flag, thereby increasing the speed of determining the target CNNLF model.
  • the value of the first flag is the first value, it indicates that the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models.
  • the value of the first flag is the second value, it indicates that the target CNNLF model is the CNNLF model corresponding to the smallest rate-distortion cost among the N CNNLF models.
  • the encoding end if the value of the first flag is the second value, the encoding end also writes the index of the target CNNLF model in the code stream. In this way, the decoder directly determines the target CNNLF model by decoding the index of the target CNNLF model, which is simple and fast.
  • the encoding end does not write the first flag in the code stream, but directly writes the index of the determined target CNNLF model, which can save codewords and make the decoding end do not need to perform other operations, directly from the code stream
  • the index of the target CNNLF model is parsed in , and then the target CNNLF model is determined to realize the rapid determination of the target CNNLF model, and the method is simple.
  • the embodiment of the present application before performing the above S802, further includes: obtaining a second flag, the second flag is used to indicate whether the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model.
  • the value of the second flag is the third value, it indicates that the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model.
  • the value of the second flag is the fourth numerical value, it indicates that the current sequence does not allow the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model.
  • the third value is different from the fourth value.
  • the present application does not limit the specific values of the above-mentioned third value and the fourth value.
  • the third value is 1.
  • the fourth value is 0.
  • the encoder before executing the method of the embodiment of this application or executing the above S802, the encoder needs to judge whether the current block can use the combination of the rate-distortion cost and the neural network model to determine the target according to the second flag at the sequence level. CNNLF model. If it is determined that the current block allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model, for example, when the value of the second flag is the third value, then execute the method of the embodiment of the present application, or execute the above S802, and determine Target CNNLF model.
  • the second flag indicates that the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model, then execute the method of the embodiment of the present application, or execute the above S802, that is, determine to use N CNNLF models to determine the target CNNLF model respectively.
  • the reconstruction block is filtered, the rate-distortion costs corresponding to the N CNNLF models respectively.
  • the encoding end obtains the target reconstruction block of the current block, and the target reconstruction block is the reconstruction block of the target CNNLF model to be input; it is determined to use the preset N CNNLF models to perform the target reconstruction block respectively When filtering, the rate-distortion costs corresponding to the N CNNLF models respectively; input the target reconstruction block into the preset neural network model, and obtain the selection probabilities corresponding to the N CNNLF models output by the neural network model; according to the N CNNLF models corresponding to The rate-distortion cost and selection probability of the target CNNLF model are selected from N CNNLF models; the target reconstruction block is filtered using the target CNNLF model.
  • the target CNNLF model of this application is determined according to the rate-distortion cost and selection probability corresponding to the preset N CNNLF models, which improves the selection accuracy of the target CNNLF model, so that when filtering based on the accurately selected target CNNLF model, The filtering effect can be improved, thereby improving the encoding performance.
  • FIG. 17 is a schematic flowchart of a video coding method provided by an embodiment of the present application. If the current block includes chroma components, this embodiment mainly introduces the process of determining the target CNNLF model under the chroma components and the process of using the determined target CNNLF model to filter the target reconstruction block under the chroma components. As shown in Figure 17, including:
  • the CNNLF model under the chroma component when used to filter the target reconstruction block under the chroma component, the CNNLF model under the chroma component The rate-distortion cost corresponding to the CNNLF model.
  • the rate-distortion cost corresponding to each CNNLF model among the N CNNLF models under the chroma component is calculated according to the above formula (1) or (2).
  • the above S903 may be executed before the above S902, or after the above S902, or executed synchronously with the S902, which is not limited in this application.
  • the neural network model corresponding to the chroma component is different from the neural network model corresponding to the brightness component, therefore, the above S903 includes:
  • the target reconstruction block under the chroma component is input into the neural network model corresponding to the chroma component, and the N CNNLF models under the chroma component respectively corresponding to the output of the neural network model corresponding to the chroma component
  • the selection probabilities of for example, selection probability 1, selection probability 2..., selection probability N.
  • the rate-distortion cost and selection probability corresponding to each of the N CNNLF models under the chroma component can be determined, and then, according to the rate-distortion corresponding to the N CNNLF models under the chroma component The cost and the selection probability select the target CNNLF model from the N CNNLF models under the chroma component.
  • the method of selecting the target CNNLF model under the chroma component from the N CNNLF models under the chroma component in the above S904 includes but is not limited to the following examples:
  • Example 1 Select M CNNLF models with the smallest rate-distortion cost from the N CNNLF models under the chroma component, and then select P CNNLF models with the highest probability of selection from the N CNNLF models under the chroma component. If the M CNNLF models under the chroma component and the same CNNLF model among the P CNNLF models under the chroma component are one, the same CNNLF model is determined as the target CNNLF model under the chroma component. If the M CNNLF models under the chroma component and the same CNNLF model among the P CNNLF models under the chroma component are multiple, the CNNLF model with the smallest rate-distortion cost among the same multiple CNNLF models is determined as the color The target CNNLF model under the degree component.
  • the CNNLF model with the highest probability of being selected among the same multiple CNNLF models is determined as Targeted CNNLF model under chroma components.
  • Example 2 the above S904 includes the following steps from S904-A1 to S1004-A3:
  • S904-A3 Determine a target CNNLF model under the chroma component according to the first CNNLF model and the second CNNLF model.
  • the first CNNLF model is the same as the second CNNLF model
  • the first CNNLF model or the second CNNLF model is determined as the target CNNLF model under the chrominance component.
  • the first CNNLF model is determined as the target CNNLF model under the chrominance component.
  • the encoding end of this application determines the rate-distortion costs and selection probabilities corresponding to the N CNNLF models under the chroma component respectively, and according to the rate-distortion costs and selection probabilities corresponding to the N CNNLF models under the chroma component, from the chroma component Select the target CNNLF model under the chroma component from the N CNNLF models below to realize the true determination of the target CNNLF model under the chroma component.
  • the target reconstruction block is the target reconstruction block of the current block under the chroma component
  • the N CNNLF models are N CNNLF models under the chroma component
  • write the first flag in the code stream The first flag is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component.
  • the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component.
  • the target CNNLF model is the CNNLF model corresponding to the smallest rate-distortion cost among the N CNNLF models under the chroma component.
  • the method before the above S902, the method further includes: acquiring a second flag, the second flag is used to indicate whether the chrominance component of the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model. If the second flag indicates that the chrominance component of the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model, then perform S902 to determine that when N CNNLF models are used to filter the target reconstruction block, N The rate-distortion costs corresponding to the CNNLF models respectively.
  • the rate-distortion cost corresponding to the target CNNLF model meets the requirements, perform the above step S903, and use the target CNNLF model under the chroma component to filter the target reconstruction block under the chroma component.
  • HPM-ModAI sets a frame-level switch to control whether to enable CNNLF for the chrominance component.
  • the frame-level switch is determined by the following equation (4):
  • Dnet2 is the distortion after filtering
  • Drec2 is the distortion before filtering
  • R2 is the number of CTUs in the current frame
  • ⁇ 2 In order to be consistent with the ⁇ of the adaptive correction filter.
  • the encoding end obtains the target reconstruction block of the current block under the chrominance component, and the target reconstruction block is a reconstruction block to be input into the target CNNLF model; determine When using the preset N CNNLF models under the chroma component to filter the target reconstruction block respectively, the rate-distortion costs corresponding to the N CNNLF models under the chroma component respectively; input the target reconstruction block into the preset neural network model , to obtain the selection probabilities corresponding to the N CNNLF models under the chroma component output by the neural network model; according to the rate-distortion costs and selection probabilities corresponding to the N CNNLF models under the chroma component, from the N Select the target CNNLF model from the CNNLF model; use the target CNNLF model under the chroma component to filter the target reconstruction block under the chroma component.
  • the target CNNLF model under the chroma component of this application is determined according to the rate-distortion costs and selection probabilities corresponding to the N CNNLF models under the preset chroma component, which improves the selection of the target CNNLF model under the chroma component Accuracy, so that when filtering based on the accurately selected target CNNLF model under the chroma component, the filtering effect of the chroma component can be improved, thereby improving the decoding performance of the chroma component.
  • FIG. 18 is a schematic flowchart of a video coding method provided by an embodiment of the present application. If the current block includes a luminance component, this embodiment mainly introduces the process of determining the target CNNLF model under the luminance component and the process of using the determined target CNNLF model to filter the target reconstruction block under the luminance component. As shown in Figure 18, including:
  • the CNNLF model under the luminance component corresponds to Rate-distortion cost.
  • the rate-distortion cost corresponding to each CNNLF model among the N CNNLF models under the luminance component is calculated according to the above formula (1) or (2).
  • the neural network model corresponding to the brightness component is different from the neural network model corresponding to the brightness component, therefore, the above S1003 includes:
  • the target reconstruction block under the luminance component is input into the neural network model corresponding to the luminance component, and the selection probabilities corresponding to the N CNNLF models under the luminance component output by the neural network model corresponding to the luminance component are obtained, For example, selection probability 1, selection probability 2 . . . , selection probability N.
  • selection probability 1 selection probability 1
  • selection probability 2 selection probability 2 . . .
  • selection probability N selection probability N.
  • the rate-distortion cost and selection probability corresponding to each of the N CNNLF models under the brightness component can be determined, and then, according to the rate-distortion costs corresponding to the N CNNLF models under the brightness component and the selection probability, select the target CNNLF model from the N CNNLF models under the brightness component.
  • the method of selecting the target CNNLF model under the brightness component from the N CNNLF models under the brightness component in the above S1004 includes but is not limited to the following examples:
  • Example 1 Select M CNNLF models with the smallest rate-distortion cost from the N CNNLF models under the luminance component, and then select P CNNLF models with the highest probability of selection from the N CNNLF models under the luminance component. If the M CNNLF models under the luminance component and the same CNNLF model among the P CNNLF models under the luminance component are one, the same CNNLF model is determined as the target CNNLF model under the luminance component. If there are multiple CNNLF models in the M CNNLF models under the luminance component and P CNNLF models under the luminance component, the CNNLF model with the smallest rate-distortion cost among the same multiple CNNLF models is determined as the CNNLF model under the luminance component. The target CNNLF model. Or, if there are multiple CNNLF models in the M CNNLF models under the brightness component and the P CNNLF models under the brightness component, the CNNLF model with the highest probability of being selected among the same multiple CNNLF models is determined as the brightness component The target CNNLF model under.
  • Example 2 the above S1004 includes the following steps from S1004-A1 to S1004-A3:
  • the first CNNLF model is the same as the second CNNLF model
  • the first CNNLF model or the second CNNLF model is determined as the target CNNLF model under the brightness component.
  • the first CNNLF model is determined as the target CNNLF model under the brightness component.
  • the encoding end of this application determines the rate-distortion costs and selection probabilities corresponding to the N CNNLF models under the luminance component respectively, and according to the rate-distortion costs and selection probabilities corresponding to the N CNNLF models under the luminance component, from the N CNNLF models under the luminance component Select the target CNNLF model under the luminance component in the CNNLF model, and realize the accurate determination of the target CNNLF model under the luminance component.
  • the target reconstruction block is the target reconstruction block of the current block under the luminance component
  • the N CNNLF models are N CNNLF models under the luminance component
  • write the first flag in the code stream the first A flag is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the luminance component.
  • the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the luminance component.
  • the target CNNLF model is the CNNLF model corresponding to the smallest rate-distortion cost among the N CNNLF models under the luminance component.
  • the method before the above S1002, the method further includes: acquiring a second flag, the second flag is used to indicate whether the luminance component of the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF Model. If the second flag indicates that the luminance component of the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model, then execute S1002 to determine that when N CNNLF models are used to filter the target reconstruction block, N The rate-distortion cost corresponding to the CNNLF model respectively.
  • the rate-distortion cost corresponding to the target CNNLF model meets the requirements, perform the above step S1005, and use the target CNNLF model under the luminance component to filter the target reconstruction block under the luminance component.
  • HPM-ModAI sets a frame-level switch to control whether to enable CNNLF for the luminance component.
  • the frame-level switch is determined by the following equation (5):
  • Dnet3 is the distortion after filtering
  • Drec3 is the distortion before filtering
  • R3 is the number of CTUs in the current frame
  • ⁇ 3 is It is consistent with the ⁇ of the adaptive correction filter.
  • the frame-level switch When the frame-level switch is turned on, it is further determined whether to enable CNNLF for each CTU through rate-distortion optimization.
  • the CTU-level switch is determined by the following formula (6):
  • the encoding end obtains the target reconstruction block under the luminance component of the current block, and the target reconstruction block is the reconstruction block to be input into the target CNNLF model;
  • the N CNNLF models under the luminance component respectively filter the target reconstruction block, the rate-distortion costs corresponding to the N CNNLF models under the luminance component respectively; input the target reconstruction block into the preset neural network model, and obtain the neural network
  • the selection probabilities corresponding to the N CNNLF models under the luminance component of the model output; according to the rate-distortion costs and selection probabilities corresponding to the N CNNLF models under the luminance component, select the target CNNLF from the N CNNLF models under the luminance component model; the target reconstruction block under the luma component is filtered using the target CNNLF model under the luma component.
  • the target CNNLF model under the luminance component of the present application is determined according to the rate-distortion costs and selection probabilities respectively corresponding to the N CNNLF models under the preset luminance component, which improves the selection accuracy of the target CNNLF model under the luminance component, In this way, when filtering based on the target CNNLF model under the accurately selected luminance component, the filtering effect of the luminance component can be improved, thereby improving the decoding performance of the luminance component.
  • FIGS. 5 to 18 are only examples of the present application, and should not be construed as limiting the present application.
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be used in this application.
  • the implementation of the examples constitutes no limitation.
  • the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships. Specifically, A and/or B may mean: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
  • Fig. 19 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
  • the video decoder 10 includes:
  • the decoding unit 11 is used to decode the code stream to obtain the target reconstruction block of the current block, and the target reconstruction block is the reconstruction block of the loop filter CNNLF model based on the residual neural network of the target to be input;
  • the determination unit 12 is configured to determine the target CNNLF model, the target CNNLF model is determined according to the rate-distortion costs and selection probabilities respectively corresponding to the preset N CNNLF models, and the selection probability is determined based on the neural network model
  • the target reconstruction block is predicted, and the N is a positive integer;
  • a filtering unit 13 configured to use the target CNNLF model to filter the target reconstruction block.
  • the decoding unit 11 is further configured to decode the code stream to obtain a first flag, and the first flag is used to indicate whether the target CNNLF model corresponds to the maximum selection probability among the N CNNLF models.
  • the CNNLF model ;
  • the determining unit 12 is specifically configured to determine the target CNNLF model according to the first flag.
  • the determination unit 12 is specifically configured to input the target reconstruction block into the neural network model to obtain the output of the neural network model if the value of the first flag is a first value
  • the selection probabilities corresponding to the N CNNLF models respectively, the first numerical value is used to indicate that the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models; among the N CNNLF models
  • the CNNLF model corresponding to the maximum selection probability is determined as the target CNNLF model.
  • the determination unit 12 is specifically configured to decode the code stream to obtain the index of the target CNNLF model if the value of the first flag is a second value, and the second value is represented by Indicating that the target CNNLF model is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models; determining the CNNLF model corresponding to the index of the target CNNLF model among the N CNNLF models as the target CNNLF Model.
  • the N CNNLF models are N CNNLF models under the chroma component
  • the The first flag is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component.
  • the determining unit 12 is specifically configured to input the target reconstruction block under the chroma component into the neural network model if the value of the first flag is a first value, to obtain the The selected probabilities corresponding to the N CNNLF models under the chroma component output by the neural network model, the first value is used to indicate that the target CNNLF model is the largest among the N CNNLF models under the chroma component
  • the CNNLF model corresponding to the selection probability; the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component is determined as the target CNNLF model.
  • the determining unit 12 is specifically configured to obtain the neural network model corresponding to the chroma component; input the target reconstruction block under the chroma component into the neural network model corresponding to the chroma component, to obtain The selection probabilities respectively corresponding to the N CNNLF models under the chroma component output by the neural network model corresponding to the chroma component.
  • the determination unit 12 is specifically configured to decode the code stream to obtain the index of the target CNNLF model if the value of the first flag is a second value, and the second value is represented by Indicating that the target CNNLF model is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models under the chroma component; corresponding to the index of the target CNNLF model in the N CNNLF models under the chroma component The CNNLF model, identified as the target CNNLF model.
  • the N CNNLF models are N CNNLF models under the luminance component
  • the first The flag is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the brightness component.
  • the determining unit 12 is specifically configured to input the target reconstruction block under the luminance component into the neural network model to obtain the The selection probabilities corresponding to the N CNNLF models under the luminance component output by the neural network model, and the first value is used to indicate that the target CNNLF model corresponds to the maximum selection probability among the N CNNLF models under the luminance component A CNNLF model; the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the brightness component is determined as the target CNNLF model.
  • the determining unit 12 is specifically configured to obtain the neural network model corresponding to the brightness component; input the target reconstruction block under the brightness component into the neural network model corresponding to the brightness component to obtain the brightness The selection probabilities corresponding to the N CNNLF models under the brightness component output by the neural network model corresponding to the component.
  • the determination unit 12 is specifically configured to decode the code stream to obtain the index of the target CNNLF model if the value of the first flag is a second value, and the second value is represented by Indicating that the target CNNLF model is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models under the brightness component; the CNNLF corresponding to the index of the target CNNLF model among the N CNNLF models under the brightness component model, identified as the target CNNLF model.
  • the determining unit 12 is further configured to obtain a second flag, the second flag is used to indicate whether the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model; if the second flag indicates that the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model, then determine the target CNNLF model.
  • the neural network model includes K layers of convolutional layers and L layers of fully connected layers, and both K and L are positive integers.
  • a downsampling unit is connected after at least one convolutional layer in the K convolutional layers, and the downsampling unit is used for downsampling the feature map output by the convolutional layer.
  • an activation function is connected after at least one fully-connected layer of the L-layer fully-connected layers.
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the video decoder 10 shown in FIG. 19 can execute the decoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the video decoder 10 are for realizing the above-mentioned decoding method and other methods. For the sake of brevity, the corresponding process will not be repeated here.
  • Fig. 20 is a schematic block diagram of a video encoder provided by an embodiment of the present application.
  • the video encoder 20 may include:
  • the coding unit 21 is used to obtain the target reconstruction block of the current block, and the target reconstruction block is the reconstruction block of the loop filter CNNLF model based on the residual neural network of the target to be input;
  • a cost determination unit 22 configured to determine the rate-distortion costs corresponding to the N CNNLF models respectively when the target reconstruction block is filtered using the preset N CNNLF models, and the N is a positive integer;
  • a selection probability determination unit 23 configured to input the target reconstruction block into a preset neural network model to obtain the selection probabilities respectively corresponding to the N CNNLF models output by the neural network model;
  • a model determination unit 24 configured to select a target CNNLF model from the N CNNLF models according to the rate-distortion costs and selection probabilities respectively corresponding to the N CNNLF models;
  • a filtering unit 25 configured to use the target CNNLF model to filter the target reconstruction block.
  • the model determination unit 24 is specifically configured to select the first CNNLF model with the smallest rate-distortion cost from the N CNNLF models according to the rate-distortion costs respectively corresponding to the N CNNLF models; According to the selection probabilities corresponding to the N CNNLF models respectively, the second CNNLF model with the largest selection probability is selected from the N CNNLF models; according to the first CNNLF model and the second CNNLF model, determine the target CNNLF Model.
  • the model determining unit 24 is specifically configured to determine the first CNNLF model or the second CNNLF model as the target if the first CNNLF model is the same as the second CNNLF model A CNNLF model; if the first CNNLF model is different from the second CNNLF model, then determining the first CNNLF model as the target CNNLF model.
  • the encoding unit 11 is further configured to write a first flag in the code stream, and the first flag is used to indicate whether the target CNNLF model corresponds to the maximum selection probability among the N CNNLF models. CNNLF model.
  • the value of the first flag is the first value, it indicates that the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models; if the first When the value of the flag is the second value, it indicates that the target CNNLF model is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models.
  • the encoding unit 11 is further configured to write the index of the target CNNLF model in the code stream.
  • the N CNNLF models are N CNNLF models under the chroma component
  • the cost The determining unit 22 is specifically configured to determine when the target reconstruction block under the chroma component is filtered using the N CNNLF models under the chroma component, the N CNNLF models under the chroma component respectively correspond to Rate-distortion cost; input the target reconstruction block under the chroma component into the neural network model, and obtain the selection probabilities corresponding to the N CNNLF models under the chroma component output by the neural network model.
  • the cost determination unit 22 is specifically configured to obtain the neural network model corresponding to the chroma component; input the target reconstruction block under the chroma component into the neural network model corresponding to the chroma component, The selection probabilities respectively corresponding to the N CNNLF models under the chroma component output by the neural network model corresponding to the chroma component are obtained.
  • the N CNNLF models are N CNNLF models under the chroma component
  • the The first flag is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component.
  • the value of the first flag is the first value, it indicates that the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the chroma component; if When the value of the first flag is the second value, it indicates that the target CNNLF model is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models under the chroma component.
  • the N CNNLF models are N CNNLF models under the luminance component, and the cost determination unit 22. It is specifically used to determine the rate-distortion costs corresponding to the N CNNLF models under the luminance component respectively when using the N CNNLF models under the luminance component to respectively filter the target reconstruction block under the luminance component;
  • the target reconstruction block under the luminance component is input into the neural network model, and the selection probabilities respectively corresponding to the N CNNLF models under the luminance component output by the neural network model are obtained.
  • the cost determination unit 22 is specifically configured to obtain the neural network model corresponding to the brightness component; input the target reconstruction block under the brightness component into the neural network model corresponding to the brightness component, and obtain the The selection probabilities respectively corresponding to the N CNNLF models under the brightness component output by the neural network model corresponding to the brightness component.
  • the N CNNLF models are N CNNLF models under the luminance component
  • the first The flag is used to indicate whether the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the brightness component.
  • the value of the first flag is the first value, it indicates that the target CNNLF model is the CNNLF model corresponding to the maximum selection probability among the N CNNLF models under the brightness component; if the value of the first flag is the second value, it indicates that the target CNNLF model is the CNNLF model corresponding to the minimum rate-distortion cost among the N CNNLF models under the brightness component.
  • the cost determination unit 22 is further configured to obtain a second flag, the second flag is used to indicate whether the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model; if the second flag indicates that the current sequence allows the combination of the rate-distortion cost and the neural network model to determine the target CNNLF model, it is determined to use the N CNNLF models respectively for the When the target reconstruction block performs filtering, the rate-distortion costs corresponding to the N CNNLF models respectively.
  • the neural network model includes K layers of convolutional layers and L layers of fully connected layers, and both K and L are positive integers.
  • a downsampling unit is connected after at least one convolutional layer in the K convolutional layers, and the downsampling unit is used for downsampling the feature map output by the convolutional layer.
  • an activation function is connected after at least one fully-connected layer of the L-layer fully-connected layers.
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the video encoder 20 shown in FIG. 20 may correspond to the corresponding subject in the encoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the video decoder 20 are for realizing the encoding
  • the corresponding processes in each method, such as the method will not be repeated here.
  • the functional unit may be implemented in the form of hardware, may also be implemented by instructions in the form of software, and may also be implemented by a combination of hardware and software units.
  • each step of the method embodiment in the embodiment of the present application can be completed by an integrated logic circuit of the hardware in the processor and/or instructions in the form of software, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as hardware
  • the decoding processor is executed, or the combination of hardware and software units in the decoding processor is used to complete the execution.
  • the software unit may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, and registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • Fig. 21 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 30 may be the video encoder or video decoder described in the embodiment of the present application, and the electronic device 30 may include:
  • a memory 33 and a processor 32 the memory 33 is used to store a computer program 34 and transmit the program code 34 to the processor 32 .
  • the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
  • the processor 32 can be used to execute the steps in the above-mentioned method 200 according to the instructions in the computer program 34 .
  • the processor 32 may include, but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 33 includes but is not limited to:
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the present application.
  • the one or more units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
  • the electronic device 30 may also include:
  • a transceiver 33 the transceiver 33 can be connected to the processor 32 or the memory 33 .
  • the processor 32 can control the transceiver 33 to communicate with other devices, specifically, can send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 33 may include a transmitter and a receiver.
  • the transceiver 33 may further include antennas, and the number of antennas may be one or more.
  • bus system includes not only a data bus, but also a power bus, a control bus and a status signal bus.
  • Fig. 22 is a schematic block diagram of a video codec system provided by an embodiment of the present application.
  • the video codec system 40 may include: a video encoder 41 and a video decoder 42, wherein the video encoder 41 is used to execute the video encoding method involved in the embodiment of the present application, and the video decoder 42 is used to execute The video decoding method involved in the embodiment of the present application.
  • the present application also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer can execute the methods of the above method embodiments.
  • the embodiments of the present application further provide a computer program product including instructions, and when the instructions are executed by a computer, the computer executes the methods of the foregoing method embodiments.
  • the present application also provides a code stream, which is generated according to the above encoding method.
  • the code stream includes the above-mentioned first flag, or includes the first flag and the second flag.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a digital video disc (digital video disc, DVD)), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供一种视频编解码方法、设备、系统、及存储介质,本申请中目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,其中选中概率是通过神经网络模型基于目标重建块预测得到的,使用目标CNNLF模型对目标重建块进行滤波。即本申请的目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,提高了目标CNNLF模型的选择准确性,这样基于准确选择的目标CNNLF模型进行滤波时,可以提高滤波的效果,进而提升解码性能。

Description

视频编解码方法、设备、系统、及存储介质 技术领域
本申请涉及视频编解码技术领域,尤其涉及一种视频编解码方法、设备、系统、及存储介质。
背景技术
数字视频技术可以并入多种视频装置中,例如数字电视、智能手机、计算机、电子阅读器或视频播放器等。随着视频技术的发展,视频数据所包括的数据量较大,为了便于视频数据的传输,视频装置执行视频压缩技术,以使视频数据更加有效的传输或存储。
环路滤波为视频编解码中的重要一环,用于对重建图像进行滤波,提高重建图像的效果。环路滤波中包括基于残差神经网络的环路滤波器(Convolutional Neural Network based In-Loop Filter,简称CNNLF)。目前设计了多种不同的CNNLF模型,这多种不同的CNNLF模型与不同的量化步长(Quantization Parameter,QP)对应。在确定使用CNNLF对当前块的目标重建块进行滤波时,根据当前块对应的QP,从这多种不同的CNNLF模型中选出一个CNNLF模型对当前块的重建块进行滤波。
但是,在一些情况下,在编码时QP会发生波动,此时,基于QP选择CNNLF模型时,存在CNNLF模型选择不准确,造成滤波效果差的问题。
发明内容
本申请实施例提供了一种视频编解码方法、设备、系统、及存储介质,提出了一种CNNLF模型的选择准确性,提升滤波效果。
第一方面,本申请提供了一种视频编码方法,包括:
解码码流,得到当前块的目标重建块,所述目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;
确定所述目标CNNLF模型,所述目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,所述选中概率是通过神经网络模型基于所述目标重建块预测得到的,所述N为正整数;
使用所述目标CNNLF模型对所述目标重建块进行滤波。
第二方面,本申请实施例提供一种视频解码方法,包括:
获取当前块的目标重建块,所述目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;
确定使用预设的N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价,所述N为正整数;
将所述目标重建块输入预设的神经网络模型中,得到所述神经网络模型输出的所述N个CNNLF模型分别对应的选中概率;
根据所述N个CNNLF模型分别对应的率失真代价和选中概率,从所述N个CNNLF模型中选出目标CNNLF模型;
使用所述目标CNNLF模型对所述目标重建块进行滤波。
第三方面,本申请提供了一种视频编码器,用于执行上述第一方面或其各实现方式中的方法。具体地,该编码器包括用于执行上述第一方面或其各实现方式中的方法的功能单元。
第四方面,本申请提供了一种视频解码器,用于执行上述第二方面或其各实现方式中的方法。具体地,该解码器 包括用于执行上述第二方面或其各实现方式中的方法的功能单元。
第五方面,提供了一种视频编码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第一方面或其各实现方式中的方法。
第六方面,提供了一种视频解码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第二方面或其各实现方式中的方法。
第七方面,提供了一种视频编解码系统,包括视频编码器和视频解码器。视频编码器用于执行上述第一方面或其各实现方式中的方法,视频解码器用于执行上述第二方面或其各实现方式中的方法。
第八方面,提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十一方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十二方面,提供了一种码流,码流是基于上述第一方面的方法生成的。
基于以上技术方案,解码端通过解码码流,得到当前块的目标重建块,该目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;接着,确定目标CNNLF模型,该目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,其中选中概率是通过神经网络模型基于目标重建块预测得到的;最后使用目标CNNLF模型对目标重建块进行滤波。即本申请的目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,提高了目标CNNLF模型的选择准确性,这样基于准确选择的目标CNNLF模型进行滤波时,可以提高滤波的效果,进而提升解码性能。
附图说明
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;
图2是本申请实施例涉及的视频编码器的示意性框图;
图3是本申请实施例涉及的视频解码器的示意性框图;
图4为本申请涉及的环路滤波单元的结构示意图;
图5为本申请实施例提供的视频解码方法的一种流程示意图;
图6为神经网络模型的一种网络结构示意图;
图7为本申请实施例的一种实现过程示意图;
图8为本申请实施例涉及的一种选中概率示意图;
图9为亮度分量对应的一种CNNLF模型结构示意图;
图10为色度分量对应的一种CNNLF模型结构示意图;
图11为本申请实施例涉及的残差块示意图;
图12为本申请一实施例提供的视频解码方法流程示意图;
图13为本申请实施例涉及的一种选中概率示意图;
图14为本申请一实施例提供的视频解码方法流程示意图;
图15为本申请实施例涉及的一种选中概率示意图;
图16为本申请实施例提供的视频编码方法的一种流程示意图;
图17为本申请实施例提供的视频编码方法的一种流程示意图;
图18为本申请实施例提供的视频编码方法的一种流程示意图;
图19是本申请一实施例提供的视频解码器的示意性框图;
图20是本申请一实施例提供的视频编码器的示意性框图;
图21是本申请实施例提供的电子设备的示意性框图;
图22是本申请实施例提供的视频编解码系统的示意性框图。
具体实施方式
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。
为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。
在一些实施例中,解码设备120包含输入接口121和视频解码器122。
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。
下面对本申请实施例涉及的视频编码框架进行介绍。
图2是本申请实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。
在一些实施例中,如图2所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
帧间预测单元211可用于帧间预测,帧间预测可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要再参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。
帧内估计单元212只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。对于亮度分量有基于训练得到的预测矩阵(Matrix based intra prediction,MIP)预测模式,对于色度分量,有CCLM预测模式。
在MIP技术中,对于一个宽度为W,高度为H的矩形预测块,MIP会选取该块上方一行的W个重建像素点和左侧一列的H个重建像素点作为输入。如果这些位置的像素还未被重建,则未重建位置的像素会被置为默认值,例如对于10bit的像素,填充的默认值为512。MIP产生预测值主要基于三个步骤,分别是参考像素取均值,矩阵向量相乘和线性插值上采样。
MIP作用于4x4至64x64大小的块,对于一个长方形的预测块,MIP模式会根据矩形边长来选择合适的预测矩阵;对于短边为4的矩形,共有16套矩阵参数供选择;对于短边为8的矩形,共有8套矩阵参数供选择;其它矩形,共有6套矩阵参数供选择。MIP会利用供选择的矩阵进行预测,代价最小的一个矩阵的索引将会编入码流供解码端读取改矩阵参数用于预测。
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的 TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量化程度。
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。
环路滤波单元260用于对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考,例如可执行消块滤波操作以减少与CU相关联的像素块的块效应。
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。
图3是本申请实施例涉及的视频解码器的示意性框图。
如图3所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。
在一些实施例中,预测单元320包括帧内估计单元321和帧间预测单元322。
帧内估计单元321可执行帧内预测以产生PU的预测块。帧内估计单元321可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元321还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。
帧间预测单元322可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元322可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元322可根据PU的一个或多个参考块来产生PU的预测块。
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。
视频编解码的基本流程如下:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的 差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。
图4为本申请涉及的环路滤波单元的结构示意图。在一些实施例中,环路滤波单元主要包含去块滤波器(DeBlocking Filter,简称DBF),样值自适应补偿(Sample adaptive Offset,简称SAO)和自适应修正滤波器(Adaptive loop filter,简称ALF)。在一些实施例中,在高性能-模块化智能编码测试模型(High Performance-Modular Artificial Intelligence Model,简称HPM-ModAI)中,采用了基于残差神经网络的环路滤波器(Convolutional Neural Network based In-Loop Filter,简称CNNLF)作为智能环路滤波模块的基线方案,并位于SAO和ALF之间,如图4所示。在编码测试时,按照智能编码通用测试条件,对于All Intra(全帧内)配置,打开ALF,关闭DBF和SAO;对于随机访问(Random Access)和低延迟(Low Delay)配置,打开I帧的DBF,打开ALF,关闭SAO。
目前设计了多种不同的CNNLF模型,这多种不同的CNNLF模型与不同的量化步长(Quantization Parameter,QP)对应。在确定使用CNNLF对当前块的目标重建块进行滤波时,根据当前块对应的QP,从这多种不同的CNNLF模型中选出一个CNNLF模型对当前块的重建块进行滤波。但是,在一些情况下,在编码时QP会发生波动,此时,基于QP选择CNNLF模型时,存在CNNLF模型选择不准确,造成滤波效果差的问题。
为了解决上述技术问题,本申请实施例通过确定使用N个预设的CNNTF模型分别对当前块的目标重建块进行滤波时,这N个CNNTF模型分别对应的率失真代价(Rate Distortion Optimization,简称RDO),并将当前块的目标重建块输入神经网络模型,通过神经网络模型预测这N个CNNTF模型分别对应的选中概率,根据N个CNNTF模型分别对应的率失真代价和选中概率,从N个CNNLF模型中选出目标CNNLF,实现目标CNNLF的准确选择,基于准确选择的目标CNNLF对当前块的目标重建块进行滤波时,可以提高滤波效果。
下面结合图5,以解码端为例,对本申请实施例提供的视频解码方法进行介绍。
图5为本申请实施例提供的视频解码方法的一种流程示意图,本申请实施例应用于图1和图2所示视频解码器。如图5所示,本申请实施例的方法包括:
S501、解码码流,得到当前块的目标重建块。
在一些实施例中,当前块也称为当前解码块、当前解码单元、解码块、待解码块、待解码的当前块等。
如图3所示,本申请实施例涉及的解码过程可以是,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。环路滤波单元350对重建块进行环路滤波,得到解码块。
需要说明的是,本实施例中的目标重建块为待输入目标CNNLF模型的重建块。
例如,图4所示,若当前块的重建块在进行CNNLF之前,还需要经过DBF和/或SAO时,则将经过DBF和/或SAO的重建块确定为当前块的目标目标重建块。
S502、确定目标CNNLF模型。
其中,目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,而选中概率是通过神经网络模型基于目标重建块预测得到的,N为正整数。
本申请对神经网络的具体网络结构不做限制,例如可以为卷积神经网络。
在一些实施例中,如神经网络模型包括K层卷积层和L层全连接层,K、L均为正整数。
在一些实施例中,K层卷积层中的至少一个卷积层之后连接有下采样单元,所述下采样单元用于对所述卷积层输出的特征图进行下采样。
可选的,下采样单元可以是最大池化层或平均池化层。
在一种具体的示例中,如图6所示,该神经网络模型包括3个卷积层(Conv),每个卷积层的卷积核大小为3X3,每个卷积层之后连接一个最大池化层,最大池化层的卷积核大小为2X2,也就是说,最大池化层实现2倍的下采样。最后一个最大池化层之后连接两个全连接层,第一个全连接层之后连接非线性激活函数ReLU,最后一个全连接层连接Softmax。如图6所示,将重建块输入该神经网络模型中,该神经网络模型输出N个CNNLF模型中每个CNNLF模型对应的选中概率,例如a,b,……n。
本申请的神经网络是经过重建块和重建块对应的CNNLF模型训练得到的,其中训练过程可以是,将训练重建块输入神经网络模型中,得到该神经网络模型输入的N个CNNLF模型分别对应的选中概率预测值,将N个CNNLF模型分别对应的选中概率预测值与该训练重建块对应的CNNLF模型的真值进行比较,得到损失,根据该损失对神经网模型进行更新。将下一个训练重建块输入更新后的神经网络模型中,得到神经网络模型输入的N个CNNLF模型分别对应的选中概率预测值,将N个CNNLF模型分别对应的选中概率预测值与该下一个训练重建块对应的CNNLF模型的真值进行比较,得到损失,根据该损失对神经网模型进行更新。依次进行,直到满足训练结束条件为止,得到训练好的神经网络模型。可选的,训练结束条件可以是训练次数到达预设次数,或者为损失到达预设损失。
经过上述训练,在实际使用时,将上述目标重建块输入训练好的神经网络模型,该神经网络模型可以输出这N个CNNLF模型分别被选中用于对目标重建块进行滤波的概率。
本申请实施例的目标CNNLF模型是基于N个CNNLF模型分别对应的率失真代价和选中概率确定的,相比于根据量化步长确定目标CNNLF模型相比,可以提高目标CNNLF模型的选择准确性,基于准确选择的目标CNNLF模型进行滤波时,可以提高滤波的效果。
在一些实施例中,上述目标CNNLF模型是解码端根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的。
在一些实施例中,上述目标CNNLF模型是编码端根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的。
上述S502中确定目标CNNLF模型的实现方式,包括但不限于如下几种:
方式一,若码流中包括第一标志,则上述S502包括如下S502-A1和S502-A2:
S501-A1、解码码流,得到第一标志,第一标志用于指示目标CNNLF模型是否为N个CNNLF模型中最大选中概 率对应的CNNLF模型;
S501-A2、根据第一标志,确定目标CNNLF模型。
在该方式一中,编码端根据预设的N个CNNLF模型分别对应的率失真代价和选中概率,从N个CNNLF模型中选出目标CNNLF模型后,在码流在写入第一标志,通过该第一标志指示目标CNNLF模型是否为N个CNNLF模型中最大选中概率对应的CNNLF模型。这样解码端获得码流后,解码码流,得到第一标志,并根据第一标志来确定目标CNNLF模型,实现对目标CNNLF模型的简单快速确定。
在一种示例中,若第一标志的取值为第一数值时,则表示目标CNNLF模型为N个CNNLF模型中最大选中概率对应的CNNLF模型;若第一标志的取值为第二数值时,则表示目标CNNLF模型为N个CNNLF模型中最小率失真代价对应的CNNLF模型。这样,解码端则根据第一标志的取值,来确定目标CNNLF模型,例如,若第一标志的取值为第一数值时,则解码端从N个CNNLF模型中选出最大选中概率对应的CNNLF模型,将最大选中概率对应的CNNLF模型确定为目标CNNLF模型。若第一标志的取值为第二数值时,则解码端从N个CNNLF模型中选中率失真代价最小的CNNLF模型,将率失真代价最小的CNNLF模型确定为目标CNNLF模型。
本申请对上述第一数值和第二数值的具体取值不做限制。
可选的,第一数值为1。
可选的,第二数值为0。
具体的,上述S501-A2中根据第一标志,确定目标CNNLF模型的方式包括但不限于如下几种示例:
示例1,若第一标志的取值为第一数值时,则将目标重建块输入神经网络模型中,得到神经网络模型输出的N个CNNLF模型分别对应的选中概率,将N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为目标CNNLF模型。其中,第一数值用于指示目标CNNLF模型为N个CNNLF模型中最大选中概率对应的CNNLF模型。
在该示例1中,解码端解码码流,得到第一标志,若第一标志的取值为第一数值时,则指示目标CNNLF模型为N个CNNLF模型中最大选中概率对应的CNNLF模型,此时,如图7所示,开关与上面的节点连接,则通过神经网络模型来预测N个CNNLF模型分别对应的选中概率。具体是,将当前块的目标重建块输入神经网络模型中,得到该神经网络模型预测的N个CNNLF模型分别对应的选中概率,进而将最大选中概率对应的CNNLF模型,确定为目标CNNLF模型。
在一些实施例中,若第一标志的取值为第一数值时,除了上述示例1中解码端通过神经网络模型预测N个CNNLF模型分别对应的选中概率,将最大选中概率对应的CNNLF模型确定为目标CNNLF模型外,还可以通过编码端直接将最大选中概率对应的CNNLF模型的索引携带在码流中。这样,解码端可以直接从码流中解码出最大选中概率对应的CNNLF模型的索引,并将该索引对应的CNNLF模型确定为目标CNNLF模型,而无需自行计算,进而降低了解码需处理的数据量,提高了解码效率。
该示例1中,若第一标志的取值为第一数值时,解码端自行通过神经网络模型预测N个CNNLF模型分别对应的选中概率,并将最大选中概率对应的CNNLF模型确定为目标CNNLF模型,相比于率失真代价方式必须在码流中传输目标CNNLF模型的索引,该示例1减少码字,降低编码复杂度。
示例2,若第一标志的取值为第二数值时,则解码码流,得到目标CNNLF模型的索引,并将N个CNNLF模型中目标CNNLF模型的索引对应的CNNLF模型,确定为目标CNNLF模型,其中第二数值用于指示目标CNNLF模型为N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在该示例2中,编码端若确定目标CNNLF模型为最小率失真代价对应的CNNLF模型,则编码端将目标CNNLF模型(即最小率失真代价对应的CNNLF模型)索引携带在码流中。解码端解码码流,得到第一标志,若第一标志的取值为第二数值,该第二数值指示目标CNNLF模型为N个CNNLF模型中最小率失真代价对应的CNNLF模型,此时,如图7所示,开关与下面的节点连接,则解码端继续解码码流,得到目标CNNLF模型的索引,进而将N个CNNLF 模型中目标CNNLF模型的索引对应的CNNLF模型,确定为目标CNNLF模型。
在该示例2中,若第一标志的取值为第二数值时,解码端直接从码流中解码得到目标CNNLF模型的索引,进而得到目标CNNLF模型,其方式简单,可以快速得到目标CNNLF模型,进而提高解码效率。
上述方式一,解码端通过码流中携带的第一标志,来确定目标CNNLT模型,其方式简单,速度快。
方式二,若码流中不携带第一标志,且携带目标CNNLF模型的索引时,解码端直接通过解码码流,得到目标CNNLF模型的索引,进而将N个CNNLF模型中该索引对应的CNNLF模型,确定为目标CNNLF模型。该方式简单,解码端无需进行其他不必要的操作,提高解码效率。
S503、使用目标CNNLF模型对目标重建块进行滤波。
目标CNNLF模型为神经网络模型,将目标重建块输入目标CNNLF模型中,经过目标CNNLF模型中各层的处理,最终输出滤波后的重建块,滤波后的重建块更接近原始图像块,进而提升了解码性能。
在一些实施例中,本申请实施例还包括序列级的第二标志,该第二标志用于指示当前序列是否允许使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。
例如,若第二标志的取值为第三数值,则指示当前序列允许使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。
再例如,若第二标志的取值为第四数值值,则指示当前序列不允许使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。其中第三数值与第四数值不同。
本申请对上述第三数值和第四数值的具体取值不做限制。
可选的,第三数值为1。
可选的,第四数值为0。
解码端在执行本实施例之前,或者执行上述S502之前,需要根据该序列级的第二标志,判断当前块是否可以使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。若判断当前块允许使用率失真代价与神经网络模型相结合的方式确定目标CNNLF模型,例如第二标志的取值为第三数值时,则解码端执行本申请实施例的方式,或者执行上述S502,通过上述方式一或方式二,确定目标CNNLF模型。
可选的,可以使用字段adaptive_model_selection_enable_flag来表示第二字段。
在一种示例中,在序列头中添加第二字段,具体如表1所示:
表1
Figure PCTCN2021133240-appb-000001
在解码时,解码端先解析上述表1中的序列头信息,并从序列头信息中解析出第二标志adaptive_model_selection_enable_flag,根据该第二标志adaptive_model_selection_enable_flag的取值,判断当前块是否可以使用率失真代价与神经网络模型相结合的方式确定目标CNNLF模型,若判断当前块可以使用率失真代价与神经网络模型相结合的方式确定目标CNNLF模型时,执行本申请实施例的方式,确定出目标CNNLF模型。若判断出当前块不能使用率失真代价与神经网络模型相结合的方式确定目标CNNLF模型时,则跳过本申请实施例的方案,使用其他的方案,确定目标CNNLF模型。
本申请实施例提供的视频解码方法,解码端通过解码码流,得到当前块的目标重建块,该目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;接着,确定目标CNNLF模型,该目标CNNLF模型是根据 预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,其中选中概率是通过神经网络模型基于目标重建块预测得到的;最后使用目标CNNLF模型对目标重建块进行滤波。即本申请的目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,提高了目标CNNLF模型的选择准确性,这样基于准确选择的目标CNNLF模型进行滤波时,可以提高滤波的效果,进而提升解码性能。
在一些实施例中,基于残差神经网络的环路滤波器对于亮度分量和色度分量分别设计了不同的网络结构,如图9和图10所示,图9为亮度分量对应的一种CNNLF模型结构示意图,图10为色度分量对应的一种CNNLF模型结构示意图。
如图9所示,对于亮度分量,CNNLF模型由卷积层、激活层、残差块、跳转连接等组成。其中残差块的网络结构如图11所示,由卷积层CONV、激活层RELU和跳转连接组成。CNNLF模型中还包括一条从输入到输出的全局跳转连接,使网络专注于学习残差,加速了网络的收敛过程。可选的,在HPM-ModAI中,亮度分量网络的残差块数量N=20。
如图10所示,对于色度分量,引入了亮度分量作为输入之一来指导色度分量的滤波,整个网络由卷积层、激活层、残差块、池化层、跳转连接等部分组成。由于分辨率的不一致,色度分量首先进行了上采样。为了避免在上采样过程中引入其他噪声,可选的,可以通过直接拷贝邻近像素来完成分辨率的扩大。在网络末端,使用池化层来完成色度分量的下采样。可选的,在HPM-ModAI中,色度分量网络的残差块数量N=10。
需要说明的是,上述图9和图10只是本申请涉及的亮度分量和色度分量对应的CNNLF模型的一种网络结构,本申请实施例涉及的CNNLF模型的网络结构包括但不限于图9和图10所示。
CNNLF模型在训练阶段,离线的训练了多个(例如4个)I帧亮度分量模型,多个(例如4个)非I帧亮度分量模型,多个(例如4个)色度U分量模型,多个(例如4个)色度V分量模型。示例性的,使用DIV2K图像数据集,将图像从RGB转换成YUV4:2:0格式的单帧视频序列,作为标签数据。然后使用HPM(High Performance Model,AVS的高性能测试模型)在All Intra(全帧内)配置下对序列进行编码,关闭DBF,SAO和ALF等传统滤波器,量化步长设置为27到50,可选的,量化步长还可以设置为其他数值,本申请不局限于此。对于编码得到的重建序列,按照QP27~31,32~37,38~44,45~50为范围划分为4个区间(该量化区间的划分只是一种示例,本申请不局限于此),切割为128X128(图像块的大小只是一种示例,本申请不局限于此)的图像块作为训练数据,分别训练了多个(例如4个)I帧亮度分量模型,多个(例如4个)色度U分量模型,多个(例如4个)色度V分量模型。进一步的,使用BVI-DVC(BVI-Digital Video Compression,BVI数字视频压缩)视频数据集,使用HPM-ModAI在Random Access(随机访问)配置下编码,关闭DBF,SAO和ALF等传统滤波器,并打开I帧的CNNLF,收集编码重建的非I帧数据,分别训练了多个(例如4个)非I帧亮度分量模型。
由上述可知,色度分量和亮度分量对应的CNNLF模型不同,这样,若当前块包括色度分量时,则使用色度分量对应的CNNLF模型对当前块的色度分量进行滤波,若当前块包括亮度分量时,则使用亮度分量对应的CNNLF模型对当前块的亮度分量进行滤波,具体参照上述图12和图14所示。
图12为本申请一实施例提供的视频解码方法流程示意图。若当前块包括色度分量时,则本实施例主要对色度分量下目标CNNLF模型的确定过程,以及使用确定的目标CNNLF模型对色度分量下的目标重建块进行滤波的过程进行介绍。如图12所示,包括:
S601、解码码流,得到当前块在色度分量下的重建块。
本实施例中,解码得到的是当前块在色度分量下的重建块。
在一些实施例中,在色度分量下的重建块也称为色度重建块。
上述S601的具体实现过程参照上述S501的描述,在此不再赘述。
S602、确定色度分量下的目标CNNLF模型。
其中,色度分量下的目标CNNLF模型是根据预设的色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率确定的,而选中概率是通过神经网络模型基于色度分量下的目标重建块预测得到的。
本实施例相比于上述实施例,本申请实施例的N个CNNLF模型为色度分量下的CNNLF模型,例如,包括4种色度U分量模型,4种色度V分量模型。
上述S602中确定色度分量下的目标CNNLF模型的实现方式,包括但不限于如下几种:
方式一,若码流中包括第一标志,该第一标志用于指示目标CNNLF模型是否为色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型时,则上述S602包括如下S602-A1和S602-A2:
S601-A1、解码码流,得到第一标志;
S601-A2、根据第一标志,确定色度分量下的目标CNNLF模型。
在该方式一中,编码端根据预设的色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从色度分量下的N个CNNLF模型中选出色度分量下的目标CNNLF模型后,在码流在写入第一标志,通过该第一标志指示目标CNNLF模型是否为色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。解码端获得码流中,解码码流,得到第一标志,并根据第一标志来确定色度分量下的目标CNNLF模型,实现对色度分量下的目标CNNLF模型的简单快速确定。
在一种示例中,若第一标志的取值为第一数值时,则表示色度分量下的目标CNNLF模型为色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;若第一标志的取值为第二数值时,则表示色度分量下的目标CNNLF模型为色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。这样,解码端则根据第一标志的取值,来确定色度分量下的目标CNNLF模型,例如,若第一标志的取值为第一数值时,则解码端从色度分量下的N个CNNLF模型中选出最大选中概率对应的CNNLF模型,将最大选中概率对应的CNNLF模型确定为色度分量下的目标CNNLF模型。若第一标志的取值为第二数值时,则解码端从色度分量下的N个CNNLF模型中选中率失真代价最小的CNNLF模型,将率失真代价最小的CNNLF模型确定为色度分量下的目标CNNLF模型。
具体的,上述S601-A2中根据第一标志,确定色度分量下的目标CNNLF模型的方式包括但不限于如下几种示例:
示例1,若第一标志的取值为第一数值时,则将色度分量下的目标重建块输入神经网络模型中,得到神经网络模型输出的色度分量下的N个CNNLF模型分别对应的选中概率,将色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为色度分量下的目标CNNLF模型。其中,第一数值用于指示色度分量下的目标CNNLF模型为色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
在该示例1中,解码端解码码流,得到第一标志,若第一标志的取值为第一数值时,则指示色度分量下的目标CNNLF模型为色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型,此时,通过神经网络模型来预测色度分量下的N个CNNLF模型分别对应的选中概率。具体是,将当前块在色度分量下的目标重建块输入神经网络模型中,得到该神经网络模型预测的色度分量下的N个CNNLF模型分别对应的选中概率,进而将最大选中概率对应的CNNLF模型,确定为色度分量下的目标CNNLF模型。
该示例1中,若第一标志的取值为第一数值时,解码端自行通过神经网络模型预测色度分量下的N个CNNLF模型分别对应的选中概率,并将最大选中概率对应的CNNLF模型确定为色度分量下的目标CNNLF模型,相比于率失真代价方式必须在码流中传输色度分量下的目标CNNLF模型的索引,该示例1减少码字,降低编码复杂度,提升编解码性能。
在一些实施例中,若色度分量与亮度分量对应的神经网络模型不同时,则上述将色度分量下的目标重建块输入神经网络模型中,得到神经网络模型输出的色度分量下的N个CNNLF模型分别对应的选中概率具体包括:获取色度分 量对应的神经网络模型,将色度分量下的目标重建块输入色度分量对应的神经网络模型中,得到色度分量对应的神经网络模型输出的色度分量下的N个CNNLF模型分别对应的选中概率。
示例2,若第一标志的取值为第二数值时,则解码码流,得到色度分量下的目标CNNLF模型的索引,并将色度分量下的N个CNNLF模型中目标CNNLF模型的索引对应的CNNLF模型,确定为色度分量下的目标CNNLF模型,其中第二数值用于指示色度分量下的目标CNNLF模型为色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在该示例2中,编码端若确定色度分量下的目标CNNLF模型为最小率失真代价对应的CNNLF模型,则编码端将色度分量下的目标CNNLF模型(即最小率失真代价对应的CNNLF模型)索引携带在码流中。解码端解码码流,得到第一标志,若第一标志的取值为第二数值,该第二数值指示色度分量下的目标CNNLF模型为色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型,此时,解码端继续解码码流,得到色度分量下的目标CNNLF模型的索引,进而将色度分量下的N个CNNLF模型中目标CNNLF模型的索引对应的CNNLF模型,确定为色度分量下的目标CNNLF模型。
在该示例2中,若第一标志的取值为第二数值时,解码端直接从码流中解码得到色度分量下的目标CNNLF模型的索引,进而得到色度分量下的目标CNNLF模型,其方式简单,可以快速得到色度分量下的目标CNNLF模型,进而提高解码效率。
可选的,上述第一标志可以用字段chroma_nn_rdo_equal_flag表示。
可选的,上述色度分量下的目标CNNLF模型的索引可以用字段chroma_cnnlf_model_index表示。
在一种可能的实现方式中,上述chroma_cnnlf_model_index携带在图像头中。示例性的,图像头定义如下表2所示:
表2
Figure PCTCN2021133240-appb-000002
在一些实施例中,上述chroma_nn_rdo_equal_flag也称为图像级色度模型选择一致性标志,若为0表示色度分量下的目标CNNLF模型为色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型,进而在表2中继续解码chroma_cnnlf_model_index,将chroma_cnnlf_model_index对应的CNNLF模型确定为色度分量下的目标CNNLF模型。
上述方式一,解码端通过码流中携带的第一标志,来确定色度分量下的目标CNNLT模型,其方式简单,速度快。
方式二,若码流中不携带第一标志,且携带色度分量下的目标CNNLF模型的索引时,解码端直接通过解码码流,得到色度分量下的目标CNNLF模型的索引,进而将色度分量下的N个CNNLF模型中该索引对应的CNNLF模型,确定为色度分量下的目标CNNLF模型。该方式简单,解码端无需进行其他不必要的操作,提高解码效率。
通过上述方式一或方式二,确定出色度分量下的目标CNNLF模型后,执行如下S603的步骤。
S603、使用色度分量下的目标CNNLF模型对色度分量下的目标重建块进行滤波。
色度分量下的目标CNNLF模型为神经网络模型,例如,如上述图10所示,将色度分量下的目标重建块输入色度分量下的目标CNNLF模型中,经过色度分量下的目标CNNLF模型中各层的处理,最终输出滤波后的色度分量下的重建块,滤波后的色度分量下的重建块更接近原始色度块,进而提升了解码性能。
在一些实施例中,本申请实施例还包括序列级的第二标志,该第二标志用于指示当前序列的色度分量是否允许使用率失真代价与神经网络模型相结合的方式来确定色度分量下的目标CNNLF模型。
例如,若第二标志的取值为第三数值,则指示当前序列的色度分量允许使用率失真代价与神经网络模型相结合的方式来确定色度分量下的目标CNNLF模型。
再例如,若第二标志的取值为第四数值值,则指示当前序列的色度分量不允许使用率失真代价与神经网络模型相结合的方式来确定色度分量下的目标CNNLF模型。其中第三数值与第四数值不同。
这样,解码端在执行本实施例之前,或者执行上述S602之前,需要根据该序列级的第二标志,判断当前块的色度分量是否可以使用率失真代价与神经网络模型相结合的方式来确定色度分量下的目标CNNLF模型。若判断当前块的色度分量允许使用率失真代价与神经网络模型相结合的方式确定色度分量下的目标CNNLF模型,例如第二标志的取值为第三数值时,则解码端执行本申请实施例的方式,或者执行上述S602,确定色度分量下的目标CNNLF模型。
本申请实施例提供的视频解码方法,若当前块包括色度分量时,则解码端通过解码码流,得到当前块在色度分量下的目标重建块,该色度分量下的目标重建块为待输入目标CNNLF模型的重建块;接着,确定色度分量下的目标CNNLF模型,该色度分量下的目标CNNLF模型是根据预设的色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率确定的,其中选中概率是通过神经网络模型(例如色度分量对应的神经网络模型)基于色度分量下的目标重建块预测得到的;最后使用色度分量下的目标CNNLF模型对色度分量下的目标重建块进行滤波。即本申请的色度分量下的目标CNNLF模型是根据预设的色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率确定的,提高了色度分量下的目标CNNLF模型的选择准确性,这样基于准确选择的色度分量下的目标CNNLF模型进行滤波时,可以提高色度分量的滤波效果,进而提升色度分量的解码性能。
上述实施例对色度分量下的目标CNNLF模型的确定过程,以及使用色度分量下的目标CNNLF模型对色度分量下的目标重建块进行滤波的过程进行介绍。下面结合图14,对本申请实施例的亮度分量下的目标CNNLF模型的确定过程,以及使用亮度分量下的目标CNNLF模型对亮度分量下的目标重建块进行滤波的过程进行介绍。
图14为本申请一实施例提供的视频解码方法流程示意图。若当前块包括亮度分量时,则本实施例主要对亮度分量下目标CNNLF模型的确定过程,以及使用确定的目标CNNLF模型对亮度分量下的目标重建块进行滤波的过程进行介绍。如图14所示,包括:
S701、解码码流,得到当前块在亮度分量下的重建块。
本实施例中,解码得到的是当前块在亮度分量下的重建块。
在一些实施例中,在亮度分量下的重建块也称为亮度重建块。
上述S701的具体实现过程参照上述S501的描述,在此不再赘述。
S702、确定亮度分量下的目标CNNLF模型。
其中,亮度分量下的目标CNNLF模型是根据预设的亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率确定的,而选中概率是通过神经网络模型基于亮度分量下的目标重建块预测得到的。
本实施例相比于上述实施例,本申请实施例的N个CNNLF模型为亮度分量下的CNNLF模型,例如,包括4种亮度U分量模型,4种亮度V分量模型。
上述S702中确定亮度分量下的目标CNNLF模型的实现方式,包括但不限于如下几种:
方式一,若码流中包括第一标志,该第一标志用于指示目标CNNLF模型是否为亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型时,则上述S702包括如下S702-A1和S702-A2:
S701-A1、解码码流,得到第一标志;
S701-A2、根据第一标志,确定亮度分量下的目标CNNLF模型。
在该方式一中,编码端根据预设的亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从亮度分量下的N个CNNLF模型中选出亮度分量下的目标CNNLF模型后,在码流在写入第一标志,通过该第一标志指示目标CNNLF模型是否为亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。解码端获得码流中,解码码流,得到第一标志,并根据第一标志来确定亮度分量下的目标CNNLF模型,实现对亮度分量下的目标CNNLF模型的简单快速确定。
在一种示例中,若第一标志的取值为第一数值时,则表示亮度分量下的目标CNNLF模型为亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;若第一标志的取值为第二数值时,则表示亮度分量下的目标CNNLF模型为亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。这样,解码端则根据第一标志的取值,来确定亮度分量下的目标CNNLF模型,例如,若第一标志的取值为第一数值时,则解码端从亮度分量下的N个CNNLF模型中选出最大选中概率对应的CNNLF模型,将最大选中概率对应的CNNLF模型确定为亮度分量下的目标CNNLF模型。若第一标志的取值为第二数值时,则解码端从亮度分量下的N个CNNLF模型中选中率失真代价最小的CNNLF模型,将率失真代价最小的CNNLF模型确定为亮度分量下的目标CNNLF模型。
具体的,上述S701-A2中根据第一标志,确定亮度分量下的目标CNNLF模型的方式包括但不限于如下几种示例:
示例1,若第一标志的取值为第一数值时,则将亮度分量下的目标重建块输入神经网络模型中,得到神经网络模型输出的亮度分量下的N个CNNLF模型分别对应的选中概率,将亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为亮度分量下的目标CNNLF模型。其中,第一数值用于指示亮度分量下的目标CNNLF模型为亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
在该示例1中,解码端解码码流,得到第一标志,若第一标志的取值为第一数值时,则指示亮度分量下的目标CNNLF模型为亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型,此时,通过神经网络模型来预测亮度分量下的N个CNNLF模型分别对应的选中概率。具体是,将当前块在亮度分量下的目标重建块输入神经网络模型中,得到该神经网络模型预测的亮度分量下的N个CNNLF模型分别对应的选中概率,进而将最大选中概率对应的CNNLF模型,确定为亮度分量下的目标CNNLF模型。
该示例1中,若第一标志的取值为第一数值时,解码端自行通过神经网络模型预测亮度分量下的N个CNNLF模型分别对应的选中概率,并将最大选中概率对应的CNNLF模型确定为亮度分量下的目标CNNLF模型,相比于率失真代价方式必须在码流中传输亮度分量下的目标CNNLF模型的索引,该示例1减少码字,降低编码复杂度,提升编解码性能。
在一些实施例中,若亮度分量与亮度分量对应的神经网络模型不同时,则上述将亮度分量下的目标重建块输入神经网络模型中,得到神经网络模型输出的亮度分量下的N个CNNLF模型分别对应的选中概率具体包括:获取亮度分量对应的神经网络模型,将亮度分量下的目标重建块输入亮度分量对应的神经网络模型中,得到亮度分量对应的神经网络模型输出的亮度分量下的N个CNNLF模型分别对应的选中概率。
示例2,若第一标志的取值为第二数值时,则解码码流,得到亮度分量下的目标CNNLF模型的索引,并将亮度分量下的N个CNNLF模型中目标CNNLF模型的索引对应的CNNLF模型,确定为亮度分量下的目标CNNLF模型,其中第二数值用于指示亮度分量下的目标CNNLF模型为亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在该示例2中,编码端若确定亮度分量下的目标CNNLF模型为最小率失真代价对应的CNNLF模型,则编码端 将亮度分量下的目标CNNLF模型(即最小率失真代价对应的CNNLF模型)索引携带在码流中。解码端解码码流,得到第一标志,若第一标志的取值为第二数值,该第二数值指示亮度分量下的目标CNNLF模型为亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型,此时,解码端继续解码码流,得到亮度分量下的目标CNNLF模型的索引,进而将亮度分量下的N个CNNLF模型中目标CNNLF模型的索引对应的CNNLF模型,确定为亮度分量下的目标CNNLF模型。
在该示例2中,若第一标志的取值为第二数值时,解码端直接从码流中解码得到亮度分量下的目标CNNLF模型的索引,进而得到亮度分量下的目标CNNLF模型,其方式简单,可以快速得到亮度分量下的目标CNNLF模型,进而提高解码效率。
可选的,上述第一标志可以用字段luma_nn_rdo_equal_flag表示。
可选的,上述亮度分量下的目标CNNLF模型的索引可以用字段luma_cnnlf_model_index表示。
在一种可能的实现方式中,上述luma_cnnlf_model_index携带在图像头中。示例性的,图像头定义如下表3所示:
表3
Figure PCTCN2021133240-appb-000003
在一些实施例中,上述chroma_nn_rdo_equal_flag也称为图像级亮度模型选择一致性标志,若为0表示亮度分量下的目标CNNLF模型为亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型,进而在表2中继续解码chroma_cnnlf_model_index,将chroma_cnnlf_model_index对应的CNNLF模型确定为亮度分量下的目标CNNLF模型。
在一些实施例中,第一标志包括上述luma_nn_rdo_equal_flag和上述chroma_nn_rdo_equal_flag,上述luma_nn_rdo_equal_flag和上述chroma_nn_rdo_equal_flag,以及色度分量对应的chroma_cnnlf_model_index和亮度分量对应的luma_cnnlf_model_index可以共同携带在图像头中。示例性的,图像头定义如下表4所示:
表4
Figure PCTCN2021133240-appb-000004
Figure PCTCN2021133240-appb-000005
上述方式一,解码端通过码流中携带的第一标志,来确定亮度分量下的目标CNNLT模型,其方式简单,速度快。
方式二,若码流中不携带第一标志,且携带亮度分量下的目标CNNLF模型的索引时,解码端直接通过解码码流,得到亮度分量下的目标CNNLF模型的索引,进而将亮度分量下的N个CNNLF模型中该索引对应的CNNLF模型,确定为亮度分量下的目标CNNLF模型。该方式简单,解码端无需进行其他不必要的操作,提高解码效率。
通过上述方式一或方式二,确定出亮度分量下的目标CNNLF模型后,执行如下S703的步骤。
S703、使用亮度分量下的目标CNNLF模型对亮度分量下的目标重建块进行滤波。
亮度分量下的目标CNNLF模型为神经网络模型,例如,如上述图9所示,将亮度分量下的目标重建块输入亮度分量下的目标CNNLF模型中,经过亮度分量下的目标CNNLF模型中各层的处理,最终输出滤波后的亮度分量下的重建块,滤波后的亮度分量下的重建块更接近原始亮度块,进而提升了解码性能。
在一些实施例中,本申请实施例还包括序列级的第二标志,该第二标志用于指示当前序列的亮度分量是否允许使用率失真代价与神经网络模型相结合的方式来确定亮度分量下的目标CNNLF模型。
例如,若第二标志的取值为第三数值,则指示当前序列的亮度分量允许使用率失真代价与神经网络模型相结合的方式来确定亮度分量下的目标CNNLF模型。
再例如,若第二标志的取值为第四数值值,则指示当前序列的亮度分量不允许使用率失真代价与神经网络模型相结合的方式来确定亮度分量下的目标CNNLF模型。其中第三数值与第四数值不同。
这样,解码端在执行本实施例之前,或者执行上述S702之前,需要根据该序列级的第二标志,判断当前块的亮度分量是否可以使用率失真代价与神经网络模型相结合的方式来确定亮度分量下的目标CNNLF模型。若判断当前块的亮度分量允许使用率失真代价与神经网络模型相结合的方式确定亮度分量下的目标CNNLF模型,例如第二标志的取值为第三数值时,则解码端执行本申请实施例的方式,或者执行上述S702,确定亮度分量下的目标CNNLF模型。
本申请实施例提供的视频解码方法,若当前块包括亮度分量时,则解码端通过解码码流,得到当前块在亮度分量下的目标重建块,该亮度分量下的目标重建块为待输入目标CNNLF模型的重建块;接着,确定亮度分量下的目标CNNLF模型,该亮度分量下的目标CNNLF模型是根据预设的亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率确定的,其中选中概率是通过神经网络模型(例如亮度分量对应的神经网络模型)基于亮度分量下的目标重建块预测得到的;最后使用亮度分量下的目标CNNLF模型对亮度分量下的目标重建块进行滤波。即本申请的亮度分量下的目标CNNLF模型是根据预设的亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率确定的,提高了亮度分量下的目标CNNLF模型的选择准确性,这样基于准确选择的亮度分量下的目标CNNLF模型进行滤波时,可以提高亮度分量的滤波效果,进而提升亮度分量的解码性能。
进一步的,通过试验对本申请实施例提供的技术方案进行验证。
具体的,以HPM-ModAI中的4个非I帧亮度分量模型为例,在AVS3智能编码参考软件HPM11.1-ModAI6.1上实现,测试结果如表5和表6所示。
表5本方案对比anchor(HPM11.1-ModAI6.1)的RA(Random Access,随机访问)性能
Figure PCTCN2021133240-appb-000006
Figure PCTCN2021133240-appb-000007
如表5所示,在智能编码通用测试条件Random Access(随机访问)配置下对AVS3要求的测试序列进行测试,对比anchor(锚)为HPM11.1-ModAI6.1,本申请技术方案在Y,U,V分量上BD-rate平均变化分别为-1.19%,0.47%,0.45%,说明本申请技术提升了编码性能。其中,BD-rate是评价视频编码算法性能的主要参数之一,表示新算法编码的视频相对于原来的算法在码率和PSNR(Peak Signal-to-Noise Ratio,峰值信噪比)上的变化情况。在视频编码中,码率低表示压缩量大,PSNR值高表示客观质量好。
表6本方案对比(anchor HPM11.1-ModAI6.1)的LDB(Low Delay B,低延迟B)性能
Figure PCTCN2021133240-appb-000008
如表6所示,在智能编码通用测试条件Low Delay(低延时)B配置下,本申请技术方案在Y,U,V分量上BD-rate平均变化分别为-0.91%,0.31%,-0.03%,说明本申请技术提升了编码性能。
进一步地,将本申请的技术方案与基于深度学习的模型自适应选择方案(即通过神经网络模型预测的选中概率,选择目标CNNLF模型)进行了比较,在Random Access和Low Delay B配置下的BD-rate性能如表7,表8所示。
表7本方案对比基于深度学习的模型自适应选择方案的RA性能
Figure PCTCN2021133240-appb-000009
表8本方案对比基于深度学习的模型自适应选择方案的LDB性能
Figure PCTCN2021133240-appb-000010
由上述表7和表8可知,由于目前只针对非I帧亮度分量进行了测试,所以在色度分量由于整体比特数据的增加, 存在一定性能损失。进一步的,考虑到对于亮度和色度不同的性能度量尺度,例如常用的8:1:1或10:1:1,所以整体BD-rate性能仍是增益的。
上文对本申请实施例的解码方法进行介绍,在此基础上,下面对本申请实施例提供的编码方法进行介绍。
图16为本申请实施例提供的视频编码方法的一种流程示意图,本申请实施例应用于图1和图2所示视频编码器。如图16所示,本申请实施例的方法包括:
S801、获取当前块的目标重建块,该目标重建块为待输入目标CNNLF模型的重建块。
在视频编码过程中,视频编码器接收视频流,该视频流由一系列图像帧组成,针对视频流中的每一帧图像进行视频编码,视频编码器对图像帧进行块划分,得到当前块。
在一些实施例中,当前块也称为当前编码块、当前图像块、编码块、当前编码单元、当前待编码块、当前待编码的图像块等。
在块划分时,传统方法划分后的块既包含了当前块位置的色度分量,又包含了当前块位置的亮度分量。而分离树技术(dual tree)可以划分单独分量块,例如单独的亮度块和单独的色度块,其中亮度块可以理解为只包含当前块位置的亮度分量,色度块理解为只包含当前块位置的色度分量。这样相同位置的亮度分量和色度分量可以属于不同的块,划分可以有更大的灵活性。如果分离树用在CU划分中,那么有的CU既包含亮度分量又包含色度分量,有的CU只包含亮度分量,有的CU只包含色度分量。
在一些实施例中,本申请实施例的当前块只包括色度分量,可以理解为色度块。
在一些实施例中,本申请实施例的当前块只包括亮度分量,可以理解为亮度块。
在一些实施例中,该当前块即包括亮度分量又包括色度分量。
如图2所示,编码端编码过程可以是:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。同时,反量化/变换单元240对量化系数矩阵进行反量化、反变换得到残差块。重建单元250将预测块和残差块相加得到重建块。重建块经过环路滤波单元260进行环路滤波。
需要说明的是,本实施例中的目标重建块为待输入目标CNNLF模型的重建块。例如,图4所示,若当前块的重建块在进行CNNLF之前,还需要经过DBF和/或SAO时,则将经过DBF和/或SAO的重建块确定为当前块的目标目标重建块。
S802、确定使用预设的N个CNNLF模型分别对目标重建块进行滤波时,N个CNNLF模型分别对应的率失真代价,N为正整数。
本步骤中,针对N个CNNLF模型中的每一个CNNLF模型,计算使用该CNNLF模型对目标重建块进行滤波时,该CNNLF模型对应的率失真代价。
在一种示例中,根据如下公式(1)确定该CNNLF模型对应的率失真代价RDcost:
RDcost1=D1+λ1*R1   (1)
其中,D1为使用该CNNLF模型对目标重建块进行滤波时的失真,λ1为拉格朗日乘子,R1为使用该CNNLF模型对该目标重建块进行滤波时需要的比特量。
在另一种示例中,根据如下公式(2)确定该CNNLF模型对应的率失真代价RDcost1:
RDcost1=D1   (2)
可选的,上述失真D根据如下公式(3)确定:
D1=Dnet1-Drec1   (3)
其中,Dnet1为目标重建块经过CNNLF模型滤波后的损失,Drec1为目标重建块经过CNNLF模型滤波前的损失。例如,将目标重建块经过CNNLF模型滤波后的重建块与当前块的像素差,确定为Dnet1,将目标重建块经过CNNLF模型滤波前的重建块(即目标重建块)与当前块的像素差,确定为Drec1。可选的,还可以使用其他方式确定Dnet1和Drec1,本申请对此不作限制。
根据上述方式,可以确定出N个CNNLF模型中每个CNNLF模型对应的率失真代价。
S803、将目标重建块输入预设的神经网络模型中,得到神经网络模型输出的N个CNNLF模型分别对应的选中概率。
需要说明的是,上述S803与上述S802在执行时没有先后顺序要求,即上述S803可以在上述S802之前执行,或者在S802之后执行,或者与S802同步执行,本申请对此不作限制。
上述预设的神经网络模型为预先训练好的,可以基于图像块预测出N个CNNLF模型分别对应的选中概率。基于此,如图8所示,将目标重建块输入预设的神经网络模型中,得到神经网络模型输出的N个CNNLF模型分别对应的选中概率,例如为选中概率1、选中概率2……、选中概率N。其中CNNLF模型对应的选中概率越高,说明该CNNLF模型被用于对目标重建块进行滤波的概率越大。
该步骤中,基于目标重建块的特征信息来选择CNNLF模型,不仅丰富了CNNLF模型的选择方式,且在CNNLF模型的选择过程中考虑了目标重建块的特征信息,以选出符合目标重建块的特征信息的CNNLF模型,进而提高了CNNLF模型的选择准确性。
S804、根据N个CNNLF模型分别对应的率失真代价和选中概率,从N个CNNLF模型中选出目标CNNLF模型。
根据上述S803和S804的步骤,可以确定出N个CNNLF模型中每一个CNNLF模型对应的率失真代价和选中概率,接着,根据N个CNNLF模型分别对应的率失真代价和选中概率,从N个CNNLF模型中选出目标CNNLF模型。
上述S804中从N个CNNLF模型中选出目标CNNLF模型的方式包括但不限于如下几种示例:
示例1,从N个CNNLF模型中选出率失真代价最小的M个CNNLF模型,在从N个CNNLF模型中选出选中概率最大的P个CNNLF模型。若M个CNNLF模型与P个CNNLF模型中相同的CNNLF模型为一个时,将该相同的一个CNNLF模型确定为目标CNNLF模型。若M个CNNLF模型与P个CNNLF模型中相同的CNNLF模型为多个时,将该相同的多个CNNLF模型中率失真代价最小的CNNLF模型,确定为目标CNNLF模型。或者,若M个CNNLF模型与P个CNNLF模型中相同的CNNLF模型为多个时,将该相同的多个CNNLF模型中选中概率最大的CNNLF模型,确定为目标CNNLF模型。
示例2,上述S804包括如下S804-A1至S804-A3的步骤:
S804-A1、根据N个CNNLF模型分别对应的率失真代价,从N个CNNLF模型中选出率失真代价最小的第一CNNLF模型;
S804-A2、根据N个CNNLF模型分别对应的选中概率,从N个CNNLF模型中选出选中概率最大的第二CNNLF模型;
S804-A3、根据第一CNNLF模型和第二CNNLF模型,确定目标CNNLF模型。
需要说明的是,上述S804-A2与上述S804-A1在执行时没有先后顺序要求,即上述S804-A2可以在上述S804-A1之前执行,或者在S804-A1之后执行,或者与S804-A1同步执行,本申请对此不作限制。
该示例2中,编码端根据N个CNNLF模型分别对应的率失真代价,从N个CNNLF模型中选出率失真代价最小的第一CNNLF模型,并根据N个CNNLF模型分别对应的选中概率,从N个CNNLF模型中选出选中概率最大的第二CNNLF模型,进而根据第一CNNLF模型和第二CNNLF模型,确定目标CNNLF模型。
上述S804-A3的实现方式包括但不限于如下几种示例:
在一种示例中,若第一CNNLF模型与第二CNNLF模型相同,则将第一CNNLF模型或第二CNNLF模型确定为目标CNNLF模型。这样可以保证选出的目标CNNLF模型不仅满足目标重建图像的特征信息,且是率失真代价最小的CNNLF模型,使得选出的目标CNNLF模型为最佳CNNLF模型,进而提高目标CNNLF模型的选择准确性,提高滤波效果。
在另一种示例中,若第一CNNLF模型与第二CNNLF模型不相同,则将第一CNNLF模型确定为目标CNNLF模型。这是由于在解码时率失真代价的考量相比于目标重建图像的特征信息更重要,因此,当第一CNNLF模型与第二CNNLF模型不相同,则选择率失真代价最小的第一CNNLF模型作为目标CNNLF模型,进而保证了CNNLF滤波不会带来过多的失真,保证解码准确性。
本申请根据N个CNNLF模型分别对应的率失真代价和选中概率确定目标CNNLF模型。这样保证选择出的目标CNNLF模型不仅考虑了率失真代价,且考虑了选中概率,进而提高了目标CNNLF模型的选择准确性,这样后续基于准确选择出的目标CNNLF模型进行滤波时,可以提高滤波效果。
S805、使用目标CNNLF模型对目标重建块进行滤波。
目标CNNLF模型为神经网络模型,将目标重建块输入目标CNNLF模型中,经过目标CNNLF模型中各层的处理,最终输出滤波后的重建块,滤波后的重建块更接近原始图像块,进而提升了编码性能。
在一些实施例中,本申请实施例还包括:
在码流中写入第一标志,该第一标志用于指示目标CNNLF模型是否为N个CNNLF模型中最大选中概率对应的CNNLF模型。这样解码端可以通过解码第一标志,确定目标CNNLF模型,提高目标CNNLF模型确定速度。
在一种示例中,若第一标志的取值为第一数值时,则指示目标CNNLF模型为N个CNNLF模型中最大选中概率对应的CNNLF模型。
在另一种示例中,若第一标志的取值为第二数值时,则指示目标CNNLF模型为N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在一些实施例中,若第一标志的取值为第二数值时,则编码端还在码流中写入目标CNNLF模型的索引。这样,解码端直接通过解码目标CNNLF模型的索引,确定出目标CNNLF模型,方式简单,速度快。
在一些实施例中,编码端在码流中不写入第一标志,而是直接写入确定的目标CNNLF模型的索引,可以节省码字,且使得解码端无需进行其他操作,直接从码流中解析出目标CNNLF模型的索引,进而确定出目标CNNLF模型,实现目标CNNLF模型的快速确定,且方式简单。
在一些实施例中,在执行上述S802之前,本申请实施例还包括:获取第二标志,该第二标志用于指示当前序列是否允许使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。
例如,若第二标志的取值为第三数值,则指示当前序列允许使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。
再例如,若第二标志的取值为第四数值值,则指示当前序列不允许使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。其中第三数值与第四数值不同。
本申请对上述第三数值和第四数值的具体取值不做限制。
可选的,第三数值为1。
可选的,第四数值为0。
本申请,编码端在执行本申请实施例的方法,或执行上述S802之前,需要根据该序列级的第二标志,判断当前块是否可以使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。若判断当前块允许使用率失真代价与神经网络模型相结合的方式确定目标CNNLF模型,例如第二标志的取值为第三数值时,则执行本申请实施例的方式,或者执行上述S802,确定目标CNNLF模型。
若第二标志指示当前序列允许使用率失真代价与神经网络模型相结合的方式确定目标CNNLF模型时,则执行本申请实施例的方法,或执行上述S802,即确定使用N个CNNLF模型分别对目标重建块进行滤波时,N个CNNLF模型分别对应的率失真代价。
本申请实施例提供的视频编码方法,编码端通过获取当前块的目标重建块,该目标重建块为待输入目标CNNLF模型的重建块;确定使用预设的N个CNNLF模型分别对目标重建块进行滤波时,N个CNNLF模型分别对应的率失真代价;将目标重建块输入预设的神经网络模型中,得到神经网络模型输出的N个CNNLF模型分别对应的选中概率;根据N个CNNLF模型分别对应的率失真代价和选中概率,从N个CNNLF模型中选出目标CNNLF模型;使用目标CNNLF模型对目标重建块进行滤波。即本申请的目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,提高了目标CNNLF模型的选择准确性,这样基于准确选择的目标CNNLF模型进行滤波时,可以提高滤波的效果,进而提升编码性能。
图17为本申请实施例提供的视频编码方法的一种流程示意图。若当前块包括色度分量时,则本实施例主要对色度分量下目标CNNLF模型的确定过程,以及使用确定的目标CNNLF模型对色度分量下的目标重建块进行滤波的过程进行介绍。如图17所示,包括:
S901、获取当前块在色度分量下的目标重建块。
S902、确定使用色度分量下的N个CNNLF模型分别对色度分量下的目标重建块进行滤波时,色度分量下的N个CNNLF模型分别对应的率失真代价。
本步骤中,针对色度分量下的N个CNNLF模型中的每一个CNNLF模型,计算使用色度分量下的该CNNLF模型对色度分量下的目标重建块进行滤波时,色度分量下的该CNNLF模型对应的率失真代价。
例如根据上述公式(1)或(2)计算得到色度分量下的N个CNNLF模型中每个CNNLF模型对应的率失真代价。
S903、将色度分量下的目标重建块输入神经网络模型中,得到神经网络模型输出的色度分量下的N个CNNLF模型分别对应的选中概率。
需要说明的是,上述S903与上述S902在执行时没有先后顺序要求,即上述S903可以在上述S902之前执行,或者在S902之后执行,或者与S902同步执行,本申请对此不作限制。
在一些实施例中,色度分量对应的神经网络模型与亮度分量对应的神经网络模型不同,因此,上述S903包括:
S903-A1、获取色度分量对应的神经网络模型;
S903-A2、将色度分量下的目标重建块输入色度分量对应的神经网络模型中,得到色度分量对应的神经网络模型输出的色度分量下的N个CNNLF模型分别对应的选中概率。
例如,如图13所示,将色度分量下的目标重建块输入色度分量对应的神经网络模型中,得到色度分量对应的神经网络模型输出的色度分量下的N个CNNLF模型分别对应的选中概率,例如为选中概率1、选中概率2……、选中概率N。其中CNNLF模型对应的选中概率越高,说明该CNNLF模型被用于对色度分量下的目标重建块进行滤波的概率越大。
S904、根据色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从色度分量下的N个CNNLF模型中选出目标CNNLF模型。
根据上述S902和S903的步骤,可以确定出色度分量下的N个CNNLF模型中每一个CNNLF模型对应的率失真代价和选中概率,接着,根据色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从色度分量下的N个CNNLF模型中选出目标CNNLF模型。
上述S904中从色度分量下的N个CNNLF模型中选出色度分量下的目标CNNLF模型的方式包括但不限于如下几种示例:
示例1,从色度分量下的N个CNNLF模型中选出率失真代价最小的M个CNNLF模型,再从色度分量下的N个CNNLF模型中选出选中概率最大的P个CNNLF模型。若色度分量下的M个CNNLF模型与色度分量下的P个CNNLF模型中相同的CNNLF模型为一个时,将该相同的一个CNNLF模型确定为色度分量下的目标CNNLF模型。若色度分量下的M个CNNLF模型与色度分量下的P个CNNLF模型中相同的CNNLF模型为多个时,将该相同的多个CNNLF模型中率失真代价最小的CNNLF模型,确定为色度分量下的目标CNNLF模型。或者,若色度分量下的M个CNNLF模型与色度分量下的P个CNNLF模型中相同的CNNLF模型为多个时,将该相同的多个CNNLF模型中选中概率最大的CNNLF模型,确定为色度分量下的目标CNNLF模型。
示例2,上述S904包括如下S904-A1至S1004-A3的步骤:
S904-A1、根据色度分量下的N个CNNLF模型分别对应的率失真代价,从色度分量下的N个CNNLF模型中选出率失真代价最小的第一CNNLF模型;
S904-A2、根据色度分量下的N个CNNLF模型分别对应的选中概率,从色度分量下的N个CNNLF模型中选出选中概率最大的第二CNNLF模型;
S904-A3、根据第一CNNLF模型和第二CNNLF模型,确定色度分量下的目标CNNLF模型。
需要说明的是,上述S904-A1与上述S904-A2在执行时没有先后顺序要求,即上述S904-A2可以在上述S904-A1之前执行,或者在S904-A1之后执行,或者与S904-A1同步执行,本申请对此不作限制。
例如,若第一CNNLF模型与第二CNNLF模型相同,则将第一CNNLF模型或第二CNNLF模型确定为色度分量下的目标CNNLF模型。
再例如,若第一CNNLF模型与第二CNNLF模型不相同,则将第一CNNLF模型确定为色度分量下的目标CNNLF模型。
本申请编码端通过确定色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,并根据色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从色度分量下的N个CNNLF模型中选中色度分量下的目标CNNLF模型,实现对色度分量下的目标CNNLF模型真确确定。
S905、使用色度分量下的目标CNNLF模型对色度分量下的目标重建块进行滤波。
具体参照上述S603的描述,在此不再赘述。
在一些实施例中,若目标重建块为当前块在色度分量下的目标重建块,N个CNNLF模型为色度分量下的N个CNNLF模型时,则在码流中写入第一标志,该第一标志用于指示目标CNNLF模型是否为色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
例如,若第一标志的取值为第一数值时,则指示目标CNNLF模型为色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
再例如,若第一标志的取值为第二数值时,则指示目标CNNLF模型为色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在一些实施例中,在上述S902之前,方法还包括:获取第二标志,该第二标志用于指示当前序列的色度分量是否允许使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。若第二标志指示当前序列的色度分量允许使用率失真代价与神经网络模型相结合的方式确定目标CNNLF模型时,则执行S902,确定使用N个CNNLF模型分别对目标重建块进行滤波时,N个CNNLF模型分别对应的率失真代价。
在一些实施例中,根据上述S902的方法,确定出色度分量下的目标CNNLF模型后,使用目标CNNLF模型对色粉分量下的目标重建块进行滤波之前,还需要判断该目标CNNLF模型对应的率失真代价,在该目标CNNLF模型对应的率失真代价满足要求时,执行上述S903的步骤,使用该色度分量下的目标CNNLF模型对色度分量下的目标重建块进行滤波。
示例性的,HPM-ModAI为色度分量设置了帧级开关控制是否打开CNNLF。具体的,帧级开关由如下式(4)决定:
RDcost2=D2+λ2*R2   (4)
其中,D2=Dnet2-Drec2,指色度分量下的目标CNNLF对目标重建块处理后减少的失真,Dnet2为滤波后的失真,Drec2为滤波前的失真,R2为当前帧的CTU个数,λ2为与自适应修正滤波器的λ保持一致。当RDcost2为负时,打开帧级CNNLF,执行上述S903,否则跳过上述S903。
本申请实施例提供的视频编码方法,若当前块包括色度分量时,编码端通过获取当前块在色度分量下的目标重建块,该目标重建块为待输入目标CNNLF模型的重建块;确定使用预设的色度分量下的N个CNNLF模型分别对目标重建块进行滤波时,色度分量下的N个CNNLF模型分别对应的率失真代价;将目标重建块输入预设的神经网络模型中,得到神经网络模型输出的色度分量下的N个CNNLF模型分别对应的选中概率;根据色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从色度分量下的N个CNNLF模型中选出目标CNNLF模型;使用色度分量下的目标CNNLF模型对色度分量下的目标重建块进行滤波。即本申请的色度分量下的目标CNNLF模型是根据预设的色度分量下的N个CNNLF模型分别对应的率失真代价和选中概率确定的,提高了色度分量下的目标CNNLF模型的选择准确性,这样基于准确选择的色度分量下的目标CNNLF模型进行滤波时,可以提高色度分量的滤波效果,进而提升色度分量的解码性能。
图18为本申请实施例提供的视频编码方法的一种流程示意图。若当前块包括亮度分量时,则本实施例主要对亮度分量下目标CNNLF模型的确定过程,以及使用确定的目标CNNLF模型对亮度分量下的目标重建块进行滤波的过程进行介绍。如图18所示,包括:
S1001、获取当前块在亮度分量下的目标重建块。
S1002、确定使用亮度分量下的N个CNNLF模型分别对亮度分量下的目标重建块进行滤波时,亮度分量下的N个CNNLF模型分别对应的率失真代价。
本步骤中,针对亮度分量下的N个CNNLF模型中的每一个CNNLF模型,计算使用亮度分量下的该CNNLF模型对亮度分量下的目标重建块进行滤波时,亮度分量下的该CNNLF模型对应的率失真代价。
例如根据上述公式(1)或(2)计算得到亮度分量下的N个CNNLF模型中每个CNNLF模型对应的率失真代价。
S1003、将亮度分量下的目标重建块输入神经网络模型中,得到神经网络模型输出的亮度分量下的N个CNNLF模型分别对应的选中概率。
需要说明的是,上述S1003与上述S1002在执行时没有先后顺序要求,即上述S1003可以在上述S1002之前执行,或者在S1002之后执行,或者与S1002同步执行,本申请对此不作限制。
在一些实施例中,亮度分量对应的神经网络模型与亮度分量对应的神经网络模型不同,因此,上述S1003包括:
S1003-A1、获取亮度分量对应的神经网络模型;
S1003-A2、将亮度分量下的目标重建块输入亮度分量对应的神经网络模型中,得到亮度分量对应的神经网络模型输出的亮度分量下的N个CNNLF模型分别对应的选中概率。
例如,如图15所示,将亮度分量下的目标重建块输入亮度分量对应的神经网络模型中,得到亮度分量对应的神经网络模型输出的亮度分量下的N个CNNLF模型分别对应的选中概率,例如为选中概率1、选中概率2……、选中概率N。其中CNNLF模型对应的选中概率越高,说明该CNNLF模型被用于对亮度分量下的目标重建块进行滤波的概率越大。
S1004、根据亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从亮度分量下的N个CNNLF模型中选出目标CNNLF模型。
根据上述S1002和S1003的步骤,可以确定出亮度分量下的N个CNNLF模型中每一个CNNLF模型对应的率失真代价和选中概率,接着,根据亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从亮度分量下的N个CNNLF模型中选出目标CNNLF模型。
上述S1004中从亮度分量下的N个CNNLF模型中选出亮度分量下的目标CNNLF模型的方式包括但不限于如下几种示例:
示例1,从亮度分量下的N个CNNLF模型中选出率失真代价最小的M个CNNLF模型,再从亮度分量下的N个CNNLF模型中选出选中概率最大的P个CNNLF模型。若亮度分量下的M个CNNLF模型与亮度分量下的P个CNNLF模型中相同的CNNLF模型为一个时,将该相同的一个CNNLF模型确定为亮度分量下的目标CNNLF模型。若亮度分量下的M个CNNLF模型与亮度分量下的P个CNNLF模型中相同的CNNLF模型为多个时,将该相同的多个CNNLF模型中率失真代价最小的CNNLF模型,确定为亮度分量下的目标CNNLF模型。或者,若亮度分量下的M个CNNLF模型与亮度分量下的P个CNNLF模型中相同的CNNLF模型为多个时,将该相同的多个CNNLF模型中选中概率最大的CNNLF模型,确定为亮度分量下的目标CNNLF模型。
示例2,上述S1004包括如下S1004-A1至S1004-A3的步骤:
S1004-A1、根据亮度分量下的N个CNNLF模型分别对应的率失真代价,从亮度分量下的N个CNNLF模型中选出率失真代价最小的第一CNNLF模型;
S1004-A2、根据亮度分量下的N个CNNLF模型分别对应的选中概率,从亮度分量下的N个CNNLF模型中选出选中概率最大的第二CNNLF模型;
S1004-A3、根据第一CNNLF模型和第二CNNLF模型,确定亮度分量下的目标CNNLF模型。
需要说明的是,上述S1004-A1与上述S1004-A2在执行时没有先后顺序要求,即上述S1004-A2可以在上述S1004-A1之前执行,或者在S1004-A1之后执行,或者与S1004-A1同步执行,本申请对此不作限制。
例如,若第一CNNLF模型与第二CNNLF模型相同,则将第一CNNLF模型或第二CNNLF模型确定为亮度分量下的目标CNNLF模型。
再例如,若第一CNNLF模型与第二CNNLF模型不相同,则将第一CNNLF模型确定为亮度分量下的目标CNNLF模型。
本申请编码端通过确定亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,并根据亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从亮度分量下的N个CNNLF模型中选中亮度分量下的目标CNNLF模型,实现对亮度分量下的目标CNNLF模型真确确定。
S1005、使用亮度分量下的目标CNNLF模型对亮度分量下的目标重建块进行滤波。
具体参照上述S603的描述,在此不再赘述。
在一些实施例中,若目标重建块为当前块在亮度分量下的目标重建块,N个CNNLF模型为亮度分量下的N个CNNLF模型时,则在码流中写入第一标志,该第一标志用于指示目标CNNLF模型是否为亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
例如,若第一标志的取值为第一数值时,则指示目标CNNLF模型为亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
再例如,若第一标志的取值为第二数值时,则指示目标CNNLF模型为亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在一些实施例中,在上述S1002之前,方法还包括:获取第二标志,该第二标志用于指示当前序列的亮度分量是否允许使用率失真代价与神经网络模型相结合的方式来确定目标CNNLF模型。若第二标志指示当前序列的亮度分量允许使用率失真代价与神经网络模型相结合的方式确定目标CNNLF模型时,则执行S1002,确定使用N个CNNLF 模型分别对目标重建块进行滤波时,N个CNNLF模型分别对应的率失真代价。
在一些实施例中,根据上述S1004的方法,确定出亮度分量下的目标CNNLF模型后,使用目标CNNLF模型对色粉分量下的目标重建块进行滤波之前,还需要判断该目标CNNLF模型对应的率失真代价,在该目标CNNLF模型对应的率失真代价满足要求时,执行上述S1005的步骤,使用该亮度分量下的目标CNNLF模型对亮度分量下的目标重建块进行滤波。
示例性的,HPM-ModAI为亮度分量设置了帧级开关控制是否打开CNNLF。具体的,帧级开关由如下式(5)决定:
RDcost3=D3+λ3*R3   (5)
其中,D3=Dnet3-Drec3,指亮度分量下的目标CNNLF对目标重建块处理后减少的失真,Dnet3为滤波后的失真,Drec3为滤波前的失真,R3为当前帧的CTU个数,λ3为与自适应修正滤波器的λ保持一致。当RDcost3为负时,打开帧级CNNLF,否则关闭,跳过上述S703。
当帧级开关打开时,进一步通过率失真优化判断每个CTU是否打开CNNLF。示例性的,CTU级开关由如下公式(6)决定:
RDcost4=D3   (6)
若RDcost4小于预设值,则执行上述S1005,否则跳过上述S1005。
本申请实施例提供的视频编码方法,若当前块包括亮度分量时,编码端通过获取当前块在亮度分量下的目标重建块,该目标重建块为待输入目标CNNLF模型的重建块;确定使用预设的亮度分量下的N个CNNLF模型分别对目标重建块进行滤波时,亮度分量下的N个CNNLF模型分别对应的率失真代价;将目标重建块输入预设的神经网络模型中,得到神经网络模型输出的亮度分量下的N个CNNLF模型分别对应的选中概率;根据亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率,从亮度分量下的N个CNNLF模型中选出目标CNNLF模型;使用亮度分量下的目标CNNLF模型对亮度分量下的目标重建块进行滤波。即本申请的亮度分量下的目标CNNLF模型是根据预设的亮度分量下的N个CNNLF模型分别对应的率失真代价和选中概率确定的,提高了亮度分量下的目标CNNLF模型的选择准确性,这样基于准确选择的亮度分量下的目标CNNLF模型进行滤波时,可以提高亮度分量的滤波效果,进而提升亮度分量的解码性能。
应理解,图5至图18仅为本申请的示例,不应理解为对本申请的限制。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。另外,本申请实施例中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。具体地,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
上文结合图5至图18,详细描述了本申请的方法实施例,下文结合图19至图21,详细描述本申请的装置实施例。
图19是本申请一实施例提供的视频解码器的示意性框图。
如图19所示,视频解码器10包括:
解码单元11,用于解码码流,得到当前块的目标重建块,所述目标重建块为待输入目标基于残差神经网络的环路 滤波器CNNLF模型的重建块;
确定单元12,用于确定所述目标CNNLF模型,所述目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,所述选中概率是通过神经网络模型基于所述目标重建块预测得到的,所述N为正整数;
滤波单元13,用于使用所述目标CNNLF模型对所述目标重建块进行滤波。
在一些实施例中,解码单元11,还用于解码所述码流,得到第一标志,所述第一标志用于指示所述目标CNNLF模型是否为所述N个CNNLF模型中最大选中概率对应的CNNLF模型;
确定单元12,具体用于根据所述第一标志,确定所述目标CNNLF模型。
在一些实施例中,确定单元12,具体用于若所述第一标志的取值为第一数值时,则将所述目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述N个CNNLF模型分别对应的选中概率,所述第一数值用于指示所述目标CNNLF模型为所述N个CNNLF模型中最大选中概率对应的CNNLF模型;将所述N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为所述目标CNNLF模型。
在一些实施例中,确定单元12,具体用于若所述第一标志的取值为第二数值时,则解码所述码流,得到所述目标CNNLF模型的索引,所述第二数值用于指示所述目标CNNLF模型为所述N个CNNLF模型中最小率失真代价对应的CNNLF模型;将所述N个CNNLF模型中所述目标CNNLF模型的索引对应的CNNLF模型,确定为所述目标CNNLF模型。
在一些实施例中,若所述目标重建块为所述当前块在色度分量下的目标重建块时,则所述N个CNNLF模型为所述色度分量下的N个CNNLF模型,所述第一标志用于指示所述目标CNNLF模型是否为所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
在一些实施例中,确定单元12,具体用于若所述第一标志的取值为第一数值时,则将所述色度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率,所述第一数值用于指示所述目标CNNLF模型为所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;将所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为所述目标CNNLF模型。
在一些实施例中,确定单元12,具体用于获取所述色度分量对应的神经网络模型;将所述色度分量下的目标重建块输入所述色度分量对应的神经网络模型中,得到所述色度分量对应的神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率。
在一些实施例中,确定单元12,具体用于若所述第一标志的取值为第二数值时,则解码所述码流,得到所述目标CNNLF模型的索引,所述第二数值用于指示所述目标CNNLF模型为所述色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型;将所述色度分量下的N个CNNLF模型中所述目标CNNLF模型的索引对应的CNNLF模型,确定为所述目标CNNLF模型。
在一些实施例中,若所述目标重建块为所述当前块在亮度分量下的目标重建块时,则所述N个CNNLF模型为所述亮度分量下的N个CNNLF模型,所述第一标志用于指示所述目标CNNLF模型是否为所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
在一些实施例中,确定单元12,具体用于若所述第一标志的取值为第一数值时,则将所述亮度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率,所述第一数值用于指示所述目标CNNLF模型为所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;将所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为所述目标CNNLF模型。
在一些实施例中,确定单元12,具体用于获取所述亮度分量对应的神经网络模型;将所述亮度分量下的目标重建块输入所述亮度分量对应的神经网络模型中,得到所述亮度分量对应的神经网络模型输出的所述亮度分量下的N个 CNNLF模型分别对应的选中概率。
在一些实施例中,确定单元12,具体用于若所述第一标志的取值为第二数值时,则解码所述码流,得到所述目标CNNLF模型的索引,所述第二数值用于指示所述目标CNNLF模型为所述亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型;将所述亮度分量下的N个CNNLF模型中所述目标CNNLF模型的索引对应的CNNLF模型,确定为所述目标CNNLF模型。
在一些实施例中,确定单元12,还用于获取第二标志,所述第二标志用于指示当前序列是否允许使用率失真代价与所述神经网络模型相结合的方式来确定所述目标CNNLF模型;若所述第二标志指示所述当前序列允许使用率失真代价与所述神经网络模型相结合的方式确定所述目标CNNLF模型时,则确定目标CNNLF模型。
在一些实施例中,所述神经网络模型包括K层卷积层和L层全连接层,所述K、L均为正整数。
在一些实施例中,所述K层卷积层中的至少一个卷积层之后连接有下采样单元,所述下采样单元用于对所述卷积层输出的特征图进行下采样。
在一些实施例中,所述L层全连接层中的至少一个全连接层之后连接有激活函数。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图19所示的视频解码器10可以执行本申请实施例的解码方法,并且视频解码器10中的各个单元的前述和其它操作和/或功能分别为了实现上述解码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
图20是本申请一实施例提供的视频编码器的示意性框图。
如图20所示,该视频编码器20可以包括:
编码单元21,用于获取当前块的目标重建块,所述目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;
代价确定单元22,用于确定使用预设的N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价,所述N为正整数;
选中概率确定单元23,用于将所述目标重建块输入预设的神经网络模型中,得到所述神经网络模型输出的所述N个CNNLF模型分别对应的选中概率;
模型确定单元24,用于根据所述N个CNNLF模型分别对应的率失真代价和选中概率,从所述N个CNNLF模型中选出目标CNNLF模型;
滤波单元25,用于使用所述目标CNNLF模型对所述目标重建块进行滤波。
在一些实施例中,模型确定单元24,具体用于根据所述N个CNNLF模型分别对应的率失真代价,从所述N个CNNLF模型中选出率失真代价最小的第一CNNLF模型;根据所述N个CNNLF模型分别对应的选中概率,从所述N个CNNLF模型中选出选中概率最大的第二CNNLF模型;根据所述第一CNNLF模型和所述第二CNNLF模型,确定所述目标CNNLF模型。
在一些实施例中,模型确定单元24,具体用于若所述第一CNNLF模型与所述第二CNNLF模型相同,则将所述第一CNNLF模型或所述第二CNNLF模型确定为所述目标CNNLF模型;若所述第一CNNLF模型与所述第二CNNLF模型不相同,则将所述第一CNNLF模型确定为所述目标CNNLF模型。
在一些实施例中,编码单元11,还用于在码流中写入第一标志,所述第一标志用于指示所述目标CNNLF模型是否为所述N个CNNLF模型中最大选中概率对应的CNNLF模型。
在一种实施例中,若所述第一标志的取值为第一数值时,则指示所述目标CNNLF模型为所述N个CNNLF模型中最大选中概率对应的CNNLF模型;若所述第一标志的取值为第二数值时,则指示所述目标CNNLF模型为所述N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在一些实施例中,若所述第一标志的取值为所述第二数值时,则编码单元11,还用于在所述码流中写入所述目标CNNLF模型的索引。
在一些实施例中,若所述目标重建块为所述当前块在色度分量下的目标重建块,则所述N个CNNLF模型为所述色度分量下的N个CNNLF模型,所述代价确定单元22,具体用于确定使用所述色度分量下的N个CNNLF模型分别对所述色度分量下的目标重建块进行滤波时,所述色度分量下的N个CNNLF模型分别对应的率失真代价;将所述色度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率。
在一些实施例中,代价确定单元22,具体用于获取所述色度分量对应的神经网络模型;将所述色度分量下的目标重建块输入所述色度分量对应的神经网络模型中,得到所述色度分量对应的神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率。
在一些实施例中,若所述目标重建块为所述当前块在色度分量下的目标重建块时,则所述N个CNNLF模型为所述色度分量下的N个CNNLF模型,所述第一标志用于指示所述目标CNNLF模型是否为所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
在一些实施例中,若所述第一标志的取值为第一数值时,则指示所述目标CNNLF模型为所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;若所述第一标志的取值为第二数值时,则指示所述目标CNNLF模型为所述色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在一些实施例中,若所述目标重建块为所述当前块在亮度分量下的目标重建块,则所述N个CNNLF模型为所述亮度分量下的N个CNNLF模型,所述代价确定单元22,具体用于确定使用所述亮度分量下的N个CNNLF模型分别对所述亮度分量下的目标重建块进行滤波时,所述亮度分量下的N个CNNLF模型分别对应的率失真代价;将所述亮度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率。
在一些实施例中,代价确定单元22,具体用于获取所述亮度分量对应的神经网络模型;将所述亮度分量下的目标重建块输入所述亮度分量对应的神经网络模型中,得到所述亮度分量对应的神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率。
在一些实施例中,若所述目标重建块为所述当前块在亮度分量下的目标重建块时,则所述N个CNNLF模型为所述亮度分量下的N个CNNLF模型,所述第一标志用于指示所述目标CNNLF模型是否为所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
在一些实施例中,若所述第一标志的取值为第一数值时,则指示所述目标CNNLF模型为所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;若所述第一标志的取值为第二数值时,则指示所述目标CNNLF模型为所述亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。
在一些实施例中,代价确定单元22,还用于获取第二标志,所述第二标志用于指示当前序列是否允许使用率失真代价与所述神经网络模型相结合的方式来确定所述目标CNNLF模型;若所述第二标志指示所述当前序列允许使用率失真代价与所述神经网络模型相结合的方式确定所述目标CNNLF模型时,则确定使用所述N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价。
在一些实施例中,所述神经网络模型包括K层卷积层和L层全连接层,所述K、L均为正整数。
在一些实施例中,所述K层卷积层中的至少一个卷积层之后连接有下采样单元,所述下采样单元用于对所述卷积层输出的特征图进行下采样。
在一些实施例中,所述L层全连接层中的至少一个全连接层之后连接有激活函数。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。 具体地,图20所示的视频编码器20可以对应于执行本申请实施例的编码方法中的相应主体,并且视频解码器20中的各个单元的前述和其它操作和/或功能分别为了实现编码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图21是本申请实施例提供的电子设备的示意性框图。
如图21所示,该电子设备30可以为本申请实施例所述的视频编码器,或者视频解码器,该电子设备30可包括:
存储器33和处理器32,该存储器33用于存储计算机程序34,并将该程序代码34传输给该处理器32。换言之,该处理器32可以从存储器33中调用并运行计算机程序34,以实现本申请实施例中的方法。
例如,该处理器32可用于根据该计算机程序34中的指令执行上述方法200中的步骤。
在本申请的一些实施例中,该处理器32可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器33包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在本申请的一些实施例中,该计算机程序34可以被分割成一个或多个单元,该一个或者多个单元被存储在该存储器33中,并由该处理器32执行,以完成本申请提供的方法。该一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序34在该电子设备30中的执行过程。
如图21所示,该电子设备30还可包括:
收发器33,该收发器33可连接至该处理器32或存储器33。
其中,处理器32可以控制该收发器33与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器33可以包括发射机和接收机。收发器33还可以进一步包括天线,天线的数量可以为一个或多个。
应当理解,该电子设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
图22是本申请实施例提供的视频编解码系统的示意性框图。
如图22所示,该视频编解码系统40可包括:视频编码器41和视频解码器42,其中视频编码器41用于执行本申请实施例涉及的视频编码方法,视频解码器42用于执行本申请实施例涉及的视频解码方法。
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
本申请还提供了一种码流,该码流是根据上述编码方法生成的,可选的,该码流中包括上述第一标志,或者包括第一标志和第二标志。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。

Claims (41)

  1. 一种视频解码方法,其特征在于,包括:
    解码码流,得到当前块的目标重建块,所述目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;
    确定所述目标CNNLF模型,所述目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,所述选中概率是通过神经网络模型基于所述目标重建块预测得到的,所述N为正整数;
    使用所述目标CNNLF模型对所述目标重建块进行滤波。
  2. 根据权利要求1所述的方法,其特征在于,所述确定目标CNNLF模型,包括:
    解码所述码流,得到第一标志,所述第一标志用于指示所述目标CNNLF模型是否为所述N个CNNLF模型中最大选中概率对应的CNNLF模型;
    根据所述第一标志,确定所述目标CNNLF模型。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述第一标志,确定所述目标CNNLF模型,包括:
    若所述第一标志的取值为第一数值时,则将所述目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述N个CNNLF模型分别对应的选中概率,所述第一数值用于指示所述目标CNNLF模型为所述N个CNNLF模型中最大选中概率对应的CNNLF模型;
    将所述N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为所述目标CNNLF模型。
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述第一标志,确定所述目标CNNLF模型,包括:
    若所述第一标志的取值为第二数值时,则解码所述码流,得到所述目标CNNLF模型的索引,所述第二数值用于指示所述目标CNNLF模型为所述N个CNNLF模型中最小率失真代价对应的CNNLF模型;
    将所述N个CNNLF模型中所述目标CNNLF模型的索引对应的CNNLF模型,确定为所述目标CNNLF模型。
  5. 根据权利要求2-4任一项所述的方法,其特征在于,若所述目标重建块为所述当前块在色度分量下的目标重建块时,则所述N个CNNLF模型为所述色度分量下的N个CNNLF模型,所述第一标志用于指示所述目标CNNLF模型是否为所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述第一标志,确定所述目标CNNLF模型,包括:
    若所述第一标志的取值为第一数值时,则将所述色度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率,所述第一数值用于指示所述目标CNNLF模型为所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;
    将所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为所述目标CNNLF模型。
  7. 根据权利要求6所述的方法,其特征在于,所述将所述色度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率,包括:
    获取所述色度分量对应的神经网络模型;
    将所述色度分量下的目标重建块输入所述色度分量对应的神经网络模型中,得到所述色度分量对应的神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率。
  8. 根据权利要求5所述的方法,其特征在于,所述根据所述第一标志,确定所述目标CNNLF模型,包括:
    若所述第一标志的取值为第二数值时,则解码所述码流,得到所述目标CNNLF模型的索引,所述第二数值用于指示所述目标CNNLF模型为所述色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型;
    将所述色度分量下的N个CNNLF模型中所述目标CNNLF模型的索引对应的CNNLF模型,确定为所述目标CNNLF模型。
  9. 根据权利要求2-4任一项所述的方法,其特征在于,若所述目标重建块为所述当前块在亮度分量下的目标重建 块时,则所述N个CNNLF模型为所述亮度分量下的N个CNNLF模型,所述第一标志用于指示所述目标CNNLF模型是否为所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述第一标志,确定所述目标CNNLF模型,包括:
    若所述第一标志的取值为第一数值时,则将所述亮度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率,所述第一数值用于指示所述目标CNNLF模型为所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;
    将所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型,确定为所述目标CNNLF模型。
  11. 根据权利要求10所述的方法,其特征在于,所述将所述亮度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率,包括:
    获取所述亮度分量对应的神经网络模型;
    将所述亮度分量下的目标重建块输入所述亮度分量对应的神经网络模型中,得到所述亮度分量对应的神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率。
  12. 根据权利要求9所述的方法,其特征在于,所述根据所述第一标志,确定所述目标CNNLF模型,包括:
    若所述第一标志的取值为第二数值时,则解码所述码流,得到所述目标CNNLF模型的索引,所述第二数值用于指示所述目标CNNLF模型为所述亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型;
    将所述亮度分量下的N个CNNLF模型中所述目标CNNLF模型的索引对应的CNNLF模型,确定为所述目标CNNLF模型。
  13. 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    获取第二标志,所述第二标志用于指示当前序列是否允许使用率失真代价与所述神经网络模型相结合的方式来确定所述目标CNNLF模型;
    所述确定目标CNNLF模型,包括:
    若所述第二标志指示所述当前序列允许使用率失真代价与所述神经网络模型相结合的方式确定所述目标CNNLF模型时,则确定目标CNNLF模型。
  14. 根据权利要求1-4任一项所述的方法,其特征在于,所述神经网络模型包括K层卷积层和L层全连接层,所述K、L均为正整数。
  15. 根据权利要求14所述的方法,其特征在于,所述K层卷积层中的至少一个卷积层之后连接有下采样单元,所述下采样单元用于对所述卷积层输出的特征图进行下采样。
  16. 根据权利要求14所述的方法,其特征在于,所述L层全连接层中的至少一个全连接层之后连接有激活函数。
  17. 一种视频编码方法,其特征在于,包括:
    获取当前块的目标重建块,所述目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;
    确定使用预设的N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价,所述N为正整数;
    将所述目标重建块输入预设的神经网络模型中,得到所述神经网络模型输出的所述N个CNNLF模型分别对应的选中概率;
    根据所述N个CNNLF模型分别对应的率失真代价和选中概率,从所述N个CNNLF模型中选出目标CNNLF模型;
    使用所述目标CNNLF模型对所述目标重建块进行滤波。
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述N个CNNLF模型分别对应的率失真代价和选中概率,从所述N个CNNLF模型中选出目标CNNLF模型,包括:
    根据所述N个CNNLF模型分别对应的率失真代价,从所述N个CNNLF模型中选出率失真代价最小的第一CNNLF模型;
    根据所述N个CNNLF模型分别对应的选中概率,从所述N个CNNLF模型中选出选中概率最大的第二CNNLF模型;
    根据所述第一CNNLF模型和所述第二CNNLF模型,确定所述目标CNNLF模型。
  19. 根据权利要求18所述的方法,其特征在于,所述根据所述第一CNNLF模型和所述第二CNNLF模型,确定所述目标CNNLF模型,包括:
    若所述第一CNNLF模型与所述第二CNNLF模型相同,则将所述第一CNNLF模型或所述第二CNNLF模型确定为所述目标CNNLF模型;
    若所述第一CNNLF模型与所述第二CNNLF模型不相同,则将所述第一CNNLF模型确定为所述目标CNNLF模型。
  20. 根据权利要求19所述的方法,其特征在于,所述方法还包括:
    在码流中写入第一标志,所述第一标志用于指示所述目标CNNLF模型是否为所述N个CNNLF模型中最大选中概率对应的CNNLF模型。
  21. 根据权利要求20所述的方法,其特征在于,
    若所述第一标志的取值为第一数值时,则指示所述目标CNNLF模型为所述N个CNNLF模型中最大选中概率对应的CNNLF模型;
    若所述第一标志的取值为第二数值时,则指示所述目标CNNLF模型为所述N个CNNLF模型中最小率失真代价对应的CNNLF模型。
  22. 根据权利要求21所述的方法,其特征在于,若所述第一标志的取值为所述第二数值时,则所述方法还包括:
    在所述码流中写入所述目标CNNLF模型的索引。
  23. 根据权利要求17-22任一项所述的方法,其特征在于,若所述目标重建块为所述当前块在色度分量下的目标重建块,则所述N个CNNLF模型为所述色度分量下的N个CNNLF模型,所述确定使用预设的N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价,包括:
    确定使用所述色度分量下的N个CNNLF模型分别对所述色度分量下的目标重建块进行滤波时,所述色度分量下的N个CNNLF模型分别对应的率失真代价;
    所述将所述目标重建块输入预设的神经网络模型中,得到所述神经网络模型输出的所述N个CNNLF模型分别对应的选中概率,包括:
    将所述色度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率。
  24. 根据权利要求23所述的方法,其特征在于,所述将所述色度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率,包括:
    获取所述色度分量对应的神经网络模型;
    将所述色度分量下的目标重建块输入所述色度分量对应的神经网络模型中,得到所述色度分量对应的神经网络模型输出的所述色度分量下的N个CNNLF模型分别对应的选中概率。
  25. 根据权利要求20-22任一项所述的方法,其特征在于,若所述目标重建块为所述当前块在色度分量下的目标重建块时,则所述N个CNNLF模型为所述色度分量下的N个CNNLF模型,所述第一标志用于指示所述目标CNNLF模型是否为所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
  26. 根据权利要求25所述的方法,其特征在于,
    若所述第一标志的取值为第一数值时,则指示所述目标CNNLF模型为所述色度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;
    若所述第一标志的取值为第二数值时,则指示所述目标CNNLF模型为所述色度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。
  27. 根据权利要求17-22任一项所述的方法,其特征在于,若所述目标重建块为所述当前块在亮度分量下的目标重建块,则所述N个CNNLF模型为所述亮度分量下的N个CNNLF模型,所述确定使用预设的N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价,包括:
    确定使用所述亮度分量下的N个CNNLF模型分别对所述亮度分量下的目标重建块进行滤波时,所述亮度分量下的N个CNNLF模型分别对应的率失真代价;
    所述将所述目标重建块输入预设的神经网络模型中,得到所述神经网络模型输出的所述N个CNNLF模型分别对应的选中概率,包括:
    将所述亮度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率。
  28. 根据权利要求27所述的方法,其特征在于,所述将所述亮度分量下的目标重建块输入所述神经网络模型中,得到所述神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率,包括:
    获取所述亮度分量对应的神经网络模型;
    将所述亮度分量下的目标重建块输入所述亮度分量对应的神经网络模型中,得到所述亮度分量对应的神经网络模型输出的所述亮度分量下的N个CNNLF模型分别对应的选中概率。
  29. 根据权利要求20-22任一项所述的方法,其特征在于,若所述目标重建块为所述当前块在亮度分量下的目标重建块时,则所述N个CNNLF模型为所述亮度分量下的N个CNNLF模型,所述第一标志用于指示所述目标CNNLF模型是否为所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型。
  30. 根据权利要求29所述的方法,其特征在于,
    若所述第一标志的取值为第一数值时,则指示所述目标CNNLF模型为所述亮度分量下的N个CNNLF模型中最大选中概率对应的CNNLF模型;
    若所述第一标志的取值为第二数值时,则指示所述目标CNNLF模型为所述亮度分量下的N个CNNLF模型中最小率失真代价对应的CNNLF模型。
  31. 根据权利要求17-22任一项所述的方法,其特征在于,所述确定使用预设的N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价之前,所述方法还包括:
    获取第二标志,所述第二标志用于指示当前序列是否允许使用率失真代价与所述神经网络模型相结合的方式来确定所述目标CNNLF模型;
    所述确定使用预设的N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价,包括:
    若所述第二标志指示所述当前序列允许使用率失真代价与所述神经网络模型相结合的方式确定所述目标CNNLF模型时,则确定使用所述N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价。
  32. 根据权利要求17-22任一项所述的方法,其特征在于,所述神经网络模型包括K层卷积层和L层全连接层,所述K、L均为正整数。
  33. 根据权利要求32所述的方法,其特征在于,所述K层卷积层中的至少一个卷积层之后连接有下采样单元,所述下采样单元用于对所述卷积层输出的特征图进行下采样。
  34. 根据权利要求32所述的方法,其特征在于,所述L层全连接层中的至少一个全连接层之后连接有激活函数。
  35. 一种视频解码器,其特征在于,包括:
    解码单元,用于解码码流,得到当前块的目标重建块,所述目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;
    确定单元,用于确定所述目标CNNLF模型,所述目标CNNLF模型是根据预设的N个CNNLF模型分别对应的率失真代价和选中概率确定的,所述选中概率是通过神经网络模型基于所述目标重建块预测得到的,所述N为正整数;
    滤波单元,用于使用所述目标CNNLF模型对所述目标重建块进行滤波。
  36. 一种视频编码器,其特征在于,包括:
    编码单元,用于获取当前块的目标重建块,所述目标重建块为待输入目标基于残差神经网络的环路滤波器CNNLF模型的重建块;
    代价确定单元,用于确定使用预设的N个CNNLF模型分别对所述目标重建块进行滤波时,所述N个CNNLF模型分别对应的率失真代价,所述N为正整数;
    选中概率确定单元,用于将所述目标重建块输入预设的神经网络模型中,得到所述神经网络模型输出的所述N个CNNLF模型分别对应的选中概率;
    模型确定单元,用于根据所述N个CNNLF模型分别对应的率失真代价和选中概率,从所述N个CNNLF模型中选出目标CNNLF模型;
    滤波单元,用于使用所述目标CNNLF模型对所述目标重建块进行滤波。
  37. 一种视频解码器,其特征在于,包括处理器和存储器;
    所示存储器用于存储计算机程序;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现上述权利要求1至16任一项所述的方法。
  38. 一种视频编码器,其特征在于,包括处理器和存储器;
    所示存储器用于存储计算机程序;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现如上述权利要求17至34任一项所述的方法。
  39. 一种视频编解码系统,其特征在于,包括:
    根据权利要求37所述的视频编码器;
    以及根据权利要求38所述的视频解码器。
  40. 一种计算机可读存储介质,其特征在于,用于存储计算机程序;
    所述计算机程序使得计算机执行如上述权利要求1至16或17至34任一项所述的方法。
  41. 一种码流,其特征在于,所述码流是基于如上述权利要求17至34任一项所述的方法生成的。
PCT/CN2021/133240 2021-11-25 2021-11-25 视频编解码方法、设备、系统、及存储介质 WO2023092404A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/133240 WO2023092404A1 (zh) 2021-11-25 2021-11-25 视频编解码方法、设备、系统、及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/133240 WO2023092404A1 (zh) 2021-11-25 2021-11-25 视频编解码方法、设备、系统、及存储介质

Publications (1)

Publication Number Publication Date
WO2023092404A1 true WO2023092404A1 (zh) 2023-06-01

Family

ID=86538529

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/133240 WO2023092404A1 (zh) 2021-11-25 2021-11-25 视频编解码方法、设备、系统、及存储介质

Country Status (1)

Country Link
WO (1) WO2023092404A1 (zh)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190273948A1 (en) * 2019-01-08 2019-09-05 Intel Corporation Method and system of neural network loop filtering for video coding
CN113422966A (zh) * 2021-05-27 2021-09-21 绍兴市北大信息技术科创中心 一种多模型cnn环路滤波方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190273948A1 (en) * 2019-01-08 2019-09-05 Intel Corporation Method and system of neural network loop filtering for video coding
CN113422966A (zh) * 2021-05-27 2021-09-21 绍兴市北大信息技术科创中心 一种多模型cnn环路滤波方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Y. LI (BYTEDANCE), K. ZHANG (BYTEDANCE), L. ZHANG (BYTEDANCE): "AHG11: Deep In-Loop Filter with Adaptive Model Selection and External Attention", 23. JVET MEETING; 20210707 - 20210716; TELECONFERENCE; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 13 July 2021 (2021-07-13), XP030296035 *
Y. LI (BYTEDANCE), L. ZHANG (BYTEDANCE), K. ZHANG (BYTEDANCE): "AHG11: Convolutional Neural Network-based In-Loop Filter with Adaptive Model Selection", 21. JVET MEETING; 20210106 - 20210115; TELECONFERENCE; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 6 January 2021 (2021-01-06), XP030293146 *
Z. DAI (OPPO), Y. YU (OPPO), H. YU (OPPO), K. SATO (OPPO), L. XU (OPPO), Z. XIE (OPPO), D. WANG(OPPO): "AHG11: Neural Network-based Adaptive Model Selection for CNN In-Loop Filtering", 24. JVET MEETING; 20211006 - 20211015; TELECONFERENCE; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 5 October 2021 (2021-10-05), XP030298016 *

Similar Documents

Publication Publication Date Title
TW201841503A (zh) 視頻寫碼中之內濾波旗標
CN112352429B (zh) 对视频数据进行编解码的方法、设备和存储介质
CN112954367B (zh) 使用调色板译码的编码器、解码器和相应方法
WO2020125595A1 (zh) 视频译码器及相应方法
US20220295071A1 (en) Video encoding method, video decoding method, and corresponding apparatus
WO2020114394A1 (zh) 视频编解码方法、视频编码器和视频解码器
KR20210107131A (ko) 이미지 예측 방법, 장치 및 시스템, 디바이스 및 저장 매체
WO2023039859A1 (zh) 视频编解码方法、设备、系统、及存储介质
KR20240026202A (ko) 비디오 데이터를 위한 인코딩 방법, 비디오 데이터를 위한 디코딩 방법, 컴퓨팅 디바이스, 및 매체
WO2023092404A1 (zh) 视频编解码方法、设备、系统、及存储介质
CN117716691A (zh) 用于视频编解码的基于小波变换域卷积神经网络的环路滤波
WO2020259353A1 (zh) 语法元素的熵编码/解码方法、装置以及编解码器
CN115086664A (zh) 未匹配像素的解码方法、编码方法、解码器以及编码器
WO2020114393A1 (zh) 变换方法、反变换方法以及视频编码器和视频解码器
WO2022217447A1 (zh) 视频编解码方法与系统、及视频编解码器
WO2022174475A1 (zh) 视频编解码方法与系统、及视频编码器与视频解码器
WO2022193389A1 (zh) 视频编解码方法与系统、及视频编解码器
WO2022193390A1 (zh) 视频编解码方法与系统、及视频编解码器
WO2023220969A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023236113A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2022155922A1 (zh) 视频编解码方法与系统、及视频编码器与视频解码器
WO2023122968A1 (zh) 帧内预测方法、设备、系统、及存储介质
WO2023184250A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023122969A1 (zh) 帧内预测方法、设备、系统、及存储介质
WO2023220946A1 (zh) 视频编解码方法、装置、设备、系统及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21965139

Country of ref document: EP

Kind code of ref document: A1