WO2022257049A1 - 编解码方法、码流、编码器、解码器以及存储介质 - Google Patents

编解码方法、码流、编码器、解码器以及存储介质 Download PDF

Info

Publication number
WO2022257049A1
WO2022257049A1 PCT/CN2021/099234 CN2021099234W WO2022257049A1 WO 2022257049 A1 WO2022257049 A1 WO 2022257049A1 CN 2021099234 W CN2021099234 W CN 2021099234W WO 2022257049 A1 WO2022257049 A1 WO 2022257049A1
Authority
WO
WIPO (PCT)
Prior art keywords
network model
value
current block
identification information
syntax element
Prior art date
Application number
PCT/CN2021/099234
Other languages
English (en)
French (fr)
Inventor
戴震宇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2021/099234 priority Critical patent/WO2022257049A1/zh
Priority to CN202180098998.1A priority patent/CN117461315A/zh
Publication of WO2022257049A1 publication Critical patent/WO2022257049A1/zh
Priority to US18/534,485 priority patent/US20240107015A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Definitions

  • the embodiments of the present application relate to the technical field of image processing, and in particular, to a codec method, a code stream, an encoder, a decoder, and a storage medium.
  • the traditional loop filter mainly includes a deblocking filter, a sample value adaptive compensation filter and an adaptive correction filter.
  • HPM-ModAI High Performance-Modular Artificial Intelligence Model
  • AVS3 third-generation audio and video coding standard
  • CNNLF residual neural network-based The loop filter (hereinafter referred to as CNNLF) is used as the baseline scheme of the intelligent loop filter module, and is set between the sample adaptive compensation filter and the adaptive correction filter.
  • Embodiments of the present application provide an encoding and decoding method, a code stream, an encoder, a decoder, and a storage medium, which can not only reduce complexity, but also improve encoding performance, thereby improving encoding and decoding efficiency.
  • the embodiment of the present application provides a decoding method applied to a decoder, and the method includes:
  • the first syntax element identification information indicates that the current block allows model selection using a preset selection network model, determine the default selection network model of the current block, and determine the loop filter network model used by the current block according to the preset selection network model ;
  • a loop filtering network model is used to filter the current block to obtain a reconstructed image block of the current block.
  • the embodiment of the present application provides an encoding method applied to an encoder, and the method includes:
  • the first syntax element identification information indicates that the current block allows model selection using a preset selection network model, determine the default selection network model of the current block, and determine the loop filter network model used by the current block according to the preset selection network model ;
  • a loop filtering network model is used to filter the current block to obtain a reconstructed image block of the current block.
  • the embodiment of the present application provides a code stream, which is generated by bit coding according to the information to be encoded; wherein the information to be encoded includes at least one of the following: the value of the first syntax element identification information, The value of the second syntax element identification information, the value of the first luma syntax element identification information, the value of the second luma syntax element identification information and the value of the chroma syntax element identification information;
  • the first syntax element identification information is used to indicate whether the current block is allowed to use the preset selection network model for model selection
  • the second syntax element identification information indicates whether the video sequence uses the loop filter network model for filtering processing
  • the first brightness syntax element The identification information is used to indicate whether the luminance component of the current frame is filtered using the luminance loop filtering network model
  • the second luminance syntax element identification information is used to indicate whether the luminance component of the current block is filtered using the luminance loop filtering network model.
  • the chroma syntax element identification information is used to indicate whether the chroma component of the current frame is filtered using the chroma loop filter network model; the video sequence includes the current frame, and the current frame includes the current block.
  • the embodiment of the present application provides an encoder, which includes a first determining unit, a first selecting unit, and a first filtering unit; wherein,
  • the first determining unit is configured to determine the value of the first syntax element identification information
  • the first selection unit is configured to determine the preset selection network model of the current block when the first syntax element identification information indicates that the current block allows model selection using the preset selection network model, and determine the current block according to the preset selection network model The loop filter network model used;
  • the first filtering unit is configured to use a loop filtering network model to filter the current block to obtain a reconstructed image block of the current block.
  • the embodiment of the present application provides an encoder, where the encoder includes a first memory and a first processor; wherein,
  • a first memory for storing a computer program capable of running on the first processor
  • the first processor is configured to execute the method of the second aspect when running the computer program.
  • the embodiment of the present application provides a decoder, which includes an analysis unit, a second selection unit, and a second filtering unit; wherein,
  • the parsing unit is configured to parse the code stream and determine the value of the first syntax element identification information
  • the second selection unit is configured to determine the preset selection network model of the current block when the first syntax element identification information indicates that the current block uses the preset selection network model for model selection, and determine the use of the current block according to the preset selection network model
  • the second filtering unit is configured to use the loop filtering network model to filter the current block to obtain a reconstructed image block of the current block.
  • the embodiment of the present application provides a decoder, where the decoder includes a second memory and a second processor; wherein,
  • a second memory for storing a computer program capable of running on the second processor
  • the second processor is configured to execute the method of the first aspect when running the computer program.
  • the embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed, the method of the first aspect or the method of the second aspect is implemented.
  • the embodiment of the present application provides a codec method, a code stream, an encoder, a decoder, and a storage medium.
  • the value of the first syntax element identification information is determined; when the first syntax element identification information indicates that the current block
  • determine the preset selection network model of the current block and determine the loop filter network model used by the current block according to the preset selection network model; use the loop filter network model for the current block Filtering is performed to obtain the reconstructed image block of the current block.
  • the preset selection network model is used to select at least one candidate loop filter network model, and then the current block is filtered according to the selected loop filter network model, Not only can the encoding performance be improved, and thus the encoding and decoding efficiency can be improved; but also the reconstructed image block that is finally output can be closer to the original image block, and the video image quality can be improved.
  • Fig. 1 is an application schematic diagram of a coding framework provided by related technologies
  • Fig. 2 is an application schematic diagram of another encoding framework provided by related technologies
  • FIG. 3A is a schematic diagram of a detailed framework of a video coding system provided by an embodiment of the present application.
  • FIG. 3B is a detailed schematic diagram of a video decoding system provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a decoding method provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the application of a coding framework provided by the embodiment of the present application.
  • FIG. 6A is a schematic diagram of the network structure composition of a luma loop filter network model provided by the embodiment of the present application.
  • FIG. 6B is a schematic diagram of the network structure composition of a chroma loop filter network model provided by the embodiment of the present application.
  • FIG. 6C is a schematic diagram of the network structure composition of another luma loop filter network model provided by the embodiment of the present application.
  • FIG. 6D is a schematic diagram of the network structure composition of another chroma loop filter network model provided by the embodiment of the present application.
  • FIG. 7 is a schematic diagram of a network structure composition of a residual block provided by an embodiment of the present application.
  • FIG. 8A is a schematic diagram of the composition and structure of a preset selection network model provided by the embodiment of the present application.
  • FIG. 8B is a schematic diagram of the composition and structure of another preset selection network model provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of the overall framework of a network model based on preset selection provided by the embodiment of the present application.
  • FIG. 10 is a schematic flowchart of another decoding method provided by the embodiment of the present application.
  • FIG. 11 is a schematic flowchart of an encoding method provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the composition and structure of an encoder provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a specific hardware structure of an encoder provided in an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a decoder provided by an embodiment of the present application.
  • FIG. 15 is a schematic diagram of a specific hardware structure of a decoder provided by an embodiment of the present application.
  • references to “some embodiments” describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
  • first ⁇ second ⁇ third involved in the embodiment of the present application is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ The specific order or sequence of "third” can be interchanged where allowed, so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.
  • the first image component, the second image component and the third image component are generally used to represent the coding block (Coding Block, CB); where these three image components are a luminance component and a blue chrominance component respectively And a red chrominance component, specifically, the luminance component is usually represented by the symbol Y, the blue chrominance component is usually represented by the symbol Cb or U, and the red chrominance component is usually represented by the symbol Cr or V; thus, the video image can be expressed in the YCbCr format Indicates that it can also be expressed in YUV format.
  • JVET Joint Video Experts Team
  • VVC Very Video Coding
  • VVC's reference software testing platform VVC Test Model, VTM
  • Audio Video coding Standard (AVS)
  • HPM High-Performance Model
  • HPM-ModAI High Performance-Modular Artificial Intelligence Model
  • Deblocking Filter (DeBlocking Filter, DBF)
  • Sample adaptive offset (Sample adaptive Offset, SAO)
  • Adaptive correction filter (Adaptive loop filter, ALF)
  • Quantization Parameter (QP) Quantization Parameter
  • the digital video compression technology mainly compresses huge digital image and video data, so as to facilitate transmission and storage.
  • the existing digital video compression standards can save a lot of video data, it is still necessary to pursue better digital video compression technology to reduce digital video.
  • the encoder In the process of digital video encoding, the encoder reads unequal pixels for the original video sequences in different color formats, including luminance and chrominance components, that is, the encoder reads a black-and-white or color image. Then divide the image into blocks, and pass the block data to the encoder for encoding.
  • the encoder is usually a mixed frame encoding mode, which generally includes intra prediction and inter prediction, transformation/quantization, inverse quantization/inverse transformation, For operations such as loop filtering and entropy coding, the specific processing flow can be referred to as shown in FIG. 1 .
  • intra-frame prediction only refers to the information of the same frame image, and predicts the pixel information in the current divided block to eliminate spatial redundancy
  • inter-frame prediction can include motion estimation and motion compensation, which can refer to image information of different frames, using Motion estimation searches the motion vector information that best matches the current division block, which is used to eliminate time redundancy
  • transformation converts the predicted image block into the frequency domain, energy redistribution, combined with quantization can remove information that is not sensitive to the human eye, and is used for Eliminate visual redundancy
  • entropy coding can eliminate character redundancy according to the current context model and the probability information of the binary code stream
  • loop filtering mainly processes the inversely transformed and inversely quantized pixels to compensate for distortion information and provide subsequent encoded pixels. Better reference.
  • the traditional loop filtering module mainly includes a deblocking filter (hereinafter referred to as DBF), a sample adaptive compensation filter (hereinafter referred to as SAO) and an automatic Adaptive Correction Filter (hereinafter abbreviated as ALF).
  • DBF deblocking filter
  • SAO sample adaptive compensation filter
  • ALF automatic Adaptive Correction Filter
  • CNNLF residual neural network
  • HPM-ModAI it can be divided into 4 intervals according to the range of QP 27 ⁇ 31, 32 ⁇ 37, 38 ⁇ 44, 45 ⁇ 50, and 4 kinds of I-frame luminance component models, 4 kinds of non- I frame luminance component model, 4 kinds of chroma U component models, 4 kinds of chroma V component models, etc., a total of 16 different CNNLF models.
  • one of the 16 CNNLF models can be selected for use according to different color components and QP.
  • the QP of each frame will fluctuate compared with the initial QP during encoding, resulting in the selected CNNLF model not necessarily being the model with the best filtering effect for the frame.
  • An embodiment of the present application provides an encoding method.
  • determine the value of the first syntax element identification information when the first syntax element identification information indicates that the current block allows model selection using a preset selection network model, determine The preset selection network model of the current block is selected according to the preset selection network model to determine the loop filter network model used by the current block; the current block is filtered by the loop filter network model to obtain the reconstructed image block of the current block.
  • the embodiment of the present application also provides a decoding method.
  • the code stream is parsed to determine the value of the first syntax element identification information; when the first syntax element identification information indicates that the current block is allowed to use the preset selection network model to perform
  • determine the preset selection network model of the current block and determine the loop filter network model used by the current block according to the preset selection network model; use the loop filter network model to filter the current block to obtain the current block Reconstruct image blocks.
  • the preset selection network model is used to select at least one candidate loop filter network model, and then the current block is filtered according to the selected loop filter network model , not only can improve the encoding performance, and then can improve the encoding and decoding efficiency; but also can make the reconstructed image blocks that are finally output closer to the original image blocks, and can improve the video image quality.
  • the video coding system 10 includes a transform and quantization unit 101, an intra frame estimation unit 102, an intra frame prediction unit 103, a motion compensation unit 104, a motion estimation unit 105, an inverse transform and inverse quantization unit 106, a filter Control analysis unit 107, filtering unit 108, encoding unit 109, and decoded image buffering unit 110, etc., wherein filtering unit 108 can implement DBF filtering/SAO filtering/ALF filtering, and encoding unit 109 can implement header information encoding and context-based self-adaptation Binary Arithmetic Coding (Context-based Adaptive Binary Arithmetic Coding, CABAC).
  • filtering unit 108 can implement DBF filtering/SAO filtering/ALF filtering
  • encoding unit 109 can implement header information encoding and context-based self-adaptation Binary Arithmetic Coding (Context-based Adaptive Binary Arithmetic Coding, CABAC).
  • a video coding block can be obtained by dividing the coding tree unit (Coding Tree Unit, CTU), and then the residual pixel information obtained after intra-frame or inter-frame prediction is paired by the transformation and quantization unit 101
  • the video coding block is transformed, including transforming the residual information from the pixel domain to the transform domain, and quantizing the obtained transform coefficients to further reduce the bit rate;
  • the intra frame estimation unit 102 and the intra frame prediction unit 103 are used for Intra-frame prediction is performed on the video coding block; specifically, the intra-frame estimation unit 102 and the intra-frame prediction unit 103 are used to determine the intra-frame prediction mode to be used to code the video coding block;
  • the motion compensation unit 104 and the motion estimation unit 105 is used to perform inter-frame predictive encoding of the received video coding block relative to one or more blocks in one or more reference frames to provide temporal prediction information;
  • the motion estimation performed by the motion estimation unit 105 is used to generate motion vectors process, the motion vector can estimate the motion of the video
  • the context content can be based on adjacent coding blocks, and can be used to encode the information indicating the determined intra-frame prediction mode, and output the code stream of the video signal; and the decoded image buffer unit 110 is used to store the reconstructed video coding block for Forecast reference. As the video image encoding progresses, new reconstructed video encoding blocks will be continuously generated, and these reconstructed video encoding blocks will be stored in the decoded image buffer unit 110 .
  • the video decoding system 20 includes a decoding unit 201, an inverse transform and inverse quantization unit 202, an intra prediction unit 203, a motion compensation unit 204, a filtering unit 205, and a decoded image buffer unit 206, etc., wherein the decoding unit 201 can implement header information decoding and CABAC decoding, and filtering unit 205 can implement DBF filtering/SAO filtering/ALF filtering.
  • the decoding unit 201 can implement header information decoding and CABAC decoding
  • filtering unit 205 can implement DBF filtering/SAO filtering/ALF filtering.
  • the code stream of the video signal is output; the code stream is input into the video decoding system 20, and first passes through the decoding unit 201 to obtain the decoded transform coefficient; for the transform coefficient, pass
  • the inverse transform and inverse quantization unit 202 performs processing to generate a residual block in the pixel domain; the intra prediction unit 203 is operable to generate residual blocks based on the determined intra prediction mode and data from previously decoded blocks of the current frame or picture.
  • the motion compensation unit 204 determines the prediction information for the video decoding block by parsing motion vectors and other associated syntax elements, and uses the prediction information to generate the predictive properties of the video decoding block being decoded block; a decoded video block is formed by summing the residual block from the inverse transform and inverse quantization unit 202 with the corresponding predictive block produced by the intra prediction unit 203 or the motion compensation unit 204; the decoded video signal Video quality can be improved by filtering unit 205 in order to remove block artifacts; the decoded video blocks are then stored in the decoded picture buffer unit 206, which stores reference pictures for subsequent intra prediction or motion compensation , and is also used for the output of the video signal, that is, the restored original video signal is obtained.
  • the method provided in the embodiment of the present application can be applied to the filtering unit 108 as shown in FIG. 3A (indicated by a black bold box), and can also be applied to the filtering unit 205 as shown in FIG. 3B (indicated by a bold black box). That is to say, the method in the embodiment of the present application can be applied not only to a video encoding system (referred to as “encoder” for short), but also to a video decoding system (referred to as “decoder” for short), and can even be applied to both A video encoding system and a video decoding system, but there is no limitation here.
  • the "current block” specifically refers to the block currently to be encoded in the video image (also referred to as “coding block” for short); when the embodiment of the present application is applied For a decoder, the “current block” specifically refers to a block currently to be decoded in a video image (may also be referred to simply as a "decoded block”).
  • FIG. 4 shows a schematic flowchart of a decoding method provided in an embodiment of the present application.
  • the method may include:
  • S401 Parse the code stream, and determine a value of the first syntax element identification information.
  • each decoding block may include the first image component, the second image component and the third image component; and the current block is the first image component, the second image component or the third image component loop currently to be performed in the video image The decoded block for the filtering process.
  • the embodiment of the present application can divide them into two types of color components such as luminance component and chrominance component.
  • the current block can also be called a luma block; or, if the current block performs chrominance component prediction, inverse transformation and inverse quantization, loop filtering and other operations, then the current block may also be called a chroma block.
  • the embodiment of the present application specifically provides a loop filtering method, especially a model adaptive selection method based on deep learning, which is applied to the filtering shown in Figure 3B Unit 205 section.
  • the filtering unit 205 may include a deblocking filter (DBF), a sample adaptive compensation filter (SAO), a residual neural network-based loop filter (CNNLF) and an adaptive correction filter (ALF),
  • DPF deblocking filter
  • SAO sample adaptive compensation filter
  • CNLF residual neural network-based loop filter
  • ALF adaptive correction filter
  • the embodiment of the present application proposes a model adaptive selection module based on deep learning, which is used to adaptively select a loop filter network model (such as a CNNLF model) to improve coding performance.
  • a loop filter network model such as a CNNLF model
  • the loop filter can also include a model adaptive selection module (Model Adaptive Selection, MAS), and the model adaptive selection module is located in SAO filtering and CNNLF filtering between.
  • the use of the model adaptive selection module does not depend on the switches of DBF, SAO, CNNLF and ALF, but is placed before CNNLF in position.
  • model adaptive selection module can be regarded as a preset selection network model composed of a multi-layer convolutional neural network and a multi-layer fully connected neural network. Using the preset selection network model, an appropriate model can be selected to perform CNNLF filter processing.
  • a first syntax element identification information can be set, and then determined according to the value of the first syntax element identification information obtained through decoding.
  • the method may also include:
  • the value of the first syntax element identification information is the first value, it is determined that the first syntax element identification information indicates that the current block allows model selection using a preset selection network model; or,
  • the value of the first syntax element identification information is the second value, it is determined that the first syntax element identification information indicates that the current block is not allowed to use the preset selection network model for model selection.
  • first value and the second value are different, and the first value and the second value may be in the form of parameters or numbers.
  • the first syntax element identification information may be a parameter written in a profile (profile), or a value of a flag (flag), which is not limited in this embodiment of the present application.
  • the first value can be set to 1, and the second value can be set to 0; or, the first value can also be set to is true, the second value can also be set to false; or, the first value can also be set to 0, and the second value can also be set to 1; or, the first value can also be set to false, and the second value can also be set to true.
  • the flag generally, the first value may be 1, and the second value may be 0, but this is not limited.
  • the preset selection network model can be regarded as a neural network
  • the identification information of the first syntax element can be regarded as an enabling flag for model adaptive selection based on a neural network, which can be represented by model_adaptive_selection_enable_flag here.
  • model_adaptive_selection_enable_flag can be used to indicate whether the current block allows model selection using the preset selection network model.
  • the preset selection network model is to perform model selection on multiple candidate loop filter network models, before determining whether the current block is allowed to use the preset selection network model for model selection, it is first necessary to determine whether the current block uses the loop filtering network model.
  • the filtering network model is used for filtering processing. In this way, if the current block uses the loop filtering network model for filtering processing, then the preset selection network model can be used for model selection at this time; otherwise, if the current block does not use the loop filtering network model for filtering processing, then at this time the There is no need to utilize this preset selection network model for model selection.
  • a sequence header identification information can be set, for example, a second syntax element identification information can be set, which is used to indicate whether the current video sequence uses The loop filtering network model performs filtering processing.
  • the method may further include: parsing the code stream, and determining the value of the second syntax element identification information; wherein, the second syntax element identification information is used to indicate whether the video sequence is filtered using a loop filtering network model .
  • the method may also include:
  • the value of the second syntax element identification information is the first value, it is determined that the second syntax element identification information indicates that the video sequence uses a loop filtering network model for filtering processing; or,
  • the value of the second syntax element identification information is the second value, it is determined that the second syntax element identification information indicates that the video sequence does not use the loop filtering network model for filtering processing.
  • first value and the second value are different.
  • the first value can be set to 1, and the second value can be set to 0; Or, the first value can also be set to true, and the second value can also be set to false; or, the first value can also be set to 0, and the second value can also be set to 1; or, the first value can also be set to false , the second value can also be set to true.
  • the embodiment of the present application does not make any limitation on this.
  • the video sequence includes at least one frame, and the at least one frame may include the current frame.
  • the embodiment of the present application needs to further determine whether the current frame in the video sequence uses the loop filtering network model for filtering processing, that is, it is also necessary to set a first Three syntax element identification information.
  • the method may further include: parsing the code stream, and determining the value of the third syntax element identification information; wherein, the third syntax element identification information is used to indicate whether the current frame in the video sequence uses The loop filtering network model performs filtering processing.
  • the method may further include:
  • the code stream is parsed to determine a value of the third syntax element identification information.
  • the parsing the code stream to determine the value of the third syntax element identification information may include:
  • the third syntax element identification information may be referred to as the first luminance syntax element identification information at this time, which is used to indicate whether the luminance component of the current frame is filtered using the loop filtering network model deal with.
  • the third syntax element identification information at this time may be called chroma syntax element identification information, which is used to indicate whether the chroma components of the current frame are filtered using the chroma loop filter network model.
  • the current frame may be divided into blocks, and the at least one block may include the current block.
  • the current frame uses the loop filtering network model for filtering processing, it does not mean that every block in the current frame uses the loop filtering network model for filtering processing, and may also involve CTU-level syntax element identification Information to further determine whether the current block uses the loop filtering network model for filtering processing.
  • the two types of color components, the luma component and the chrominance component, will be described below as examples.
  • the parsing the bitstream to determine the value of the first syntax element identification information may include:
  • the first luminance syntax element identification information indicates that the luminance component of the current frame is filtered using the luminance loop filter network model, analyze the code stream, and determine the value of the second luminance syntax element identification information;
  • the step of parsing the code stream and determining the value of the first syntax element identification information is performed.
  • the frame-level syntax element may be referred to as the first luma syntax element identification information, represented by luma_frame_flag; the CTU-level syntax element may be referred to as the second luma syntax element identification information, represented by luma_ctu_flag.
  • the embodiment of the present application can also set a luminance frame-level switch and a luminance CTU-level switch, and determine whether to use the luminance loop filter network model by controlling whether to enable the loop filter network model of the luminance component Perform filtering. Therefore, in some embodiments, the method may further include: setting a luma frame-level switch and a luma CTU-level switch. Wherein, the current block is located in the current frame. Among them, the brightness frame-level switch can be used to control whether the brightness component of the current frame is filtered using the brightness loop filtering network model, and the brightness CTU-level switch can be used to control whether the brightness component of the current block is filtered using the brightness loop filtering network model deal with.
  • the method may further include:
  • the value of the first luminance syntax element identification information is the first value, it is determined that the first luminance syntax element identification information indicates that the luminance component of the current frame is filtered using the luminance loop filtering network model; or,
  • the value of the first luminance syntax element identification information is the second value, it is determined that the first luminance syntax element identification information indicates that the luminance component of the current frame does not use the luminance loop filter network model for filtering processing.
  • the method may also include:
  • the value of the first luma syntax element identification information is the first value, turn on the luma frame-level switch; or,
  • the luma frame-level switch is turned off.
  • the first value and the second value are different, and the first value and the second value may be in the form of parameters or numbers.
  • the value of the first brightness syntax element identification information may be a parameter written in a profile (profile), or a value of a flag (flag), which is not limited in this embodiment of the present application.
  • the first value can be set to 1, and the second value can be set to 0 ; Or, the first value can also be set to true, and the second value can also be set to false; or, the first value can also be set to 0, and the second value can also be set to 1; or, the first value can also be set to false, the second value can also be set to true.
  • the embodiment of the present application does not make any limitation on this.
  • the luma frame-level switch can be turned on, that is, the frame-level loop filter network model is invoked, which At this time, it can be determined that the luminance component of the current frame is filtered using the luminance loop filter network model. Otherwise, if the value of the identification information of the first luma syntax element is 0, then the luma frame-level switch can be turned off, that is, the frame-level loop filtering network model is not invoked, and at this time it can be determined that the luma component of the current frame does not use luma loop filtering The network model performs filtering processing. At this time, the next frame can be obtained from the video sequence, and the next frame is determined as the current frame, and then the step of parsing the code stream and determining the value of the first luma syntax element identification information is continued.
  • the method may further include:
  • the value of the second luminance syntax element identification information is the first value, it is determined that the second luminance syntax element identification information indicates that the luminance component of the current block is filtered using the luminance loop filtering network model; or,
  • the second luminance syntax element identification information is the second value, it is determined that the second luminance syntax element identification information indicates that the luminance component of the current block does not use the luminance loop filter network model for filtering processing.
  • the method may also include:
  • the value of the second brightness syntax element identification information is the first value, turn on the brightness CTU level switch; or,
  • the luma CTU level switch is turned off.
  • first value and the second value are different.
  • the first value can be set to 1, and the second value can be set to 0; or, the first value can also be set to true, and the second value can also be set to false; or, the first value can also be set to 0, and the second value can also be set to 1; or, the first value can also be set to is false, the second value can also be set to true.
  • the embodiment of the present application does not make any limitation on this.
  • the luminance CTU-level switch that is, call the CTU-level loop filter network model. At this time, it can be determined that the luminance component of the current block is filtered using the luminance loop filter network model. Otherwise, if the value of the second luminance syntax element identification information is 0, then the luminance CTU-level switch can be turned off, that is, the CTU-level loop filter network model is not called, and at this time it can be determined that the luminance component of the current block does not use the luminance loop filter The network model performs filtering processing.
  • next block can be obtained from the current frame, and the next block can be determined as the current block, and then continue to perform the steps of parsing the code stream and determining the value of the second luma syntax element identification information until the current All the blocks included in the frame are processed, and then the next frame is loaded to continue processing.
  • the parsing the bitstream to determine the value of the first syntax element identification information may include:
  • the step of parsing the code stream and determining the value of the first syntax element identification information is performed.
  • the frame-level syntax element may be called chroma syntax element identification information, represented by chroma_frame_flag.
  • the chroma syntax element identification information indicates that the chroma component of the current frame is filtered using the chroma loop filter network model, then the blocks included in the current frame are all
  • the chroma loop filtering network model is used for filtering; if the chroma syntax element identification information indicates that the chroma components of the current frame do not use the chroma loop filtering network model for filtering processing, then the blocks included in the current frame are not filtered by default. Filtering is performed using a chroma loop filtering network model.
  • the method may further include: setting a chroma frame-level switch.
  • the chroma frame-level switch can be used to control whether the chroma component of the current frame is filtered using the chroma loop filter network model.
  • the method may further include:
  • the value of the chroma syntax element identification information is the first value, it is determined that the chroma syntax element identification information indicates that the chroma components of the current frame are filtered using the chroma loop filter network model; or,
  • the chroma syntax element identification information indicates that the chroma component of the current frame does not use the chroma loop filter network model for filtering processing.
  • the method may also include:
  • the value of the chroma syntax element identification information is the first value, turn on the chroma frame-level switch; or,
  • the chroma frame-level switch is turned off.
  • first value and the second value are different, and the first value and the second value may be in the form of parameters or numbers.
  • either the chroma syntax element identification information or the chroma frame-level switch can be a parameter written in a profile, or a value of a flag, and this embodiment of the present application does not make any Any restrictions.
  • the first value can be set to 1, and the second value can be set to 0 ; Or, the first value can also be set to true, and the second value can also be set to false; or, the first value can also be set to 0, and the second value can also be set to 1; or, the first value can also be set to false, the second value can also be set to true.
  • the embodiment of the present application does not make any limitation on this.
  • the chroma frame-level switch can be turned on, that is, the frame-level loop filter network model is invoked, which It can be determined that the chroma component of the current frame is filtered using the chroma loop filter network model, and by default, each block corresponding to the chroma component of the current frame is filtered using the chroma loop filter network model.
  • the chroma frame-level switch can be turned off, that is, the frame-level loop filtering network model is not called, and at this time, each block corresponding to the chroma component of the current frame can be determined
  • the chroma loop filter network model is not used for filtering processing.
  • the next frame can be obtained from the video sequence, and the next frame can be determined as the current frame, and then continue to analyze the code stream to determine the identity information of the chroma syntax element. Steps to get value.
  • the syntax element identification information can be set one by one according to the luma component and chrominance component, and then determined by parsing the code stream; The syntax element identification information may also be set only for the current block and/or current frame, and then determined by parsing the bitstream.
  • the syntax element identification information (such as the second syntax element identification information, the first brightness syntax element identification information, the second brightness syntax element identification information, and the color degree syntax element identification information, etc.), but it is not specifically limited here.
  • the decoder can determine whether the current block uses ring filter network model (including luma loop filter network model or chroma loop filter network model) for filtering processing.
  • the code stream can be further analyzed to obtain the value of the first syntax element identification information, and then determine whether the current block allows model selection using a preset selection network model.
  • the loop filter network used by the current block can be determined according to the preset selection network model Model.
  • the selecting the network model according to the preset and determining the loop filtering network model used by the current block may include:
  • a loop filtering network model used by the current block is determined according to respective output values corresponding to at least one candidate loop filtering network model.
  • the determining the loop filtering network model used by the current block according to the corresponding output values of at least one candidate loop filtering network model may include:
  • a target value is determined from output values corresponding to at least one candidate loop filter network model, and the candidate loop filter network model corresponding to the target value is used as the loop filter network model used by the current block.
  • determining the target value from the corresponding output values of at least one candidate loop filtering network model may include: selecting the maximum value from the corresponding output values of at least one candidate loop filtering network model, The maximum value is used as the target value.
  • the output values corresponding to at least one candidate loop filter network model can be obtained respectively; and then the target value (such as the maximum value ), the candidate loop filter network model corresponding to the target value (such as the maximum value) is used as the loop filter network model used by the current block.
  • the output value may be a probability value.
  • the respective output values of the at least one candidate loop filter network model may be used to reflect the respective probability distributions of the at least one candidate loop filter network model.
  • the preset selection network models here are also different.
  • the preset selection network model corresponding to the luma component may be called a luma selection network model
  • the preset selection network model corresponding to the chroma component may be called a chroma selection network model. Therefore, in some embodiments, the determining the preset selection network model of the current block may include:
  • the color component type of the current block is a brightness component
  • the color component type of the current block is a chroma component
  • the candidate loop filtering network models are also different.
  • the candidate loop filter network model corresponding to the luma component may be called a candidate luma loop filter network model
  • the color component type of the current block is a brightness component
  • the corresponding output values of at least one candidate chroma loop filtering network model are determined according to the chroma selection network model.
  • a color component type it may include a luma component and a chrominance component.
  • the color component type of the current block is a brightness component
  • the current block at this time can be called a brightness block
  • the respective output values of the luma loop filter network models are described.
  • the current block at this time can be called a chroma block, then it is necessary to determine the chroma selection network model of the current block, and then determine at least one candidate chroma ring according to the chroma selection network model The corresponding output values of the road filter network models.
  • candidate loop filter network model corresponding to at least one luminance component which may be referred to simply as a “candidate luminance loop filter network model”
  • candidate loop filter network model corresponding to at least one chrominance component may be referred to simply as “candidate chroma loop filter network models”
  • the method may further include:
  • the first training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter
  • the first neural network structure is trained by using the luminance components of the training samples in the first training set to obtain at least one candidate luminance loop filter network model.
  • At least one candidate luminance loop filter network model is determined by performing model training on the first neural network structure according to at least one training sample, and the at least one candidate luminance loop filter network model and the color component type and quantization parameter There is a corresponding relationship between them.
  • the first neural network structure includes at least one of the following: a convolutional layer, an activation layer, a residual block, and a skip connection layer.
  • the first neural network structure may include a first convolution module, a first residual module, a second convolution module and a first connection module.
  • the input of the first neural network structure is a reconstructed brightness frame, and the output is an original brightness frame;
  • the first neural network structure includes: a first convolution module 601, a first residual module 602 , the second convolution module 603 and the first connection module 604 .
  • the first convolution module 601, the first residual module 602, the second convolution module 603 and the first connection module 604 are connected in sequence, and the first connection module 604 is also connected with the first convolution module 601 input connection.
  • the first convolution module may consist of one convolution layer and one activation layer
  • the second convolution module may consist of two convolution layers and A layer of activation layer
  • the connection module can be composed of a jump connection layer
  • the first residual module can include several residual blocks
  • each residual block can be composed of two convolutional layers and one activation layer.
  • the method may further include:
  • the first training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter
  • At least one candidate chroma loop filter network model is determined by performing model training on the second neural network structure according to at least one training sample, and at least one candidate chroma loop filter network model and color component type and quantization parameter There is a corresponding relationship between them.
  • the second neural network structure includes at least one of the following: a sampling layer, a convolutional layer, an activation layer, a residual block, a pooling layer, and a skip connection layer.
  • the second neural network structure may include an upsampling module, a third convolution module, a fourth convolution module, a fusion module, a second residual module, a fifth convolution module and a second connection module .
  • the input of the second neural network structure is a reconstructed luma frame and a reconstructed chrominance frame, and the output is an original chrominance frame;
  • the second neural network structure includes: an upsampling module 605, a third Convolution module 606 , fourth convolution module 607 , fusion module 608 , second residual module 609 , fifth convolution module 610 and second connection module 611 .
  • the input of the up-sampling module 605 is the reconstructed chroma frame, the up-sampling module 605 is connected with the third convolution module 606; the input of the fourth convolution module 607 is the reconstructed luminance frame, and the third convolution module 606 and the fourth convolution module 607 are connected with the fusion module 608, the fusion module 608, the second residual module 609, the fifth convolution module 610 and the second connection module 611 are connected in sequence, and the second connection module 611 is also connected with the above Input connection to sampling module 605 .
  • the third convolution module may consist of a convolution layer and an activation layer
  • the fourth convolution module may consist of a convolution layer and One layer of activation layer
  • the fifth convolution module can be composed of two layers of convolution layer, one layer of activation layer and one layer of pooling layer
  • the connection module can be composed of jump connection layer
  • the second residual module can include several Residual blocks, and each residual block can consist of two convolutional layers and one activation layer.
  • CNNLF designs different network structures for the luma component and the chrominance component respectively.
  • a first neural network structure is designed, see FIG. 6C for details;
  • a second neural network structure is designed, see FIG. 6D for details.
  • the entire network structure can be composed of convolutional layers, activation layers, residual blocks, skip connection layers and other parts.
  • the convolution kernel of the convolutional layer can be 3 ⁇ 3, that is, it can be represented by 3 ⁇ 3 Conv;
  • the activation layer can be a linear activation function, that is, it can be represented by a linear rectification function (Rectified Linear Unit, ReLU), which can also be called In order to correct the linear unit, it is a commonly used activation function in artificial neural networks, usually referring to the nonlinear function represented by the ramp function and its variants.
  • the network structure of the residual block is shown in the dotted box in Figure 7, which can be composed of a convolutional layer (Conv), an activation layer (ReLU), and a jump connection layer.
  • the jump connection layer refers to a global jump connection from input to output included in the network structure, which enables the network to focus on learning residuals and accelerates the convergence process of the network.
  • the luminance component is introduced here as one of the inputs to guide the filtering of the chrominance component.
  • the entire network structure can be composed of a convolutional layer, an activation layer, a residual block, a pooling layer, a jump Connection layer and other components. Due to the inconsistency of the resolution, the chroma component needs to be up-sampled first. In order to avoid introducing other noises during the upsampling process, the resolution can be expanded by directly copying adjacent pixels to obtain an enlarged chroma frame (Enlarged chroma frame).
  • a pooling layer (such as an average pooling layer, represented by 2 ⁇ 2 AvgPool) is also used to complete the downsampling of the chroma component.
  • a pooling layer such as an average pooling layer, represented by 2 ⁇ 2 AvgPool
  • 4 I-frame luminance component models, 4 non-I-frame luminance component models, 4 chroma U component models, 4 chroma V component models, etc. can be trained offline for a total of 16 candidate rings. road filter network model.
  • the preset selection network model of the current block also needs to be determined.
  • the corresponding preset selection network models are also different.
  • the preset selection network model corresponding to the luma component may be called a luma selection network model
  • the preset selection network model corresponding to the chroma component may be called a chroma selection network model.
  • the determining the brightness selection network model of the current block may include:
  • the selected candidate brightness selection network model is determined as the brightness selection network model of the current block.
  • the determining the chroma selection network model of the current block may include:
  • the selected candidate chrominance selection network model is determined as the chrominance selection network model of the current block.
  • the preset selection network model of the current block is not only related to quantization parameters, but also related to color component types. Among them, different color component types correspond to different preset selection network models.
  • the preset selection network model can be a brightness selection network model related to the brightness component; It is assumed that the selection network model may be a chrominance selection network model related to chrominance components.
  • At least one candidate brightness selection network model and at least one candidate brightness selection network model can be trained in advance. Chroma selection network model.
  • the candidate brightness selection network model corresponding to the quantization parameter can be selected from at least one candidate brightness selection network model, that is, the brightness selection network model of the current block;
  • a candidate chrominance selection network model corresponding to the quantization parameter is selected from the chrominance selection network model, that is, the chrominance selection network model of the current block.
  • the method may further include:
  • the second training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter
  • At least one candidate luma selection network model has a corresponding relationship with luma components and quantization parameters
  • at least one candidate chrominance selection network model has a corresponding relationship with chrominance components and quantization parameters.
  • At least one candidate brightness selection network model and the at least one candidate chrominance selection network model are respectively determined by performing model training on the third neural network structure according to at least one training sample, and the at least one candidate brightness selection network model
  • Each of the at least one candidate chrominance selection network model has a corresponding relationship with a color component type and a quantization parameter.
  • the third neural network structure may include at least one of the following: a convolutional layer, a pooling layer, a fully connected layer, and an activation layer.
  • the third neural network structure may include a sixth convolution module and a fully connected module, and the sixth convolution module and the fully connected module are connected in sequence.
  • the sixth convolution module may include several convolution sub-modules, and each convolution sub-module may consist of a convolutional layer and a pooling layer; the fully connected module may include several Each fully connected submodule can be composed of a fully connected layer and an activation layer.
  • the preset selection network model can be composed of multi-layer convolutional neural network and multi-layer fully connected layer neural network, and then use the training samples for deep learning to obtain the preset selection network model of the current block, such as the brightness selection network model Or the Chroma select network model.
  • deep learning is a kind of machine learning, and machine learning is the necessary path to realize artificial intelligence.
  • the concept of deep learning originates from the research of artificial neural networks, and a multi-layer perceptron with multiple hidden layers is a deep learning structure.
  • Deep learning can discover the distributed feature representation of data by combining low-level features to form more abstract high-level representation attribute categories or features.
  • CNN Convolutional Neural Networks
  • feedforward Neural Networks Feedforward Neural Networks
  • Learning is one of the representative algorithms.
  • the preset selection network model here may be a convolutional neural network structure.
  • the embodiment of the present application also designs a third neural network structure, see FIG. 8A and FIG. 8B for details.
  • the input of the third neural network structure is the reconstructed frame, and the output is the probability distribution of each candidate loop filter network model when the current block uses the loop filter network model.
  • the third neural network structure includes: a sixth convolution module 801 and a fully connected module 802, and the sixth convolution module 801 and the fully connected module 802 are connected in sequence.
  • the sixth convolution module 801 can include several convolution sub-modules, each convolution sub-module can be composed of one layer of convolution layer and one layer of pooling layer;
  • the full connection module 802 can include several full connection sub-modules , each fully connected sub-module can consist of a fully connected layer and an activation layer.
  • the third neural network structure may consist of a multi-layer convolutional neural network and a multi-layer fully connected neural network.
  • the network structure may include K layers of convolution layers, M layers of pooling layers, L layers of fully connected layers and N layers of activation layers, and K, M, L, and N are all integers greater than or equal to 1.
  • the network structure shown in Figure 8B it can be composed of 3 layers of convolutional layers and 2 layers of fully connected layers, and a pooling layer is provided after each layer of convolutional layers; wherein, the convolution kernel of the convolutional layer can be is 3 ⁇ 3, that is, it can be represented by 3 ⁇ 3 Conv; the pooling layer can use the maximum pooling layer, represented by 2 ⁇ 2 MaxPool; in addition, an activation layer is set after the fully connected layer, here, the activation layer can be A linear activation function can also be a nonlinear activation function, such as ReLU and Softmax.
  • the probability distribution of the at least one candidate loop filtering network model may also be determined.
  • the determining the corresponding output values of at least one candidate loop filter network model according to the preset selection network model may include:
  • the input reconstructed image block is input into the preset selection network model to obtain the corresponding output values of at least one candidate loop filtering network model.
  • the loop filtering network model may refer to the aforementioned CNNLF model.
  • the input reconstructed image block of the CNNLF model is used as the input of the preset selection network model, and the output of the preset selection network model is at least The respective probability distributions of a candidate loop filter network model. That is, after obtaining the probability value of the at least one candidate loop filter network model, the loop filter network model used by the current block may be determined according to the magnitude of the probability value. Specifically, the maximum probability value may be selected from the probability values of at least one candidate loop filter network model, and the candidate loop filter network model corresponding to the maximum probability value may be determined as the loop filter network model used by the current block.
  • the preset selection network model includes the brightness selection network model and the chrominance selection network model; thus, for the input reconstruction image block, the input reconstruction brightness image block and the input Reconstruct chroma image patches.
  • the selecting the network model according to the preset and determining the loop filtering network model used by the current block may include:
  • a target value is determined from output values corresponding to at least one candidate luma loop filter network model, and the candidate luma loop filter network model corresponding to the target value is used as the luma loop filter network model used by the current block.
  • the selecting the network model according to the preset and determining the loop filtering network model used by the current block may include:
  • the target value is determined from output values corresponding to at least one candidate chroma loop filter network model, and the candidate chroma loop filter network model corresponding to the target value is used as the chroma loop filter network model used by the current block.
  • the selected loop filter network model can be used to filter the current block.
  • S403 Perform filtering processing on the current block by using the loop filter network model to obtain a reconstructed image block of the current block.
  • the loop filtering network model described in the embodiment of the present application may be a CNNLF model.
  • the CNNLF filtering process can be performed on the current block by using the selected CNNLF model to obtain a reconstructed image block of the current block.
  • the input reconstructed image block can be deblocking filter and sample adaptive compensation filter obtained after filtering.
  • the method may further include: after determining the reconstructed image block of the current block, continue to filter the reconstructed image block by using an adaptive correction filter.
  • FIG. 9 shows a schematic diagram of an overall framework of using a preset selection network model provided by an embodiment of the present application.
  • the input of the network structure is the input reconstructed luminance image block or the input reconstructed chrominance image block of the CNNLF model
  • the output of the network structure is the probability value of each CNNLF model
  • the input reconstructed image block is obtained after filtering through the deblocking filter (DBF) and the sample adaptive compensation filter (SAO), and then through the model adaptive selection module and the CNNLF model
  • DPF deblocking filter
  • SAO sample adaptive compensation filter
  • the obtained reconstructed image block can also be input into an adaptive correction filter (ALF) to continue filtering processing.
  • ALF adaptive correction filter
  • This embodiment also provides a decoding method, which is applied to a decoder. Determine the value of the first syntax element identification information by parsing the code stream; when the first syntax element identification information indicates that the current block allows the use of a preset selection network model for model selection, determine the default selection network model of the current block, and according to The network model is preset to be selected to determine the loop filter network model used by the current block; the current block is filtered by the loop filter network model to obtain the reconstructed image block of the current block.
  • the preset selection network model is used to select at least one candidate loop filter network model, and then the current block is filtered according to the selected loop filter network model, Not only can the encoding performance be improved, thereby improving the encoding and decoding efficiency; but also the reconstructed image block that is finally output can be closer to the original image block, and the video image quality can be improved.
  • FIG. 10 shows a schematic flowchart of another decoding method provided by the embodiment of the present application. As shown in Figure 10, the method may include:
  • S1001 Parse the code stream, and determine the value of the first syntax element identification information.
  • S1003 Determine the loop filtering network model used by the current block from at least one candidate loop filtering network model according to the index number of the loop filtering network model.
  • S1004 Perform filtering processing on the current block by using a loop filter network model to obtain a reconstructed image block of the current block.
  • a first syntax element identification information can be set, and then according to the value of the first syntax element identification information obtained by decoding Sure.
  • the first syntax element identification information may be represented by model_adaptive_selection_enable_flag.
  • model_adaptive_selection_enable_flag if the value of model_adaptive_selection_enable_flag is the first value, it can be determined that the current block allows model selection using a preset selection network model; or, if the value of model_adaptive_selection_enable_flag is the second value, it can be determined that the current block does not allow model selection using the preset selection network model.
  • the first value may be 1, and the second value may be 0, but there is no limitation here.
  • the model adaptive selection module on the decoder side its index number can be encoded and written into the code according to the CNNLF model selected by the model adaptive selection module on the encoder side. stream, and then the CNNLF model used by the current block can be determined in the decoder according to the index number of the parsed CNNLF model and filtered, thereby reducing the complexity of the decoder.
  • the number of convolutional layers, the number of fully connected layers, and the nonlinear activation function can all be Make adjustments.
  • the loop filter network model targeted by the model adaptive selection module may also perform model adaptive selection for other efficient neural network filter models, and this embodiment of the present application does not make any limitation thereto.
  • the embodiment of the present application proposes a model adaptive selection module based on deep learning, which is used for adaptive selection of the CNNLF model to improve coding performance.
  • the model adaptive selection module can be regarded as a preset selection network model composed of a multi-layer convolutional neural network and a multi-layer fully connected neural network. Its input is the input reconstruction image block of the CNNLF model, and the output is the probability distribution of each CNNLF model happensing.
  • the position of the model adaptive selection module in the encoder/decoder is shown in Figure 5.
  • the use of the model adaptive selection module does not depend on the switch of DBF, SAO, ALF, and CNNLF, but it is placed before CNNLF in position.
  • the decoder acquires and parses the code stream, and when it is parsed to the loop filter module, it will be processed according to the preset filter order.
  • the preset filter sequence is DBF filter ⁇ SAO filter ⁇ model adaptive selection module ⁇ CNNLF filter ⁇ ALF filter.
  • model_adaptive_selection_enable_flag it is judged whether the model adaptive selection module is allowed to be used in the current block for model selection. If model_adaptive_selection_enable_flag is "1", then try to perform model adaptive selection module processing on the current block, and jump to (b); if model_adaptive_selection_enable_flag is "0", then jump to (e);
  • the reconstructed luminance image blocks of the CNNLF model are used as the input of the model adaptive selection module, and the output is the probability distribution of each luminance CNNLF model. Select the model with the largest probability value as the CNNLF model of the current brightness image block, and perform CNNLF filter processing on the current brightness image block to obtain the final reconstructed image block;
  • the reconstructed chrominance image blocks of the CNNLF model are used as the input of the model adaptive selection module, and the output is the probability distribution of each chroma CNNLF model. Select the model with the largest probability value as the CNNLF model of the current chrominance image block, and perform CNNLF filtering on the current chrominance image block to obtain the final reconstructed image block;
  • the enable flag for neural network-based model adaptive selection is model_adaptive_selection_enable_flag.
  • the embodiment of the present application introduces the model adaptive selection technology based on deep learning, and inputs the reconstructed image blocks of the CNNLF model of HPM-ModAI into the neural network structure of multi-layer convolutional layer plus fully connected layer, and outputs The probability distribution of each CNNLF model, adaptively select the appropriate CNNLF model for the input reconstruction image block, and then input the input reconstruction image block into the selected CNNLF model for filtering, so that the final output reconstruction image block is closer to the original image block , which can improve encoding performance.
  • the comparison anchor is HPM11.0-ModAI6.0, and the average change of BD-rate on the Y, U, and V components is -1.01 respectively %, 0.00%, and 0.04%, as shown in Table 2; the average changes in BD-rate under the low delay B configuration of the intelligent coding general test condition are -0.86%, -0.21%, and -0.30%, as shown in Table 3 Show.
  • Table 2 and Table 3 can show that this technical solution improves the coding performance. Specifically, this technical solution can bring the existing AVS3 intelligent coding reference software HPM-ModAI a Comes with a nice performance gain.
  • the introduction of the model adaptive selection technology based on deep learning can not only improve the coding performance, but also improve Encoding and decoding efficiency; and it can also make the final output reconstructed image block closer to the original image block, which can improve the video image quality.
  • FIG. 11 shows a schematic flowchart of an encoding method provided in an embodiment of the present application. As shown in Figure 11, the method may include:
  • S1101 Determine a value of the first syntax element identification information.
  • each encoding block may include a first image component, a second image component, and a third image component; and the current block is the first image component, the second image component, or the third image component to be looped currently in the video image Coded blocks for filtering processing.
  • the embodiment of the present application can divide them into two types of color components such as luminance component and chrominance component.
  • the current block can also be called a luma block; or, if the current block performs chrominance component prediction, inverse transformation and inverse quantization, loop filtering and other operations, then the current block may also be called a chroma block.
  • the embodiment of the present application specifically provides a loop filtering method, especially a model adaptive selection method based on deep learning, which is applied to the filtering shown in Figure 3A Unit 108 part.
  • the filtering unit 108 may include a deblocking filter (DBF), a sample adaptive compensation filter (SAO), a residual neural network-based loop filter (CNNLF) and an adaptive correction filter (ALF),
  • DPF deblocking filter
  • SAO sample adaptive compensation filter
  • CNLF residual neural network-based loop filter
  • ALF adaptive correction filter
  • the embodiment of the present application proposes a model adaptive selection module based on deep learning. For details, see the model adaptive selection module shown in FIG. Adaptive selection to improve encoding performance.
  • whether the current block is allowed to use a preset selection network model for model selection may be indicated by a first syntax element identification information.
  • the determining the value of the identification information of the first syntax element includes:
  • the current block allows model selection using a preset selection network model, then determine that the value of the first syntax element identification information is the first value; or,
  • the method further includes: encoding the value of the identification information of the first syntax element, and writing the encoded bits into the code stream.
  • a first syntax element identification information may be set to indicate whether the current block allows to use a preset selection network model for model selection.
  • the current block allows the use of the preset selection network model for model selection, then it can be determined that the value of the first syntax element identification information is the first value; if the current block does not allow the use of the preset selection network model for model selection, then It may be determined that the value of the identification information of the first syntax element is a second value.
  • the value of the first syntax element identification information is written into the code stream for transmission to the decoder, so that the decoder can parse the code stream Gets whether the current block allows model selection using a preset selection network model.
  • the first value and the second value are different, and the first value and the second value may be in the form of parameters or numbers.
  • the first syntax element identification information may be a parameter written in a profile (profile), or a value of a flag (flag), which is not limited in this embodiment of the present application.
  • the first value can be set to 1, and the second value can be set to 0; or, the first value can also be set to is true, the second value can also be set to false; or, the first value can also be set to 0, and the second value can also be set to 1; or, the first value can also be set to false, and the second value can also be set to true.
  • the flag generally, the first value may be 1, and the second value may be 0, but this is not limited.
  • the preset selection network model can be regarded as a neural network
  • the identification information of the first syntax element can be regarded as an enabling flag for model adaptive selection based on a neural network, which can be represented by model_adaptive_selection_enable_flag here.
  • model_adaptive_selection_enable_flag can be used to indicate whether the current block allows model selection using a preset selection network model.
  • model_adaptive_selection_enable_flag 1
  • model_adaptive_selection_enable_flag 0
  • the current block can be determined according to the preset selection network model.
  • the selecting the network model according to the preset and determining the loop filtering network model used by the current block may include:
  • a loop filtering network model used by the current block is determined according to respective output values corresponding to at least one candidate loop filtering network model.
  • the determining the loop filtering network model used by the current block according to the respective output values of at least one candidate loop filtering network model may include:
  • a target value is determined from output values corresponding to at least one candidate loop filter network model, and the candidate loop filter network model corresponding to the target value is used as the loop filter network model used by the current block.
  • the determining the target value from the corresponding output values of the at least one candidate loop filter network model may include: selecting the maximum value from the output values corresponding to the at least one candidate loop filter network model , taking the maximum value as the target value.
  • the output values corresponding to at least one candidate loop filter network model can be obtained respectively; and then the target value (such as the maximum value ), the candidate loop filter network model corresponding to the target value (such as the maximum value) is used as the loop filter network model used by the current block.
  • the output value may be a probability value.
  • the respective output values of the at least one candidate loop filter network model may be used to reflect the respective probability distributions of the at least one candidate loop filter network model.
  • the preset selection network models here are also different.
  • the preset selection network model corresponding to the luma component may be called a luma selection network model
  • the preset selection network model corresponding to the chroma component may be called a chroma selection network model. Therefore, in some embodiments, the determining the preset selection network model of the current block may include:
  • the color component type of the current block is a brightness component
  • the color component type of the current block is a chroma component
  • the candidate loop filtering network models are also different.
  • the candidate loop filter network model corresponding to the luma component may be called a candidate luma loop filter network model
  • the color component type of the current block is a brightness component
  • the corresponding output values of at least one candidate chroma loop filtering network model are determined according to the chroma selection network model.
  • a color component type it may include a luma component and a chrominance component.
  • the current block at this time can be called a brightness block, then it is necessary to determine the brightness selection network model of the current block, and then at least one can be determined according to the brightness selection network model The corresponding output values of the candidate luma loop filter network models.
  • the current block at this time can be called a chroma block
  • the chroma selection network model of the current block needs to be determined, and then at least one candidate chroma can be determined according to the chroma selection network model The respective output values of the loop filter network models.
  • candidate loop filter network model corresponding to at least one luminance component which may be referred to simply as a “candidate luminance loop filter network model”
  • candidate loop filter network model corresponding to at least one chrominance component may be referred to simply as “candidate chroma loop filter network models”
  • the method may further include:
  • the first training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter
  • the first neural network structure is trained by using the luminance components of the training samples in the first training set to obtain at least one candidate luminance loop filter network model.
  • At least one candidate luminance loop filter network model is determined by performing model training on the first neural network structure according to at least one training sample, and at least one candidate luminance loop filter network model is related to the color component type and quantization parameter have a corresponding relationship.
  • the first neural network structure includes at least one of the following: a convolutional layer, an activation layer, a residual block, and a skip connection layer.
  • the first neural network structure may include a first convolution module, a first residual module, a second convolution module and a first connection module.
  • the first convolution module can be composed of one convolution layer and one activation layer
  • the second convolution module can be composed of two convolution layers and one activation layer
  • the connection The modules can be composed of skip connection layers
  • the first residual module can include several residual blocks
  • each residual block can be composed of two convolutional layers and one activation layer.
  • the method may further include:
  • the first training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter
  • the second neural network structure includes at least one of the following: a sampling layer, a convolutional layer, an activation layer, a residual block, a pooling layer, and a skip connection layer.
  • the second neural network structure may include an upsampling module, a third convolution module, a fourth convolution module, a fusion module, a second residual module, a fifth convolution module and a second connection module .
  • the third convolution module can be composed of one layer of convolution layer and one layer of activation layer
  • the fourth convolution module can be composed of one layer of convolution layer and one layer of activation layer
  • the first The five-convolution module can be composed of two convolutional layers, one activation layer and one pooling layer
  • the connection module can be composed of a jump connection layer
  • the second residual module can include several residual blocks, and each A residual block can consist of two convolutional layers and one activation layer.
  • CNNLF designs different network structures for the luma component and the chrominance component respectively.
  • the luminance component it designed the first neural network structure, see Figure 6A and Figure 6C for details
  • the chrominance component it designed the second neural network structure, see Figure 6B and Figure 6D for details.
  • the preset selection network model of the current block also needs to be determined.
  • the corresponding preset selection network models are also different.
  • the preset selection network model corresponding to the luma component may be called a luma selection network model
  • the preset selection network model corresponding to the chroma component may be called a chroma selection network model.
  • the determining the brightness selection network model of the current block may include:
  • the selected candidate brightness selection network model is determined as the brightness selection network model of the current block.
  • the determining the chroma selection network model of the current block may include:
  • the selected candidate chrominance selection network model is determined as the chrominance selection network model of the current block.
  • the preset selection network model of the current block is not only related to quantization parameters, but also related to color component types. Among them, different color component types correspond to different preset selection network models.
  • the preset selection network model can be a brightness selection network model related to the brightness component; It is assumed that the selection network model may be a chrominance selection network model related to chrominance components.
  • At least one candidate brightness selection network model and at least one candidate brightness selection network model can be trained in advance. Chroma selection network model.
  • the candidate brightness selection network model corresponding to the quantization parameter can be selected from at least one candidate brightness selection network model, that is, the brightness selection network model of the current block;
  • a candidate chrominance selection network model corresponding to the quantization parameter is selected from the chrominance selection network model, that is, the chrominance selection network model of the current block.
  • the method may further include:
  • the second training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter
  • At least one candidate luma selection network model has a corresponding relationship with luma components and quantization parameters
  • at least one candidate chrominance selection network model has a corresponding relationship with chrominance components and quantization parameters.
  • At least one candidate brightness selection network model and the at least one candidate chrominance selection network model are respectively determined by performing model training on the third neural network structure according to at least one training sample, and the at least one candidate brightness selection network model
  • Each of the at least one candidate chrominance selection network model has a corresponding relationship with a color component type and a quantization parameter.
  • the third neural network structure includes at least one of the following: a convolutional layer, a pooling layer, a fully connected layer, and an activation layer.
  • the third neural network structure may include a sixth convolution module and a fully connected module, and the sixth convolution module and the fully connected module are connected in sequence.
  • the sixth convolution module can include several convolution sub-modules, and each convolution sub-module can be composed of one layer of convolution layer and one layer of pooling layer;
  • the fully connected module can include several fully connected sub-modules, each A fully connected sub-module can consist of a fully connected layer and an activation layer.
  • the third neural network structure can be composed of multi-layer convolutional neural network and multi-layer fully connected layer neural network, and then use training samples for deep learning to obtain the preset selection network model of the current block, such as brightness selection network model or chroma select network model.
  • the third neural network structure can be composed of 3 layers of convolutional layers and 2 layers of fully connected layers, and each layer of convolutional layers is followed by a pooling layer; wherein, the convolution of the convolutional layer
  • the kernel can be 3 ⁇ 3, that is, it can be represented by 3 ⁇ 3 Conv; the pooling layer can use the maximum pooling layer, represented by 2 ⁇ 2 MaxPool; in addition, an activation layer is set after the fully connected layer, here, the activation layer It can be a linear activation function or a nonlinear activation function, such as ReLU and Softmax.
  • the probability distribution of the at least one candidate loop filter network model may also be determined.
  • the determining the corresponding output values of at least one candidate loop filter network model according to the preset selection network model may include:
  • the input reconstructed image block is input into the preset selection network model to obtain the corresponding output values of at least one candidate loop filtering network model.
  • the loop filtering network model may refer to the aforementioned CNNLF model.
  • the input reconstructed image block of the CNNLF model is used as the input of the preset selection network model, and the output of the preset selection network model is at least The probability distribution of a candidate loop filter network model. That is, after obtaining the probability value of the at least one candidate loop filter network model, the loop filter network model used by the current block may be determined according to the magnitude of the probability value. Specifically, the maximum probability value may be selected from the probability values of at least one candidate loop filter network model, and the candidate loop filter network model corresponding to the maximum probability value may be determined as the loop filter network model used by the current block.
  • the preset selection network model includes the brightness selection network model and the chrominance selection network model; thus, for the input reconstruction image block, the input reconstruction brightness image block and the input Reconstruct chroma image patches.
  • the selecting the network model according to the preset and determining the loop filtering network model used by the current block may include:
  • a target value is determined from output values corresponding to at least one candidate luma loop filter network model, and the candidate luma loop filter network model corresponding to the target value is used as the luma loop filter network model used by the current block.
  • the selecting the network model according to the preset and determining the loop filtering network model used by the current block may include:
  • the target value is determined from output values corresponding to at least one candidate chroma loop filter network model, and the candidate chroma loop filter network model corresponding to the target value is used as the chroma loop filter network model used by the current block.
  • the selected loop filter network model can be used to filter the current block.
  • the method may further include:
  • the CNNLF model As an example, according to the CNNLF model selected by the model adaptive selection module on the encoder side, encode its index number and write it into the code stream;
  • the index sequence number can directly determine the CNNLF model used by the current block and perform filtering processing, thereby reducing the complexity of the decoder.
  • S1103 Use the loop filter network model to filter the current block to obtain a reconstructed image block of the current block.
  • the preset selection network model is for model selection of multiple candidate loop filter network models
  • after determining whether the current block allows the use of the preset selection network model for model selection if the current block allows the use of the preset Select the network model for model selection, then after selecting the loop filter network model used by the current block, it is necessary to further determine whether the current block uses the loop filter network model for filtering processing. In this way, if the current block determines to use the loop filtering network model for filtering processing, then the loop filtering network model can be used for filtering processing at this time.
  • the syntax element identification information can be set one by one according to the luma component and chrominance component, and then determined by parsing the code stream ; It is also possible to set syntax element identification information only for the current block and/or current frame, and then determine it by parsing the bitstream.
  • the syntax element identification information (such as the second syntax element identification information, the first brightness syntax element identification information, the second brightness syntax element identification information, and the color degree syntax element identification information, etc.), but it is not specifically limited here.
  • a sequence header identification information can be set, for example, a second syntax element identification information can be set to indicate the current video sequence Whether to use the loop filter network model for filtering. Therefore, in some embodiments, the method may also include:
  • the video sequence is determined to use a loop filtering network model for filtering processing, then determine that the value of the second syntax element identification information is the first value; or,
  • the value of the second syntax element identification information is a second value.
  • the method further includes: encoding the value of the identification information of the second syntax element, and writing the encoded bits into the code stream.
  • first value and the second value are different.
  • the first value can be set to 1, and the second value can be set to 0; or, the first value can also be If set to true, the second value can also be set to false; or, the first value can also be set to 0, and the second value can also be set to 1; or, the first value can also be set to false, and the second value can also be set is true.
  • the embodiment of the present application does not make any limitation on this.
  • the video sequence includes at least one frame, and the at least one frame may include the current frame.
  • the embodiment of the present application needs to further determine whether the current frame in the video sequence uses the loop filtering network model for filtering processing, that is, it is also necessary to set a first Three syntax element identification information.
  • the meanings represented by the third syntax element identification information are different according to the difference between the luma component and the chrominance component.
  • the third syntax element identification information is the first luminance syntax element identification information at this time, which is used to indicate whether the luminance component of the current frame is processed using the luminance loop filter network model. Filter processing; for the chroma component of the current frame, at this time, it can be assumed that the third syntax element identification information is the chroma syntax element identification information, which is used to indicate whether the chroma component of the current frame is filtered using the chroma loop filter network model .
  • the loop filtering network model is determined to be a luma loop filtering network model.
  • the method may also include:
  • the value of the first luma syntax element identification information is determined.
  • the determining the value of the first luma syntax element identification information according to the first rate-distortion cost and the second rate-distortion cost may include:
  • first rate-distortion cost value is less than the second rate-distortion cost value, determine that the value of the first luma syntax element identification information is the first value; and/or, if the first rate-distortion cost value is greater than or equal to the second rate-distortion cost value If the cost value is determined, the value of the identification information of the first brightness syntax element is determined to be the second value.
  • the method further includes: encoding the value of the identification information of the first luma syntax element, and writing the encoded bits into the code stream.
  • the method may further include:
  • the third rate-distortion cost is less than the fourth rate-distortion cost, determine that the value of the second luma syntax element identification information is the first value; and/or, if the third rate-distortion cost is greater than or equal to the fourth rate-distortion cost value, determine that the value of the identification information of the second luma syntax element is the second value.
  • the method further includes: encoding the value of the identification information of the second luma syntax element, and writing the encoded bits into the code stream.
  • the frame-level syntax element may be referred to as first luma syntax element identification information
  • the CTU-level syntax element may be referred to as second luma syntax element identification information.
  • the first luma syntax element identification information and the second luma syntax element identification information are flag information
  • the first luma syntax element identification information may be represented by luma_frame_flag
  • the second luma syntax element identification information may be represented by luma_ctu_flag.
  • both the value of the first luminance syntax element identification information and the value of the second luminance syntax element identification information can be determined in a rate-distortion cost manner.
  • the method may further include: performing block division on the current frame, and determining at least one division block; wherein, at least one division block includes the current block;
  • the determination of the first rate-distortion cost value for filtering the luminance component of the current frame using the luminance loop filter network model may include:
  • the calculation of the second rate-distortion cost value of the luminance component of the current frame that is not filtered using the luminance loop filter network model may include:
  • the calculated fourth rate-distortion cost value is accumulated to obtain the second rate-distortion cost value.
  • the distortion value may be determined according to the mean square error.
  • the reconstructed image block of the luminance component of each block can be obtained;
  • the mean square error value, the mean square error value of each block can be obtained, and the mean square error value of the current frame can be obtained through cumulative calculation;
  • D is the mean square error value of the current frame
  • R is the number of blocks included in the current frame
  • is consistent with ⁇ of the adaptive correction filter.
  • the fourth rate-distortion cost value of the luminance component of each block that is not filtered by the luminance loop filter network model can also be calculated, and then the second rate-distortion cost value of the current frame can be obtained through cumulative calculation.
  • the distortion value can also be determined according to the mean square error; the mean square error at this time refers to the average value of the reconstructed image block output by the luminance loop filter network model and the original image block.
  • the square error value, other calculation operations are the same as the calculation of the first rate-distortion cost value, and will not be described in detail here.
  • the filtering network model performs filtering processing.
  • the next frame can be obtained from the video sequence, and the next frame is determined as the current frame, and the calculation of the first rate-distortion cost and the second rate-distortion cost is continued. Otherwise, if the first rate-distortion cost value is smaller than the second rate-distortion cost value, it can be determined that the value of the first luma syntax element identification information is 1, which means that the luma component of the current frame needs to be processed using the luma loop filter network model.
  • the embodiment of the present application can also set a luminance frame-level switch and a luminance CTU-level switch, and determine whether to use the luminance loop filter network model for filtering processing by controlling whether they are turned on.
  • the method may further include: setting a brightness frame-level switch, and the brightness frame-level switch is used to control whether the brightness component of the current frame is filtered using the brightness loop filtering network model;
  • the method may also include:
  • first rate-distortion cost value is smaller than the second rate-distortion cost value, turn on the luma frame-level switch; or,
  • first rate-distortion cost is greater than or equal to the second rate-distortion cost, turn off the luma frame level switch.
  • the method may further include: setting a luma CTU level switch, the luma CTU level switch is used to control whether the luma component of the current block is filtered using the luma loop filter network model;
  • the method may also include:
  • whether it is a brightness frame-level switch or a brightness CTU-level switch can also be determined according to a rate-distortion cost manner.
  • rate-distortion cost it may be determined according to the value of the calculated rate-distortion cost.
  • D represents the reduced distortion value of the current frame after being processed by the brightness loop filter network model
  • D D out -D rec (D out is the distortion after processing by the brightness loop filter network model, and D rec is the brightness loop filter network model Distortion before model processing)
  • R is the number of blocks included in the current frame
  • is consistent with ⁇ of the adaptive correction filter.
  • D represents the reduced distortion value of the current block after being processed by the luma loop filter network model
  • D D out -D rec (D out is the distortion after being processed by the luma loop filter network model
  • D rec is the luma loop filter network model Distortion before model processing).
  • the filtering process on the current block using the loop filtering network model may include: if the third rate-distortion cost value is smaller than the fourth rate-distortion cost value, then using the brightness loop The path filtering network model performs filtering processing on the current block.
  • the luma component two syntax elements are required: frame-level syntax elements and CTU-level syntax elements. Only when the CTU-level syntax element (that is, the second brightness syntax element identification information) indicates that the current block uses the brightness loop filter network model for filtering processing, that is, the third rate-distortion cost value is smaller than the fourth rate-distortion cost value, then it can be used
  • the luma loop filtering network model performs filtering processing on the current block. Only at this time can the current block be allowed to use the preset selection network model for model selection, that is, the step of determining the value of the first syntax element identification information needs to be performed.
  • the method further includes:
  • the fifth rate-distortion cost is less than the sixth rate-distortion cost, determine that the value of the chroma syntax element identification information is the first value; and/or, if the fifth rate-distortion cost is greater than or equal to the sixth rate-distortion cost value, then determine the value of the chroma syntax element identification information as the second value.
  • the method further includes: encoding the value of the chroma syntax element identification information, and writing the encoded bits into the code stream.
  • the frame-level syntax element may be called chroma syntax element identification information, assuming that the chroma syntax element identification information is flag information, it may be represented by chroma_frame_flag.
  • the first value can be set to 1, and the second value can be set to 0; or, the first value can also be set to true, and the second value can also be set to is false; or, the first value can also be set to 0, and the second value can also be set to 1; or, the first value can also be set to false, and the second value can also be set to true.
  • the first value may be 1, and the second value may be 0, but this is not limited thereto.
  • the chroma syntax element identification information indicates that the chroma component of the current frame uses the chroma loop filter network model for filtering, then the blocks included in the current frame use the chroma component by default.
  • the chroma loop filter network model is used for filtering; if the chroma syntax element identification information indicates that the chroma component of the current frame does not use the chroma loop filter network model for filtering, then the blocks included in the current frame do not use chroma by default
  • the loop filtering network model performs filtering processing. Therefore, it is no longer necessary to set a CTU-level syntax element for the chrominance component, and similarly, it is not necessary to set a CTU-level switch.
  • the method may further include: setting a chroma frame-level switch, the chroma frame-level switch is used to control whether the chroma component of the current frame is filtered using the chroma loop filter network model;
  • the method may also include:
  • the chroma frame level switch is turned off.
  • the distortion value can also be determined according to the mean square error, and other calculation operations are the same as calculating the first rate-distortion cost value and The second rate-distortion cost has the same value and will not be described in detail here.
  • the chroma frame-level switch whether it is turned on or not is the same as the implementation of determining whether the luma frame-level switch is turned on, and will not be described in detail here.
  • the chroma frame-level switch can be turned on, and it can also be determined that the value of the chroma syntax element identification information is 1, which means that the current frame's chroma
  • the component needs to use the chroma loop filter network model for filtering processing; after the current frame processing is completed, continue to load the next frame for processing.
  • the chroma frame-level switch can be turned off, and it can also be determined that the value of the chroma syntax element identification information is 0, which means that the current frame's chroma The degree component does not need to use the chroma loop filter network model for filtering processing.
  • the next frame can be obtained from the video sequence, and the next frame can be determined as the current frame, and the next frame can be loaded for processing to determine the next frame. The value of the syntax element identification information of the frame.
  • the filtering process on the current block using the luma loop filtering network model may include: if the fifth rate-distortion cost value is less than the sixth rate-distortion cost value, then using the color The degree loop filtering network model performs filtering processing on the current block.
  • chroma loop filtering network model performs filtering processing on the current block.
  • the loop filtering network model described in the embodiment of the present application may be a CNNLF model.
  • the selected CNNLF model can be used to perform CNNLF filtering on the current block to obtain a reconstructed image block of the current block.
  • CNNLF model can contain two stages of offline training and inference testing.
  • 4 I-frame luminance component models, 4 non-I-frame luminance component models, 4 chroma U component models, 4 chroma V component models, and a total of 16 models can be trained offline.
  • a preset image data set such as DIV2K, which has 1000 high-definition images (2K resolution), of which 800 are used as training, 100 are used as verification, and 100 are used as testing
  • the image is converted from RGB A single-frame video sequence in YUV4:2:0 format as label data.
  • a preset video dataset such as BVI-DVC
  • HPM-ModAI to encode under the Random Access configuration
  • turn off traditional filters such as DBF, SAO, and ALF
  • turn on the CNNLF model of I frames and collect the encoded and reconstructed
  • four non-I frame luminance component models were trained respectively.
  • HPM-ModAI sets a frame-level switch and a CTU-level switch for the luminance component to control whether to call the CNNLF model, and sets a frame-level switch for the chroma component to control whether to call the CNNLF model.
  • a switch can usually be represented by a flag.
  • a CTU-level switch is set to control whether to invoke the CNNLF model. Specifically, the CTU-level switch is determined by formula (2).
  • the encoder can determine whether the current frame or the current block uses the CNNLF model for filtering processing through the rate-distortion cost method, so as to determine the reconstruction of the current block Image blocks.
  • the input reconstructed image block may be filtered through a deblocking filter and a sample value adaptive compensation filter after getting.
  • the method may further include: after determining the reconstructed image block of the current block, continue to filter the reconstructed image block by using an adaptive correction filter.
  • the input reconstructed image block is obtained after filtering through the deblocking filter (DBF) and the sample adaptive compensation filter (SAO), and then obtained through the model adaptive selection module and the CNNLF model
  • the reconstructed image block can also be input into an adaptive correction filter (ALF) to continue filtering processing.
  • DPF deblocking filter
  • SAO sample adaptive compensation filter
  • ALF adaptive correction filter
  • the number of convolutional layers and the number of fully connected layers, Non-linear activation functions, etc. can all be tuned.
  • the loop filter network model targeted by the model adaptive selection module can also perform model adaptive selection for other efficient neural network filter models, which is not limited here.
  • the embodiment of the present application proposes a model adaptive selection module based on deep learning, which is used for adaptive selection of the CNNLF model to improve coding performance.
  • the model adaptive selection module can be regarded as a preset selection network model composed of a multi-layer convolutional neural network and a multi-layer fully connected neural network. Its input is the input reconstruction image block of the CNNLF model, and the output is the probability distribution of each CNNLF model happensing.
  • the position of the model adaptive selection module in the encoder/decoder is shown in Figure 5.
  • the use of the model adaptive selection module does not depend on the switch of DBF, SAO, ALF, and CNNLF, but it is placed before CNNLF in position.
  • the preset filter sequence is DBF filter ⁇ SAO filter ⁇ model adaptive selection module ⁇ CNNLF filter ⁇ ALF filter.
  • model_adaptive_selection_enable_flag First, judge whether the model adaptive selection module is allowed to be used for model selection under the current block according to model_adaptive_selection_enable_flag. If model_adaptive_selection_enable_flag is "1", then try to perform model adaptive selection module processing on the current block, and jump to (b); if model_adaptive_selection_enable_flag is "0”, then jump to (e);
  • the reconstructed luminance image blocks of the CNNLF model are used as the input of the model adaptive selection module, and the output is the probability distribution of each luminance CNNLF model. Select the model with the largest probability value as the CNNLF model of the current brightness image block, and perform CNNLF filter processing on the current brightness image block to obtain the final reconstructed image block;
  • the input of the CNNLF model is used to reconstruct the chrominance image block as the input of the model adaptive selection module, and the output is the probability distribution of each chroma CNNLF model. Select the model with the largest probability value as the CNNLF model of the current chrominance image block, and perform CNNLF filtering on the current chrominance image block to obtain the final reconstructed image block;
  • This embodiment provides an encoding method, which is applied to an encoder.
  • determining the value of the first syntax element identification information when the first syntax element identification information indicates that the current block allows model selection using a preset selection network model, determine the default selection network model of the current block, and select the network according to the preset Model, to determine the loop filter network model used by the current block; use the loop filter network model to filter the current block to obtain the reconstructed image block of the current block.
  • the preset selection network model is used to select at least one candidate loop filter network model, and then the current block is filtered according to the selected loop filter network model, Not only can the encoding performance be improved, and thus the encoding and decoding efficiency can be improved; but also the reconstructed image block that is finally output can be closer to the original image block, and the video image quality can be improved.
  • the embodiment of the present application provides a code stream, which is generated by performing bit coding according to the information to be encoded; wherein, the information to be encoded includes at least one of the following: the first syntax The value of the element identification information, the value of the second syntax element identification information, the value of the first luma syntax element identification information, the value of the second luma syntax element identification information, and the value of the chroma syntax element identification information.
  • the video sequence includes a current frame, and the current frame includes a current block.
  • the first syntax element identification information is used to indicate whether the current block is allowed to use the preset selection network model for model selection
  • the second syntax element identification information indicates whether the video sequence uses the loop filter network model for filtering processing
  • the first brightness syntax element The identification information is used to indicate whether the luminance component of the current frame is filtered using the luminance loop filtering network model
  • the second luminance syntax element identification information is used to indicate whether the luminance component of the current block is filtered using the luminance loop filtering network model.
  • the chroma syntax element identification information is used to indicate whether the chroma component of the current frame is filtered using the chroma in-loop filter network model.
  • FIG. 12 shows a schematic structural diagram of an encoder 120 provided in the embodiment of the present application.
  • the encoder 120 may include: a first determining unit 1201, a first selecting unit 1202, and a first filtering unit 1203; wherein,
  • the first determining unit 1201 is configured to determine the value of the first syntax element identification information
  • the first selecting unit 1202 is configured to determine the preset selected network model of the current block when the first syntax element identification information indicates that the current block allows model selection using the preset selected network model, and determine the current block according to the preset selected network model.
  • the first filtering unit 1203 is configured to use a loop filtering network model to perform filtering processing on the current block to obtain a reconstructed image block of the current block.
  • the first selection unit 1202 is further configured to determine the corresponding output values of at least one candidate loop filtering network model according to the preset selection network model; and determine the respective output values corresponding to at least one candidate loop filtering network model; The output value, which determines the loop filter network model used by the current block.
  • the first determination unit 1201 is further configured to determine the input reconstructed image block of the loop filter network model; and input the input reconstructed image block into the preset selection network model to obtain at least one candidate loop filter network model the corresponding output value.
  • the first determination unit 1201 is further configured to determine the target value from the corresponding output values of at least one candidate loop filter network model, and use the candidate loop filter network model corresponding to the target value as the current block. Loop filter network model.
  • the first determining unit 1201 is further configured to select a maximum value from output values corresponding to at least one candidate loop filter network model, and use the maximum value as a target value.
  • the encoder 120 may further include an encoding unit 1204;
  • the first determining unit 1201 is further configured to determine the index number of the loop filtering network model corresponding to the loop filtering network model;
  • the encoding unit 1204 is configured to encode the index number of the loop filter network model, and write the encoded bits into the code stream.
  • the first determining unit 1201 is further configured to determine that the value of the first syntax element identification information is the first value if the current block allows model selection using a preset selection network model; or, if the current block If it is not allowed to use the preset selection network model for model selection, it is determined that the value of the identification information of the first syntax element is the second value.
  • the encoding unit 1204 is further configured to encode the value of the identification information of the first syntax element, and write the encoded bits into the code stream.
  • the video sequence includes a current frame, and the current frame includes a current block; correspondingly, the first determining unit 1201 is further configured to determine the second syntax element if the video sequence is determined to use a loop filtering network model for filtering processing The value of the identification information is the first value; or, if the video sequence is determined not to use the in-loop filtering network model for filtering processing, the value of the identification information of the second syntax element is determined to be the second value.
  • the encoding unit 1204 is further configured to encode the value of the second syntax element identification information, and write the encoded bits into the code stream.
  • the first determining unit 1201 is further configured to determine the first rate-distortion cost value of the first rate-distortion cost of filtering the luma component of the current frame using the luma loop filtering network model; and determine that the luma component of the current frame does not use the luma loop The second rate-distortion cost value for filtering processing by the channel filtering network model; and if the first rate-distortion cost value is less than the second rate-distortion cost value, then determine that the value of the first luma syntax element identification information is the first value; and/ Or, if the first rate-distortion cost is greater than or equal to the second rate-distortion cost, determine that the value of the first luma syntax element identification information is the second value.
  • the encoding unit 1204 is further configured to encode the value of the first luma syntax element identification information, and write the encoded bits into the code stream.
  • the first determination unit 1201 is further configured to determine the third rate-distortion cost of the luma component of the current block to be filtered using the luma loop filter network model when the first rate-distortion cost is smaller than the second rate-distortion cost rate-distortion cost value; and determine the fourth rate-distortion cost value of the luma component of the current block that is not filtered using the luma loop filter network model; and if the third rate-distortion cost value is less than the fourth rate-distortion cost value, determine the fourth rate-distortion cost value
  • the value of the identification information of the second luma syntax element is the first value; and/or, if the third rate-distortion cost value is greater than or equal to the fourth rate-distortion cost value, determine that the value of the second luma syntax element identification information is the second value value.
  • the encoding unit 1204 is further configured to encode the value of the second luma syntax element identification information, and write the encoded bits into the code stream.
  • the first determination unit 1201 is further configured to use the luma loop filter network model to perform filtering on the current block if the third rate-distortion cost is smaller than the fourth rate-distortion cost.
  • the first determining unit 1201 is further configured to determine the fifth rate at which the chroma component of the current frame is filtered using the chroma loop filter network model when the type of the color component of the current frame is a chroma component Distortion cost value; and determine the sixth rate-distortion cost value of the chroma component of the current frame that is not filtered using the chroma loop filter network model; and if the fifth rate-distortion cost value is less than the sixth rate-distortion cost value, determine The value of the chroma syntax element identification information is the first value; and/or, if the fifth rate-distortion cost value is greater than or equal to the sixth rate-distortion cost value, determine that the value of the chroma syntax element identification information is the second value .
  • the encoding unit 1204 is further configured to encode the value of the chroma syntax element identification information, and write the encoded bits into the code stream.
  • the first determining unit 1201 is further configured to use the luma loop filter network model to perform filtering on the current block if the fifth rate-distortion cost is smaller than the sixth rate-distortion cost.
  • the first determining unit 1201 is further configured to determine the brightness selection network model of the current block when the color component type of the current block is a brightness component; and when the color component type of the current block is a chroma component, Determine the chroma selection network model for the current block;
  • the first determining unit 1201 is further configured to determine the corresponding output values of at least one candidate brightness loop filter network model according to the brightness selection network model when the color component type of the current block is a brightness component; and when the current block's When the color component type is a chroma component, the corresponding output values of at least one candidate chroma loop filter network model are determined according to the chroma selection network model.
  • the encoder 120 may further include a first training unit 1205;
  • the first determining unit 1201 is further configured to determine a first training set, wherein the first training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter;
  • the first training unit 1205 is configured to use the luminance components of the training samples in the first training set to train the first neural network structure to obtain at least one candidate luminance loop filter network model.
  • the first neural network structure includes a first convolution module, a first residual module, a second convolution module and a first connection module, the first convolution module, the first residual module , the second convolution module is connected to the first connection module in sequence, and the first connection module is also connected to the input of the first convolution module.
  • the first convolution module consists of one convolution layer and one activation layer
  • the second convolution module consists of two convolution layers and one activation layer
  • the connection module Composed of skip connection layers the first residual module includes several residual blocks
  • the residual block is composed of two convolutional layers and one activation layer.
  • the first determining unit 1201 is further configured to determine a first training set, wherein the first training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter;
  • the first training unit 1205 is further configured to use the luminance component and chrominance component of the training samples in the first training set to train the second neural network structure to obtain at least one candidate chrominance loop filter network model.
  • the second neural network structure includes an upsampling module, a third convolution module, a fourth convolution module, a fusion module, a second residual module, a fifth convolution module and a second connection module
  • the The upsampling module is connected to the third convolution module
  • the third convolution module and the fourth convolution module are connected to the fusion module
  • the The fifth convolution module is connected to the second connection module in sequence
  • the second connection module is also connected to the input of the up-sampling module.
  • the third convolution module consists of a convolution layer and an activation layer
  • the fourth convolution module consists of a convolution layer and an activation layer
  • the fifth convolution module consists of two layers Convolutional layer, one layer of activation layer and one layer of pooling layer
  • the connection module is composed of jump connection layer
  • the second residual module includes several residual blocks
  • the residual block consists of two convolutional layers and one layer Activation layer composition.
  • the first selection unit 1202 is further configured to determine at least one candidate brightness selection network model when the color component type of the current block is a brightness component; and determine the quantization parameter of the current block, from at least one candidate selecting a candidate brightness selection network model corresponding to the quantization parameter in the brightness selection network model;
  • the first determining unit 1201 is further configured to determine the selected candidate brightness selection network model as the brightness selection network model of the current block.
  • the first selection unit 1202 is further configured to determine at least one candidate chroma selection network model when the color component type of the current block is a chroma component; and determine the quantization parameter of the current block, from at least A candidate chroma selection network model corresponding to a quantization parameter selected in a candidate chroma selection network model;
  • the first determining unit 1201 is further configured to determine the selected candidate chroma selection network model as the chroma selection network model of the current block.
  • the first determining unit 1201 is further configured to determine a second training set, wherein the second training set includes at least one training sample, and the training sample is obtained according to at least one quantization parameter;
  • the first training unit 1205 is also configured to use the luminance components of the training samples in the second training set to train the third neural network structure to obtain at least one candidate luminance selection network model; and use the chrominance components of the training samples in the second training set to pair
  • the third neural network structure is trained to obtain at least one candidate chroma selection network model; wherein, at least one candidate luminance selection network model has a corresponding relationship with the luminance component and the quantization parameter, and at least one candidate chroma selection network model and chroma There is a correspondence between components and quantization parameters.
  • the third neural network structure includes a sixth convolution module and a fully connected module, and the sixth convolution module and the fully connected module are connected in sequence; wherein, the sixth convolution module Including several convolutional sub-modules, the convolutional sub-module is composed of a layer of convolutional layer and a layer of pooling layer; It consists of a fully connected layer and an activation layer.
  • the first determining unit 1201 is further configured to determine the input reconstructed luminance image block of the luminance loop filter network model when the color component type of the current block is the luminance component; and input the reconstructed luminance image block into the luminance Selecting a network model to obtain output values corresponding to at least one candidate luminance loop filtering network model;
  • the filter network model is used as the luma loop filter network model used by the current block; or, when the color component type of the current block is a chroma component, determine the input of the chroma loop filter network model to reconstruct the chroma image block; and reconstruct the input
  • the chroma image block is input into the chroma selection network model to obtain the respective output values corresponding to at least one candidate chroma loop filter network model; and determining the target value from the respective output values corresponding to at least one candidate chroma loop filter network model,
  • the candidate chroma loop filter network model corresponding to the target value is used as the chroma loop filter network model used by
  • the loop filtering network model is a CNNLF model.
  • the input reconstructed image block is obtained after filtering through a deblocking filter and a sample adaptive compensation filter.
  • the first filtering unit 1203 is further configured to perform filtering processing on the reconstructed image block by using an adaptive correction filter after the reconstructed image block is determined.
  • a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules. Wherein, if the integrated units are implemented in the form of software function modules and are not sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially The above or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of software products, the computer software products are stored in a storage medium, including several instructions to make a computer device (It may be a personal computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
  • the embodiment of the present application provides a computer storage medium, which is applied to the encoder 120, and the computer storage medium stores a computer program, and when the computer program is executed by the first processor, it implements any one of the preceding embodiments. Methods.
  • FIG. 13 shows a schematic diagram of a specific hardware structure of the encoder 120 provided by the embodiment of the present application.
  • it may include: a first communication interface 1301 , a first memory 1302 and a first processor 1303 ; each component is coupled together through a first bus system 1304 .
  • the first bus system 1304 includes not only a data bus, but also a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as first bus system 1304 in FIG. 13 . in,
  • the first communication interface 1301 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • the first memory 1302 is used to store computer programs that can run on the first processor 1303;
  • the first processor 1303 is configured to, when running the computer program, execute:
  • the first syntax element identification information indicates that the current block allows model selection using a preset selection network model, determine the default selection network model of the current block, and determine the loop filter network model used by the current block according to the preset selection network model ;
  • a loop filtering network model is used to filter the current block to obtain a reconstructed image block of the current block.
  • the first memory 1302 in the embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDRSDRAM
  • enhanced SDRAM ESDRAM
  • Synchlink DRAM SLDRAM
  • Direct Memory Bus Random Access Memory Direct Rambus RAM, DRRAM
  • the first memory 1302 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
  • the first processor 1303 may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above method may be implemented by an integrated logic circuit of hardware in the first processor 1303 or an instruction in the form of software.
  • the above-mentioned first processor 1303 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the first memory 1302, and the first processor 1303 reads the information in the first memory 1302, and completes the steps of the above method in combination with its hardware.
  • the embodiments described in this application may be implemented by hardware, software, firmware, middleware, microcode or a combination thereof.
  • the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices for performing the functions described in this application electronic unit or its combination.
  • the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein.
  • Software codes can be stored in memory and executed by a processor. Memory can be implemented within the processor or external to the processor.
  • the first processor 1303 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • This embodiment provides an encoder, which may include a first determining unit, a first selecting unit, and a first filtering unit.
  • a first determining unit may be improved, and then the coding and decoding efficiency can be improved; but also the final output reconstructed image block can be closer to the original image block, which can improve the video image quality.
  • FIG. 14 shows a schematic diagram of the composition and structure of a decoder 140 provided by the embodiment of the present application.
  • the decoder 140 may include: an analysis unit 1401, a second selection unit 1402, and a second filtering unit 1403; wherein,
  • the parsing unit 1401 is configured to parse the code stream and determine the value of the first syntax element identification information
  • the second selection unit 1402 is configured to determine the preset selected network model of the current block when the first syntax element identification information indicates that the current block uses the preset selected network model for model selection, and determine the current block according to the preset selected network model The loop filter network model used;
  • the second filtering unit 1403 is configured to use a loop filtering network model to perform filtering processing on the current block to obtain a reconstructed image block of the current block.
  • the second selection unit 1402 is further configured to determine the corresponding output values of at least one candidate loop filter network model according to the preset selection network model; The corresponding output values of the models determine the loop filter network model used by the current block.
  • the decoder 140 may further include a second determining unit 1404 configured to determine an input reconstructed image block of the loop filtering network model;
  • the second selection unit 1402 is further configured to input the input reconstructed image block into a preset selection network model to obtain respective output values corresponding to at least one candidate loop filtering network model.
  • the second determination unit 1404 is further configured to determine the target value from the corresponding output values of at least one candidate loop filter network model, and use the candidate loop filter network model corresponding to the target value as the current block Loop filter network model.
  • the second determining unit 1404 is further configured to select a maximum value from output values corresponding to each of the at least one candidate loop filter network model, and use the maximum value as the target value.
  • the second determination unit 1404 is further configured to determine that the first syntax element identification information indicates that the current block is allowed to use a preset selection network model to perform modeling if the value of the first syntax element identification information is the first value. Select; or, if the value of the first syntax element identification information is the second value, it is determined that the first syntax element identification information indicates that the current block does not allow model selection using a preset selection network model.
  • the parsing unit 1401 is further configured to parse the code stream to determine the value of the second syntax element identification information; and when the second syntax element identification information indicates that the video sequence uses a loop filter network model for filtering processing, Parsing the code stream to determine the value of the third syntax element identification information; wherein, the third syntax element identification information is used to indicate whether the current frame in the video sequence uses a loop filter network model for filtering processing, and the current frame includes the current block.
  • the second determining unit 1404 is further configured to, if the value of the second syntax element identification information is the first value, determine that the second syntax element identification information indicates that the video sequence is filtered using a loop filtering network model ; or, if the value of the second syntax element identification information is the second value, it is determined that the second syntax element identification information indicates that the video sequence does not use the loop filtering network model for filtering processing.
  • the parsing unit 1401 is further configured to parse the code stream to obtain the first luma syntax element identification information corresponding to the luma component of the current frame, and the first luma syntax element identification information is used to indicate whether the luma component of the current frame uses
  • the luminance loop filtering network model performs filtering processing; or, parse the code stream to obtain the chroma syntax element identification information corresponding to the chroma component of the current frame, and the chroma syntax element identification information is used to indicate whether the chroma component of the current frame uses chroma
  • the degree loop filtering network model is used for filtering processing.
  • the parsing unit 1401 is further configured to, in the case of the luminance component of the current frame, when the first luminance syntax element identification information indicates that the luminance component of the current frame is filtered using a luminance loop filtering network model , parsing the code stream to determine the value of the second luma syntax element identification information; and when the second luma syntax element identification information indicates that the luma component of the current block is filtered using the luma loop filter network model, execute parsing the code stream to determine Steps for taking values of the first syntax element identification information.
  • the second determination unit 1404 is further configured to determine that the first luma syntax element identification information indicates that the luma component of the current frame uses a luma loop if the value of the first luma syntax element identification information is the first value.
  • the filtering network model performs filtering processing; or, if the value of the first luminance syntax element identification information is the second value, it is determined that the first luminance syntax element identification information indicates that the luminance component of the current frame does not use the luminance loop filtering network model for filtering deal with.
  • the second determining unit 1404 is further configured to determine that the second luma syntax element identification information indicates that the luma component of the current block uses a luma loop if the value of the second luma syntax element identification information is the first value.
  • the filtering network model performs filtering processing; or, if the value of the second luminance syntax element identification information is the second value, it is determined that the second luminance syntax element identification information indicates that the luminance component of the current block does not use the luminance loop filtering network model for filtering deal with.
  • the parsing unit 1401 is further configured to, in the case of the chroma component of the current frame, when the chroma syntax element identification information indicates that the chroma component of the current frame is filtered using the chroma loop filter network model During processing, the step of parsing the code stream and determining the value of the first syntax element identification information is performed.
  • the second determining unit 1404 is further configured to determine that the chroma syntax element identification information indicates that the chroma component of the current frame uses a chroma loop if the value of the chroma syntax element identification information is the first value
  • the filter network model performs filtering processing; or, if the value of the chroma syntax element identification information is the second value, it is determined that the chroma syntax element identification information indicates that the chroma components of the current frame do not use the chroma loop filter network model for filtering deal with.
  • the second determination unit 1404 is further configured to determine the brightness selection network model of the current block when the color component type of the current block is a brightness component; and when the color component type of the current block is a chroma component, Determine the chroma selection network model for the current block;
  • the second determining unit 1404 is further configured to determine the corresponding output values of at least one candidate brightness loop filtering network model according to the brightness selection network model when the color component type of the current block is a brightness component; and when the current block's When the color component type is a chroma component, the corresponding output values of at least one candidate chroma loop filter network model are determined according to the chroma selection network model.
  • At least one candidate luminance loop filter network model is determined by performing model training on the first neural network structure according to at least one training sample, and at least one candidate luminance loop filter network model is related to the color component type and quantization parameter There is a corresponding relationship between them.
  • the first neural network structure includes a first convolution module, a first residual module, a second convolution module and a first connection module, the first convolution module, the first residual module, the second convolution module The convolution module and the first connection module are connected in sequence, and the first connection module is also connected to the input of the first convolution module.
  • the first convolutional module consists of one convolutional layer and one activation layer
  • the second convolutional module consists of two convolutional layers and one activation layer
  • the connection module consists of a skip connection layer
  • the first residual module includes several residual blocks
  • the residual block consists of two convolutional layers and one activation layer.
  • At least one candidate chroma loop filter network model is determined by performing model training on the second neural network structure according to at least one training sample, and at least one candidate chroma loop filter network model is related to the color component type and There is a correspondence between the quantization parameters.
  • the second neural network structure includes an upsampling module, a third convolution module, a fourth convolution module, a fusion module, a second residual module, a fifth convolution module and a second connection module, and the upsampling
  • the module is connected to the third convolution module, the third convolution module and the fourth convolution module are connected to the fusion module, the fusion module, the second residual module, the fifth convolution module and the second connection module are connected in sequence, and the first The second connection module is also connected to the input of the up-sampling module.
  • the third convolution module consists of a convolution layer and an activation layer
  • the fourth convolution module consists of a convolution layer and an activation layer
  • the fifth convolution module consists of two layers Convolutional layer, one layer of activation layer and one layer of pooling layer
  • the connection module is composed of jump connection layer
  • the second residual module includes several residual blocks
  • the residual block consists of two convolutional layers and one layer Activation layer composition.
  • the second selection unit 1402 is further configured to determine at least one candidate brightness selection network model when the color component type of the current block is a brightness component; and determine the quantization parameter of the current block, from at least one candidate selecting a candidate brightness selection network model corresponding to the quantization parameter in the brightness selection network model;
  • the second determining unit 1404 is further configured to determine the selected candidate brightness selection network model as the brightness selection network model of the current block.
  • the second selection unit 1402 is further configured to determine at least one candidate chroma selection network model when the color component type of the current block is a chroma component; and determine the quantization parameter of the current block, from at least A candidate chroma selection network model corresponding to a quantization parameter selected in a candidate chroma selection network model;
  • the second determining unit 1404 is further configured to determine the selected candidate chroma selection network model as the chroma selection network model of the current block.
  • At least one candidate luminance selection network model and at least one candidate chroma selection network model are respectively determined by performing model training on the third neural network structure according to at least one training sample, and at least one candidate luminance selection network model and At least one candidate chrominance selection network model has a corresponding relationship with color component types and quantization parameters.
  • the third neural network structure includes a sixth convolution module and a fully connected module, and the sixth convolution module and the fully connected module are connected in sequence; wherein, the sixth convolution module includes several convolution sub-modules, The convolutional sub-module consists of a convolutional layer and a pooling layer; the fully-connected module includes several fully-connected sub-modules, and the fully-connected sub-module consists of a fully-connected layer and an activation layer.
  • the second determination unit 1404 is further configured to determine the input reconstructed luminance image block of the luminance loop filter network model when the color component type of the current block is the luminance component; and input the reconstructed luminance image block into the luminance Selecting a network model to obtain output values corresponding to at least one candidate luminance loop filtering network model;
  • the filter network model is used as the luma loop filter network model used by the current block; or, when the color component type of the current block is a chroma component, determine the input of the chroma loop filter network model to reconstruct the chroma image block; and reconstruct the input
  • the chroma image block is input into the chroma selection network model to obtain the respective output values corresponding to at least one candidate chroma loop filter network model; and determining the target value from the respective output values corresponding to at least one candidate chroma loop filter network model,
  • the candidate chroma loop filter network model corresponding to the target value is used as the chroma loop filter network model used by the
  • the parsing unit 1401 is configured to parse the code stream and determine the loop filter network model index number when the first syntax element identification information indicates that the current block allows model selection using a preset selection network model;
  • the second determination unit 1404 is further configured to determine the loop filter network model used by the current block from at least one candidate loop filter network model according to the index number of the loop filter network model;
  • the second filtering unit 1403 is further configured to use a loop filtering network model to perform filtering processing on the current block to obtain a reconstructed image block of the current block.
  • the loop filtering network model is a CNNLF model.
  • the input reconstructed image block is obtained after filtering through a deblocking filter and a sample adaptive compensation filter.
  • the second filtering unit 1403 is further configured to, after determining the reconstructed image block, use an adaptive correction filter to perform filtering processing on the reconstructed image block.
  • a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules. Wherein, if the integrated unit is realized in the form of a software function module but not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • this embodiment provides a computer storage medium, which is applied to the decoder 140, and the computer storage medium stores a computer program, and when the computer program is executed by the second processor, any one of the preceding embodiments is implemented. the method described.
  • FIG. 15 shows a schematic diagram of a specific hardware structure of the decoder 140 provided by the embodiment of the present application.
  • it may include: a second communication interface 1501 , a second memory 1502 and a second processor 1503 ; all components are coupled together through a second bus system 1504 .
  • the second bus system 1504 is used to realize connection and communication between these components.
  • the second bus system 1504 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as the second bus system 1504 in FIG. 15 . in,
  • the second communication interface 1501 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • the second memory 1502 is used to store computer programs that can run on the second processor 1503;
  • the second processor 1503 is configured to, when running the computer program, execute:
  • the first syntax element identification information indicates that the current block allows model selection using a preset selection network model, determine the default selection network model of the current block, and determine the loop filter network model used by the current block according to the preset selection network model ;
  • a loop filtering network model is used to filter the current block to obtain a reconstructed image block of the current block.
  • the second processor 1503 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • the hardware function of the second memory 1502 is similar to that of the first memory 1302
  • the hardware function of the second processor 1503 is similar to that of the first processor 1303 ; details will not be described here.
  • This embodiment provides a decoder, which may include an analysis unit, a second selection unit, and a second filter unit.
  • a decoder which may include an analysis unit, a second selection unit, and a second filter unit.
  • the value of the first syntax element identification information is determined; when the first syntax element identification information indicates that the current block allows the use of a preset selection network model for model selection, the preset selection network model of the current block is determined.
  • a network model is selected, and the network model is selected according to a preset to determine a loop filter network model used by the current block; the current block is filtered using the loop filter network model to obtain a reconstructed image block of the current block.
  • the preset selection network model is used to select at least one candidate loop filter network model, and then the current block is filtered according to the selected loop filter network model, Not only can the encoding performance be improved, and thus the encoding and decoding efficiency can be improved; but also the reconstructed image block that is finally output can be closer to the original image block, and the video image quality can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种编解码方法、码流、编码器、解码器以及存储介质,该方法包括:解析码流,确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。这样,通过引入基于深度学习的模型自适应选择技术,使得最终输出的重建图像块更加接近于原始图像块,可以提升编码性能,进而提高编解码效率。

Description

编解码方法、码流、编码器、解码器以及存储介质 技术领域
本申请实施例涉及图像处理技术领域,尤其涉及一种编解码方法、码流、编码器、解码器以及存储介质。
背景技术
在视频编解码系统中,环路滤波器被使用来提升重建图像的主客观质量。其中,传统的环路滤波器主要包括去块滤波器,样值自适应补偿滤波器和自适应修正滤波器。在第三代音视频编码标准(3rd Audio Video coding Standard,AVS3)的高性能-模块化智能编码测试模型(High Performance-Modular Artificial Intelligence Model,HPM-ModAI)中,还采用了基于残差神经网络的环路滤波器(以下简称为CNNLF)作为智能环路滤波模块的基线方案,并设置于样值自适应补偿滤波器和自适应修正滤波器之间。
在相关技术中,虽然目前存在一些其他模型选择方案,但是这些方案大都是通过计算各个模型的率失真代价值来选择出性能较好的模型进行滤波处理,该选择过程的复杂度较高。
发明内容
本申请实施例提供一种编解码方法、码流、编码器、解码器以及存储介质,不仅可以降低复杂度,还可以提升编码性能,进而能够提高编解码效率。
本申请实施例的技术方案可以如下实现:
第一方面,本申请实施例提供了一种解码方法,应用于解码器,该方法包括:
解析码流,确定第一语法元素标识信息的取值;
当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;
利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
第二方面,本申请实施例提供了一种编码方法,应用于编码器,该方法包括:
确定第一语法元素标识信息的取值;
当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;
利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
第三方面,本申请实施例提供了一种码流,码流是根据待编码信息进行比特编码生成的;其中,待编码信息包括下述至少之一:第一语法元素标识信息的取值、第二语法元素标识信息的取值、第一亮度语法元素标识信息的取值、第二亮度语法元素标识信息的取值和色度语法元素标识信息的取值;
其中,第一语法元素标识信息用于指示当前块是否允许使用预设选择网络模型进行模型选择,第二语法元素标识信息指示视频序列是否使用环路滤波网络模型进行滤波处理,第一亮度语法元素标识信息用于指示当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理,第二亮度语法元素标识信息用于指示当前块的亮度分量是否使用亮度环路滤波网络模型进行滤波处理,色度语法元素标识信息用于指示当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理;视频序列包括当前帧,当前帧包括当前块。
第四方面,本申请实施例提供了一种编码器,该编码器包括第一确定单元、第一选择单元和第一滤波单元;其中,
第一确定单元,配置为确定第一语法元素标识信息的取值;
第一选择单元,配置为当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;
第一滤波单元,配置为利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
第五方面,本申请实施例提供了一种编码器,该编码器包括第一存储器和第一处理器;其中,
第一存储器,用于存储能够在第一处理器上运行的计算机程序;
第一处理器,用于在运行计算机程序时,执行如第二方面的方法。
第六方面,本申请实施例提供了一种解码器,该解码器包括解析单元、第二选择单元和第二滤波单元;其中,
解析单元,配置为解析码流,确定第一语法元素标识信息的取值;
第二选择单元,配置为当第一语法元素标识信息指示当前块使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;
第二滤波单元,配置为利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
第七方面,本申请实施例提供了一种解码器,该解码器包括第二存储器和第二处理器;其中,
第二存储器,用于存储能够在第二处理器上运行的计算机程序;
第二处理器,用于在运行计算机程序时,执行如第一方面的方法。
第八方面,本申请实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,计算机程序被执行时实现如第一方面的方法、或者如第二方面的方法。
本申请实施例提供了一种编解码方法、码流、编码器、解码器以及存储介质,在编码器侧,确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。在解码器侧,解析码流,确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。这样,通过引入基于深度学习的模型自适应选择技术,利用预设选择网络模型对至少一个候选环路滤波网络模型进行模型选择,再根据所选中的环路滤波网络模型对当前块进行滤波处理,不仅可以提升编码性能,进而能够提高编解码效率;而且还可以使得最终输出的重建图像块更加接近于原始图像块,能够提升视频图像质量。
附图说明
图1为相关技术提供的一种编码框架的应用示意图;
图2为相关技术提供的另一种编码框架的应用示意图;
图3A为本申请实施例提供的一种视频编码系统的详细框架示意图;
图3B为本申请实施例提供的一种视频解码系统的详细框架示意图;
图4为本申请实施例提供的一种解码方法的流程示意图;
图5为本申请实施例提供的一种编码框架的应用示意图;
图6A为本申请实施例提供的一种亮度环路滤波网络模型的网络结构组成示意图;
图6B为本申请实施例提供的一种色度环路滤波网络模型的网络结构组成示意图;
图6C为本申请实施例提供的另一种亮度环路滤波网络模型的网络结构组成示意图;
图6D为本申请实施例提供的另一种色度环路滤波网络模型的网络结构组成示意图;
图7为本申请实施例提供的一种残差块的网络结构组成示意图;
图8A为本申请实施例提供的一种预设选择网络模型的组成结构示意图;
图8B为本申请实施例提供的另一种预设选择网络模型的组成结构示意图;
图9为本申请实施例提供的一种基于预设选择网络模型的整体框架示意图;
图10为本申请实施例提供的另一种解码方法的流程示意图;
图11为本申请实施例提供的一种编码方法的流程示意图;
图12为本申请实施例提供的一种编码器的组成结构示意图;
图13为本申请实施例提供的一种编码器的具体硬件结构示意图;
图14为本申请实施例提供的一种解码器的组成结构示意图;
图15为本申请实施例提供的一种解码器的具体硬件结构示意图。
具体实施方式
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。还需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅是用于区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
在视频图像中,一般采用第一图像分量、第二图像分量和第三图像分量来表征编码块(Coding Block,CB);其中,这三个图像分量分别为一个亮度分量、一个蓝色色度分量和一个红色色度分量,具体地,亮度分量通常使用符号Y表示,蓝色色度分量通常使用符号Cb或者U表示,红色色度分量通常使用符号Cr或者V表示;这样,视频图像可以用YCbCr格式表示,也可以用YUV格式表示。
对本申请实施例进行进一步详细说明之前,先对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释:
联合视频专家组(Joint Video Experts Team,JVET)
新一代视频编码标准H.266/多功能视频编码(Versatile Video Coding,VVC)
VVC的参考软件测试平台(VVC Test Model,VTM)
音视频编码标准(Audio Video coding Standard,AVS)
AVS的高性能测试模型(High-Performance Model,HPM)
AVS的高性能-模块化智能编码测试模型(High Performance-Modular Artificial Intelligence Model,HPM-ModAI)
基于残差神经网络的环路滤波器(Convolutional Neural Network based in-Loop Filter,CNNLF)
去块滤波器(DeBlocking Filter,DBF)
样值自适应补偿(Sample adaptive Offset,SAO)
自适应修正滤波器(Adaptive loop filter,ALF)
量化参数(Quantization Parameter,QP)
编码单元(Coding Unit,CU)
编码树单元(Coding Tree Unit,CTU)
可以理解,数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够节省不少视频数据,但目前仍然需要追求更好的数字视频压缩技术,以减少数字视频传输的带宽和流量压力。
在数字视频编码过程中,编码器对不同颜色格式的原始视频序列读取不相等的像素,包含亮度分量和色度分量,即编码器读取一副黑白或者彩色图像。然后将该图像进行划分成块,将块数据交由编码器进行编码,如今编码器通常为混合框架编码模式,一般可以包含帧内预测与帧间预测、变换/量化、反量化/逆变换、环路滤波及熵编码等操作,处理流程具体可参考图1所示。这里,帧内预测只参考同一帧图像的信息,预测当前划分块内的像素信息,用于消除空间冗余;帧间预测可以包括运动估计和运动补偿,其可参考不同帧的图像信息,利用运动估计搜索最匹配当前划分块的运动矢量信息,用于消除时间冗余;变换将预测后的图像块转换到频率域,能量重新分布,结合量化可以将人眼不敏感的信息去除,用于消除视觉冗余;熵编码可以根据当前上下文模型以及二进制码流的概率信息消除字符冗余;环路滤波则主要对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考。
在一种可能的实施方式中,对于环路滤波部分而言,传统环路滤波模块主要包含去块滤波器(以下简称为DBF)、样值自适应补偿滤波器(以下简称为SAO)和自适应修正滤波器(以下简称为ALF)。在HPM-ModAI的应用中,还采用了基于残差神经网络的环路滤波器(以下简称为CNNLF)作为智能环路滤波模块的基线方案,并设置于SAO滤波和ALF滤波之间,具体详见图2所示。在编码测试时,按照智能编码通用测试条件,对于全帧内(All Intra)配置,打开ALF,关闭DBF和SAO;对于随机接入(Random Access)和低延迟(Low Delay)配置,打开I帧的DBF,打开ALF,关闭SAO。
具体来讲,在HPM-ModAI中,可以按照QP 27~31,32~37,38~44,45~50为范围划分为4个区间,分别训练了4种I帧亮度分量模型,4种非I帧亮度分量模型,4种色度U分量模型,4种色度V分量模型等总共16种不同的CNNLF模型。在编码时,可以根据不同的颜色分量和QP,从16种CNNLF 模型中选择对应的一种模型进行使用。对于Random Access和Low Delay等配置,在编码时每一帧的QP相比初始QP会产生一定的波动,导致选中的CNNLF模型并不一定就是使该帧滤波效果最好的模型。
另外,目前也有一些技术方案,分别基于AVS的参考软件测试平台HPM或VVC的参考软件测试平台VTM,探索了基于模型选择的神经网络环路滤波器方案。但是这些技术方案中的模型选择方式,大都为在编码器侧通过计算各个CNNLF模型的率失真代价值来选择出性能较好的模型,复杂度较高。也就是说,目前已有的技术方案缺少能够对不同的编码单元在多个CNNLF模型进行自适应的选择出性能较好的模型进行滤波处理。
本申请实施例提供了一种编码方法,在编码器侧,确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
本申请实施例还提供了一种解码方法,在解码器侧,解析码流,确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
这样,通过引入基于深度学习的模型自适应选择技术,即利用预设选择网络模型对至少一个候选环路滤波网络模型进行模型选择,再根据所选中的环路滤波网络模型对当前块进行滤波处理,不仅可以提升编码性能,进而能够提高编解码效率;而且还可以使得最终输出的重建图像块更加接近于原始图像块,能够提升视频图像质量。
下面将结合附图对本申请各实施例进行详细说明。
参见图3A,其示出了本申请实施例提供的一种视频编码系统的详细框架示意图。如图3A所示,该视频编码系统10包括变换与量化单元101、帧内估计单元102、帧内预测单元103、运动补偿单元104、运动估计单元105、反变换与反量化单元106、滤波器控制分析单元107、滤波单元108、编码单元109和解码图像缓存单元110等,其中,滤波单元108可以实现DBF滤波/SAO滤波/ALF滤波,编码单元109可以实现头信息编码及基于上下文的自适应二进制算术编码(Context-based Adaptive Binary Arithmetic Coding,CABAC)。针对输入的原始视频信号,通过编码树单元(Coding Tree Unit,CTU)的划分可以得到一个视频编码块,然后对经过帧内或帧间预测后得到的残差像素信息通过变换与量化单元101对该视频编码块进行变换,包括将残差信息从像素域变换到变换域,并对所得的变换系数进行量化,用以进一步减少比特率;帧内估计单元102和帧内预测单元103是用于对该视频编码块进行帧内预测;明确地说,帧内估计单元102和帧内预测单元103用于确定待用以编码该视频编码块的帧内预测模式;运动补偿单元104和运动估计单元105用于执行所接收的视频编码块相对于一或多个参考帧中的一或多个块的帧间预测编码以提供时间预测信息;由运动估计单元105执行的运动估计为产生运动向量的过程,所述运动向量可以估计该视频编码块的运动,然后由运动补偿单元104基于由运动估计单元105所确定的运动向量执行运动补偿;在确定帧内预测模式之后,帧内预测单元103还用于将所选择的帧内预测数据提供到编码单元109,而且运动估计单元105将所计算确定的运动向量数据也发送到编码单元109;此外,反变换与反量化单元106是用于该视频编码块的重构建,在像素域中重构建残差块,该重构建残差块通过滤波器控制分析单元107和滤波单元108去除方块效应伪影,然后将该重构残差块添加到解码图像缓存单元110的帧中的一个预测性块,用以产生经重构建的视频编码块;编码单元109是用于编码各种编码参数及量化后的变换系数,在基于CABAC的编码算法中,上下文内容可基于相邻编码块,可用于编码指示所确定的帧内预测模式的信息,输出该视频信号的码流;而解码图像缓存单元110是用于存放重构建的视频编码块,用于预测参考。随着视频图像编码的进行,会不断生成新的重构建的视频编码块,这些重构建的视频编码块都会被存放在解码图像缓存单元110中。
参见图3B,其示出了本申请实施例提供的一种视频解码系统的详细框架示意图。如图3B所示,该视频解码系统20包括解码单元201、反变换与反量化单元202、帧内预测单元203、运动补偿单元204、滤波单元205和解码图像缓存单元206等,其中,解码单元201可以实现头信息解码以及CABAC解码,滤波单元205可以实现DBF滤波/SAO滤波/ALF滤波。输入的视频信号经过图3A的编码处理之后,输出该视频信号的码流;该码流输入视频解码系统20中,首先经过解码单元201,用于得到解码后的变换系数;针对该变换系数通过反变换与反量化单元202进行处理,以便在像素域中产生残差块;帧内预测单元203可用于基于所确定的帧内预测模式和来自当前帧或图片的先前经解码块的数据而产生当前视频解码块的预测数据;运动补偿单元204是通过剖析运动向量和其他关联语法元素来确定用于视频解码块的预测信息,并使用该预测信息以产生正被解码的视频解码块的预测性块;通过对来自反变换与 反量化单元202的残差块与由帧内预测单元203或运动补偿单元204产生的对应预测性块进行求和,而形成解码的视频块;该解码的视频信号通过滤波单元205以便去除方块效应伪影,可以改善视频质量;然后将经解码的视频块存储于解码图像缓存单元206中,解码图像缓存单元206存储用于后续帧内预测或运动补偿的参考图像,同时也用于视频信号的输出,即得到了所恢复的原始视频信号。
需要说明的是,本申请实施例提供的方法,可以应用在如图3A所示的滤波单元108部分(用黑色加粗方框表示),也可以应用在如图3B所示的滤波单元205部分(用黑色加粗方框表示)。也就是说,本申请实施例中的方法,既可以应用于视频编码系统(简称为“编码器”),也可以应用于视频解码系统(简称为“解码器”),甚至还可以同时应用于视频编码系统和视频解码系统,但是这里不作任何限定。
还需要说明的是,当本申请实施例应用于编码器时,“当前块”具体是指视频图像中的当前待编码的块(也可以简称为“编码块”);当本申请实施例应用于解码器时,“当前块”具体是指视频图像中的当前待解码的块(也可以简称为“解码块”)。
在本申请的一实施例中,参见图4,其示出了本申请实施例提供的一种解码方法的流程示意图。如图4所示,该方法可以包括:
S401:解析码流,确定第一语法元素标识信息的取值。
需要说明的是,视频图像可以划分为多个图像块,每个当前待解码的图像块可以称为解码块。这里,每个解码块可以包括第一图像分量、第二图像分量和第三图像分量;而当前块即为视频图像中当前待进行第一图像分量、第二图像分量或者第三图像分量环路滤波处理的解码块。
在这里,针对第一图像分量、第二图像分量和第三图像分量,从颜色划分角度,本申请实施例可以将其划分为亮度分量和色度分量等两种颜色分量类型。在这种情况下,如果当前块进行亮度分量的预测、反变换与反量化、环路滤波等操作,那么当前块也可以称为亮度块;或者,如果当前块进行色度分量的预测、反变换与反量化、环路滤波等操作,那么当前块也可以称为色度块。
还需要说明的是,在解码器侧,本申请实施例具体提供了一种环路滤波方法,尤其是一种基于深度学习的模型自适应选择方法,该方法应用在如图3B所示的滤波单元205部分。在这里,滤波单元205可以包括去块滤波器(DBF)、样值自适应补偿滤波器(SAO)、基于残差神经网络的环路滤波器(CNNLF)和自适应修正滤波器(ALF),而本申请实施例所述的方法具体应用在SAO和CNNLF之间,以便能够对CNNLF的模型进行自适应选择,然后选择出合适的模型执行CNNLF滤波处理。
更具体地,本申请实施例提出了一种基于深度学习的模型自适应选择模块,用于对环路滤波网络模型(比如CNNLF模型)进行自适应选择,提升编码性能。如图5所示,环路滤波器除了包括DBF、SAO、CNNLF和ALF之外,还可以包括模型自适应选择模块(Model Adaptive Selection,MAS),且模型自适应选择模块位于SAO滤波和CNNLF滤波之间。另外,模型自适应选择模块的使用不依赖于DBF、SAO、CNNLF和ALF的开关,只是在位置上置于CNNLF之前。需要说明的是,模型自适应选择模块可以看作是由多层卷积神经网络和多层全连接神经网络组成的预设选择网络模型,利用预设选择网络模型可以选择出合适的模型执行CNNLF滤波处理。
在这里,为了方便解码器能够确定当前块是否允许使用预设选择网络模型进行模型选择,可以设置一个第一语法元素标识信息,然后根据解码获得的第一语法元素标识信息的取值来确定。在一些实施例中,该方法还可以包括:
若第一语法元素标识信息的取值为第一值,则确定第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择;或者,
若第一语法元素标识信息的取值为第二值,则确定第一语法元素标识信息指示当前块不允许使用所述预设选择网络模型进行模型选择。
需要说明的是,第一值和第二值不同,而且第一值和第二值可以是参数形式,也可以是数字形式。具体地,第一语法元素标识信息可以是写入在概述(profile)中的参数,也可以是一个标志(flag)的取值,本申请实施例对此不作任何限定。
以第一语法元素标识信息为一个flag为例,这时候对于第一值和第二值而言,第一值可以设置为1,第二值可以设置为0;或者,第一值还可以设置为true,第二值还可以设置为false;或者,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。示例性地,对于flag而言,一般情况下,第一值可以为1,第二值可以为0,但是并不作任何限定。
还需要说明的是,预设选择网络模型可以看作是一个神经网络,而第一语法元素标识信息可以看作是一个基于神经网络的模型自适应选择的允许标志,这里可以用model_adaptive_selection_enable_flag表示。具体来说,model_adaptive_selection_enable_flag可以用于指示当前块是否允许使用预设选择网络 模型进行模型选择。
可以理解地,由于预设选择网络模型是对多个候选环路滤波网络模型进行模型选择,因此在确定当前块是否允许使用预设选择网络模型进行模型选择之前,首先需要确定当前块是否使用环路滤波网络模型进行滤波处理。这样,如果当前块使用环路滤波网络模型进行滤波处理,那么这时候可以利用该预设选择网络模型进行模型选择;否则,如果当前块不使用环路滤波网络模型进行滤波处理,那么这时候就无需利用该预设选择网络模型进行模型选择。
在本申请实施例中,对于当前块是否使用环路滤波网络模型进行滤波处理,首先可以设置一个序列头标识信息,比如可以设置一个第二语法元素标识信息,用于指示当前的视频序列是否使用环路滤波网络模型进行滤波处理。
在一些实施例中,该方法还可以包括:解析码流,确定第二语法元素标识信息的取值;其中,第二语法元素标识信息用于指示视频序列是否使用环路滤波网络模型进行滤波处理。
进一步地,在一些实施例中,该方法还可以包括:
若第二语法元素标识信息的取值为第一值,则确定第二语法元素标识信息指示视频序列使用环路滤波网络模型进行滤波处理;或者,
若第二语法元素标识信息的取值为第二值,则确定第二语法元素标识信息指示视频序列不使用环路滤波网络模型进行滤波处理。
需要说明的是,第一值和第二值不同。
在本申请实施例中,以第二语法元素标识信息为一flag信息为例,这时候对于第一值和第二值而言,第一值可以设置为1,第二值可以设置为0;或者,第一值还可以设置为true,第二值还可以设置为false;或者,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。本申请实施例对此不作任何限定。
还需要说明的是,视频序列包括至少一个帧,这至少一个帧可以包括有当前帧。在这里,当视频序列确定使用环路滤波网络模型进行滤波处理时,那么本申请实施例还需要进一步判断视频序列内的当前帧是否使用环路滤波网络模型进行滤波处理,即还需要设置一个第三语法元素标识信息。
也就是说,在一些实施例中,该方法还可以包括:解析码流,确定第三语法元素标识信息的取值;其中,第三语法元素标识信息用于指示视频序列内的当前帧是否使用环路滤波网络模型进行滤波处理。
在一种具体的示例中,对于当前块是否使用环路滤波网络模型进行滤波处理,该方法还可以包括:
解析码流,确定第二语法元素标识信息的取值;
当第二语法元素标识信息指示视频序列使用环路滤波网络模型进行滤波处理时,解析码流,确定第三语法元素标识信息的取值。
需要说明的是,对于第三语法元素标识信息而言,根据亮度分量和色度分量的不同,第三语法元素标识信息代表的含义不同,而且环路滤波网络模型也不相同。在本申请实施例中,亮度分量对应的环路滤波网络模型可称为亮度环路滤波网络模型,色度分量对应的环路滤波网络模型可称为色度环路滤波网络模型。因此,在一些实施例中,所述解析码流,确定第三语法元素标识信息的取值,可以包括:
解析码流,获取当前帧的亮度分量对应的第一亮度语法元素标识信息,,该第一亮度语法元素标识信息用于指示当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理;或者,
解析码流,获取当前帧的色度分量对应的色度语法元素标识信息,该色度语法元素标识信息用于指示当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理。
在本申请实施例中,针对当前帧的亮度分量,这时候第三语法元素标识信息可以称为第一亮度语法元素标识信息,用于指示当前帧的亮度分量是否使用环路滤波网络模型进行滤波处理。针对当前帧的色度分量,这时候第三语法元素标识信息可以称为色度语法元素标识信息,用于指示当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理。
需要说明的是,当前帧进行块划分操作后,可以得到至少一个块,这至少一个块可以包括有当前块。在本申请实施例中,如果当前帧使用环路滤波网络模型进行滤波处理,并不代表当前帧内的每一个块都使用环路滤波网络模型进行滤波处理,还可能涉及到CTU级语法元素标识信息,进一步确定当前块是否使用环路滤波网络模型进行滤波处理。下面将以亮度分量和色度分量这两种颜色分量类型为例分别进行描述。
在一种可能的实施方式中,在当前帧的亮度分量的情况下,所述解析码流,确定第一语法元素标识信息的取值,可以包括:
当第一亮度语法元素标识信息指示所述当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理时,解析码流,确定第二亮度语法元素标识信息的取值;
当第二亮度语法元素标识信息指示当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理时, 执行解析码流,确定第一语法元素标识信息的取值的步骤。
需要说明的是,对于亮度分量,这里涉及到两种语法元素:帧级语法元素和CTU级语法元素。其中,帧级语法元素可以称为第一亮度语法元素标识信息,用luma_frame_flag表示;CTU级语法元素可以称为第二亮度语法元素标识信息,用luma_ctu_flag表示。
还需要说明的是,对于亮度分量,本申请实施例还可以设置亮度帧级开关和亮度CTU级开关,通过控制是否打开亮度分量的环路滤波网络模型,进而确定是否使用亮度环路滤波网络模型进行滤波处理。因此,在一些实施例中,该方法还可以包括:设置亮度帧级开关和亮度CTU级开关。其中,当前块位于当前帧内。其中,亮度帧级开关可以用于控制当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理,亮度CTU级开关可以用于控制当前块的亮度分量是否使用亮度环路滤波网络模型进行滤波处理。
进一步地,对于第一亮度语法元素标识信息而言,在一些实施例中,该方法还可以包括:
若第一亮度语法元素标识信息的取值为第一值,则确定第一亮度语法元素标识信息指示当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理;或者,
若第一亮度语法元素标识信息的取值为第二值,则确定第一亮度语法元素标识信息指示当前帧的亮度分量不使用亮度环路滤波网络模型进行滤波处理。
在一些实施例中,该方法还可以包括:
若第一亮度语法元素标识信息的取值为第一值,则打开亮度帧级开关;或者,
若第一亮度语法元素标识信息的取值为第二值,则关闭亮度帧级开关。
需要说明的是,第一值和第二值不同,而且第一值和第二值可以是参数形式,也可以是数字形式。具体地,第一亮度语法元素标识信息的取值可以是写入在概述(profile)中的参数,也可以是一个标志(flag)的取值,本申请实施例对此不作任何限定。
在本申请实施例中,以第一亮度语法元素标识信息为一flag信息为例,这时候对于第一值和第二值而言,第一值可以设置为1,第二值可以设置为0;或者,第一值还可以设置为true,第二值还可以设置为false;或者,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。本申请实施例对此不作任何限定。
以第一值为1,第二值为0为例,如果解码获得第一亮度语法元素标识信息的取值为1,那么可以打开亮度帧级开关,即调用帧级环路滤波网络模型,这时候可以确定当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理。否则,如果第一亮度语法元素标识信息的取值为0,那么可以关闭亮度帧级开关,即不调用帧级环路滤波网络模型,这时候可以确定当前帧的亮度分量不使用亮度环路滤波网络模型进行滤波处理,此时可以从视频序列中获取下一帧,将下一帧确定为当前帧,然后继续执行解析码流,确定第一亮度语法元素标识信息的取值的步骤。
进一步地,对于第二亮度语法元素标识信息而言,在一些实施例中,该方法还可以包括:
若第二亮度语法元素标识信息的取值为第一值,则确定第二亮度语法元素标识信息指示当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理;或者,
若所述第二亮度语法元素标识信息的取值为第二值,则确定所述第二亮度语法元素标识信息指示当前块的亮度分量不使用亮度环路滤波网络模型进行滤波处理。
在一些实施例中,该方法还可以包括:
若第二亮度语法元素标识信息的取值为第一值,则打开亮度CTU级开关;或者,
若第二亮度语法元素标识信息的取值为第二值,则关闭亮度CTU级开关。
需要说明的是,第一值和第二值不同。
在本申请实施例中,以第二亮度语法元素标识信息为另一flag信息为例,这时候对于第一值和第二值而言,第一值可以设置为1,第二值可以设置为0;或者,第一值还可以设置为true,第二值还可以设置为false;或者,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。本申请实施例对此不作任何限定。
以第一值为1,第二值为0为例,在解码获得第一亮度语法元素标识信息的取值为1的情况下,如果第二亮度语法元素标识信息的取值为1,那么可以打开亮度CTU级开关,即调用CTU级环路滤波网络模型,这时候可以确定当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理。否则,如果第二亮度语法元素标识信息的取值为0,那么可以关闭亮度CTU级开关,即不调用CTU级环路滤波网络模型,这时候可以确定当前块的亮度分量不使用亮度环路滤波网络模型进行滤波处理,此时可以从当前帧中获取下一个块,将下一个块确定为当前块,然后继续执行解析码流,确定第二亮度语法元素标识信息的取值的步骤,直至当前帧所包括的块全部处理完成,然后加载下一帧继续处理。
在另一种可能的实施方式中,在当前帧的色度分量的情况下,所述解析码流,确定第一语法元素标 识信息的取值,可以包括:
当色度语法元素标识信息指示当前帧的色度分量使用色度环路滤波网络模型进行滤波处理时,执行解析码流,确定第一语法元素标识信息的取值的步骤。
需要说明的是,对于色度分量,这里涉及到帧级语法元素。其中,帧级语法元素可以称为色度语法元素标识信息,用chroma_frame_flag表示。
还需要说明的是,由于考虑到编码性能和计算复杂度,如果色度语法元素标识信息指示当前帧的色度分量使用色度环路滤波网络模型进行滤波处理,那么当前帧所包括的块都默认使用色度环路滤波网络模型进行滤波处理;如果色度语法元素标识信息指示当前帧的色度分量不使用色度环路滤波网络模型进行滤波处理,那么当前帧所包括的块都默认不使用色度环路滤波网络模型进行滤波处理。因此,对于色度分量不再需要设置CTU级语法元素,同理也不需要设置CTU级开关。换句话说,对于色度分量,本申请实施例可以只设置帧级开关。因此,在一些实施例中,该方法还可以包括:设置色度帧级开关。其中,色度帧级开关可以用于控制当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理。
进一步地,对于色度语法元素标识信息而言,在一些实施例中,该方法还可以包括:
若色度语法元素标识信息的取值为第一值,则确定色度语法元素标识信息指示当前帧的色度分量使用色度环路滤波网络模型进行滤波处理;或者,
若色度语法元素标识信息的取值为第二值,则确定色度语法元素标识信息指示当前帧的色度分量不使用色度环路滤波网络模型进行滤波处理。
在一些实施例中,该方法还可以包括:
若色度语法元素标识信息的取值为第一值,则打开色度帧级开关;或者,
若色度语法元素标识信息的取值为第二值,则关闭色度帧级开关。
需要说明的是,第一值和第二值不同,而且第一值和第二值可以是参数形式,也可以是数字形式。具体地,无论是色度语法元素标识信息还是色度帧级开关均可以是写入在概述(profile)中的参数,也可以是一个标志(flag)的取值,本申请实施例对此不作任何限定。
在本申请实施例中,以色度语法元素标识信息为又一flag信息为例,这时候对于第一值和第二值而言,第一值可以设置为1,第二值可以设置为0;或者,第一值还可以设置为true,第二值还可以设置为false;或者,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。本申请实施例对此不作任何限定。
以第一值为1,第二值为0为例,如果解码获得色度语法元素标识信息的取值为1,那么可以打开色度帧级开关,即调用帧级环路滤波网络模型,这时候可以确定当前帧的色度分量使用色度环路滤波网络模型进行滤波处理,而且默认当前帧的色度分量对应的每一个块均使用色度环路滤波网络模型进行滤波处理。否则,如果色度语法元素标识信息的取值为0,那么可以关闭色度帧级开关,即不调用帧级环路滤波网络模型,这时候可以确定当前帧的色度分量对应的每一个块均不使用色度环路滤波网络模型进行滤波处理,此时可以从视频序列中获取下一帧,将下一帧确定为当前帧,然后继续执行解析码流,确定色度语法元素标识信息的取值的步骤。
还需要说明的是,针对视频序列、当前帧、当前块等是否使用环路滤波网络模型进行滤波处理,可以根据亮度分量和色度分量逐一设置语法元素标识信息,然后通过解析码流来确定;也可以仅针对当前块和/或当前帧设置语法元素标识信息,然后通过解析码流来确定。在本申请实施例中,对于视频序列、当前帧、当前块,可以逐一设置语法元素标识信息(如第二语法元素标识信息、第一亮度语法元素标识信息、第二亮度语法元素标识信息和色度语法元素标识信息等),但是这里并不作具体限定。
这样,解码器在通过解析码流获得第一亮度语法元素标识信息的取值、第二亮度语法元素标识信息的取值和色度语法元素标识信息的取值之后,可以确定当前块是否使用环路滤波网络模型(包括亮度环路滤波网络模型或者色度环路滤波网络模型)进行滤波处理。在当前块使用环路滤波网络模型进行滤波处理的情况下,可以进一步解析码流,获取第一语法元素标识信息的取值,进而确定出当前块是否允许使用预设选择网络模型进行模型选择。
S402:当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定所述当前块使用的环路滤波网络模型。
需要说明的是,如果当前块允许使用预设选择网络模型进行模型选择,那么在确定出当前块的预设选择网络模型后,可以根据预设选择网络模型确定出当前块使用的环路滤波网络模型。
在一些实施例中,所述根据预设选择网络模型,确定当前块使用的环路滤波网络模型,可以包括:
根据预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值;
根据至少一个候选环路滤波网络模型各自对应的输出值,确定所述当前块使用的环路滤波网络模型。
进一步地,在一些实施例中,所述根据至少一个候选环路滤波网络模型各自对应的输出值,确定当 前块使用的环路滤波网络模型,可以包括:
从至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选环路滤波网络模型作为当前块使用的环路滤波网络模型。
在一种具体的示例中,从至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,可以包括:从至少一个候选环路滤波网络模型各自对应的输出值中选取最大值,将所述最大值作为所述目标值。
也就是说,根据预设选择网络模型可以得到至少一个候选环路滤波网络模型各自对应的输出值;然后从这至少一个候选环路滤波网络模型各自对应的输出值中确定目标值(比如最大值),将目标值(比如最大值)对应的候选环路滤波网络模型作为当前块使用的环路滤波网络模型。
还需要说明的是,输出值可以为概率值。具体地,至少一个候选环路滤波网络模型各自对应的输出值,可以用于反映这至少一个候选环路滤波网络模型各自的概率分布情况。
还可以理解,针对不同的颜色分量类型,这里的预设选择网络模型也不相同。在本申请实施例中,亮度分量对应的预设选择网络模型可以称为亮度选择网络模型,色度分量对应的预设选择网络模型可以称为色度选择网络模型。因此,在一些实施例中,所述确定当前块的预设选择网络模型,可以包括:
当当前块的颜色分量类型为亮度分量时,确定所述当前块的亮度选择网络模型;
当当前块的颜色分量类型为色度分量时,确定所述当前块的色度选择网络模型。
相应地,针对不同的颜色分量类型,这里的候选环路滤波网络模型也是不同的。在本申请实施例中,亮度分量对应的候选环路滤波网络模型可以称为候选亮度环路滤波网络模型,色度分量对应的候选环路滤波网络模型可以称为候选色度环路滤波网络模型。因此,在一些实施例中,所述根据预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值,可以包括:
当当前块的颜色分量类型为亮度分量时,根据亮度选择网络模型确定至少一个候选亮度环路滤波网络模型各自对应的输出值;
当当前块的颜色分量类型为色度分量时,根据色度选择网络模型确定至少一个候选色度环路滤波网络模型各自对应的输出值。
也就是说,对于颜色分量类型而言,其可以包括亮度分量和色度分量。在本申请实施例中,如果当前块的颜色分量类型为亮度分量,这时候的当前块可以称为亮度块,那么需要确定当前块的亮度选择网络模型,然后根据亮度选择网络模型确定至少一个候选亮度环路滤波网络模型各自对应的输出值。如果当前块的颜色分量类型为色度分量,这时候的当前块可以称为色度块,那么需要确定当前块的色度选择网络模型,然后根据色度选择网络模型确定至少一个候选色度环路滤波网络模型各自对应的输出值。
还需要说明的是,无论是至少一个亮度分量对应的候选环路滤波网络模型(可简称为“候选亮度环路滤波网络模型”),还是至少一个色度分量对应的候选环路滤波网络模型(可简称为“候选色度环路滤波网络模型”),这些候选环路滤波网络模型都是通过模型训练得到的。
在一种可能的实施方式中,对于至少一个候选亮度环路滤波网络模型而言,该方法还可以包括:
确定第一训练集,其中,第一训练集包括至少一个训练样本,且训练样本是根据至少一种量化参数得到的;
利用第一训练集中训练样本的亮度分量对第一神经网络结构进行训练,得到至少一个候选亮度环路滤波网络模型。
也就是说,至少一个候选亮度环路滤波网络模型是根据至少一个训练样本对第一神经网络结构进行模型训练确定的,且这至少一个候选亮度环路滤波网络模型与颜色分量类型和量化参数之间具有对应关系。
在这里,第一神经网络结构包括下述至少之一:卷积层、激活层、残差块和跳转连接层。在一种具体的示例中,第一神经网络结构可以包括第一卷积模块、第一残差模块、第二卷积模块和第一连接模块。
示例性地,如图6A所示,第一神经网络结构的输入是重建亮度帧,输出是原始亮度帧;该第一神经网络结构包括有:第一卷积模块601、第一残差模块602、第二卷积模块603和第一连接模块604。其中,在图6A中,第一卷积模块601、第一残差模块602、第二卷积模块603和第一连接模块604顺次连接,且第一连接模块604还与第一卷积模块601的输入连接。
在一种更具体的示例中,对于第一神经网络结构而言,第一卷积模块可以由一层卷积层和一层激活层组成,第二卷积模块可以由两层卷积层和一层激活层组成,连接模块可以由跳转连接层组成,第一残差模块可以包括若干个残差块,且每一个残差块可以由两层卷积层和一层激活层组成。
在另一种可能的实施方式中,对于若干个候选色度环路滤波网络模型而言,该方法还可以包括:
确定第一训练集,其中,所述第一训练集包括至少一个训练样本,且所述训练样本是根据至少一种量化参数得到的;
利用所述第一训练集中训练样本的亮度分量和色度分量对第二神经网络结构进行训练,得到所述至 少一个候选色度环路滤波网络模型;
也就是说,至少一个候选色度环路滤波网络模型是根据至少一个训练样本对第二神经网络结构进行模型训练确定的,且至少一个候选色度环路滤波网络模型与颜色分量类型和量化参数之间具有对应关系。
在这里,第二神经网络结构包括下述至少之一:采样层、卷积层、激活层、残差块、池化层和跳转连接层。在一种具体的示例中,第二神经网络结构可以包括上采样模块、第三卷积模块、第四卷积模块、融合模块、第二残差模块、第五卷积模块和第二连接模块。
示例性地,如图6B所示,第二神经网络结构的输入是重建亮度帧和重建色度帧,输出是原始色度帧;该第二神经网络结构包括有:上采样模块605、第三卷积模块606、第四卷积模块607、融合模块608、第二残差模块609、第五卷积模块610和第二连接模块611。其中,在图6B中,上采样模块605的输入是重建色度帧,上采样模块605和第三卷积模块606连接;第四卷积模块607的输入是重建亮度帧,第三卷积模块606和第四卷积模块607与融合模块608连接,融合模块608、第二残差模块609、第五卷积模块610和第二连接模块611顺次连接,且第二连接模块611还与上采样模块605的输入连接。
在一种更具体的示例中,对于第二神经网络结构而言,第三卷积模块可以由一层卷积层和一层激活层组成,第四卷积模块可以由一层卷积层和一层激活层组成,第五卷积模块可以由两层卷积层、一层激活层和一层池化层组成,连接模块可以由跳转连接层组成,第二残差模块可以包括若干个残差块,且每一个残差块可以由两层卷积层和一层激活层组成。
示例性地,以环路滤波网络模型为CNNLF为例,CNNLF对于亮度分量和色度分量分别设计了不同的网络结构。其中,对于亮度分量,其设计了第一神经网络结构,具体参见图6C;对于色度分量,其设计了第二神经网络结构,具体参见图6D。
对于亮度分量,如图6C所示,整个网络结构可以由卷积层、激活层、残差块、跳转连接层等部分组成。这里,卷积层的卷积核可以为3×3,即可以用3×3 Conv表示;激活层可以为线性激活函数,即可以用线性整流函数(Rectified Linear Unit,ReLU)表示,又可称为修正线性单元,是一种人工神经网络中常用的激活函数,通常指代以斜坡函数及其变种为代表的非线性函数。残差块(ResBlock)的网络结构如图7中的虚线框所示,可以由卷积层(Conv)、激活层(ReLU)和跳转连接层等组成。在网络结构中,跳转连接层(Concat)是指网络结构中所包括的一条从输入到输出的全局跳转连接,能够使网络能够专注于学习残差,加速了网络的收敛过程。
对于色度分量,如图6D所示,这里引入了亮度分量作为输入之一来指导色度分量的滤波,整个网络结构可以由卷积层、激活层、残差块、池化层、跳转连接层等部分组成。由于分辨率的不一致性,色度分量首先需要进行上采样。为了避免在上采样过程中引入其他噪声,可以通过直接拷贝邻近像素来完成分辨率的扩大,以得到放大色度帧(Enlarged chroma frame)。另外,在网络结构的末端,还使用了池化层(如平均值池化层,用2×2 AvgPool表示)来完成色度分量的下采样。具体地,在HPM-ModAI的应用中,亮度分量网络的残差块数量可设置为N=20,色度分量网络的残差块数量可设置为N=10。
这样,在模型训练阶段,可以离线的训练出4个I帧亮度分量模型,4个非I帧亮度分量模型,4个色度U分量模型,4个色度V分量模型等共16种候选环路滤波网络模型。
还可以理解,如果确定至少一个候选环路滤波网络模型各自对应的输出值,那么还需确定当前块的预设选择网络模型。针对不同的颜色分量类型,其对应的预设选择网络模型也不相同。在这里,亮度分量对应的预设选择网络模型可以称为亮度选择网络模型,色度分量对应的预设选择网络模型可以称为色度选择网络模型。
在一种可能的实施方式中,在当前块的颜色分量类型为亮度分量的情况下,所述确定当前块的亮度选择网络模型,可以包括:
确定至少一个候选亮度选择网络模型;
确定当前块的量化参数,从至少一个候选亮度选择网络模型中选取量化参数对应的候选亮度选择网络模型;
将所选取的候选亮度选择网络模型确定为当前块的亮度选择网络模型。
在另一种可能的实施方式中,在当前块的颜色分量类型为色度分量的情况下,所述确定当前块的色度选择网络模型,可以包括:
确定至少一个候选色度选择网络模型;
确定当前块的量化参数,从至少一个候选色度选择网络模型中选取量化参数对应的候选色度选择网络模型;
将所选取的候选色度选择网络模型确定为当前块的色度选择网络模型。
需要说明的是,当前块的预设选择网络模型不仅和量化参数有关,而且还和颜色分量类型有关。其中,不同的颜色分量类型,对应有不同的预设选择网络模型,比如对于亮度分量来说,预设选择网络模 型可以是与亮度分量相关的亮度选择网络模型;对于色度分量来说,预设选择网络模型可以是与色度分量相关的色度选择网络模型。
还需要说明的是,根据不同的量化参数,比如QP的取值为27~31、32~37、38~44、45~50等,预先可以训练出至少一个候选亮度选择网络模型和至少一个候选色度选择网络模型。这样,在确定出当前块的量化参数之后,可以从至少一个候选亮度选择网络模型中选取出该量化参数对应的候选亮度选择网络模型,即当前块的亮度选择网络模型;也可以从至少一个候选色度选择网络模型中选取出该量化参数对应的候选色度选择网络模型,即当前块的色度选择网络模型。
进一步地,对于至少一个候选亮度选择网络模型和至少一个候选色度选择网络模型的模型训练,在一些实施例中,该方法还可以包括:
确定第二训练集,其中,第二训练集包括至少一个训练样本,且所述训练样本是根据至少一种量化参数得到的;
利用第二训练集中训练样本的亮度分量对第三神经网络结构进行训练,得到至少一个候选亮度选择网络模型;
利用第二训练集中训练样本的色度分量对第三神经网络结构进行训练,得到至少一个候选色度选择网络模型;
其中,至少一个候选亮度选择网络模型与亮度分量和量化参数之间具有对应关系,至少一个候选色度选择网络模型与色度分量和量化参数之间具有对应关系。
也就是说,至少一个候选亮度选择网络模型和所述至少一个候选色度选择网络模型分别是根据至少一个训练样本对第三神经网络结构进行模型训练确定的,且这至少一个候选亮度选择网络模型和所述至少一个候选色度选择网络模型均与颜色分量类型和量化参数之间具有对应关系。
在这里,第三神经网络结构可以包括下述至少之一:卷积层、池化层、全连接层和激活层。在一种具体的示例中,第三神经网络结构可以包括第六卷积模块和全连接模块,第六卷积模块和全连接模块顺次连接。
在一种更具体的示例中,第六卷积模块可以包括若干个卷积子模块,每一个卷积子模块可以由一层卷积层和一层池化层组成;全连接模块可以包括若干个全连接子模块,每一个全连接子模块可以由一层全连接层和一层激活层组成。
也就是说,预设选择网络模型可以选择多层卷积神经网络和多层全连接层神经网络组成,然后利用训练样本进行深度学习以得到当前块的预设选择网络模型,比如亮度选择网络模型或者色度选择网络模型。
在本申请实施例中,深度学习是机器学习的一种,而机器学习是实现人工智能的必经路径。深度学习的概念源于人工神经网络的研究,含多个隐藏层的多层感知器就是一种深度学习结构。深度学习可以通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。在本申请实施例中,以卷积神经网络(Convolutional Neural Networks,CNN)为例,它是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学习(Deep Learning)的代表算法之一。这里的预设选择网络模型可以是一种卷积神经网络结构。
示例性地,无论是亮度选择网络模型还是色度选择网络模型,其可以看作是由第三神经网络结构进行训练得到的。也就是说,对于预设选择网络模型,本申请实施例还设计了第三神经网络结构,具体参见图8A和图8B。
如图8A所示,第三神经网络结构的输入是重建帧,输出是当前块使用环路滤波网络模型时的各个候选环路滤波网络模型的概率分布情况。在图8A中,该第三神经网络结构包括有:第六卷积模块801和全连接模块802,且第六卷积模块801和全连接模块802顺次连接。其中,第六卷积模块801可以包括若干个卷积子模块,每一个卷积子模块可以由一层卷积层和一层池化层组成;全连接模块802可以包括若干个全连接子模块,每一个全连接子模块可以由一层全连接层和一层激活层组成。
在一种具体的示例中,如图8B所示,第三神经网络结构可以由多层卷积神经网络和多层全连接神经网络组成。其中,该网络结构可以包括K层卷积层、M层池化层、L层全连接层和N层激活层,K、M、L、N均为大于或等于1的整数。
在一种更具体的示例中,K=3,M=3,L=2,N=2。
这样,基于图8B所示的网络结构,其可以由3层卷积层和2层全连接层组成,而且每层卷积层之后设置有池化层;其中,卷积层的卷积核可以为3×3,即可以用3×3 Conv表示;池化层可以采用最大值池化层,用2×2 MaxPool表示;另外,全连接层之后设置有激活层,在这里,激活层可以为线性激活函数,也可以为非线性激活函数,比如ReLU和Softmax等。
进一步地,根据上述的实施方式,在确定出预设选择网络模型和至少一个候选环路滤波网络模型之 后,还可以确定这至少一个候选环路滤波网络模型的概率分布情况。在一些实施例中,所述根据所述预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值,可以包括:
确定环路滤波网络模型的输入重建图像块;
将输入重建图像块输入预设选择网络模型,得到至少一个候选环路滤波网络模型各自对应的输出值。
在这里,环路滤波网络模型可以是指前述的CNNLF模型。以输出值为概率值为例,在确定出CNNLF模型的输入重建图像块之后,将CNNLF模型的输入重建图像块作为预设选择网络模型的输入,而预设选择网络模型的输出即为这至少一个候选环路滤波网络模型各自的概率分布情况。即在得到这至少一个候选环路滤波网络模型的概率值之后,可以根据概率值的大小确定出当前块使用的环路滤波网络模型。具体地,可以从至少一个候选环路滤波网络模型的概率值中选取最大概率值,将最大概率值对应的候选环路滤波网络模型确定为当前块使用的环路滤波网络模型。
也就是说,无论是亮度环路滤波网络模型还是色度环路滤波网络模型,均是先通过模型训练以得到若干个候选亮度环路滤波网络模型或者若干个候选亮度环路滤波网络模型,然后再利用预设选择网络模型确定出这些若干个候选亮度环路滤波网络模型或者若干个候选亮度环路滤波网络模型的概率值,再选择概率值最大的候选环路滤波网络模型来确定出当前块使用的环路滤波网络模型。
还需要说明的是,根据颜色分量类型的不同,预设选择网络模型包括亮度选择网络模型和色度选择网络模型;这样,对于输入重建图像块来说,也可以包括输入重建亮度图像块和输入重建色度图像块。
在一种可能的实施方式中,当当前块的颜色分量类型为亮度分量时,所述根据预设选择网络模型,确定当前块使用的环路滤波网络模型,可以包括:
确定亮度环路滤波网络模型的输入重建亮度图像块;
将输入重建亮度图像块输入亮度选择网络模型,得到至少一个候选亮度环路滤波网络模型各自对应的输出值;
从至少一个候选亮度环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选亮度环路滤波网络模型作为当前块使用的亮度环路滤波网络模型。
在另一种可能的实施方式中,当当前块的颜色分量类型为色度分量时,所述根据预设选择网络模型,确定当前块使用的环路滤波网络模型,可以包括:
确定色度环路滤波网络模型的输入重建色度图像块;
将输入重建色度图像块输入色度选择网络模型,得到至少一个候选色度环路滤波网络模型各自对应的输出值;
从至少一个候选色度环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选色度环路滤波网络模型作为当前块使用的色度环路滤波网络模型。
这样,在确定出当前块使用的环路滤波网络模型(包括亮度环路滤波网络模型或者色度环路滤波网络模型)之后,可以利用所选取的环路滤波网络模型对当前块进行滤波处理。
S403:利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
需要说明的是,本申请实施例所述的环路滤波网络模型可以为CNNLF模型。这样,利用所选取的CNNLF模型可以对当前块进行CNNLF滤波处理,以得到当前块的重建图像块。
还需要说明的是,对于输入重建图像块(包括输入重建亮度图像块或者输入重建色度图像块)来说,这里,输入重建图像块可以是经由去块滤波器和样值自适应补偿滤波器进行滤波处理后得到。
进一步地,在一些实施例中,该方法还可以包括:在确定出当前块的重建图像块之后,利用自适应修正滤波器继续对重建图像块进行滤波处理。
示例性地,参见图9,其示出了本申请实施例提供的一种使用预设选择网络模型的整体框架示意图。如图9所示,结合图8所示的网络结构,该网络结构的输入为CNNLF模型的输入重建亮度图像块或输入重建色度图像块,该网络结构的输出为各个CNNLF模型的概率值,然后从中选取概率值最大的CNNLF模型对输入重建亮度图像块或输入重建色度图像块进行神经网络滤波处理。另外,根据图9还可以得到,输入重建图像块是经由去块滤波器(DBF)和样值自适应补偿滤波器(SAO)进行滤波处理后得到的,然后经由模型自适应选择模块和CNNLF模型所得到的重建图像块还可以输入自适应修正滤波器(ALF)继续进行滤波处理。
本实施例还提供了一种解码方法,应用于解码器。通过解析码流,确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。这样,通过引入基于深度学习的模型自适应选择技术,利用预设选择网络模型对至少一个候选环路滤波网络模型进行模型选择,再根据所选中的环路滤波网络模型对当前块进行滤波处理,不仅可以提升编码性能,进而提高编解码效率;而且还可以使得 最终输出的重建图像块更加接近于原始图像块,能够提升视频图像质量。
在本申请的另一实施例中,为了节省解码器的复杂度,参见图10,其示出了本申请实施例提供的另一种解码方法的流程示意图。如图10所示,该方法可以包括:
S1001:解析码流,确定第一语法元素标识信息的取值。
S1002:当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,解析码流,确定环路滤波网络模型索引序号。
S1003:根据环路滤波网络模型索引序号,从至少一个候选环路滤波网络模型中确定当前块使用的环路滤波网络模型。
S1004:利用环路滤波网络模型对所述当前块进行滤波处理,得到当前块的重建图像块。
需要说明的是,为了方便解码器能够确定当前块是否允许使用预设选择网络模型进行模型选择,可以设置一个第一语法元素标识信息,然后根据解码获得的第一语法元素标识信息的取值来确定。其中,第一语法元素标识信息可以用model_adaptive_selection_enable_flag表示。
在一种具体的示例中,如果model_adaptive_selection_enable_flag的取值为第一值,那么可以确定当前块允许使用预设选择网络模型进行模型选择;或者,如果model_adaptive_selection_enable_flag的取值为第二值,那么可以确定当前块不允许使用所述预设选择网络模型进行模型选择。示例性地,第一值可以为1,第二值可以为0,但这里不作任何限定。
还需要说明的是,以CNNLF模型为例,对于解码器侧的模型自适应选择模块,可以根据在编码器侧模型自适应选择模块所选中的CNNLF模型,将其索引序号进行编码并写入码流中,然后在解码器中根据解析出CNNLF模型的索引序号即可确定出当前块使用的CNNLF模型并进行滤波处理,从而降低解码器的复杂度。
除此之外,针对前述实施例中的第一神经网络结构、第二神经网络结构和第三神经网络结构等,其包括的卷积层数量,全连接层数量,非线性激活函数等均可以进行调整。另外,模型自适应选择模块所针对的环路滤波网络模型,除了CNNLF模型之外,还可以是针对其他高效的神经网络滤波器模型进行模型自适应选择,本申请实施例也不作任何限定。
简言之,本申请实施例提出了一种基于深度学习的模型自适应选择模块,用于对CNNLF模型进行自适应的选择,提升编码性能。模型自适应选择模块可以看作是由多层卷积神经网络和多层全连接神经网络组成的预设选择网络模型,其输入为CNNLF模型的输入重建图像块,输出为各个CNNLF模型的概率分布情况。模型自适应选择模块位于编码器/解码器中的位置如图5所示,模型自适应选择模块的使用不依赖于DBF、SAO、ALF、CNNLF的开关,只是在位置上置于CNNLF之前。
在一种具体的示例中,本申请实施例的技术方案作用在解码器的环路滤波模块中,其具体流程如下:
解码器获取并解析码流,当解析到环路滤波模块时,按照预设的滤波器顺序进行处理。这里,预设的滤波器顺序为DBF滤波→SAO滤波→模型自适应选择模块→CNNLF滤波→ALF滤波。当进入模型自适应选择模块时,
(a)首先根据解码得到的model_adaptive_selection_enable_flag判断当前块下是否允许使用模型自适应选择模块进行模型选择。如果model_adaptive_selection_enable_flag为“1”,那么对当前块尝试进行模型自适应选择模块处理,跳转至(b);如果model_adaptive_selection_enable_flag为“0”,那么跳转至(e);
(b)判断当前块的颜色分量类型,如果当前块为亮度分量块,那么跳转至(c);如果当前块为色度分量块,那么跳转至(d);
(c)对于亮度分量,将CNNLF模型的输入重建亮度图像块作为模型自适应选择模块的输入,输出为各个亮度CNNLF模型的概率分布情况。选择其中概率值最大的模型作为当前亮度图像块的CNNLF模型,并对当前亮度图像块进行CNNLF滤波处理,得到最终的重建图像块;
(d)对于色度分量,将CNNLF模型的输入重建色度图像块作为模型自适应选择模块的输入,输出为各个色度CNNLF模型的概率分布情况。选择其中概率值最大的模型作为当前色度图像块的CNNLF模型,并对当前色度图像块进行CNNLF滤波处理,得到最终的重建图像块;
(e)如果当前帧已完成模型自适应选择模块的处理,那么加载下一帧进行处理,然后跳转至(a)。
在实现中,其语法元素的修改如表1所示。
表1
Figure PCTCN2021099234-appb-000001
在这里,基于神经网络的模型自适应选择的允许标志为model_adaptive_selection_enable_flag。
综上所述,本申请实施例通过引入基于深度学习的模型自适应选择技术,将HPM-ModAI的CNNLF模型的输入重建图像块输入多层卷积层加全连接层的神经网络结构中,输出各个CNNLF模型的概率分布情况,为输入重建图像块自适应地选择合适的CNNLF模型,然后将输入重建图像块输入选中的CNNLF模型进行滤波处理,使得最终输出的重建图像块更接近于原始图像块,可以提升编码性能。
示例性地,以HPM-ModAI中的4个非I帧亮度分量模型为例,实现了模型自适应选择模块的训练和测试,本申请实施例提出的技术方案在AVS3智能编码参考软件HPM11.0-ModAI6.0上实现。在智能编码通用测试条件Random Access配置下对AVS3要求的测试序列进行测试,对比锚(anchor)为HPM11.0-ModAI6.0,在Y、U、V分量上BD-rate平均变化分别为-1.01%、0.00%、0.04%,具体如表2所示;在智能编码通用测试条件Low Delay B配置下BD-rate平均变化分别为-0.86%、-0.21%、-0.30%,具体如表3所示。表2和表3这些数据可以说明本技术方案提升了编码性能,具体来讲,本技术方案通过引入基于深度学习的模型自适应选择技术,可以给已有的AVS3智能编码参考软件HPM-ModAI带来不错的性能增益。
表2
Figure PCTCN2021099234-appb-000002
表3
Figure PCTCN2021099234-appb-000003
通过上述实施例,对前述实施例的具体实现进行了详细阐述,从中可以看出,通过前述实施例的技术方案,引入了基于深度学习的模型自适应选择技术,不仅可以提升编码性能,进而提高编解码效率;而且还可以使得最终输出的重建图像块更加接近于原始图像块,能够提升视频图像质量。
在本申请的又一实施例中,参见图11,其示出了本申请实施例提供的一种编码方法的流程示意图。如图11所示,该方法可以包括:
S1101:确定第一语法元素标识信息的取值。
需要说明的是,视频图像可以划分为多个图像块,每个当前待编码的图像块可以称为编码块。这里,每个编码块可以包括第一图像分量、第二图像分量和第三图像分量;而当前块即为视频图像中当前待进行第一图像分量、第二图像分量或者第三图像分量环路滤波处理的编码块。
在这里,针对第一图像分量、第二图像分量和第三图像分量,从颜色划分角度,本申请实施例可以将其划分为亮度分量和色度分量等两种颜色分量类型。在这种情况下,如果当前块进行亮度分量的预测、反变换与反量化、环路滤波等操作,那么当前块也可以称为亮度块;或者,如果当前块进行色度分量的预测、反变换与反量化、环路滤波等操作,那么当前块也可以称为色度块。
还需要说明的是,在编码器侧,本申请实施例具体提供了一种环路滤波方法,尤其是一种基于深度学习的模型自适应选择方法,该方法应用在如图3A所示的滤波单元108部分。在这里,滤波单元108可以包括去块滤波器(DBF)、样值自适应补偿滤波器(SAO)、基于残差神经网络的环路滤波器(CNNLF)和自适应修正滤波器(ALF),而本申请实施例所述的方法具体应用在SAO和CNNLF之间,以便能够 对CNNLF的模型进行自适应选择,然后根据选择出合适的模型执行CNNLF滤波处理。
更具体地,本申请实施例提出了一种基于深度学习的模型自适应选择模块,详见图5所示的模型自适应选择模块,可以用于对环路滤波网络模型(比如CNNLF模型)进行自适应选择,从而提升编码性能。
在本申请实施例中,对于模型自适应选择模块,当前块是否允许使用预设选择网络模型进行模型选择,可以通过一个第一语法元素标识信息进行指示。在一些实施例中,所述确定第一语法元素标识信息的取值,包括:
若当前块允许使用预设选择网络模型进行模型选择,则确定第一语法元素标识信息的取值为第一值;或者,
若当前块不允许使用预设选择网络模型进行模型选择,则确定第一语法元素标识信息的取值为第二值。
进一步地,该方法还包括:对第一语法元素标识信息的取值进行编码,将编码比特写入码流。
也就是说,首先可以设置一个第一语法元素标识信息,以指示当前块是否允许使用预设选择网络模型进行模型选择。在这里,如果当前块允许使用预设选择网络模型进行模型选择,那么可以确定第一语法元素标识信息的取值为第一值;如果当前块不允许使用预设选择网络模型进行模型选择,那么可以确定第一语法元素标识信息的取值为第二值。这样,在编码器中,当确定出第一语法元素标识信息的取值后,将第一语法元素标识信息的取值写入码流以传输到解码器,使得解码器通过解析码流即可获知当前块是否允许使用预设选择网络模型进行模型选择。
在这里,第一值和第二值不同,而且第一值和第二值可以是参数形式,也可以是数字形式。具体地,第一语法元素标识信息可以是写入在概述(profile)中的参数,也可以是一个标志(flag)的取值,本申请实施例对此不作任何限定。
以第一语法元素标识信息为一个flag为例,这时候对于第一值和第二值而言,第一值可以设置为1,第二值可以设置为0;或者,第一值还可以设置为true,第二值还可以设置为false;或者,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。示例性地,对于flag而言,一般情况下,第一值可以为1,第二值可以为0,但是并不作任何限定。
还需要说明的是,预设选择网络模型可以看作是一个神经网络,而第一语法元素标识信息可以看作是一个基于神经网络的模型自适应选择的允许标志,这里可以用model_adaptive_selection_enable_flag表示。具体来说,model_adaptive_selection_enable_flag可以用于指示当前块是否允许使用预设选择网络模型进行模型选择。
这样,以第一值为1,第二值为0为例,如果model_adaptive_selection_enable_flag的取值为1,那么可以确定当前块允许使用预设选择网络模型进行模型选择;如果model_adaptive_selection_enable_flag的取值为0,那么可以确定当前块不允许使用预设选择网络模型进行模型选择。
S1102:当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定所述当前块使用的环路滤波网络模型。
需要说明的是,需要说明的是,如果当前块允许使用预设选择网络模型进行模型选择,那么在确定出当前块的预设选择网络模型后,可以根据预设选择网络模型确定出当前块使用的环路滤波网络模型。
在一些实施例中,所述根据预设选择网络模型,确定当前块使用的环路滤波网络模型,可以包括:
根据预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值;
根据至少一个候选环路滤波网络模型各自对应的输出值,确定所述当前块使用的环路滤波网络模型。
进一步地,在一些实施例中,所述根据至少一个候选环路滤波网络模型各自对应的输出值,确定当前块使用的环路滤波网络模型,可以包括:
从至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选环路滤波网络模型作为当前块使用的环路滤波网络模型。
在一种具体的示例中,所述从至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,可以包括:从至少一个候选环路滤波网络模型各自对应的输出值中选取最大值,将所述最大值作为所述目标值。
也就是说,根据预设选择网络模型可以得到至少一个候选环路滤波网络模型各自对应的输出值;然后从这至少一个候选环路滤波网络模型各自对应的输出值中确定目标值(比如最大值),将目标值(比如最大值)对应的候选环路滤波网络模型作为当前块使用的环路滤波网络模型。
还需要说明的是,输出值可以为概率值。具体地,至少一个候选环路滤波网络模型各自对应的输出值,可以用于反映这至少一个候选环路滤波网络模型各自的概率分布情况。
还可以理解,针对不同的颜色分量类型,这里的预设选择网络模型也不相同。在本申请实施例中, 亮度分量对应的预设选择网络模型可以称为亮度选择网络模型,色度分量对应的预设选择网络模型可以称为色度选择网络模型。因此,在一些实施例中,所述确定当前块的预设选择网络模型,可以包括:
当当前块的颜色分量类型为亮度分量时,确定所述当前块的亮度选择网络模型;
当当前块的颜色分量类型为色度分量时,确定所述当前块的色度选择网络模型。
相应地,针对不同的颜色分量类型,这里的候选环路滤波网络模型也是不同的。在本申请实施例中,亮度分量对应的候选环路滤波网络模型可以称为候选亮度环路滤波网络模型,色度分量对应的候选环路滤波网络模型可以称为候选色度环路滤波网络模型。因此,在一些实施例中,所述根据预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值,可以包括:
当当前块的颜色分量类型为亮度分量时,根据亮度选择网络模型确定至少一个候选亮度环路滤波网络模型各自对应的输出值;
当当前块的颜色分量类型为色度分量时,根据色度选择网络模型确定至少一个候选色度环路滤波网络模型各自对应的输出值。
也就是说,对于颜色分量类型而言,其可以包括亮度分量和色度分量。在本申请实施例中,如果当前块的颜色分量类型为亮度分量,这时候的当前块可以称为亮度块,那么需要确定当前块的亮度选择网络模型,然后可以根据亮度选择网络模型确定至少一个候选亮度环路滤波网络模型各自对应的输出值。如果当前块的颜色分量类型为色度分量,这时候的当前块可以称为色度块,那么需要确定当前块的色度选择网络模型,然后可以根据色度选择网络模型确定至少一个候选色度环路滤波网络模型各自对应的输出值。
还需要说明的是,无论是至少一个亮度分量对应的候选环路滤波网络模型(可简称为“候选亮度环路滤波网络模型”),还是至少一个色度分量对应的候选环路滤波网络模型(可简称为“候选色度环路滤波网络模型”),这些候选环路滤波网络模型都是通过模型训练得到的。
在一种可能的实施方式中,对于至少一个候选亮度环路滤波网络模型而言,该方法还可以包括:
确定第一训练集,其中,第一训练集包括至少一个训练样本,且训练样本是根据至少一种量化参数得到的;
利用第一训练集中训练样本的亮度分量对第一神经网络结构进行训练,得到至少一个候选亮度环路滤波网络模型。
也就是说,至少一个候选亮度环路滤波网络模型是根据至少一个训练样本对第一神经网络结构进行模型训练确定的,且至少一个候选亮度环路滤波网络模型与颜色分量类型和量化参数之间具有对应关系。
在这里,第一神经网络结构包括下述至少之一:卷积层、激活层、残差块和跳转连接层。
在一种具体的示例中,第一神经网络结构可以包括第一卷积模块、第一残差模块、第二卷积模块和第一连接模块。其中,对于第一神经网络结构而言,第一卷积模块可以由一层卷积层和一层激活层组成,第二卷积模块可以由两层卷积层和一层激活层组成,连接模块可以由跳转连接层组成,第一残差模块可以包括若干个残差块,且每一个残差块可以由两层卷积层和一层激活层组成。
在另一种可能的实施方式中,对于若干个候选色度环路滤波网络模型而言,该方法还可以包括:
确定第一训练集,其中,所述第一训练集包括至少一个训练样本,且所述训练样本是根据至少一种量化参数得到的;
利用所述第一训练集中训练样本的亮度分量和色度分量对第二神经网络结构进行训练,得到所述至少一个候选色度环路滤波网络模型;
在这里,第二神经网络结构包括下述至少之一:采样层、卷积层、激活层、残差块、池化层和跳转连接层。
在一种具体的示例中,第二神经网络结构可以包括上采样模块、第三卷积模块、第四卷积模块、融合模块、第二残差模块、第五卷积模块和第二连接模块。其中,对于第二神经网络结构而言,第三卷积模块可以由一层卷积层和一层激活层组成,第四卷积模块可以由一层卷积层和一层激活层组成,第五卷积模块可以由两层卷积层、一层激活层和一层池化层组成,连接模块可以由跳转连接层组成,第二残差模块可以包括若干个残差块,且每一个残差块可以由两层卷积层和一层激活层组成。
示例性地,以环路滤波网络模型为CNNLF为例,CNNLF对于亮度分量和色度分量分别设计了不同的网络结构。其中,对于亮度分量,其设计了第一神经网络结构,具体参见图6A和图6C;对于色度分量,其设计了第二神经网络结构,具体参见图6B和图6D。
还可以理解,如果确定至少一个候选环路滤波网络模型各自对应的输出值,那么还需确定当前块的预设选择网络模型。针对不同的颜色分量类型,其对应的预设选择网络模型也不相同。在这里,亮度分量对应的预设选择网络模型可以称为亮度选择网络模型,色度分量对应的预设选择网络模型可以称为色度选择网络模型。
在一种可能的实施方式中,在当前块的颜色分量类型为亮度分量的情况下,所述确定当前块的亮度选择网络模型,可以包括:
确定至少一个候选亮度选择网络模型;
确定当前块的量化参数,从至少一个候选亮度选择网络模型中选取量化参数对应的候选亮度选择网络模型;
将所选取的候选亮度选择网络模型确定为当前块的亮度选择网络模型。
在另一种可能的实施方式中,在当前块的颜色分量类型为色度分量的情况下,所述确定当前块的色度选择网络模型,可以包括:
确定至少一个候选色度选择网络模型;
确定当前块的量化参数,从至少一个候选色度选择网络模型中选取量化参数对应的候选色度选择网络模型;
将所选取的候选色度选择网络模型确定为当前块的色度选择网络模型。
需要说明的是,当前块的预设选择网络模型不仅和量化参数有关,而且还和颜色分量类型有关。其中,不同的颜色分量类型,对应有不同的预设选择网络模型,比如对于亮度分量来说,预设选择网络模型可以是与亮度分量相关的亮度选择网络模型;对于色度分量来说,预设选择网络模型可以是与色度分量相关的色度选择网络模型。
还需要说明的是,根据不同的量化参数,比如QP的取值为27~31、32~37、38~44、45~50等,预先可以训练出至少一个候选亮度选择网络模型和至少一个候选色度选择网络模型。这样,在确定出当前块的量化参数之后,可以从至少一个候选亮度选择网络模型中选取出该量化参数对应的候选亮度选择网络模型,即当前块的亮度选择网络模型;也可以从至少一个候选色度选择网络模型中选取出该量化参数对应的候选色度选择网络模型,即当前块的色度选择网络模型。
进一步地,对于至少一个候选亮度选择网络模型和至少一个候选色度选择网络模型的模型训练,在一些实施例中,该方法还可以包括:
确定第二训练集,其中,第二训练集包括至少一个训练样本,且所述训练样本是根据至少一种量化参数得到的;
利用第二训练集中训练样本的亮度分量对第三神经网络结构进行训练,得到至少一个候选亮度选择网络模型;
利用第二训练集中训练样本的色度分量对第三神经网络结构进行训练,得到至少一个候选色度选择网络模型;
其中,至少一个候选亮度选择网络模型与亮度分量和量化参数之间具有对应关系,至少一个候选色度选择网络模型与色度分量和量化参数之间具有对应关系。
也就是说,至少一个候选亮度选择网络模型和所述至少一个候选色度选择网络模型分别是根据至少一个训练样本对第三神经网络结构进行模型训练确定的,且这至少一个候选亮度选择网络模型和所述至少一个候选色度选择网络模型均与颜色分量类型和量化参数之间具有对应关系。
需要说明的是,第三神经网络结构包括下述至少之一:卷积层、池化层、全连接层和激活层。
在一种具体的示例中,第三神经网络结构可以包括第六卷积模块和全连接模块,第六卷积模块和全连接模块顺次连接。其中,第六卷积模块可以包括若干个卷积子模块,每一个卷积子模块可以由一层卷积层和一层池化层组成;全连接模块可以包括若干个全连接子模块,每一个全连接子模块可以由一层全连接层和一层激活层组成。
也就是说,第三神经网络结构可以选择多层卷积神经网络和多层全连接层神经网络组成,然后利用训练样本进行深度学习,可以得到当前块的预设选择网络模型,比如亮度选择网络模型或者色度选择网络模型。
以图8A和图8B为例,第三神经网络结构可以由3层卷积层和2层全连接层组成,而且每层卷积层之后设置有池化层;其中,卷积层的卷积核可以为3×3,即可以用3×3 Conv表示;池化层可以采用最大值池化层,用2×2 MaxPool表示;另外,全连接层之后设置有激活层,在这里,激活层可以为线性激活函数,也可以为非线性激活函数,比如ReLU和Softmax等。
进一步地,根据上述的实施方式,在确定出预设选择网络模型和至少一个候选环路滤波网络模型之后,还可以确定这至少一个候选环路滤波网络模型的概率分布情况。在一些实施例中,所述根据所述预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值,可以包括:
确定环路滤波网络模型的输入重建图像块;
将输入重建图像块输入预设选择网络模型,得到至少一个候选环路滤波网络模型各自对应的输出值。
在这里,环路滤波网络模型可以是指前述的CNNLF模型。以输出值为概率值为例,在确定出CNNLF 模型的输入重建图像块之后,将CNNLF模型的输入重建图像块作为预设选择网络模型的输入,而预设选择网络模型的输出即为这至少一个候选环路滤波网络模型的概率分布情况。即在得到这至少一个候选环路滤波网络模型的概率值之后,可以根据概率值的大小确定出当前块使用的环路滤波网络模型。具体地,可以从至少一个候选环路滤波网络模型的概率值中选取最大概率值,将最大概率值对应的候选环路滤波网络模型确定为当前块使用的环路滤波网络模型。
也就是说,无论是亮度环路滤波网络模型还是色度环路滤波网络模型,均是先通过模型训练以得到若干个候选亮度环路滤波网络模型或者若干个候选亮度环路滤波网络模型,然后再利用预设选择网络模型确定出这些若干个候选亮度环路滤波网络模型或者若干个候选亮度环路滤波网络模型的概率值,再选择概率值最大的候选环路滤波网络模型来确定出当前块使用的环路滤波网络模型。
还需要说明的是,根据颜色分量类型的不同,预设选择网络模型包括亮度选择网络模型和色度选择网络模型;这样,对于输入重建图像块来说,也可以包括输入重建亮度图像块和输入重建色度图像块。
在一种可能的实施方式中,当当前块的颜色分量类型为亮度分量时,所述根据预设选择网络模型,确定当前块使用的环路滤波网络模型,可以包括:
确定亮度环路滤波网络模型的输入重建亮度图像块;
将输入重建亮度图像块输入亮度选择网络模型,得到至少一个候选亮度环路滤波网络模型各自对应的输出值;
从至少一个候选亮度环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选亮度环路滤波网络模型作为当前块使用的亮度环路滤波网络模型。
在另一种可能的实施方式中,当当前块的颜色分量类型为色度分量时,所述根据预设选择网络模型,确定当前块使用的环路滤波网络模型,可以包括:
确定色度环路滤波网络模型的输入重建色度图像块;
将输入重建色度图像块输入色度选择网络模型,得到至少一个候选色度环路滤波网络模型各自对应的输出值;
从至少一个候选色度环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选色度环路滤波网络模型作为当前块使用的色度环路滤波网络模型。
这样,在确定出当前块使用的环路滤波网络模型(包括亮度环路滤波网络模型或者色度环路滤波网络模型)之后,可以利用所选取的环路滤波网络模型对当前块进行滤波处理。
进一步地,为了节省复杂度,在一些实施例中,在确定所述当前块使用的环路滤波网络模型之后,该方法还可以包括:
确定环路滤波网络模型对应的环路滤波网络模型索引序号;
对环路滤波网络模型索引序号进行编码,将编码比特写入码流。
这样,以CNNLF模型为例,根据在编码器侧模型自适应选择模块所选中的CNNLF模型,将其索引序号进行编码并写入码流中;如此,后续在解码器中根据解析出CNNLF模型的索引序号即可直接确定出当前块使用的CNNLF模型并进行滤波处理,从而能够降低解码器的复杂度。
S1103:利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
需要说明的是,由于预设选择网络模型是对多个候选环路滤波网络模型进行模型选择,因此在确定当前块是否允许使用预设选择网络模型进行模型选择之后,如果当前块允许使用预设选择网络模型进行模型选择,那么在选择出当前块使用的环路滤波网络模型后,还需要进一步确定当前块是否使用环路滤波网络模型进行滤波处理。这样,如果当前块确定使用环路滤波网络模型进行滤波处理,那么此时才可以使用该环路滤波网络模型进行滤波处理。
在本申请实施例中,针对视频序列、当前帧、当前块等是否使用环路滤波网络模型进行滤波处理,可以根据亮度分量和色度分量逐一设置语法元素标识信息,然后通过解析码流来确定;也可以仅针对当前块和/或当前帧设置语法元素标识信息,然后通过解析码流来确定。在本申请实施例中,对于视频序列、当前帧、当前块,可以逐一设置语法元素标识信息(如第二语法元素标识信息、第一亮度语法元素标识信息、第二亮度语法元素标识信息和色度语法元素标识信息等),但是这里并不作具体限定。
在一种可能的实施方式中,对于当前块是否使用环路滤波网络模型进行滤波处理,首先可以设置一个序列头标识信息,比如可以设置一个第二语法元素标识信息,用于指示当前的视频序列是否使用环路滤波网络模型进行滤波处理。因此,在一些实施例中,该方法还可以包括:
若视频序列确定使用环路滤波网络模型进行滤波处理,则确定第二语法元素标识信息的取值为第一值;或者,
若视频序列确定不使用环路滤波网络模型进行滤波处理,则确定第二语法元素标识信息的取值为第二值。
进一步地,该方法还包括:对第二语法元素标识信息的取值进行编码,将编码比特写入码流。
需要说明的是,第一值和第二值不同。
以第二语法元素标识信息为一flag信息为例,这时候对于第一值和第二值而言,第一值可以设置为1,第二值可以设置为0;或者,第一值还可以设置为true,第二值还可以设置为false;或者,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。本申请实施例对此不作任何限定。
还需要说明的是,视频序列包括至少一个帧,这至少一个帧可以包括有当前帧。在这里,当视频序列确定使用环路滤波网络模型进行滤波处理时,那么本申请实施例还需要进一步判断视频序列内的当前帧是否使用环路滤波网络模型进行滤波处理,即还需要设置一个第三语法元素标识信息。对于第三语法元素标识信息而言,根据亮度分量和色度分量的不同,第三语法元素标识信息代表的含义不同。
在本申请实施例中,对于当前帧的亮度分量,这时候可以假定第三语法元素标识信息为第一亮度语法元素标识信息,用于指示当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理;对于当前帧的色度分量,这时候可以假定第三语法元素标识信息为色度语法元素标识信息,用于指示当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理。
这样,在确定视频序列使用环路滤波网络模型进行滤波处理之后,当当前帧的颜色分量类型为亮度分量时,确定环路滤波网络模型为亮度环路滤波网络模型。这时候,在一种可能的实施方式中,该方法还可以包括:
确定当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理的第一率失真代价值;以及确定当前帧的亮度分量未使用亮度环路滤波网络模型进行滤波处理的第二率失真代价值;
根据第一率失真代价值和第二率失真代价值,确定第一亮度语法元素标识信息的取值。
在一种具体的示例中,所述根据第一率失真代价值和第二率失真代价值,确定第一亮度语法元素标识信息的取值,可以包括:
若第一率失真代价值小于第二率失真代价值,则确定第一亮度语法元素标识信息的取值为第一值;和/或,若第一率失真代价值大于或等于第二率失真代价值,则确定第一亮度语法元素标识信息的取值为第二值。
进一步地,该方法还包括:对第一亮度语法元素标识信息的取值进行编码,将编码比特写入码流。
需要说明的是,对于亮度分量而言,如果第一亮度语法元素标识信息的取值为第一值,即确定当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理,那么还需要继续判断当前帧内的当前块的亮度分量是否使用亮度环路滤波网络模型进行滤波处理。因此,在一些实施例中,当第一率失真代价值小于第二率失真代价值时,该方法还可以包括:
确定当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理的第三率失真代价值;以及确定当前块的亮度分量未使用亮度环路滤波网络模型进行滤波处理的第四率失真代价值;
若第三率失真代价值小于第四率失真代价值,则确定第二亮度语法元素标识信息的取值为第一值;和/或,若第三率失真代价值大于或等于所述第四率失真代价值,则确定第二亮度语法元素标识信息的取值为第二值。
进一步地,该方法还包括:对第二亮度语法元素标识信息的取值进行编码,将编码比特写入码流。
也就是说,对于亮度分量,这里涉及到两种语法元素:帧级语法元素和CTU级语法元素。其中,帧级语法元素可以称为第一亮度语法元素标识信息,CTU级语法元素可以称为第二亮度语法元素标识信息。假定第一亮度语法元素标识信息和第二亮度语法元素标识信息为一flag信息,那么第一亮度语法元素标识信息可以用luma_frame_flag表示;第二亮度语法元素标识信息可以用luma_ctu_flag表示。在这里,无论是第一亮度语法元素标识信息的取值还是第二亮度语法元素标识信息的取值均可以利用率失真代价方式进行确定。
以第一亮度语法元素标识信息为例,对于当前帧,在一些实施例中,该方法还可以包括:对当前帧进行块划分,确定至少一个划分块;其中,至少一个划分块包括当前块;
相应地,所述确定当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理的第一率失真代价值,可以包括:
分别计算这至少一个划分块的亮度分量使用亮度环路滤波网络模型进行滤波处理的第三率失真代价值;
对计算得到的第三率失真代价值进行累加计算,得到第一率失真代价值;
所述计算当前帧的亮度分量未使用亮度环路滤波网络模型进行滤波处理的第二率失真代价值,可以包括:
分别计算这至少一个划分块的亮度分量未使用亮度环路滤波网络模型进行滤波处理的第四率失真 代价值;
对计算得到的第四率失真代价值进行累加计算,得到第二率失真代价值。
也就是说,可以通过计算每一个块的亮度分量使用亮度环路滤波网络模型进行滤波的第三率失真代价值,然后通过累计计算得到当前帧的第一率失真代价值。在一种具体的示例中,在率失真代价值的计算中,失真值可以是根据均方误差确定的。
在一种可能的实现方式中,每一个块的亮度分量使用亮度环路滤波网络模型进行滤波后,可以得到每一个块的亮度分量的重建图像块;然后计算重建图像块与原始图像块的均方误差值,可以得到每一个块的均方误差值;利用率失真代价公式RDcost=D+λ*R可以计算得到每一个块的第三率失真代价值;其中,D为每一个块的均方误差值,R为1,λ与自适应修正滤波器的λ保持一致。最后将每一个块的第三率失真代价值进行累加,可以得到当前帧的第一率失真代价值。
在另一种可能的实现方式中,每一个块的亮度分量使用亮度环路滤波网络模型进行滤波后,可以得到每一个块的亮度分量的重建图像块;然后计算重建图像块与原始图像块的均方误差值,可以得到每一个块的均方误差值,通过累计计算得到当前帧的均方误差值;再利用率失真代价公式RDcost=D+λ*R计算得到第一率失真代价值;其中,这里的D为当前帧的均方误差值,R为当前帧包括的块数量,λ与自适应修正滤波器的λ保持一致。
还需要说明的是,还可以通过计算每一个块的亮度分量未使用亮度环路滤波网络模型进行滤波的第四率失真代价值,然后通过累计计算得到当前帧的第二率失真代价值。这里,在率失真代价值的计算中,失真值也可以是根据均方误差确定的;这时候的均方误差是指未经过亮度环路滤波网络模型的输出重建图像块与原始图像块的均方误差值,其他计算操作与计算第一率失真代价值相同,这里不再详述。
这样,以第一值为1,第二值为0为例,在得到第一率失真代价值和第二率失真代价值之后,可以通过将第一率失真代价值与第二率失真代价值进行比较;如果第一率失真代价值大于或等于第二率失真代价值,那么可以确定出第一亮度语法元素标识信息的取值为0,意味着当前帧的亮度分量不需要使用亮度环路滤波网络模型进行滤波处理,此时可以从视频序列中获取下一帧,将下一帧确定为当前帧,继续进行第一率失真代价值和第二率失真代价值的计算。否则,如果第一率失真代价值小于第二率失真代价值,那么可以确定出第一亮度语法元素标识信息的取值为1,意味着当前帧的亮度分量需要使用亮度环路滤波网络模型进行滤波处理,此时需要继续判断当前帧内的当前块的亮度分量是否使用亮度环路滤波网络模型进行滤波;即将第三率失真代价值与第四率失真代价值进行比较;如果第三率失真代价值小于第四率失真代价值,那么可以确定出第二亮度语法元素标识信息的取值为1,意味着需要对当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理;否则,如果第三率失真代价值大于或等于第四率失真代价值,那么可以确定出第二亮度语法元素标识信息的取值为0,意味着不需要对当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理,此时可以从当前帧中获取下一个块,将下一个块确定为当前块,继续进行第三率失真代价值和第四率失真代价值的计算。
另外,对于亮度分量,本申请实施例还可以设置亮度帧级开关和亮度CTU级开关,通过控制其是否打开来确定是否使用亮度环路滤波网络模型进行滤波处理。
对于亮度帧级开关,在一些实施例中,该方法还可以包括:设置亮度帧级开关,该亮度帧级开关用于控制当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理;
相应地,该方法还可以包括:
若第一率失真代价值小于第二率失真代价值,则打开亮度帧级开关;或者,
若第一率失真代价值大于或等于第二率失真代价值,则关闭亮度帧级开关。
对于亮度CTU级开关,在一些实施例中,该方法还可以包括:设置亮度CTU级开关,该亮度CTU级开关用于控制当前块的亮度分量是否使用亮度环路滤波网络模型进行滤波处理;
相应地,该方法还可以包括:
若第三率失真代价值小于第四率失真代价值,则打开亮度CTU级开关;或者,
若第三率失真代价值大于或等于第四率失真代价值,则关闭亮度CTU级开关。
需要说明的是,无论是亮度帧级开关还是亮度CTU级开关,也可以根据率失真代价方式来确定是否打开。在这里,一种可能的实现方式中,可以根据计算得到的率失真代价值的大小判断确定。
在另一种可能的实现方式中,对于亮度帧级开关,仍然根据RDcost=D+λ*R来确定。这里,D表示当前帧经过亮度环路滤波网络模型处理后减少的失真值,D=D out-D rec(D out为亮度环路滤波网络模型处理后的失真,D rec为亮度环路滤波网络模型处理前的失真),R为当前帧包括的块数量,λ与自适应修正滤波器的λ保持一致。这时候,当RDcost为负值时,打开亮度帧级开关,即打开帧级亮度环路滤波网络模型;否则关闭亮度帧级开关,即关闭帧级亮度环路滤波网络模型。
当亮度帧级开关打开时,对于亮度CTU级开关,可以根据RDcost=D来确定。这里,D表示当前 块经过亮度环路滤波网络模型处理后减少的失真值,D=D out-D rec(D out为亮度环路滤波网络模型处理后的失真,D rec为亮度环路滤波网络模型处理前的失真)。
这样,对于S1103来说,在一些实施例中,所述利用环路滤波网络模型对当前块进行滤波处理,可以包括:若第三率失真代价值小于第四率失真代价值,则利用亮度环路滤波网络模型对当前块进行滤波处理。
也就是说,对于亮度分量,需要两种语法元素:帧级语法元素和CTU级语法元素。只有CTU级语法元素(即第二亮度语法元素标识信息)指示当前块使用亮度环路滤波网络模型进行滤波处理时,即第三率失真代价值小于第四率失真代价值,这时候才可以利用亮度环路滤波网络模型对当前块进行滤波处理,此时当前块才有可能允许使用预设选择网络模型进行模型选择,即需要执行确定第一语法元素标识信息的取值的步骤。
进一步地,在确定视频序列使用环路滤波网络模型进行滤波处理之后,当当前帧的颜色分量类型为色度分量时,确定环路滤波网络模型为色度环路滤波网络模型。这时候,在另一种可能的实施方式中,该方法还包括:
确定当前帧的色度分量使用色度环路滤波网络模型进行滤波处理的第五率失真代价值;以及确定当前帧的色度分量未使用色度环路滤波网络模型进行滤波处理的第六率失真代价值;
若第五率失真代价值小于第六率失真代价值,则确定色度语法元素标识信息的取值为第一值;和/或,若第五率失真代价值大于或等于第六率失真代价值,则确定色度语法元素标识信息的取值为第二值。
进一步地,该方法还包括:对色度语法元素标识信息的取值进行编码,将编码比特写入码流。
需要说明的是,对于色度分量,这里涉及到帧级语法元素。其中,帧级语法元素可以称为色度语法元素标识信息,假定色度语法元素标识信息为一flag信息,那么可以用chroma_frame_flag表示。
还需要说明的是,对于第一值和第二值而言,第一值可以设置为1,第二值可以设置为0;或者,第一值还可以设置为true,第二值还可以设置为false;或者,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。示例性地,一般情况下,第一值可以为1,第二值可以为0,但是并不作任何限定。
进一步地,由于考虑到编码性能和计算复杂度,如果色度语法元素标识信息指示当前帧的色度分量使用色度环路滤波网络模型进行滤波处理,那么当前帧所包括的块都默认使用色度环路滤波网络模型进行滤波处理;如果色度语法元素标识信息指示当前帧的色度分量不使用色度环路滤波网络模型进行滤波处理,那么当前帧所包括的块都默认不使用色度环路滤波网络模型进行滤波处理。因此,对于色度分量不再需要设置CTU级语法元素,同理也不需要设置CTU级开关。
换句话说,对于色度分量,本申请实施例可以只设置帧级开关。因此,在一些实施例中,该方法还可以包括:设置色度帧级开关,色度帧级开关用于控制当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理;
相应地,该方法还可以包括:
若第五率失真代价值小于第六率失真代价值,则打开色度帧级开关;或者,
若第五率失真代价值大于或等于第六率失真代价值,则关闭色度帧级开关。
需要说明的是,对于第五率失真代价值和第六率失真代价值,在一种具体的示例中,失真值也可以根据均方误差确定,其他计算操作与计算第一率失真代价值和第二率失真代价值相同,这里不再详述。另外,对于色度帧级开关,其是否打开则与确定亮度帧级开关是否打开的实现方式相同,这里不再详述。
这样,以第一值为1,第二值为0为例,在得到第五率失真代价值和第六率失真代价值之后,可以通过将第五率失真代价值与第六率失真代价值进行比较;如果第五率失真代价值小于第六率失真代价值,那么可以打开色度帧级开关,而且还可以确定出色度语法元素标识信息的取值为1,意味着当前帧的色度分量需要使用色度环路滤波网络模型进行滤波处理;在当前帧处理完成后,继续加载下一帧进行处理。否则,如果第五率失真代价值大于或等于第六率失真代价值,那么可以关闭色度帧级开关,而且还可以确定出色度语法元素标识信息的取值为0,意味着当前帧的色度分量不需要使用色度环路滤波网络模型进行滤波处理,此时可以从视频序列中获取下一帧,将下一帧确定为当前帧,继续加载下一帧进行处理,以确定出下一帧的语法元素标识信息的取值。
这样,对于S1103来说,在一些实施例中,所述利用亮度环路滤波网络模型对当前块进行滤波处理,可以包括:若第五率失真代价值小于第六率失真代价值,则利用色度环路滤波网络模型对所述当前块进行滤波处理。
也就是说,对于色度分量,可以仅需要一种语法元素:帧级语法元素。只有帧级语法元素(即色度语法元素标识信息)指示当前块使用色度环路滤波网络模型进行滤波处理时,即第三率失真代价值小于第四率失真代价值,这时候才可以利用该色度环路滤波网络模型对当前块进行滤波处理。
另外,本申请实施例所述的环路滤波网络模型可以为CNNLF模型。这样,如果当前块使用CNNLF模型进行滤波处理,那么可以利用所选取的CNNLF模型可以对当前块进行CNNLF滤波处理,以得到当前块的重建图像块。
简言之,CNNLF模型的使用可以包含离线训练和推理测试两个阶段。其中,在离线训练阶段,可以离线的训练了4个I帧亮度分量模型,4个非I帧亮度分量模型,4个色度U分量模型,4个色度V分量模型等共16种模型。具体地,使用预设图像数据集(例如DIV2K,该数据集有1000张高清图(2K分辨率),其中,800张作为训练,100张作为验证,100张作为测试),将图像从RGB转换成YUV4:2:0格式的单帧视频序列,作为标签数据。然后使用HPM在All Intra配置下对序列进行编码,关闭DBF,SAO和ALF等传统滤波器,量化步长设置为27到50。对于编码得到的重建序列,按照QP 27~31、32~37、38~44、45~50为范围划分为4个区间,切割为128×128的图像块作为训练数据,分别训练了4种I帧亮度分量模型,4种色度U分量模型,4种色度V分量模型。进一步地,使用预设视频数据集(例如BVI-DVC),使用HPM-ModAI在Random Access配置下编码,关闭DBF,SAO和ALF等传统滤波器,并打开I帧的CNNLF模型,收集编码重建的非I帧数据,分别训练了4种非I帧亮度分量模型。
在推理测试阶段,HPM-ModAI为亮度分量设置了帧级开关与CTU级开关以控制是否调用CNNLF模型,而为色度分量设置了帧级开关以控制是否调用CNNLF模型。在这里,开关通常可以用flag表示。另外,帧级开关由式(1)确定,其中,D=D net-D rec表示CNNLF处理后减少的失真(D net为滤波后的失真,D rec为滤波前的失真),R表示当前帧的CTU个数,λ与自适应修正滤波器的λ保持一致。当RDcost为负时,打开帧级开关,否则关闭帧级开关。
RDcost=D+λ*R            (1)
当帧级开关打开时,还需要进一步通过率失真代价方式决策每个CTU是否打开CTU级开关。在这里,设置了CTU级开关以控制是否调用CNNLF模型。具体地,CTU级开关由式(2)确定。
RDcost=D            (2)
也就是说,确定出当前块是否允许使用预设选择网络模型进行模型选择后,编码器可以通过率失真代价方式确定当前帧或者当前块是否使用CNNLF模型进行滤波处理,以便确定出当前块的重建图像块。
进一步地,对于输入重建图像块(包括输入重建亮度图像块或者输入重建色度图像块)来说,这里,输入重建图像块可以是经由去块滤波器和样值自适应补偿滤波器进行滤波处理后得到。
进一步地,在一些实施例中,该方法还可以包括:在确定出当前块的重建图像块之后,利用自适应修正滤波器继续对重建图像块进行滤波处理。
以图9为例,输入重建图像块是经由去块滤波器(DBF)和样值自适应补偿滤波器(SAO)进行滤波处理后得到的,然后经由模型自适应选择模块和CNNLF模型所得到的重建图像块还可以输入自适应修正滤波器(ALF)继续进行滤波处理。
除此之外,在本申请实施例中,针对前述实施例中的第一神经网络结构、第二神经网络结构和第三神经网络结构等,其包括的卷积层数量,全连接层数量,非线性激活函数等均可以进行调整。另外,模型自适应选择模块所针对的环路滤波网络模型,除了CNNLF模型之外,还可以是针对其他高效的神经网络滤波器模型进行模型自适应选择,这里不作任何限定。
简言之,本申请实施例提出了一种基于深度学习的模型自适应选择模块,用于对CNNLF模型进行自适应的选择,提升编码性能。模型自适应选择模块可以看作是由多层卷积神经网络和多层全连接神经网络组成的预设选择网络模型,其输入为CNNLF模型的输入重建图像块,输出为各个CNNLF模型的概率分布情况。模型自适应选择模块位于编码器/解码器中的位置如图5所示,模型自适应选择模块的使用不依赖于DBF、SAO、ALF、CNNLF的开关,只是在位置上置于CNNLF之前。
在一种具体的示例中,本申请实施例的技术方案作用在编码器的环路滤波模块中,其具体流程如下:
编码端进入环路滤波模块时,按照预设的滤波器顺序进行处理。这里,预设的滤波器顺序为DBF滤波→SAO滤波→模型自适应选择模块→CNNLF滤波→ALF滤波。当进入模型自适应选择模块时,
(a)首先根据model_adaptive_selection_enable_flag判断当前块下是否允许使用模型自适应选择模块进行模型选择。如果model_adaptive_selection_enable_flag为“1”,那么对当前块尝试进行模型自适应选择模块处理,跳转至(b);如果model_adaptive_selection_enable_flag为“0”,那么跳转至(e);
(b)判断当前块的颜色分量类型,如果当前块为亮度分量块,那么跳转至(c);如果当前块为色度分量块,那么跳转至(d);
(c)对于亮度分量,将CNNLF模型的输入重建亮度图像块作为模型自适应选择模块的输入,输出为各个亮度CNNLF模型的概率分布情况。选择其中概率值最大的模型作为当前亮度图像块的CNNLF模型,并对当前亮度图像块进行CNNLF滤波处理,得到最终的重建图像块;
(d)对于色度分量,将CNNLF模型的输入重建色度图像块作为模型自适应选择模块的输入,输 出为各个色度CNNLF模型的概率分布情况。选择其中概率值最大的模型作为当前色度图像块的CNNLF模型,并对当前色度图像块进行CNNLF滤波处理,得到最终的重建图像块;
(e)如果当前帧已完成模型自适应选择模块的决策处理,那么加载下一帧进行处理,然后跳转至(a)。
在一种更具体的示例中,其语法元素的修改如表1所示。
本实施例提供了一种编码方法,应用于编码器。通过确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。这样,通过引入基于深度学习的模型自适应选择技术,利用预设选择网络模型对至少一个候选环路滤波网络模型进行模型选择,再根据所选中的环路滤波网络模型对当前块进行滤波处理,不仅可以提升编码性能,进而能够提高编解码效率;而且还可以使得最终输出的重建图像块更加接近于原始图像块,能够提升视频图像质量。
在本申请的再一实施例中,本申请实施例提供了一种码流,该码流是根据待编码信息进行比特编码生成的;其中,待编码信息包括下述至少之一:第一语法元素标识信息的取值、第二语法元素标识信息的取值、第一亮度语法元素标识信息的取值、第二亮度语法元素标识信息的取值和色度语法元素标识信息的取值。
在本申请实施例中,视频序列包括当前帧,当前帧包括当前块。其中,第一语法元素标识信息用于指示当前块是否允许使用预设选择网络模型进行模型选择,第二语法元素标识信息指示视频序列是否使用环路滤波网络模型进行滤波处理,第一亮度语法元素标识信息用于指示当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理,第二亮度语法元素标识信息用于指示当前块的亮度分量是否使用亮度环路滤波网络模型进行滤波处理,色度语法元素标识信息用于指示当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理。
在本申请的再一实施例中,基于前述实施例相同的发明构思,参见图12,其示出了本申请实施例提供的一种编码器120的组成结构示意图。如图12所示,该编码器120可以包括:第一确定单元1201、第一选择单元1202和第一滤波单元1203;其中,
第一确定单元1201,配置为确定第一语法元素标识信息的取值;
第一选择单元1202,配置为当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;
第一滤波单元1203,配置为利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
在一些实施例中,第一选择单元1202,还配置为根据预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值;以及根据至少一个候选环路滤波网络模型各自对应的输出值,确定当前块使用的环路滤波网络模型。
在一些实施例中,第一确定单元1201,还配置为确定环路滤波网络模型的输入重建图像块;以及将输入重建图像块输入预设选择网络模型,得到至少一个候选环路滤波网络模型各自对应的输出值。
在一些实施例中,第一确定单元1201,还配置为从至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选环路滤波网络模型作为当前块使用的环路滤波网络模型。
在一些实施例中,第一确定单元1201,还配置为从至少一个候选环路滤波网络模型各自对应的输出值中选取最大值,将最大值作为目标值。
在一些实施例中,参见图12,编码器120还可以包括编码单元1204;
第一确定单元1201,还配置为确定环路滤波网络模型对应的环路滤波网络模型索引序号;
编码单元1204,配置为对环路滤波网络模型索引序号进行编码,将编码比特写入码流。
在一些实施例中,第一确定单元1201,还配置为若当前块允许使用预设选择网络模型进行模型选择,则确定第一语法元素标识信息的取值为第一值;或者,若当前块不允许使用预设选择网络模型进行模型选择,则确定第一语法元素标识信息的取值为第二值。
在一些实施例中,编码单元1204,还配置为对第一语法元素标识信息的取值进行编码,将编码比特写入码流。
在一些实施例中,视频序列包括当前帧,当前帧包括当前块;相应地,第一确定单元1201,还配置为若视频序列确定使用环路滤波网络模型进行滤波处理,则确定第二语法元素标识信息的取值为第一 值;或者,若视频序列确定不使用环路滤波网络模型进行滤波处理,则确定第二语法元素标识信息的取值为第二值。
在一些实施例中,编码单元1204,还配置为对第二语法元素标识信息的取值进行编码,将编码比特写入码流。
在一些实施例中,第一确定单元1201,还配置为确定当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理的第一率失真代价值;以及确定当前帧的亮度分量未使用亮度环路滤波网络模型进行滤波处理的第二率失真代价值;以及若第一率失真代价值小于第二率失真代价值,则确定第一亮度语法元素标识信息的取值为第一值;和/或,若第一率失真代价值大于或等于第二率失真代价值,则确定第一亮度语法元素标识信息的取值为第二值。
在一些实施例中,编码单元1204,还配置为对第一亮度语法元素标识信息的取值进行编码,将编码比特写入码流。
在一些实施例中,第一确定单元1201,还配置为当第一率失真代价值小于第二率失真代价值时,确定当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理的第三率失真代价值;以及确定当前块的亮度分量未使用亮度环路滤波网络模型进行滤波处理的第四率失真代价值;以及若第三率失真代价值小于第四率失真代价值,则确定第二亮度语法元素标识信息的取值为第一值;和/或,若第三率失真代价值大于或等于第四率失真代价值,则确定第二亮度语法元素标识信息的取值为第二值。
在一些实施例中,编码单元1204,还配置为对第二亮度语法元素标识信息的取值进行编码,将编码比特写入码流。
在一些实施例中,第一确定单元1201,还配置为若第三率失真代价值小于第四率失真代价值,则利用所述亮度环路滤波网络模型对所述当前块进行滤波处理。
在一些实施例中,第一确定单元1201,还配置为当当前帧的颜色分量类型为色度分量时,确定当前帧的色度分量使用色度环路滤波网络模型进行滤波处理的第五率失真代价值;以及确定当前帧的色度分量未使用色度环路滤波网络模型进行滤波处理的第六率失真代价值;以及若第五率失真代价值小于第六率失真代价值,则确定色度语法元素标识信息的取值为第一值;和/或,若第五率失真代价值大于或等于第六率失真代价值,则确定色度语法元素标识信息的取值为第二值。
在一些实施例中,编码单元1204,还配置为对色度语法元素标识信息的取值进行编码,将编码比特写入码流。
在一些实施例中,第一确定单元1201,还配置为若第五率失真代价值小于第六率失真代价值,则利用所述亮度环路滤波网络模型对所述当前块进行滤波处理。
在一些实施例中,第一确定单元1201,还配置为当当前块的颜色分量类型为亮度分量时,确定当前块的亮度选择网络模型;以及当当前块的颜色分量类型为色度分量时,确定当前块的色度选择网络模型;
相应地,第一确定单元1201,还配置为当当前块的颜色分量类型为亮度分量时,根据亮度选择网络模型确定至少一个候选亮度环路滤波网络模型各自对应的输出值;以及当当前块的颜色分量类型为色度分量时,根据色度选择网络模型确定至少一个候选色度环路滤波网络模型各自对应的输出值。
在一些实施例中,参见图12,编码器120还可以包括第一训练单元1205;
第一确定单元1201,还配置为确定第一训练集,其中,第一训练集包括至少一个训练样本,且训练样本是根据至少一种量化参数得到的;
第一训练单元1205,配置为利用第一训练集中训练样本的亮度分量对第一神经网络结构进行训练,得到至少一个候选亮度环路滤波网络模型。
在一些实施例中,第一神经网络结构包括第一卷积模块、第一残差模块、第二卷积模块和第一连接模块,所述第一卷积模块、所述第一残差模块、所述第二卷积模块和所述第一连接模块顺次连接,且所述第一连接模块还与所述第一卷积模块的输入连接。
在一些实施例中,所述第一卷积模块由一层卷积层和一层激活层组成,所述第二卷积模块由两层卷积层和一层激活层组成,所述连接模块由跳转连接层组成,所述第一残差模块包括若干个残差块,且所述残差块由两层卷积层和一层激活层组成。
在一些实施例中,第一确定单元1201,还配置为确定第一训练集,其中,第一训练集包括至少一个训练样本,且训练样本是根据至少一种量化参数得到的;
第一训练单元1205,还配置为利用第一训练集中训练样本的亮度分量和色度分量对第二神经网络结构进行训练,得到至少一个候选色度环路滤波网络模型。
在一些实施例中,第二神经网络结构包括上采样模块、第三卷积模块、第四卷积模块、融合模块、第二残差模块、第五卷积模块和第二连接模块,所述上采样模块和所述第三卷积模块连接,所述第三卷 积模块和所述第四卷积模块与所述融合模块连接,所述融合模块、所述第二残差模块、所述第五卷积模块和所述第二连接模块顺次连接,且所述第二连接模块还与所述上采样模块的输入连接。
在一些实施例中,第三卷积模块由一层卷积层和一层激活层组成,第四卷积模块由一层卷积层和一层激活层组成,第五卷积模块由两层卷积层、一层激活层和一层池化层组成,连接模块由跳转连接层组成,第二残差模块包括若干个残差块,且残差块由两层卷积层和一层激活层组成。
在一些实施例中,第一选择单元1202,还配置为在当前块的颜色分量类型为亮度分量的情况下,确定至少一个候选亮度选择网络模型;以及确定当前块的量化参数,从至少一个候选亮度选择网络模型中选取量化参数对应的候选亮度选择网络模型;
第一确定单元1201,还配置为将选取的候选亮度选择网络模型确定为当前块的亮度选择网络模型。
在一些实施例中,第一选择单元1202,还配置为在当前块的颜色分量类型为色度分量的情况下,确定至少一个候选色度选择网络模型;以及确定当前块的量化参数,从至少一个候选色度选择网络模型中选取量化参数对应的候选色度选择网络模型;
第一确定单元1201,还配置为将选取的候选色度选择网络模型确定为当前块的色度选择网络模型。
在一些实施例中,第一确定单元1201,还配置为确定第二训练集,其中,第二训练集包括至少一个训练样本,且训练样本是根据至少一种量化参数得到的;
第一训练单元1205,还配置为利用第二训练集中训练样本的亮度分量对第三神经网络结构进行训练,得到至少一个候选亮度选择网络模型;以及利用第二训练集中训练样本的色度分量对第三神经网络结构进行训练,得到至少一个候选色度选择网络模型;其中,至少一个候选亮度选择网络模型与亮度分量和量化参数之间具有对应关系,至少一个候选色度选择网络模型与色度分量和量化参数之间具有对应关系。
在一些实施例中,所述第三神经网络结构包括第六卷积模块和全连接模块,所述第六卷积模块和所述全连接模块顺次连接;其中,所述第六卷积模块包括若干个卷积子模块,所述卷积子模块由一层卷积层和一层池化层组成;所述全连接模块包括若干个全连接子模块,所述全连接子模块由一层全连接层和一层激活层组成。
在一些实施例中,第一确定单元1201,还配置为当当前块的颜色分量类型为亮度分量时,确定亮度环路滤波网络模型的输入重建亮度图像块;以及将输入重建亮度图像块输入亮度选择网络模型,得到至少一个候选亮度环路滤波网络模型各自对应的输出值;以及从至少一个候选亮度环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选亮度环路滤波网络模型作为当前块使用的亮度环路滤波网络模型;或者,当当前块的颜色分量类型为色度分量时,确定色度环路滤波网络模型的输入重建色度图像块;以及将输入重建色度图像块输入色度选择网络模型,得到至少一个候选色度环路滤波网络模型各自对应的输出值;以及从至少一个候选色度环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选色度环路滤波网络模型作为当前块使用的色度环路滤波网络模型。
在一些实施例中,环路滤波网络模型为CNNLF模型。
在一些实施例中,输入重建图像块是经由去块滤波器和样值自适应补偿滤波器进行滤波处理后得到。
在一些实施例中,第一滤波单元1203,还配置为在确定出重建图像块之后,利用自适应修正滤波器对重建图像块进行滤波处理。
可以理解地,在本申请实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。其中,所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
因此,本申请实施例提供了一种计算机存储介质,应用于编码器120,该计算机存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现前述实施例中任一项所述的方法。
基于上述编码器120的组成以及计算机存储介质,参见图13,其示出了本申请实施例提供的编码器120的具体硬件结构示意图。如图13所示,可以包括:第一通信接口1301、第一存储器1302和第一处理器1303;各个组件通过第一总线系统1304耦合在一起。可理解,第一总线系统1304用于实现 这些组件之间的连接通信。第一总线系统1304除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图13中将各种总线都标为第一总线系统1304。其中,
第一通信接口1301,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第一存储器1302,用于存储能够在第一处理器1303上运行的计算机程序;
第一处理器1303,用于在运行所述计算机程序时,执行:
确定第一语法元素标识信息的取值;
当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;
利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
可以理解,本申请实施例中的第一存储器1302可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的第一存储器1302旨在包括但不限于这些和任意其它适合类型的存储器。
而第一处理器1303可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第一处理器1303中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第一处理器1303可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第一存储器1302,第一处理器1303读取第一存储器1302中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,第一处理器1303还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
本实施例提供了一种编码器,该编码器可以包括第一确定单元、第一选择单元和第一滤波单元。这样,通过引入基于深度学习的模型自适应选择技术,不仅可以提升编码性能,进而提高编解码效率;而且还可以使得最终输出的重建图像块更加接近于原始图像块,能够提升视频图像质量。
在本申请的再一实施例中,基于前述实施例相同的发明构思,参见图14,其示出了本申请实施例提供的一种解码器140的组成结构示意图。如图14所示,该解码器140可以包括:解析单元1401、第二选择单元1402和第二滤波单元1403;其中,
解析单元1401,配置为解析码流,确定第一语法元素标识信息的取值;
第二选择单元1402,配置为当第一语法元素标识信息指示当前块使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;
第二滤波单元1403,配置为利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
在一些实施例中,第二选择单元1402,还配置为根据所述预设选择网络模型,确定至少一个候选 环路滤波网络模型各自对应的输出值;以及根据所述至少一个候选环路滤波网络模型各自对应的输出值,确定所述当前块使用的环路滤波网络模型。
在一些实施例中,参见图14,解码器140还可以包括第二确定单元1404,配置为确定环路滤波网络模型的输入重建图像块;
第二选择单元1402,还配置为将输入重建图像块输入预设选择网络模型,得到至少一个候选环路滤波网络模型各自对应的输出值。
在一些实施例中,第二确定单元1404,还配置为从至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选环路滤波网络模型作为当前块使用的环路滤波网络模型。
在一些实施例中,第二确定单元1404,还配置为从所述至少一个候选环路滤波网络模型各自对应的输出值中选取最大值,将所述最大值作为所述目标值。
在一些实施例中,第二确定单元1404,还配置为若第一语法元素标识信息的取值为第一值,则确定第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择;或者,若第一语法元素标识信息的取值为第二值,则确定第一语法元素标识信息指示当前块不允许使用预设选择网络模型进行模型选择。
在一些实施例中,解析单元1401,还配置为解析码流,确定第二语法元素标识信息的取值;以及当第二语法元素标识信息指示视频序列使用环路滤波网络模型进行滤波处理时,解析码流,确定第三语法元素标识信息的取值;其中,第三语法元素标识信息用于指示视频序列内的当前帧是否使用环路滤波网络模型进行滤波处理,当前帧包括当前块。
在一些实施例中,第二确定单元1404,还配置为若第二语法元素标识信息的取值为第一值,则确定第二语法元素标识信息指示视频序列使用环路滤波网络模型进行滤波处理;或者,若第二语法元素标识信息的取值为第二值,则确定第二语法元素标识信息指示视频序列不使用环路滤波网络模型进行滤波处理。
在一些实施例中,解析单元1401,还配置为解析码流,获取当前帧的亮度分量对应的第一亮度语法元素标识信息,第一亮度语法元素标识信息用于指示当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理;或者,解析码流,获取当前帧的色度分量对应的色度语法元素标识信息,色度语法元素标识信息用于指示当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理。
在一些实施例中,解析单元1401,还配置为在所述当前帧的亮度分量的情况下,当第一亮度语法元素标识信息指示当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理时,解析码流,确定第二亮度语法元素标识信息的取值;以及当第二亮度语法元素标识信息指示当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理时,执行解析码流,确定第一语法元素标识信息的取值的步骤。
在一些实施例中,第二确定单元1404,还配置为若第一亮度语法元素标识信息的取值为第一值,则确定第一亮度语法元素标识信息指示当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理;或者,若第一亮度语法元素标识信息的取值为第二值,则确定第一亮度语法元素标识信息指示当前帧的亮度分量不使用亮度环路滤波网络模型进行滤波处理。
在一些实施例中,第二确定单元1404,还配置为若第二亮度语法元素标识信息的取值为第一值,则确定第二亮度语法元素标识信息指示当前块的亮度分量使用亮度环路滤波网络模型进行滤波处理;或者,若第二亮度语法元素标识信息的取值为第二值,则确定第二亮度语法元素标识信息指示当前块的亮度分量不使用亮度环路滤波网络模型进行滤波处理。
在一些实施例中,解析单元1401,还配置为在所述当前帧的色度分量的情况下,当色度语法元素标识信息指示当前帧的色度分量使用色度环路滤波网络模型进行滤波处理时,执行解析码流,确定第一语法元素标识信息的取值的步骤。
在一些实施例中,第二确定单元1404,还配置为若色度语法元素标识信息的取值为第一值,则确定色度语法元素标识信息指示当前帧的色度分量使用色度环路滤波网络模型进行滤波处理;或者,若色度语法元素标识信息的取值为第二值,则确定色度语法元素标识信息指示当前帧的色度分量不使用色度环路滤波网络模型进行滤波处理。
在一些实施例中,第二确定单元1404,还配置为当当前块的颜色分量类型为亮度分量时,确定当前块的亮度选择网络模型;以及当当前块的颜色分量类型为色度分量时,确定当前块的色度选择网络模型;
相应地,第二确定单元1404,还配置为当当前块的颜色分量类型为亮度分量时,根据亮度选择网络模型确定至少一个候选亮度环路滤波网络模型各自对应的输出值;以及当当前块的颜色分量类型为色度分量时,根据色度选择网络模型确定至少一个候选色度环路滤波网络模型各自对应的输出值。
在一些实施例中,至少一个候选亮度环路滤波网络模型是根据至少一个训练样本对第一神经网络结 构进行模型训练确定的,且至少一个候选亮度环路滤波网络模型与颜色分量类型和量化参数之间具有对应关系。
在一些实施例中,第一神经网络结构包括第一卷积模块、第一残差模块、第二卷积模块和第一连接模块,第一卷积模块、第一残差模块、第二卷积模块和第一连接模块顺次连接,且第一连接模块还与第一卷积模块的输入连接。
在一些实施例中,第一卷积模块由一层卷积层和一层激活层组成,第二卷积模块由两层卷积层和一层激活层组成,连接模块由跳转连接层组成,第一残差模块包括若干个残差块,且残差块由两层卷积层和一层激活层组成。
在一些实施例中,至少一个候选色度环路滤波网络模型是根据至少一个训练样本对第二神经网络结构进行模型训练确定的,且至少一个候选色度环路滤波网络模型与颜色分量类型和量化参数之间具有对应关系。
在一些实施例中,第二神经网络结构包括上采样模块、第三卷积模块、第四卷积模块、融合模块、第二残差模块、第五卷积模块和第二连接模块,上采样模块和第三卷积模块连接,第三卷积模块和第四卷积模块与融合模块连接,融合模块、第二残差模块、第五卷积模块和第二连接模块顺次连接,且第二连接模块还与上采样模块的输入连接。
在一些实施例中,第三卷积模块由一层卷积层和一层激活层组成,第四卷积模块由一层卷积层和一层激活层组成,第五卷积模块由两层卷积层、一层激活层和一层池化层组成,连接模块由跳转连接层组成,第二残差模块包括若干个残差块,且残差块由两层卷积层和一层激活层组成。
在一些实施例中,第二选择单元1402,还配置为在当前块的颜色分量类型为亮度分量的情况下,确定至少一个候选亮度选择网络模型;以及确定当前块的量化参数,从至少一个候选亮度选择网络模型中选取量化参数对应的候选亮度选择网络模型;
第二确定单元1404,还配置为将选取的候选亮度选择网络模型确定为当前块的亮度选择网络模型。
在一些实施例中,第二选择单元1402,还配置为在当前块的颜色分量类型为色度分量的情况下,确定至少一个候选色度选择网络模型;以及确定当前块的量化参数,从至少一个候选色度选择网络模型中选取量化参数对应的候选色度选择网络模型;
第二确定单元1404,还配置为将选取的候选色度选择网络模型确定为当前块的色度选择网络模型。
在一些实施例中,至少一个候选亮度选择网络模型和至少一个候选色度选择网络模型分别是根据至少一个训练样本对第三神经网络结构进行模型训练确定的,且至少一个候选亮度选择网络模型和至少一个候选色度选择网络模型均与颜色分量类型和量化参数之间具有对应关系。
在一些实施例中,第三神经网络结构包括第六卷积模块和全连接模块,第六卷积模块和全连接模块顺次连接;其中,第六卷积模块包括若干个卷积子模块,卷积子模块由一层卷积层和一层池化层组成;全连接模块包括若干个全连接子模块,全连接子模块由一层全连接层和一层激活层组成。
在一些实施例中,第二确定单元1404,还配置为当当前块的颜色分量类型为亮度分量时,确定亮度环路滤波网络模型的输入重建亮度图像块;以及将输入重建亮度图像块输入亮度选择网络模型,得到至少一个候选亮度环路滤波网络模型各自对应的输出值;以及从至少一个候选亮度环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选亮度环路滤波网络模型作为当前块使用的亮度环路滤波网络模型;或者,当当前块的颜色分量类型为色度分量时,确定色度环路滤波网络模型的输入重建色度图像块;以及将输入重建色度图像块输入色度选择网络模型,得到至少一个候选色度环路滤波网络模型各自对应的输出值;以及从至少一个候选色度环路滤波网络模型各自对应的输出值中确定目标值,将目标值对应的候选色度环路滤波网络模型作为当前块使用的色度环路滤波网络模型。
在一些实施例中,解析单元1401,配置为当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,解析码流,确定环路滤波网络模型索引序号;
第二确定单元1404,还配置为根据环路滤波网络模型索引序号,从至少一个候选环路滤波网络模型中确定当前块使用的环路滤波网络模型;
第二滤波单元1403,还配置为利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
在一些实施例中,环路滤波网络模型为CNNLF模型。
在一些实施例中,输入重建图像块是经由去块滤波器和样值自适应补偿滤波器进行滤波处理后得到。
在一些实施例中,第二滤波单元1403,还配置为在确定出重建图像块之后,利用自适应修正滤波器对重建图像块进行滤波处理。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也 可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。其中,所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本实施例提供了一种计算机存储介质,应用于解码器140,该计算机存储介质存储有计算机程序,所述计算机程序被第二处理器执行时实现前述实施例中任一项所述的方法。
基于上述解码器140的组成以及计算机存储介质,参见图15,其示出了本申请实施例提供的解码器140的具体硬件结构示意图。如图15所示,可以包括:第二通信接口1501、第二存储器1502和第二处理器1503;各个组件通过第二总线系统1504耦合在一起。可理解,第二总线系统1504用于实现这些组件之间的连接通信。第二总线系统1504除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图15中将各种总线都标为第二总线系统1504。其中,
第二通信接口1501,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第二存储器1502,用于存储能够在第二处理器1503上运行的计算机程序;
第二处理器1503,用于在运行所述计算机程序时,执行:
解析码流,确定第一语法元素标识信息的取值;
当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;
利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。
可选地,作为另一个实施例,第二处理器1503还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
可以理解,第二存储器1502与第一存储器1302的硬件功能类似,第二处理器1503与第一处理器1303的硬件功能类似;这里不再详述。
本实施例提供了一种解码器,该解码器可以包括解析单元、第二选择单元和第二滤波单元。这样,通过引入基于深度学习的模型自适应选择技术,不仅可以提升编码性能,进而能够提高编解码效率;而且还可以使得最终输出的重建图像块更加接近于原始图像块,能够提升视频图像质量。
需要说明的是,在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
工业实用性
本申请实施例中,在编码器侧,确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。在解码器侧,解析码流,确定第一语法元素标识信息的取值;当第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定当前块的预设选择网络模型,并根据预设选择网络模型,确定当前块使用的环路滤波网络模型;利用环路滤波网络模型对当前块进行滤波处理,得到当前块的重建图像块。这样,通过引入基于深度学习的模型自适应选择技术,利用预设选择网络模型对至少一个候选环路滤波网络模型进行模型选择,再根据所选中的环路滤波网络模型对当前块进行滤波处理,不仅可以提升编码性能,进而能够提高编解码效率;而且还可以使得最终输出的重建图像块更加接近于原始图像块,能够提升视频图像质量。

Claims (69)

  1. 一种解码方法,应用于解码器,所述方法包括:
    解析码流,确定第一语法元素标识信息的取值;
    当所述第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定所述当前块的预设选择网络模型,并根据所述预设选择网络模型,确定所述当前块使用的环路滤波网络模型;
    利用所述环路滤波网络模型对所述当前块进行滤波处理,得到所述当前块的重建图像块。
  2. 根据权利要求1所述的方法,其中,所述根据所述预设选择网络模型,确定所述当前块使用的环路滤波网络模型,包括:
    根据所述预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值;
    根据所述至少一个候选环路滤波网络模型各自对应的输出值,确定所述当前块使用的环路滤波网络模型。
  3. 根据权利要求2所述的方法,其中,所述根据所述预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值,包括:
    确定所述环路滤波网络模型的输入重建图像块;
    将所述输入重建图像块输入所述预设选择网络模型,得到所述至少一个候选环路滤波网络模型各自对应的输出值。
  4. 根据权利要求2所述的方法,其中,所述根据所述至少一个候选环路滤波网络模型各自对应的输出值,确定所述当前块使用的环路滤波网络模型,包括:
    从所述至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,将所述目标值对应的候选环路滤波网络模型作为所述当前块使用的环路滤波网络模型。
  5. 根据权利要求4所述的方法,其中,所述从所述至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,包括:
    从所述至少一个候选环路滤波网络模型各自对应的输出值中选取最大值,将所述最大值作为所述目标值。
  6. 根据权利要求1所述的方法,其中,所述方法还包括:
    若所述第一语法元素标识信息的取值为第一值,则确定所述第一语法元素标识信息指示所述当前块允许使用所述预设选择网络模型进行模型选择;或者,
    若所述第一语法元素标识信息的取值为第二值,则确定所述第一语法元素标识信息指示所述当前块不允许使用所述预设选择网络模型进行模型选择。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    解析所述码流,确定第二语法元素标识信息的取值;
    当所述第二语法元素标识信息指示视频序列使用所述环路滤波网络模型进行滤波处理时,解析所述码流,确定第三语法元素标识信息的取值;其中,所述第三语法元素标识信息用于指示所述视频序列内的当前帧是否使用环路滤波网络模型进行滤波处理,所述当前帧包括所述当前块。
  8. 根据权利要求7所述的方法,其中,所述方法还包括:
    若所述第二语法元素标识信息的取值为第一值,则确定所述第二语法元素标识信息指示所述视频序列使用所述环路滤波网络模型进行滤波处理;或者,
    若所述第二语法元素标识信息的取值为第二值,则确定所述第二语法元素标识信息指示所述视频序列不使用所述环路滤波网络模型进行滤波处理。
  9. 根据权利要求8所述的方法,其中,所述解析所述码流,确定第三语法元素标识信息的取值,包括:
    解析所述码流,获取所述当前帧的亮度分量对应的第一亮度语法元素标识信息,所述第一亮度语法元素标识信息用于指示所述当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理;或者,
    解析所述码流,获取所述当前帧的色度分量对应的色度语法元素标识信息,所述色度语法元素标识信息用于指示所述当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理。
  10. 根据权利要求9所述的方法,其中,在所述当前帧的亮度分量的情况下,所述解析码流,确定第一语法元素标识信息的取值,包括:
    当所述第一亮度语法元素标识信息指示所述当前帧的亮度分量使用所述亮度环路滤波网络模型进行滤波处理时,解析所述码流,确定第二亮度语法元素标识信息的取值;
    当所述第二亮度语法元素标识信息指示所述当前块的亮度分量使用所述亮度环路滤波网络模型进 行滤波处理时,执行所述解析码流,确定第一语法元素标识信息的取值的步骤。
  11. 根据权利要求10所述的方法,其中,所述方法还包括:
    若所述第一亮度语法元素标识信息的取值为第一值,则确定所述第一亮度语法元素标识信息指示所述当前帧的亮度分量使用所述亮度环路滤波网络模型进行滤波处理;或者,
    若所述第一亮度语法元素标识信息的取值为第二值,则确定所述第一亮度语法元素标识信息指示所述当前帧的亮度分量不使用所述亮度环路滤波网络模型进行滤波处理。
  12. 根据权利要求10所述的方法,其中,所述方法还包括:
    若所述第二亮度语法元素标识信息的取值为第一值,则确定所述第二亮度语法元素标识信息指示所述当前块的亮度分量使用所述亮度环路滤波网络模型进行滤波处理;或者,
    若所述第二亮度语法元素标识信息的取值为第二值,则确定所述第二亮度语法元素标识信息指示所述当前块的亮度分量不使用所述亮度环路滤波网络模型进行滤波处理。
  13. 根据权利要求9所述的方法,其中,在所述当前帧的色度分量的情况下,所述解析码流,确定第一语法元素标识信息的取值,包括:
    当所述色度语法元素标识信息指示所述当前帧的色度分量使用所述色度环路滤波网络模型进行滤波处理时,执行所述解析码流,确定第一语法元素标识信息的取值的步骤。
  14. 根据权利要求13所述的方法,其中,所述方法还包括:
    若所述色度语法元素标识信息的取值为第一值,则确定所述色度语法元素标识信息指示所述当前帧的色度分量使用所述色度环路滤波网络模型进行滤波处理;或者,
    若所述色度语法元素标识信息的取值为第二值,则确定所述色度语法元素标识信息指示所述当前帧的色度分量不使用所述色度环路滤波网络模型进行滤波处理。
  15. 根据权利要求2所述的方法,其中,所述确定所述当前块的预设选择网络模型,包括:
    当所述当前块的颜色分量类型为亮度分量时,确定所述当前块的亮度选择网络模型;
    当所述当前块的颜色分量类型为色度分量时,确定所述当前块的色度选择网络模型;
    所述根据所述预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值,包括:
    当所述当前块的颜色分量类型为亮度分量时,根据所述亮度选择网络模型确定至少一个候选亮度环路滤波网络模型各自对应的输出值;
    当所述当前块的颜色分量类型为色度分量时,根据所述色度选择网络模型确定至少一个候选色度环路滤波网络模型各自对应的输出值。
  16. 根据权利要求15所述的方法,其中,所述至少一个候选亮度环路滤波网络模型是根据至少一个训练样本对第一神经网络结构进行模型训练确定的,且所述至少一个候选亮度环路滤波网络模型与颜色分量类型和量化参数之间具有对应关系。
  17. 根据权利要求16所述的方法,其中,所述第一神经网络结构包括第一卷积模块、第一残差模块、第二卷积模块和第一连接模块,所述第一卷积模块、所述第一残差模块、所述第二卷积模块和所述第一连接模块顺次连接,且所述第一连接模块还与所述第一卷积模块的输入连接。
  18. 根据权利要求17所述的方法,其中,所述第一卷积模块由一层卷积层和一层激活层组成,所述第二卷积模块由两层卷积层和一层激活层组成,所述连接模块由跳转连接层组成,所述第一残差模块包括若干个残差块,且所述残差块由两层卷积层和一层激活层组成。
  19. 根据权利要求15所述的方法,其中,所述至少一个候选色度环路滤波网络模型是根据至少一个训练样本对第二神经网络结构进行模型训练确定的,且所述至少一个候选色度环路滤波网络模型与颜色分量类型和量化参数之间具有对应关系。
  20. 根据权利要求19所述的方法,其中,所述第二神经网络结构包括上采样模块、第三卷积模块、第四卷积模块、融合模块、第二残差模块、第五卷积模块和第二连接模块,所述上采样模块和所述第三卷积模块连接,所述第三卷积模块和所述第四卷积模块与所述融合模块连接,所述融合模块、所述第二残差模块、所述第五卷积模块和所述第二连接模块顺次连接,且所述第二连接模块还与所述上采样模块的输入连接。
  21. 根据权利要求20所述的方法,其中,所述第三卷积模块由一层卷积层和一层激活层组成,所述第四卷积模块由一层卷积层和一层激活层组成,所述第五卷积模块由两层卷积层、一层激活层和一层池化层组成,所述连接模块由跳转连接层组成,所述第二残差模块包括若干个残差块,且所述残差块由两层卷积层和一层激活层组成。
  22. 根据权利要求15所述的方法,其中,所述确定所述当前块的亮度选择网络模型,包括:
    在所述当前块的颜色分量类型为亮度分量的情况下,确定至少一个候选亮度选择网络模型;
    确定所述当前块的量化参数,从所述至少一个候选亮度选择网络模型中选取所述量化参数对应的候 选亮度选择网络模型;
    将所选取的候选亮度选择网络模型确定为所述当前块的亮度选择网络模型。
  23. 根据权利要求15所述的方法,其中,所述确定所述当前块的色度选择网络模型,包括:
    在所述当前块的颜色分量类型为色度分量的情况下,确定至少一个候选色度选择网络模型;
    确定所述当前块的量化参数,从所述至少一个候选色度选择网络模型中选取所述量化参数对应的候选色度选择网络模型;
    将所选取的候选色度选择网络模型确定为所述当前块的色度选择网络模型。
  24. 根据权利要求22或23所述的方法,其中,所述至少一个候选亮度选择网络模型和所述至少一个候选色度选择网络模型分别是根据至少一个训练样本对第三神经网络结构进行模型训练确定的,且所述至少一个候选亮度选择网络模型和所述至少一个候选色度选择网络模型均与颜色分量类型和量化参数之间具有对应关系。
  25. 根据权利要求24所述的方法,其中,所述第三神经网络结构包括第六卷积模块和全连接模块,所述第六卷积模块和所述全连接模块顺次连接;
    其中,所述第六卷积模块包括若干个卷积子模块,所述卷积子模块由一层卷积层和一层池化层组成;所述全连接模块包括若干个全连接子模块,所述全连接子模块由一层全连接层和一层激活层组成。
  26. 根据权利要求15所述的方法,其中,所述根据所述预设选择网络模型,确定所述当前块使用的环路滤波网络模型,包括:
    当所述当前块的颜色分量类型为亮度分量时,确定所述亮度环路滤波网络模型的输入重建亮度图像块;
    将所述输入重建亮度图像块输入所述亮度选择网络模型,得到所述至少一个候选亮度环路滤波网络模型各自对应的输出值;
    从所述至少一个候选亮度环路滤波网络模型各自对应的输出值中确定目标值,将所述目标值对应的候选亮度环路滤波网络模型作为所述当前块使用的亮度环路滤波网络模型;或者,
    当所述当前块的颜色分量类型为色度分量时,确定所述色度环路滤波网络模型的输入重建色度图像块;
    将所述输入重建色度图像块输入所述色度选择网络模型,得到所述至少一个候选色度环路滤波网络模型各自对应的输出值;
    从所述至少一个候选色度环路滤波网络模型各自对应的输出值中确定目标值,将所述目标值对应的候选色度环路滤波网络模型作为所述当前块使用的色度环路滤波网络模型。
  27. 根据权利要求1所述的方法,其中,所述方法还包括:
    当所述第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,解析所述码流,确定环路滤波网络模型索引序号;
    根据所述环路滤波网络模型索引序号,从至少一个候选环路滤波网络模型中确定所述当前块使用的环路滤波网络模型;
    利用所述环路滤波网络模型对所述当前块进行滤波处理,得到所述当前块的重建图像块。
  28. 根据权利要求1所述的方法,其中,所述环路滤波网络模型为基于残差神经网络的环路滤波器(CNNLF)模型。
  29. 根据权利要求3所述的方法,其中,所述输入重建图像块是经由去块滤波器和样值自适应补偿滤波器进行滤波处理后得到。
  30. 根据权利要求1至29任一项所述的方法,其中,所述方法还包括:
    在确定出所述重建图像块之后,利用自适应修正滤波器对所述重建图像块进行滤波处理。
  31. 一种编码方法,应用于编码器,所述方法包括:
    确定第一语法元素标识信息的取值;
    当所述第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定所述当前块的预设选择网络模型,并根据所述预设选择网络模型,确定所述当前块使用的环路滤波网络模型;
    利用所述环路滤波网络模型对所述当前块进行滤波处理,得到所述当前块的重建图像块。
  32. 根据权利要求31所述的方法,其中,所述根据所述预设选择网络模型,确定所述当前块使用的环路滤波网络模型,包括:
    根据所述预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值;
    根据所述至少一个候选环路滤波网络模型各自对应的输出值,确定所述当前块使用的环路滤波网络模型。
  33. 根据权利要求32所述的方法,其中,所述根据所述预设选择网络模型,确定至少一个候选环 路滤波网络模型各自对应的输出值,包括:
    确定所述环路滤波网络模型的输入重建图像块;
    将所述输入重建图像块输入所述预设选择网络模型,得到所述至少一个候选环路滤波网络模型各自对应的输出值。
  34. 根据权利要求32所述的方法,其中,所述根据所述至少一个候选环路滤波网络模型各自对应的输出值,确定所述当前块使用的环路滤波网络模型,包括:
    从所述至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,将所述目标值对应的候选环路滤波网络模型作为所述当前块使用的环路滤波网络模型。
  35. 根据权利要求34所述的方法,其中,所述从所述至少一个候选环路滤波网络模型各自对应的输出值中确定目标值,包括:
    从所述至少一个候选环路滤波网络模型各自对应的输出值中选取最大值,将所述最大值作为所述目标值。
  36. 根据权利要求31所述的方法,其中,在所述确定所述当前块使用的环路滤波网络模型之后,所述方法还包括:
    确定所述环路滤波网络模型对应的环路滤波网络模型索引序号;
    对所述环路滤波网络模型索引序号进行编码,将编码比特写入码流。
  37. 根据权利要求31所述的方法,其中,所述确定第一语法元素标识信息的取值,包括:
    若所述当前块允许使用预设选择网络模型进行模型选择,则确定所述第一语法元素标识信息的取值为第一值;或者,
    若所述当前块不允许使用预设选择网络模型进行模型选择,则确定所述第一语法元素标识信息的取值为第二值。
  38. 根据权利要求37所述的方法,其中,所述方法还包括:
    对所述第一语法元素标识信息的取值进行编码,将编码比特写入码流。
  39. 根据权利要求31所述的方法,其中,视频序列包括当前帧,所述当前帧包括所述当前块;所述方法还包括:
    若所述视频序列确定使用所述环路滤波网络模型进行滤波处理,则确定第二语法元素标识信息的取值为第一值;或者,
    若所述视频序列确定不使用所述环路滤波网络模型进行滤波处理,则确定第二语法元素标识信息的取值为第二值。
  40. 根据权利要求39所述的方法,其中,所述方法还包括:
    对所述第二语法元素标识信息的取值进行编码,将编码比特写入码流。
  41. 根据权利要求39所述的方法,其中,在确定所述视频序列使用所述环路滤波网络模型进行滤波处理之后,所述方法还包括:
    确定所述当前帧的亮度分量使用亮度环路滤波网络模型进行滤波处理的第一率失真代价值;以及确定所述当前帧的亮度分量未使用所述亮度环路滤波网络模型进行滤波处理的第二率失真代价值;
    若所述第一率失真代价值小于所述第二率失真代价值,则确定第一亮度语法元素标识信息的取值为第一值;和/或,
    若所述第一率失真代价值大于或等于所述第二率失真代价值,则确定第一亮度语法元素标识信息的取值为第二值。
  42. 根据权利要求41所述的方法,其中,所述方法还包括:
    对所述第一亮度语法元素标识信息的取值进行编码,将编码比特写入码流。
  43. 根据权利要求41所述的方法,其中,当所述第一率失真代价值小于所述第二率失真代价值时,所述方法还包括:
    确定所述当前块的亮度分量使用所述亮度环路滤波网络模型进行滤波处理的第三率失真代价值;以及确定所述当前块的亮度分量未使用所述亮度环路滤波网络模型进行滤波处理的第四率失真代价值;
    若所述第三率失真代价值小于所述第四率失真代价值,则确定第二亮度语法元素标识信息的取值为第一值;和/或,
    若所述第三率失真代价值大于或等于所述第四率失真代价值,则确定第二亮度语法元素标识信息的取值为第二值。
  44. 根据权利要求43所述的方法,其中,所述方法还包括:
    对所述第二亮度语法元素标识信息的取值进行编码,将编码比特写入码流。
  45. 根据权利要求43所述的方法,其中,所述利用所述环路滤波网络模型对所述当前块进行滤波 处理,包括:
    若所述第三率失真代价值小于所述第四率失真代价值,则利用所述亮度环路滤波网络模型对所述当前块进行滤波处理。
  46. 根据权利要求39所述的方法,其中,在确定所述视频序列使用所述环路滤波网络模型进行滤波处理之后,所述方法还包括:
    确定所述当前帧的色度分量使用色度环路滤波网络模型进行滤波处理的第五率失真代价值;以及确定所述当前帧的色度分量未使用所述色度环路滤波网络模型进行滤波处理的第六率失真代价值;
    若所述第五率失真代价值小于所述第六率失真代价值,则确定色度语法元素标识信息的取值为第一值;和/或,
    若所述第五率失真代价值大于或等于所述第六率失真代价值,则确定色度语法元素标识信息的取值为第二值。
  47. 根据权利要求46所述的方法,其中,所述方法还包括:
    对所述色度语法元素标识信息的取值进行编码,将编码比特写入码流。
  48. 根据权利要求46所述的方法,其中,所述利用所述环路滤波网络模型对所述当前块进行滤波处理,包括:
    若所述第五率失真代价值小于所述第六率失真代价值,则利用所述色度环路滤波网络模型对所述当前块进行滤波处理。
  49. 根据权利要求32所述的方法,其中,所述确定所述当前块的预设选择网络模型,包括:
    当所述当前块的颜色分量类型为亮度分量时,确定所述当前块的亮度选择网络模型;
    当所述当前块的颜色分量类型为色度分量时,确定所述当前块的色度选择网络模型;
    所述根据所述预设选择网络模型,确定至少一个候选环路滤波网络模型各自对应的输出值,包括:
    当所述当前块的颜色分量类型为亮度分量时,根据所述亮度选择网络模型确定至少一个候选亮度环路滤波网络模型各自对应的输出值;
    当所述当前块的颜色分量类型为色度分量时,根据所述色度选择网络模型确定至少一个候选色度环路滤波网络模型各自对应的输出值。
  50. 根据权利要求49所述的方法,其中,所述方法还包括:
    确定第一训练集,其中,所述第一训练集包括至少一个训练样本,且所述训练样本是根据至少一种量化参数得到的;
    利用所述第一训练集中训练样本的亮度分量对第一神经网络结构进行训练,得到所述至少一个候选亮度环路滤波网络模型。
  51. 根据权利要求50所述的方法,其中,所述第一神经网络结构包括第一卷积模块、第一残差模块、第二卷积模块和第一连接模块,所述第一卷积模块、所述第一残差模块、所述第二卷积模块和所述第一连接模块顺次连接,且所述第一连接模块还与所述第一卷积模块的输入连接。
  52. 根据权利要求51所述的方法,其中,所述第一卷积模块由一层卷积层和一层激活层组成,所述第二卷积模块由两层卷积层和一层激活层组成,所述连接模块由跳转连接层组成,所述第一残差模块包括若干个残差块,且所述残差块由两层卷积层和一层激活层组成。
  53. 根据权利要求49所述的方法,其中,所述方法还包括:
    确定第一训练集,其中,所述第一训练集包括至少一个训练样本,且所述训练样本是根据至少一种量化参数得到的;
    利用所述第一训练集中训练样本的亮度分量和色度分量对第二神经网络结构进行训练,得到所述至少一个候选色度环路滤波网络模型。
  54. 根据权利要求53所述的方法,其中,所述第二神经网络结构包括上采样模块、第三卷积模块、第四卷积模块、融合模块、第二残差模块、第五卷积模块和第二连接模块,所述上采样模块和所述第三卷积模块连接,所述第三卷积模块和所述第四卷积模块与所述融合模块连接,所述融合模块、所述第二残差模块、所述第五卷积模块和所述第二连接模块顺次连接,且所述第二连接模块还与所述上采样模块的输入连接。
  55. 根据权利要求54所述的方法,其中,所述第三卷积模块由一层卷积层和一层激活层组成,所述第四卷积模块由一层卷积层和一层激活层组成,所述第五卷积模块由两层卷积层、一层激活层和一层池化层组成,所述连接模块由跳转连接层组成,所述第二残差模块包括若干个残差块,且所述残差块由两层卷积层和一层激活层组成。
  56. 根据权利要求49所述的方法,其中,所述确定所述当前块的亮度选择网络模型,包括:
    在所述当前块的颜色分量类型为亮度分量的情况下,确定至少一个候选亮度选择网络模型;
    确定所述当前块的量化参数,从所述至少一个候选亮度选择网络模型中选取所述量化参数对应的候选亮度选择网络模型;
    将所选取的候选亮度选择网络模型确定为所述当前块的亮度选择网络模型。
  57. 根据权利要求49所述的方法,其中,所述确定所述当前块的色度选择网络模型,包括:
    在所述当前块的颜色分量类型为色度分量的情况下,确定至少一个候选色度选择网络模型;
    确定所述当前块的量化参数,从所述至少一个候选色度选择网络模型中选取所述量化参数对应的候选色度选择网络模型;
    将所选取的候选色度选择网络模型确定为所述当前块的色度选择网络模型。
  58. 根据权利要求56或57所述的方法,其中,所述方法还包括:
    确定第二训练集,其中,所述第二训练集包括至少一个训练样本,且所述训练样本是根据至少一种量化参数得到的;
    利用所述第二训练集中训练样本的亮度分量对第三神经网络结构进行训练,得到至少一个候选亮度选择网络模型;
    利用所述第二训练集中训练样本的色度分量对第三神经网络结构进行训练,得到至少一个候选色度选择网络模型;
    其中,所述至少一个候选亮度选择网络模型与亮度分量和量化参数之间具有对应关系,所述至少一个候选色度选择网络模型与色度分量和量化参数之间具有对应关系。
  59. 根据权利要求58所述的方法,其中,所述第三神经网络结构包括第六卷积模块和全连接模块,所述第六卷积模块和所述全连接模块顺次连接;
    其中,所述第六卷积模块包括若干个卷积子模块,所述卷积子模块由一层卷积层和一层池化层组成;所述全连接模块包括若干个全连接子模块,所述全连接子模块由一层全连接层和一层激活层组成。
  60. 根据权利要求49所述的方法,其中,所述根据所述预设选择网络模型,确定所述当前块使用的环路滤波网络模型,包括:
    当所述当前块的颜色分量类型为亮度分量时,确定所述亮度环路滤波网络模型的输入重建亮度图像块;
    将所述输入重建亮度图像块输入所述亮度选择网络模型,得到所述至少一个候选亮度环路滤波网络模型各自对应的输出值;
    从所述至少一个候选亮度环路滤波网络模型各自对应的输出值中确定目标值,将所述目标值对应的候选亮度环路滤波网络模型作为所述当前块使用的亮度环路滤波网络模型;或者,
    当所述当前块的颜色分量类型为色度分量时,确定所述色度环路滤波网络模型的输入重建色度图像块;
    将所述输入重建色度图像块输入所述色度选择网络模型,得到所述至少一个候选色度环路滤波网络模型各自对应的输出值;
    从所述至少一个候选色度环路滤波网络模型各自对应的输出值中确定目标值,将所述目标值对应的候选色度环路滤波网络模型作为所述当前块使用的色度环路滤波网络模型。
  61. 根据权利要求31所述的方法,其中,所述环路滤波网络模型为基于残差神经网络的环路滤波器(CNNLF)模型。
  62. 根据权利要求33所述的方法,其中,所述输入重建图像块是经由去块滤波器和样值自适应补偿滤波器进行滤波处理后得到。
  63. 根据权利要求31至62任一项所述的方法,其中,所述方法还包括:
    在确定出所述重建图像块之后,利用自适应修正滤波器对所述重建图像块进行滤波处理。
  64. 一种码流,所述码流是根据待编码信息进行比特编码生成的;其中,所述待编码信息包括下述至少之一:第一语法元素标识信息的取值、第二语法元素标识信息的取值、第一亮度语法元素标识信息的取值、第二亮度语法元素标识信息的取值和色度语法元素标识信息的取值;
    其中,所述第一语法元素标识信息用于指示当前块是否允许使用预设选择网络模型进行模型选择,所述第二语法元素标识信息指示视频序列是否使用环路滤波网络模型进行滤波处理,所述第一亮度语法元素标识信息用于指示当前帧的亮度分量是否使用亮度环路滤波网络模型进行滤波处理,所述第二亮度语法元素标识信息用于指示当前块的亮度分量是否使用亮度环路滤波网络模型进行滤波处理,所述色度语法元素标识信息用于指示当前帧的色度分量是否使用色度环路滤波网络模型进行滤波处理;所述视频序列包括所述当前帧,所述当前帧包括所述当前块。
  65. 一种编码器,所述编码器包括第一确定单元、第一选择单元和第一滤波单元;其中,
    所述第一确定单元,配置为确定第一语法元素标识信息的取值;
    所述第一选择单元,配置为当所述第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定所述当前块的预设选择网络模型,并根据所述预设选择网络模型,确定所述当前块使用的环路滤波网络模型;
    所述第一滤波单元,配置为利用所述环路滤波网络模型对所述当前块进行滤波处理,得到所述当前块的重建图像块。
  66. 一种编码器,所述编码器包括第一存储器和第一处理器;其中,
    所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;
    所述第一处理器,用于在运行所述计算机程序时,执行如权利要求31至63任一项所述的方法。
  67. 一种解码器,所述解码器包括解析单元、第二选择单元和第二滤波单元;其中,
    所述解析单元,配置为解析码流,确定第一语法元素标识信息的取值;
    所述第二选择单元,配置为当所述第一语法元素标识信息指示当前块允许使用预设选择网络模型进行模型选择时,确定所述当前块的预设选择网络模型,并根据所述预设选择网络模型,确定所述当前块使用的环路滤波网络模型;
    所述第二滤波单元,配置为利用所述环路滤波网络模型对所述当前块进行滤波处理,得到所述当前块的重建图像块。
  68. 一种解码器,所述解码器包括第二存储器和第二处理器;其中,
    所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;
    所述第二处理器,用于在运行所述计算机程序时,执行如权利要求1至30任一项所述的方法。
  69. 一种计算机存储介质,其中,所述计算机存储介质存储有计算机程序,所述计算机程序被执行时实现如权利要求1至30任一项所述的方法、或者如权利要求31至63任一项所述的方法。
PCT/CN2021/099234 2021-06-09 2021-06-09 编解码方法、码流、编码器、解码器以及存储介质 WO2022257049A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2021/099234 WO2022257049A1 (zh) 2021-06-09 2021-06-09 编解码方法、码流、编码器、解码器以及存储介质
CN202180098998.1A CN117461315A (zh) 2021-06-09 2021-06-09 编解码方法、码流、编码器、解码器以及存储介质
US18/534,485 US20240107015A1 (en) 2021-06-09 2023-12-08 Encoding method, decoding method, code stream, encoder, decoder and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/099234 WO2022257049A1 (zh) 2021-06-09 2021-06-09 编解码方法、码流、编码器、解码器以及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/534,485 Continuation US20240107015A1 (en) 2021-06-09 2023-12-08 Encoding method, decoding method, code stream, encoder, decoder and storage medium

Publications (1)

Publication Number Publication Date
WO2022257049A1 true WO2022257049A1 (zh) 2022-12-15

Family

ID=84424757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099234 WO2022257049A1 (zh) 2021-06-09 2021-06-09 编解码方法、码流、编码器、解码器以及存储介质

Country Status (3)

Country Link
US (1) US20240107015A1 (zh)
CN (1) CN117461315A (zh)
WO (1) WO2022257049A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108184129A (zh) * 2017-12-11 2018-06-19 北京大学 一种视频编解码方法、装置及用于图像滤波的神经网络
CN108520505A (zh) * 2018-04-17 2018-09-11 上海交通大学 基于多网络联合构建与自适应选择的环路滤波实现方法
CN110351568A (zh) * 2019-06-13 2019-10-18 天津大学 一种基于深度卷积网络的视频环路滤波器
WO2021006624A1 (ko) * 2019-07-08 2021-01-14 엘지전자 주식회사 적응적 루프 필터를 적용하는 비디오 또는 영상 코딩

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108184129A (zh) * 2017-12-11 2018-06-19 北京大学 一种视频编解码方法、装置及用于图像滤波的神经网络
CN108520505A (zh) * 2018-04-17 2018-09-11 上海交通大学 基于多网络联合构建与自适应选择的环路滤波实现方法
CN110351568A (zh) * 2019-06-13 2019-10-18 天津大学 一种基于深度卷积网络的视频环路滤波器
WO2021006624A1 (ko) * 2019-07-08 2021-01-14 엘지전자 주식회사 적응적 루프 필터를 적용하는 비디오 또는 영상 코딩

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
H. YIN (INTEL), R. YANG (INTEL), X. FANG, S. MA, Y. YU (INTEL): "AHG9 : Adaptive convolutional neural network loop filter", 13. JVET MEETING; 20190109 - 20190118; MARRAKECH; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-M0566, 5 January 2019 (2019-01-05), XP030200692 *

Also Published As

Publication number Publication date
US20240107015A1 (en) 2024-03-28
CN117461315A (zh) 2024-01-26

Similar Documents

Publication Publication Date Title
CN111711824A (zh) 视频编解码中的环路滤波方法、装置、设备及存储介质
WO2021203394A1 (zh) 环路滤波的方法与装置
JP7439841B2 (ja) ループ内フィルタリングの方法及びループ内フィルタリングの装置
WO2022052533A1 (zh) 编码方法、解码方法、编码器、解码器以及编码系统
CN114125439A (zh) 使用交叉分量线性模型的视频编解码
WO2021134706A1 (zh) 环路滤波的方法与装置
CN118020297A (zh) 基于混合神经网络的端到端图像和视频编码方法
CN116916036A (zh) 视频压缩方法、装置及系统
CN113784128A (zh) 图像预测方法、编码器、解码器以及存储介质
EP4245030A1 (en) Network based image filtering for video coding
WO2022116085A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
KR20190117352A (ko) 영상 부호화 또는 복호화 장치 및 방법
US20230262251A1 (en) Picture prediction method, encoder, decoder and computer storage medium
WO2022227062A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
CN114731406A (zh) 编码方法、解码方法和编码装置、解码装置
WO2022257049A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
WO2022257130A1 (zh) 编解码方法、码流、编码器、解码器、系统和存储介质
WO2022178686A1 (zh) 编解码方法、编解码设备、编解码系统以及计算机可读存储介质
WO2024016156A1 (zh) 滤波方法、编码器、解码器、码流以及存储介质
WO2023245544A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
WO2024077573A1 (zh) 编解码方法、编码器、解码器、码流以及存储介质
WO2023197230A1 (zh) 滤波方法、编码器、解码器以及存储介质
WO2023130226A1 (zh) 一种滤波方法、解码器、编码器及计算机可读存储介质
WO2023123398A1 (zh) 滤波方法、滤波装置以及电子设备
CN112313950A (zh) 视频图像分量的预测方法、装置及计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21944559

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180098998.1

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE