WO2023123398A1 - 滤波方法、滤波装置以及电子设备 - Google Patents

滤波方法、滤波装置以及电子设备 Download PDF

Info

Publication number
WO2023123398A1
WO2023123398A1 PCT/CN2021/143804 CN2021143804W WO2023123398A1 WO 2023123398 A1 WO2023123398 A1 WO 2023123398A1 CN 2021143804 W CN2021143804 W CN 2021143804W WO 2023123398 A1 WO2023123398 A1 WO 2023123398A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
reconstructed image
neural network
value
current reconstructed
Prior art date
Application number
PCT/CN2021/143804
Other languages
English (en)
French (fr)
Inventor
戴震宇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2021/143804 priority Critical patent/WO2023123398A1/zh
Publication of WO2023123398A1 publication Critical patent/WO2023123398A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals

Definitions

  • the embodiments of the present application relate to the technical field of image coding and decoding, and more specifically, to a filtering method, a filtering device, and electronic equipment.
  • Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage.
  • Digital video compression standards can achieve video decompression, it is still necessary to pursue better digital video decompression technology to improve decoding. performance.
  • the present application provides a filtering method, a filtering device and electronic equipment, which can improve decoding performance.
  • the present application provides a filtering method, including:
  • the present application provides a filtering method, including:
  • the present application provides a filtering device, including:
  • a parsing unit configured to parse the code stream to obtain the current reconstructed image block
  • a prediction unit configured to determine that when the first neural network is used to filter the current reconstructed image block, the second neural network is used to predict the features of the original image block of the current reconstructed image block to obtain the current reconstructed image block
  • a filtering unit configured to use the first neural network to filter the current reconstructed image block based on the feature image block of the current reconstructed image block, to obtain a filtered reconstructed image block.
  • the present application provides a filtering device, including:
  • An acquisition unit configured to acquire the current reconstructed image block
  • a prediction unit configured to determine that when the first neural network is used to filter the current reconstructed image block, the second neural network is used to predict the features of the original image block of the current reconstructed image block to obtain the current reconstructed image block
  • a filtering unit configured to use the first neural network to filter the current reconstructed image block based on the feature image block of the current reconstructed image block, to obtain a filtered reconstructed image block.
  • the present application provides an electronic device, including:
  • a processor adapted to implement computer instructions
  • a computer-readable storage medium stores computer instructions, and the computer instructions are adapted to be loaded by a processor and execute any one of the first to second aspects above or the filtering method in each implementation.
  • processors there are one or more processors, and one or more memories.
  • the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions are read and executed by a processor of a computer device, the computer device executes the above-mentioned first step.
  • the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes any one of the above-mentioned first to second aspects or the method in each implementation manner .
  • the current reconstructed image block is obtained
  • the first neural network and designing the first neural network to filter the current reconstructed image block based on the feature image block of the current reconstructed image block
  • Realizing the filtering process based on the neural network can also ensure that the information used to filter the current reconstructed image block is information that fits the original image block as much as possible, and then can improve the image quality of the currently reconstructed image block And improve decoding performance.
  • this application considers that the purpose of filtering is to make the current reconstructed image block closer to the original image block, and use the extracted feature image block of the original image block as input through the introduced first neural network to perform filtering processing on the current reconstructed image block , can improve the quality of the current reconstructed image and improve the decoding performance; in addition, for the feature image block of the original image block, this application considers that although the encoding end can obtain it by analyzing the original image block, but the decoding end cannot obtain it, by introducing the first
  • the second neural network is used as a feature extractor, which can ensure that the decoding end can obtain the feature image blocks of the original image block, and further improve the decoding performance. That is to say, the present application proposes a neural network or filtering method for filtering the current reconstructed image block by using the feature image block of the original image, which can improve the image quality of the current reconstructed image block and improve the decoding performance.
  • Fig. 1 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
  • Fig. 2 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.
  • Fig. 3 is a schematic flowchart of a filtering method provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of the connection relationship between the first neural network and the second neural network provided by the embodiment of the present application.
  • Fig. 5 is a schematic diagram of a filtering unit including a first neural network and a second neural network provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of a second neural network provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a first neural network provided by an embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a residual block provided by an embodiment of the present application.
  • FIG. 9 is another schematic flowchart of a filtering method provided by an embodiment of the present application.
  • Fig. 10 is a schematic block diagram of a filtering device according to an embodiment of the present application.
  • Fig. 11 is another schematic block diagram of a filtering device according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the solution provided by the embodiments of the present application can be applied to the technical field of digital video coding, for example, the field of image coding and decoding, the field of video coding and decoding, the field of hardware video coding and decoding, the field of dedicated circuit video coding and decoding, and the field of real-time video coding and decoding.
  • the solutions provided in the embodiments of the present application may be combined with the Audio Video coding Standard (AVS), the second generation AVS standard (AVS2) or the third generation AVS standard (AVS3).
  • AVS Audio Video coding Standard
  • AVS265/High Efficiency Video Coding (HEVC) standard H.266/Versatile Video Coding, VVC
  • the solution provided by the embodiment of the present application may be used for image lossy compression (lossy compression), and may also be used for image lossless compression (lossless compression).
  • the lossless compression may be visually lossless compression or mathematically lossless compression.
  • the encoder reads unequal brightness component pixels and chrominance component pixels for the original video sequences of different color formats, that is, the encoder reads a black and white image or color image, and then for the image or Color images are encoded separately.
  • the black-and-white image may include pixels of luminance components
  • the color image may include pixels of chrominance components.
  • the color image may also include pixels of luminance components.
  • the color format of the original video sequence may be a luminance-chrominance (YCbCr, YUV) format or a red-green-blue (Red-Green-Blue, RGB) format or the like.
  • the encoder After the encoder reads a black-and-white image or a color image, it divides it into block data and encodes the block data.
  • the block data can be a coding tree unit (Coding Tree Unit, CTU) or a coding unit block (Coding Unit, CU).
  • a coding tree unit can be further divided into several CUs.
  • the CU can be a rectangular block or a square block. . That is, the encoder can perform encoding based on CTU or CU.
  • Today's encoders usually use a mixed frame coding mode, which generally includes operations such as intra-frame and inter-frame prediction, transformation and quantization, inverse transformation and inverse quantization, loop filtering, and entropy coding.
  • Intra-frame prediction only refers to the information of the same frame image, and predicts the pixel information in the current division block to eliminate spatial redundancy;
  • inter-frame prediction can refer to the image information of different frames, and uses motion estimation to search for the motion vector that best matches the current division block Information, used to eliminate temporal redundancy; transform transforms the predicted image block into the frequency domain, energy redistribution, combined with quantization can remove information that is not sensitive to human eyes, used to eliminate visual redundancy; entropy coding can be based on the current context The model and the probability information of the binary code stream eliminate character redundancy.
  • Loop filtering mainly processes the pixels after inverse transformation and inverse quantization to compensate for distortion information and provide better reference for subsequent encoded pixels.
  • Fig. 1 is a schematic block diagram of a coding framework 100 provided by an embodiment of the present application.
  • the encoding framework 100 may include an intra prediction unit 180, an inter prediction unit 170, a residual unit 110, a transform and quantization unit 120, an entropy encoding unit 130, an inverse transform and inverse quantization unit 140, and a loop Road filtering unit 150.
  • the encoding framework 100 may further include a decoded image buffer unit 160 .
  • the coding framework 100 may also be referred to as a hybrid framework coding mode.
  • the intra prediction unit 180 or the inter prediction unit 170 can predict the image block to be coded to output the predicted block.
  • the residual unit 110 may calculate a residual block based on the predicted block and the image block to be encoded, that is, a difference between the predicted block and the image block to be encoded.
  • the residual block can be transformed and quantized by the transformation and quantization unit 120 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
  • the residual block before being transformed and quantized by the transform and quantization unit 120 may be called a time domain residual block
  • the time domain residual block after being transformed and quantized by the transform and quantization unit 120 may be called a frequency residual block or a frequency-domain residual block.
  • the entropy encoding unit 130 can output a code stream based on the transform and quantization coefficient. For example, the entropy coding unit 130 can eliminate character redundancy according to the target context model and the probability information of the binary code stream. For example, the entropy coding unit 130 may be used for context-adaptive binary arithmetic entropy coding (CABAC).
  • CABAC context-adaptive binary arithmetic entropy coding
  • the entropy encoding unit 130 may also be referred to as a header information encoding unit.
  • the image block to be encoded may also be called an original image block or a target image block
  • the predicted block may also be called a predicted image block or an image predicted block, and may also be called a predicted signal or predicted information
  • a reconstruction block may also be called a reconstructed image block or an image reconstruction block, and may also be called a reconstructed signal or reconstructed information.
  • the image block to be encoded may also be referred to as an encoding block or an encoding image block
  • the image block to be encoded may also be referred to as a decoding block or a decoding image block.
  • the image block to be encoded may be a CTU or a CU.
  • the encoding framework 100 calculates the residual from the predicted block and the image block to be encoded to obtain the residual block, and then transmits the residual block to the decoding end through processes such as transformation and quantization. After receiving and parsing the code stream, the decoding end obtains the residual block through steps such as inverse transformation and inverse quantization, and then superimposes the predicted block obtained by the decoding end on the residual block to obtain the reconstructed block.
  • the inverse transform and inverse quantization unit 140 can be used to form a decoder.
  • the intra-frame prediction unit 180 or the inter-frame prediction unit 170 can predict the image block to be encoded based on the existing reconstructed block, thereby ensuring that the understanding of the reference frame at the encoding end and the decoding end is consistent.
  • the encoder can replicate the decoder's processing loop, which in turn can generate the same predictions as the decoder.
  • the quantized transform coefficients are inversely transformed and inversely quantized by the inverse transform and inverse quantization unit 140 to copy an approximate residual block at the decoding end.
  • the loop filtering unit 150 can be used to smoothly filter out block effects caused by block-based processing and quantization.
  • the image blocks output by the loop filtering unit 150 may be stored in the decoded image buffer unit 160 so as to be used for prediction of subsequent images.
  • the intra-frame prediction unit 180 can be used for intra-frame prediction, and the intra-frame prediction only refers to the information of the same frame image to predict the pixel information in the image block to be encoded, so as to eliminate spatial redundancy;
  • the frame used for intra-frame prediction can be an I frame .
  • the image block to be encoded can refer to the upper left image block, the upper image block and the left image block as reference information to predict the image block to be encoded, and the image to be encoded The block is used as the reference information of the next image block, so that the whole image can be predicted.
  • the input digital video is in color format, such as YUV 4:2:0 format
  • every 4 pixels of each image frame of the digital video is composed of 4 Y components and 2 UV components
  • the encoding framework 100 can Y components (ie luma blocks) and UV components (ie chrominance blocks) are coded separately.
  • the decoding end can also perform corresponding decoding according to the format.
  • the inter-frame prediction unit 170 can be used for inter-frame prediction.
  • the inter-frame prediction can refer to the image information of different frames, and use motion estimation to search for the motion vector information that best matches the image block to be encoded, so as to eliminate temporal redundancy;
  • the frame may be a P frame and/or a B frame, the P frame refers to a forward prediction frame, and the B frame refers to a bidirectional prediction frame.
  • the intra-frame prediction can use the angle prediction mode and non-angle prediction mode to predict the image block to be coded to obtain the prediction block.
  • the coded The optimal prediction mode of the image block, and transmit the prediction mode to the decoding end through the code stream.
  • the decoding end analyzes the prediction mode, predicts the prediction block of the target decoding block, and superimposes the time domain residual block obtained through code stream transmission to obtain the reconstruction block.
  • the non-angle mode remains relatively stable, and there are mean mode and flat mode; the angle mode continues to increase with the evolution of digital video codec standards.
  • the H.264/AVC standard only has 8 angle prediction modes and 1 non-angle prediction mode; H.265/HEVC extends to 33 angle prediction modes and 2 non-angle prediction modes model.
  • H.266/VVC the intra-frame prediction mode is further expanded. There are 67 traditional prediction modes and non-traditional prediction modes for luma blocks.
  • Matrix weighted intra-frame prediction (MIP) mode traditional prediction The modes include: a planar mode of mode number 0, a DC mode of mode number 1, and angle prediction modes of mode numbers 2 to 66.
  • Fig. 1 is only an example of the present application, and should not be construed as a limitation to the present application.
  • the loop filtering unit 150 in the encoding framework 100 may include a deblocking filter (DBF), a sample adaptive compensation filter (SAO) and an adaptive correction filter (ALF).
  • DBF deblocking filter
  • SAO sample adaptive compensation filter
  • ALF adaptive correction filter
  • the function of DBF is to remove the block effect
  • SAO is to remove the ringing effect.
  • the encoding framework 100 may use a neural network-based loop filter algorithm to improve video compression efficiency.
  • the coding framework 100 may be a video coding hybrid framework based on a deep learning neural network.
  • the result of pixel filtering based on neural network calculation may be used.
  • the network structure of the loop filtering unit 150 on the luma component and the chrominance component may be the same or different. Considering that the luma component contains more visual information, the luma component can also be used to guide the filtering of the chroma component, so as to improve the reconstruction quality of the chroma component.
  • Fig. 2 is a schematic block diagram of a decoding framework 200 provided by an embodiment of the present application.
  • the decoding framework 200 may include an entropy decoding unit 210, an inverse transform and inverse quantization unit 220, a residual unit 230, an intra prediction unit 240, an inter prediction unit 250, a loop filter unit 260, a decoded image buffer Unit 270.
  • the entropy decoding unit 210 After the entropy decoding unit 210 receives and analyzes the code stream, it obtains the prediction block and the residual block in the frequency domain. For the residual block in the frequency domain, the inverse transform and inverse quantization unit 220 performs steps such as inverse transform and inverse quantization to obtain the residual block in the time domain. For the difference block, the residual unit 230 superimposes the prediction block predicted by the intra prediction unit 240 or the inter prediction unit 250 on the residual block in the time domain after inverse transformation and inverse quantization by the inverse transformation and inverse quantization unit 220, which can be obtained Rebuild blocks. For example, the intra-frame prediction unit 240 or the inter-frame prediction unit 250 can obtain the prediction block by decoding the header information of the bitstream.
  • Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage.
  • existing digital video compression standards can achieve video decompression, it is still necessary to pursue better digital video decompression technology to improve decoding performance .
  • the traditional loop filter module mainly includes tools such as Deblocking Filter (DBF), Sample Adaptive Offset (SAO) and Adaptive Modification Filter (ALF).
  • DPF Deblocking Filter
  • SAO Sample Adaptive Offset
  • ALF Adaptive Modification Filter
  • the decoding performance can also be improved by introducing neural network-based filters.
  • the present application provides a filtering method, filtering device and electronic equipment, which can improve decoding performance.
  • Fig. 3 is a schematic flowchart of a filtering method 300 according to an embodiment of the present application.
  • the method 300 can be implemented by a decoding framework including a neural network-based filtering unit.
  • the neural network-based filtering unit may be extended into the decoding framework described in FIG. 2 to execute the filtering method 300 .
  • the filtering method 300 may include:
  • the current reconstructed image block is obtained
  • the first neural network and designing the first neural network to filter the current reconstructed image block based on the feature image block of the current reconstructed image block not only can Realizing the filtering process based on the neural network can also ensure that the information used to filter the current reconstructed image block is information that fits the original image block as much as possible, and then can improve the image quality of the currently reconstructed image block And improve decoding performance.
  • this application considers that the purpose of filtering is to make the current reconstructed image block closer to the original image block, and use the extracted feature image block of the original image block as input through the introduced first neural network to perform filtering processing on the current reconstructed image block , can improve the quality of the current reconstructed image and improve the decoding performance; in addition, for the feature image block of the original image block, this application considers that although the encoding end can obtain it by analyzing the original image block, but the decoding end cannot obtain it, by introducing the first
  • the second neural network is used as a feature extractor, which can ensure that the decoding end can obtain the feature image blocks of the original image block, and further improve the decoding performance. That is to say, the present application proposes a neural network or filtering method for filtering the current reconstructed image block by using the feature image block of the original image, which can improve the image quality of the current reconstructed image block and improve the decoding performance.
  • the present application may only filter the current reconstructed image block based on the characteristic image blocks of the current reconstructed image block, or may filter the current reconstructed image block based on the characteristic image blocks of the current reconstructed image block in combination with other information.
  • the reconstructed image block is filtered, which is not specifically limited in the present application.
  • the information of the current reconstructed image block and the feature image block of the current reconstructed image block may be used as the input of the first neural network, and the current reconstructed image block may be filtered to improve decoding performance.
  • the information of the currently reconstructed image block includes but not limited to: pixel values of color components (Y/U/V), block division information, prediction information, deblocking boundary strength information, and quantization step (QP) information wait.
  • a luma component may be introduced as an input to direct the filtering of the chrominance components.
  • the feature image block of the current reconstructed image block is the information of the original image block of the current reconstructed image block, which is not used for the information of the current reconstructed image block.
  • the present application does not specifically limit the network structure of the first neural network.
  • the first neural network may be a loop filter based on deep learning.
  • the first neural network may be a residual neural network-based loop filter (CNNLF).
  • CNLF residual neural network-based loop filter
  • the first neural network may include a network structure for luma components and a network structure for chrominance components.
  • the network structure for the luma component or the network structure for the chrominance component may consist of a convolutional layer, an activation layer, a residual block, a jump connection, etc.; where the network structure of the residual block consists of a convolutional layer, It consists of an activation layer and a jump connection; further, it can also include a global jump connection from input to output, so that the network structure can focus on learning residuals and accelerate the convergence process of the network structure.
  • the network structure for the chroma component may introduce the luma component as an input to guide the filtering of the chroma component.
  • the first neural network may be a loop filter based on a deep convolutional neural network.
  • the first neural network may include a multi-layer residual network.
  • multiple modes of information may be introduced as input to filter the currently reconstructed image block, and the best model is selected for filtering by calculating the rate-distortion cost of each model.
  • the present application does not limit the specific positions of the first neural network and the second neural network in the encoding and decoding framework or the filtering unit.
  • the network structure formed by the first neural network and the second neural network may also be called a neural network based loop filter (Neural Network based Loop Filter, NNLF).
  • connection relationship and network structure of the first neural network and the second neural network will be illustrated below with reference to FIG. 4 to FIG. 8 .
  • Fig. 4 is a schematic diagram of the connection relationship between the first neural network and the second neural network provided by the embodiment of the present application.
  • the present application first uses the second neural network to predict the feature image block of the original image block, and then inputs the predicted feature image block into the second neural network to perform filtering processing on the currently reconstructed image block. That is to say, the neural network loop filter used to filter the current reconstructed image block consists of two neural networks, namely the second neural network and the first neural network.
  • Fig. 5 is a schematic diagram of a filtering unit including a first neural network and a second neural network provided by an embodiment of the present application.
  • the network structure formed by the first neural network and the second neural network may be located between the SAO and the ALF.
  • the use of the network structure formed by the first neural network and the second neural network does not depend on switches of DBF, SAO, and ALF, but is placed between SAO and ALF in terms of position.
  • Fig. 6 is a schematic structural diagram of a second neural network provided by an embodiment of the present application.
  • the second neural network is composed of k convolutional layers, wherein except the last convolutional layer, there is a non-linear activation function (PReLU) layer after each convolutional layer.
  • the input of the second neural network is the current reconstructed image block, and the output is the predicted feature image block of the current reconstructed image block.
  • PReLU non-linear activation function
  • FIG. 7 is a schematic structural diagram of a first neural network provided by an embodiment of the present application.
  • the first neural network firstly performs a convolution operation on the feature image block of the current reconstructed image block output by the second neural network, and then performs inter-channel merge with the input current reconstructed image block, and inputs it to the next layer network.
  • the second layer and the last layer of the first neural network are convolutional layers, and there is a jump connection line from input to output, so that the first neural network can focus on learning residuals, and accelerate the convergence process of the network structure .
  • the input of the first neural network is the feature image block of the current reconstructed image block output by the second neural network and the current reconstructed image block, and the output of the first neural network is the filtered reconstructed image block.
  • Fig. 8 is a schematic structural diagram of a residual block provided by an embodiment of the present application.
  • a residual block can be composed of two convolutional layers, where the first convolutional layer is followed by a non-linear activation function (PReLU) layer, and there is also a jump connection line from input to output.
  • PReLU non-linear activation function
  • FIG. 3 to FIG. 8 are only examples of the present application, and should not be construed as limiting the present application.
  • the embodiment of the present application does not limit the structure of the neural network, including specific implementation methods such as the number of convolutional layers, the number of residual blocks, and nonlinear activation functions.
  • the method 300 may further include:
  • the value of the sequence identifier when the value of the sequence identifier is the first value, it means that the first neural network is allowed to use the first neural network to filter the reconstructed image block in the current reconstructed image sequence to which the current reconstructed image block belongs, and the value of the sequence identifier is When the value is the second value, it means that the first neural network is not allowed to filter the reconstructed image blocks in the current reconstructed image sequence;
  • sequence identifier may be carried in a sequence header of a code stream.
  • sequence_header may represent a sequence header
  • nnlf_enable_flag may be used to represent the sequence identifier.
  • nnlf_enable_flag when the value of nnlf_enable_flag is 1, it means that the first neural network is allowed to filter the reconstructed image block in the current reconstructed image sequence to which the current reconstructed image block belongs; for example, when the value of nnlf_enable_flag is 0, Indicates that the first neural network is not allowed to filter the reconstructed image block in the current reconstructed image sequence to which the current reconstructed image block belongs.
  • the decoding end can determine not to use the first neural network at one time. Filtering by the network can prevent the decoding end from determining whether to use the first neural network to perform filtering for each reconstructed image block in the current reconstructed image sequence in a traversal manner, and can improve decoding efficiency.
  • the present application does not specifically limit specific values of the first value and the second value.
  • the first numerical value is 1 and the second numerical value is 0; in another implementation manner, the first numerical value is 0 and the second numerical value is 1.
  • the code stream is parsed to obtain the value of the component identifier; wherein, when the value of the component identifier is the first value Indicates that it is allowed to use the first neural network to filter the reconstructed image block in the current reconstructed image to which the current reconstructed image block belongs, which has the same component as the current reconstructed image block, and the value of the component identifier is the
  • the second numerical value indicates that the first neural network is not allowed to filter the reconstructed image block in the current reconstructed image that has the same component as the current reconstructed image block; based on the value of the component identifier, determine whether The current reconstructed image block is filtered by using the first neural network.
  • the component identifier may be carried in the image header.
  • picture_header can be used to indicate the image header
  • nnlf_enable_flag can be used to indicate the sequence identifier
  • compIdx can be used to indicate the xth component of the currently reconstructed image block. For example, when compIdx is 0, it indicates the brightness component; when compIdx is 1, it indicates the Cb component; when compIdx is 2, it indicates the Cr component.
  • picture_nnlf_enable_flag[compIdx] indicates the component identification of the xth component.
  • picture_nnlf_enable_flag[compIdx] when the value of picture_nnlf_enable_flag[compIdx] is 1, it means that the first neural network is allowed to filter the reconstructed image block of the xth component in the current reconstructed image to which the current reconstructed image block belongs; picture_nnlf_enable_flag[compIdx ] is 0, it means that the first neural network is not allowed to filter the reconstructed image block of the xth component in the current reconstructed image to which the current reconstructed image block belongs.
  • the decoding end can Determining not to use the first neural network for filtering can prevent the decoding end from determining whether to use the first neural network for filtering in a traversal manner for the components of each reconstructed image block in the current reconstructed image sequence, and can Improve decoding efficiency.
  • the value of the image block identifier is obtained by parsing the code stream; wherein, the value of the image block identifier is the first value
  • a value means that the first neural network is used to filter the current reconstructed image block, and if the value of the image block identifier is the second value, it means that the first neural network is not used to filter the current reconstructed image block. performing filtering on the reconstructed image block; determining whether to use the first neural network to perform filtering on the currently reconstructed image block based on the value of the image block identifier.
  • the component identification may be carried in a batch (patch).
  • picture_header can be used to indicate the image header
  • nnlf_enable_flag can be used to indicate the sequence identifier
  • compIdx can be used to indicate the xth component of the currently reconstructed image block. For example, when compIdx is 0, it indicates the brightness component; when compIdx is 1, it indicates the Cb component; when compIdx is 2, it indicates the Cr component.
  • picture_nnlf_enable_flag[compIdx] indicates the component identification of the xth component.
  • patch_nnlf_enable_flag[compIdx][LcuIdx] indicates the image block identifier of the xth image block of the xth component in the current reconstructed image to which the current reconstructed image block belongs. For example, when the value of patch_nnlf_enable_flag[compIdx][LcuIdx] is 1, it means that the first neural network is allowed to use the xth image block of the xth component in the current reconstructed image to which the current reconstructed image block belongs.
  • the method 300 may further include:
  • At least one first training data pair is obtained, and the at least one first training data pair includes at least one first reconstructed image block and the at least one first characteristic image block respectively corresponding to at least one first reconstructed image block;
  • the first neural network its goal is to train a set of network parameters so that the image output by the first neural network is closer to the target image.
  • the training set For example, using the data sets DIV2K and BVI-DVC recommended by the VVC Video Coding Exploration Experimental Group as the training set, first convert the PNG format image or MP4 format video to the video to be compressed in the YUV420 format to obtain the original video information of each color component. Then use the VTM reference software test platform to encode to obtain the reconstructed video.
  • the image block of the brightness component is an image block of 128x128 size
  • the image block of the chrominance component is an image block of 64x64 size
  • the reconstructed image block and the original image block are composed of training data.
  • (data-label) as the training set of the first neural network.
  • the feature image of the first original image is a feature image obtained by using the second neural network to predict the first original image, or the feature image of the first original image is an annotated The feature image of the first original image.
  • the method 300 may further include:
  • At least one second training data pair is obtained, and the at least one second training data pair includes at least one third reconstructed image block and the At least one second characteristic image block corresponding to at least one second reconstructed image block;
  • the second neural network its goal is to train a set of network parameters so that the image output by the second neural network is closer to the feature image of the target image. Therefore, it is necessary to perform collected to obtain a feature image.
  • An image has rich feature information, such as color features, texture features, shape features, and spatial relationship features.
  • the color and texture features are mainly used to describe the surface properties of the scene corresponding to the image or image area
  • the shape feature is mainly used to describe the contour and regional characteristics of the image
  • the spatial relationship feature is mainly used to describe the relationship between multiple objects segmented from the image.
  • Mutual spatial position or relative orientation relationship Exemplarily, the spatial feature may include image saliency.
  • image saliency is the process of assigning a label to each pixel in an image so that pixels with the same label can share certain features.
  • Image saliency is an important visual feature in an image, which reflects the importance of vision to each area of the image.
  • Image saliency is also widely used in compression coding, edge enhancement, salient target partition and feature extraction.
  • the saliency detection task is a hot research direction in the field of machine vision, which can be used to detect salient information in visual space through mathematical methods. For example, based on the minimum grid distance (MBD) to calculate the distance between each pixel in the image and the background candidate pixel set (generally select the boundary pixels of the image); the Binary method is based on the MBD, according to the set threshold value to obtain a Zhang Binary saliency map; robust background detection (RBD) method, using continuity to improve the robustness of the background prior, using the segmentation algorithm to divide the image into multiple regions, calculating the correlation between each region and the boundary, and determining the final Significant areas; FT algorithm starts from the frequency domain and designs a band-pass filter to highlight the entire significant area through a lower low-pass cut-off frequency, and to display a clear boundary and cut off high frequencies through a higher high-pass cut-off frequency noise information.
  • Methods based on deep learning are also used in space-time domain saliency detection tasks, such as the SALI
  • I ⁇ represents the arithmetic average pixel value of the image
  • I ⁇ hc represents the Gaussian blur pixel value of the image to eliminate fine texture, noise and coding artifacts
  • ... ⁇ represents the calculation of Euclidean distance.
  • the data sets DIV2K and BVI-DVC recommended by the VVC Video Coding Exploration Experiment Group can be used as training sets.
  • image saliency as an example, the saliency feature image of the original image can be calculated by FT algorithm and converted to YUV420 Format. Then use the VTM reference software test platform to encode and decode the reconstructed video.
  • the image block of the luminance component is a 128x128 image block
  • the image block of the chrominance component is an image block of 64x64 size
  • the reconstructed image block and the feature image block are combined for training.
  • the data pair (data-label) is used as the training set of the second neural network.
  • the filter used to filter the current reconstructed image block includes a first neural network and a second neural network.
  • the second neural network and the first neural network should have different training objectives, otherwise the existence of the second neural network is meaningless. Therefore, the training of the second neural network and the first neural network may be performed independently, or the first neural network may be trained based on the trained second neural network, which is not specifically limited in this application.
  • the present application does not limit specific parameters used for training the first neural network and/or the second neural network.
  • the characteristic image block of the currently reconstructed image block is used to characterize at least one of the following characteristics of the original image block of the currently reconstructed image block:
  • the filtering method according to the embodiment of the present application is described in detail from the perspective of the decoding end with reference to FIG. 3 to FIG. 8 above, and the filtering method according to the embodiment of the present application will be described below from the perspective of the encoding end in conjunction with FIG. 9 .
  • FIG. 9 is a schematic flowchart of a filtering method 400 provided by an embodiment of the present application.
  • the method 400 can be implemented by a coding framework including a neural network-based filtering unit.
  • the neural network-based filtering unit may be extended into the coding framework described in FIG. 1 to execute the filtering method 400 .
  • the filtering method 400 may include:
  • the method 400 may further include:
  • the value of the sequence identifier when the value of the sequence identifier is the first value, it means that the first neural network is allowed to use the first neural network to filter the reconstructed image block in the current reconstructed image sequence to which the current reconstructed image block belongs, and the value of the sequence identifier is When the value is the second value, it means that the first neural network is not allowed to filter the reconstructed image blocks in the current reconstructed image sequence;
  • sequence identifier may be an identifier in a sequence parameter set.
  • the sequence parameter set may include a sequence identifier (sps_aip_enabled_flag), which is a sequence-level control switch, used to control whether to enable loop filtering based on the first neural network in the current sequence, such as the flag If it is 1, it means that the loop filtering based on the first neural network is allowed to be enabled in the current sequence, and if it is 0, it means that the loop filtering based on the first neural network is not allowed to be enabled.
  • sps_aip_enabled_flag a sequence-level control switch, used to control whether to enable loop filtering based on the first neural network in the current sequence, such as the flag If it is 1, it means that the loop filtering based on the first neural network is allowed to be enabled in the current sequence, and if it is 0, it means that the loop filtering based on the first neural network is not allowed to be enabled.
  • the encoder can obtain the specific information of the sequence identification by querying the configuration file configured by the user. Value, that is, whether
  • the first neural network is used to filter the current reconstructed image block to obtain the filtered rate of the current reconstructed image block. Distortion cost; if the rate-distortion cost of the current reconstructed image block after filtering is greater than the rate-distortion cost of the current reconstructed image block before filtering, then determine to use the first neural network to filter the current reconstructed image block; if If the rate-distortion cost of the current reconstructed image block after filtering is less than or equal to the rate-distortion cost of the current reconstructed image block before filtering, it is determined not to use the first neural network to filter the current reconstructed image block.
  • the method 400 may also include:
  • the method 400 may also include:
  • the value of the component identifier when the value of the component identifier is the first value, it means that the first neural network is allowed to use the first neural network to analyze the same components as the current reconstructed image block in the current reconstructed image to which the current reconstructed image block belongs.
  • the reconstructed image block is filtered, and when the value of the component identifier is the second value, it means that the first neural network is not allowed to reconstruct the same component in the current reconstructed image as the current reconstructed image block.
  • Image blocks are filtered;
  • the filtered rate-distortion cost of each reconstructed image block in the current reconstructed image with the same component as the current reconstructed image block is less than or less than the rate-distortion cost of each reconstructed image before filtering If the value of the component identifier is the first value; if the current reconstructed image includes a reconstructed image block whose rate-distortion cost after filtering is greater than the rate-distortion cost before filtering, then determine the value of the component identifier value is the second value.
  • the method 400 may also include:
  • the value of the image block identifier when the value of the image block identifier is the first value, it means that the first neural network is used to filter the currently reconstructed image block; if the value of the image block identifier is the second value indicates that the first neural network is not used to filter the current reconstructed image block.
  • the filtered rate-distortion cost of the current reconstructed image block is greater than the rate-distortion cost of the current reconstructed image block before filtering, then determine that the value of the image block identifier is the first value; If the filtered rate-distortion cost of the current reconstructed image block is less than or equal to the rate-distortion cost of the current reconstructed image block before filtering, then determine that the value of the image block identifier is the second value.
  • the method 400 may further include:
  • At least one first training data pair is obtained, and the at least one first training data pair includes at least one first reconstructed image block and the at least one first characteristic image block respectively corresponding to at least one first reconstructed image block;
  • the feature image of the first original image is a feature image obtained by using the second neural network to predict the first original image, or the feature image of the first original image is an annotated The feature image of the first original image.
  • the method 400 may further include:
  • At least one second training data pair is obtained, and the at least one second training data pair includes at least one third reconstructed image block and the At least one second characteristic image block corresponding to at least one second reconstructed image block;
  • the characteristic image block of the currently reconstructed image block is used to characterize at least one of the following characteristics of the original image block of the currently reconstructed image block:
  • loop filtering When loop filtering is performed at the encoding end, it can be processed according to the specified filter sequence.
  • the neural network loop filter module ie, the filter composed of the first neural network and the second neural network
  • it can be processed as follows: Steps for loop filtering:
  • nnlf_enable_flag it is judged whether the neural network loop filter module can be used under the current reconstructed image sequence. If nnlf_enable_flag is "1", try to perform neural network loop filter module processing on the current reconstructed image sequence, that is, skip to step b; if nnlf_enable_flag is "0", then the current reconstructed image sequence does not use neural network loop filter module, which ends the neural network-based loop filtering process.
  • Step b
  • D D_net-D_rec
  • the decoder acquires and parses the code stream, and when it is parsed to the loop filter, it can be processed according to the specified filter sequence.
  • the loop filtering can be performed according to the following steps:
  • nnlf_enable_flag it is judged whether the neural network loop filter module can be used under the current reconstructed image sequence. If nnlf_enable_flag is 1, try to perform neural network loop filter module processing on the current reconstructed image sequence, that is, skip to step b; if nnlf_enable_flag is 0, then the current reconstructed image sequence does not use the neural network loop filter module, that is, end Neural network-based loop filter processing.
  • Step b
  • step b If the current reconstructed image has completed the decision-making of the neural network loop filter module, load the next frame for processing and skip to step b.
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be used in this application.
  • the implementation of the examples constitutes no limitation.
  • Fig. 10 is a schematic block diagram of a filtering device 500 according to an embodiment of the present application.
  • the filtering device 500 may include:
  • An parsing unit 510 configured to parse the code stream to obtain the current reconstructed image block
  • the prediction unit 520 is configured to determine that when the first neural network is used to filter the current reconstructed image block, the second neural network is used to predict the features of the original image block of the current reconstructed image block to obtain the current reconstructed image The feature image block of the block;
  • the filtering unit 530 is configured to use the first neural network to filter the current reconstructed image block based on the feature image block of the current reconstructed image block to obtain a filtered reconstructed image block.
  • the prediction unit 520 uses the second neural network to predict the features of the original image block of the current reconstructed image block, and before obtaining the feature image block of the current reconstructed image block, the prediction unit 520 Also used for:
  • the value of the sequence identifier when the value of the sequence identifier is the first value, it means that the first neural network is allowed to use the first neural network to filter the reconstructed image block in the current reconstructed image sequence to which the current reconstructed image block belongs, and the value of the sequence identifier is When the value is the second value, it means that the first neural network is not allowed to filter the reconstructed image blocks in the current reconstructed image sequence;
  • the prediction unit 520 is specifically configured to:
  • the value of the component identifier when the value of the component identifier is the first value, it means that the first neural network is allowed to use the first neural network to analyze the same components as the current reconstructed image block in the current reconstructed image to which the current reconstructed image block belongs.
  • the reconstructed image block is filtered, and when the value of the component identifier is the second value, it means that the first neural network is not allowed to reconstruct the same component in the current reconstructed image as the current reconstructed image block.
  • Image blocks are filtered;
  • the prediction unit 520 is specifically configured to:
  • the value of the component identifier is the first value
  • the value of the image block identifier is obtained by parsing the code stream
  • the value of the image block identifier when the value of the image block identifier is the first value, it means that the first neural network is used to filter the currently reconstructed image block; if the value of the image block identifier is the second value means that the first neural network is not used to filter the current reconstructed image block;
  • the filtering unit 530 uses the first neural network to filter the current reconstructed image block based on the feature image block of the current reconstructed image block to obtain the filtered reconstructed image block.
  • the filtering unit 530 is also used for:
  • At least one first training data pair is obtained, and the at least one first training data pair includes at least one first reconstructed image block and the at least one first characteristic image block respectively corresponding to at least one first reconstructed image block;
  • the feature image of the first original image is a feature image obtained by using the second neural network to predict the first original image, or the feature image of the first original image is an annotated The feature image of the first original image.
  • the prediction unit 520 uses the second neural network to predict the features of the original image block of the current reconstructed image block, and before obtaining the feature image block of the current reconstructed image block, the prediction unit 520 Also used for:
  • At least one second training data pair is obtained, and the at least one second training data pair includes at least one third reconstructed image block and the At least one second characteristic image block corresponding to at least one second reconstructed image block;
  • the characteristic image block of the currently reconstructed image block is used to characterize at least one of the following characteristics of the original image block of the currently reconstructed image block:
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment.
  • the filtering device 500 shown in FIG. 10 may correspond to the corresponding subject in the method 300 of the embodiment of the present application, that is, the aforementioned and other operations and/or functions of each unit in the filtering device 500 are for realizing the method 300, etc. Corresponding processes in each method are not repeated here to avoid repetition.
  • Fig. 11 is a schematic block diagram of a filtering device 600 according to an embodiment of the present application.
  • the filtering device 600 may include:
  • An acquisition unit 610 configured to acquire the current reconstructed image block
  • the prediction unit 620 is configured to determine that when the first neural network is used to filter the current reconstructed image block, the second neural network is used to predict the features of the original image block of the current reconstructed image block to obtain the current reconstructed image The feature image block of the block;
  • the filtering unit 630 is configured to use the first neural network to filter the current reconstructed image block based on the feature image block of the current reconstructed image block, to obtain a filtered reconstructed image block.
  • the prediction unit 620 uses the second neural network to predict the features of the original image block of the current reconstructed image block, and before obtaining the feature image block of the current reconstructed image block, the prediction unit 620 Also used for:
  • the value of the sequence identifier when the value of the sequence identifier is the first value, it means that the first neural network is allowed to use the first neural network to filter the reconstructed image block in the current reconstructed image sequence to which the current reconstructed image block belongs, and the value of the sequence identifier is When the value is the second value, it means that the first neural network is not allowed to filter the reconstructed image blocks in the current reconstructed image sequence;
  • the prediction unit 620 is specifically configured to:
  • the value of the sequence identifier is the first value, then use the first neural network to filter the current reconstructed image block to obtain the filtered rate-distortion cost of the current reconstructed image block;
  • rate-distortion cost of the current reconstructed image block after filtering is greater than the rate-distortion cost of the current reconstructed image block before filtering, determine to use the first neural network to filter the current reconstructed image block;
  • the filtered rate-distortion cost of the current reconstructed image block is less than or equal to the rate-distortion cost of the current reconstructed image block before filtering, it is determined not to use the first neural network to filter the current reconstructed image block.
  • the prediction unit 620 is also used for:
  • the prediction unit 620 is also used for:
  • the value of the component identifier when the value of the component identifier is the first value, it means that the first neural network is allowed to use the first neural network to analyze the same components as the current reconstructed image block in the current reconstructed image to which the current reconstructed image block belongs.
  • the reconstructed image block is filtered, and when the value of the component identifier is the second value, it means that the first neural network is not allowed to reconstruct the same component in the current reconstructed image as the current reconstructed image block.
  • Image blocks are filtered;
  • the prediction unit 620 is specifically configured to:
  • the filtered rate-distortion cost of each reconstructed image block in the current reconstructed image that is the same as the component of the current reconstructed image block is less than or less than the rate-distortion cost of each reconstructed image before filtering, determine the component
  • the value of the identifier is the first value
  • the current reconstructed image includes a reconstructed image block whose rate-distortion cost after filtering is greater than the rate-distortion cost before filtering, determine that the value of the component identifier is the second value.
  • the prediction unit 620 is also used for:
  • the value of the image block identifier when the value of the image block identifier is the first value, it means that the first neural network is used to filter the currently reconstructed image block; if the value of the image block identifier is the second value indicates that the first neural network is not used to filter the current reconstructed image block.
  • the prediction unit 620 is specifically configured to:
  • the filtered rate-distortion cost of the current reconstructed image block is less than or equal to the rate-distortion cost of the current reconstructed image block before filtering, then determine that the value of the image block identifier is the second value.
  • the filtering unit 630 uses the first neural network to filter the current reconstructed image block based on the feature image block of the current reconstructed image block, and before obtaining the filtered reconstructed image block, the The filtering unit 630 is also used for:
  • At least one first training data pair is obtained, and the at least one first training data pair includes at least one first reconstructed image block and the at least one first characteristic image block respectively corresponding to at least one first reconstructed image block;
  • the feature image of the first original image is a feature image obtained by using the second neural network to predict the first original image, or the feature image of the first original image is an annotated The feature image of the first original image.
  • the prediction unit 620 uses the second neural network to predict the features of the original image block of the current reconstructed image block, and before obtaining the feature image block of the current reconstructed image block, the prediction unit 620 Also used for:
  • At least one second training data pair is obtained, and the at least one second training data pair includes at least one third reconstructed image block and the At least one second characteristic image block corresponding to at least one second reconstructed image block;
  • the characteristic image block of the currently reconstructed image block is used to characterize at least one of the following characteristics of the original image block of the currently reconstructed image block:
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment.
  • the filtering device 600 shown in FIG. 11 may correspond to the corresponding subject in the method 400 of the embodiment of the present application, that is, the aforementioned and other operations and/or functions of each unit in the filtering device 600 are for realizing the method 400, etc. Corresponding processes in each method are not repeated here to avoid repetition.
  • the various units in the filtering device 500 or filtering device 600 involved in the embodiment of the present application can be respectively or all combined into one or several other units to form, or some (some) units can be further disassembled. Divided into a plurality of functionally smaller units, this can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions. In practical applications, the functions of one unit may also be realized by multiple units, or the functions of multiple units may be realized by one unit. In other embodiments of the present application, the filtering device 500 or the filtering device 600 may also include other units.
  • a general-purpose computing device including a general-purpose computer such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM) and a storage element
  • CPU central processing unit
  • RAM random access storage medium
  • ROM read-only storage medium
  • Run computer programs capable of executing the steps involved in the corresponding methods to construct the filtering device 500 or filtering device 600 involved in the embodiment of the present application, and implement the filtering method in the embodiment of the present application.
  • the computer program can be recorded in, for example, a computer-readable storage medium, loaded in an electronic device through the computer-readable storage medium, and run therein to implement the corresponding method of the embodiment of the present application.
  • the units mentioned above can be implemented in the form of hardware, can also be implemented by instructions in the form of software, and can also be implemented in the form of a combination of software and hardware.
  • each step of the method embodiment in the embodiment of the present application can be completed by an integrated logic circuit of the hardware in the processor and/or instructions in the form of software, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as hardware
  • the execution of the decoding processor is completed, or the combination of hardware and software in the decoding processor is used to complete the execution.
  • the software may be located in mature storage media in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, and registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • FIG. 12 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
  • the electronic device 700 includes at least a processor 710 and a computer-readable storage medium 720 .
  • the processor 710 and the computer-readable storage medium 720 may be connected through a bus or in other ways.
  • the computer-readable storage medium 720 is used for storing a computer program 721
  • the computer program 721 includes computer instructions
  • the processor 710 is used for executing the computer instructions stored in the computer-readable storage medium 720 .
  • the processor 710 is the computing core and control core of the electronic device 700, which is suitable for implementing one or more computer instructions, specifically for loading and executing one or more computer instructions so as to realize corresponding method procedures or corresponding functions.
  • the processor 710 may also be called a central processing unit (Central Processing Unit, CPU).
  • the processor 710 may include but not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the computer-readable storage medium 720 can be a high-speed RAM memory, or a non-volatile memory (Non-VolatileMemory), such as at least one disk memory; computer readable storage medium.
  • the computer-readable storage medium 720 includes, but is not limited to: volatile memory and/or non-volatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Random Access Memory
  • Static Random Access Memory SRAM
  • Dynamic Random Access Memory DRAM
  • Synchronous Dynamic Random Access Memory Synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory Double Data Rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory Enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the electronic device 700 may be an encoding terminal, an encoder, or an encoding framework involved in the embodiment of the present application;
  • the computer-readable storage medium 720 stores a first computer instruction; the computer-readable instruction is loaded and executed by the processor 710
  • the first computer instruction stored in the storage medium 720 is used to implement the corresponding steps in the filtering method provided by the embodiment of the present application; in other words, the first computer instruction in the computer-readable storage medium 720 is loaded by the processor 710 and executes the corresponding steps, To avoid repetition, details are not repeated here.
  • the electronic device 700 may be a decoding terminal, a decoder or a decoding framework involved in the embodiment of the present application;
  • the computer-readable storage medium 720 stores a second computer instruction; the computer-readable instruction is loaded and executed by the processor 710
  • the second computer instruction stored in the storage medium 720 is used to implement the corresponding steps in the filtering method provided by the embodiment of the present application; in other words, the second computer instruction in the computer-readable storage medium 720 is loaded by the processor 710 and executes the corresponding steps, To avoid repetition, details are not repeated here.
  • the embodiment of the present application further provides a computer-readable storage medium (Memory).
  • the computer-readable storage medium is a storage device in the electronic device 700 and is used to store programs and data.
  • computer readable storage medium 720 may include a built-in storage medium in the electronic device 700 , and of course may also include an extended storage medium supported by the electronic device 700 .
  • the computer-readable storage medium provides storage space, and the storage space stores the operating system of the electronic device 700 .
  • one or more computer instructions adapted to be loaded and executed by the processor 710 are also stored in the storage space, and these computer instructions may be one or more computer programs 721 (including program codes).
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • computer program 721 the data processing device 700 may be a computer, and the processor 710 reads the computer instructions from the computer-readable storage medium 720, and the processor 710 executes the computer instructions, so that the computer executes the filtering method provided in the above-mentioned various optional modes .
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from a website, computer, server, or data center via Wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center.
  • Wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

本申请实施例提供了一种滤波方法、滤波装置以及电子设备,所述滤波方法包括:解析码流得到当前重建图像块;确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。本申请提供的滤波方法能够提升解码性能。

Description

滤波方法、滤波装置以及电子设备 技术领域
本申请实施例涉及图像编解码技术领域,并且更具体地,涉及滤波方法、滤波装置以及电子设备。
背景技术
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够实现视频解压缩,但目前仍然需要追求更好的数字视频解压缩技术,以提升解码性能。
发明内容
本申请提供了一种滤波方法、滤波装置以及电子设备,能够提升解码性能。
第一方面,本申请提供了一种滤波方法,包括:
解析码流得到当前重建图像块;
确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
第二方面,本申请提供了一种滤波方法,包括:
获取当前重建图像块;
确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
第三方面,本申请提供了一种滤波装置,包括:
解析单元,用于解析码流得到当前重建图像块;
预测单元,用于确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
滤波单元,用于利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
第四方面,本申请提供了一种滤波装置,包括:
获取单元,用于获取当前重建图像块;
预测单元,用于确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
滤波单元,用于利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
第五方面,本申请提供了一种电子设备,包括:
处理器,适于实现计算机指令;以及,
计算机可读存储介质,计算机可读存储介质存储有计算机指令,计算机指令适于由处理器加载并执行上述第一方面至第二方面中的任一方面或其各实现方式中的滤波方法。
在一种实现方式中,该处理器为一个或多个,该存储器为一个或多个。
在一种实现方式中,该计算机可读存储介质可以与该处理器集成在一起,或者该计算机可读存储介质与处理器分离设置。
第六方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机指令,该计算机指令被计算机设备的处理器读取并执行时,使得计算机设备执行上述第一方面至第二方面中的任一方面或其各实现方式中的滤波方法。
第七方面,本申请提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
基于以上技术方案,一方面,通过引入第二神经网络,并将所述第二神经网络设计为用于对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块,另一方面,通过 引入第一神经网络,并将所述第一神经网络设计为基于当前重建图像块的特征图像块对所述当前重建图像块进行滤波,由此,不仅能够实现基于神经网络的滤波处理,还能够保证用于对所述当前重建图像块进行滤波的信息为尽可能贴合所述原始图像块的信息,进而,能够提升所述当前重建图像块的图像质量并提升解码性能。
此外,通过引入第二神经网络,能够保证编码端和解码端对所述当前重建图像块的特征图像块的理解保持一致,进一步提升了解码性能。
换言之,本申请考虑到滤波的目的是使当前重建图像块更加接近于原始图像块,通过引入的第一神经网络将提取的原始图像块的特征图像块作为输入,对当前重建图像块进行滤波处理,能够提升当前重建图像的质量并提高解码性能;此外,对于原始图像块的特征图像块,本申请考虑到虽然编码端可以通过分析原始图像块来获取,但是解码端却无法获取,通过引入第二神经网络作为特征提取器,能够保证解码端能够获取原始图像块的特征图像块,进一步提升解码性能。也即是说,本申请提出了一种利用原始图像的特征图像块对当前重建图像块进行滤波的神经网络或滤波方法,能够提升当前重建图像块的图像质量以及提升解码性能。
附图说明
图1是本申请实施例提供的编码框架的示意性框图。
图2是本申请实施例提供的解码框架的示意性框图。
图3是本申请实施例提供的滤波方法的示意性流程图。
图4是本申请实施例提供的第一神经网络和第二神经网络的连接关系的示意图。
图5是本申请实施例提供的包括第一神经网络和第二神经网络的滤波单元的示意图。
图6是本申请实施例提供的第二神经网络的示意性结构图。
图7是本申请实施例提供的第一神经网络的示意性结构图。
图8是本申请实施例提供的残差块的示意性结构图。
图9是本申请实施例提供的滤波方法的另一示意性流程图。
图10是本申请实施例的滤波装置的示意性框图。
图11是本申请实施例的滤波装置的另一示意性框图。
图12是本申请实施例提供的电子设备的示意结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的方案可应用于数字视频编码技术领域,例如,图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域。本申请实施例提供的方案可结合至音视频编码标准(Audio Video coding Standard,AVS)、第二代AVS标准(AVS2)或第三代AVS标准(AVS3)。包括但不限于H.264/音视频编码(Audio Video coding,AVC)标准、H.265/高效视频编码(High Efficiency Video Coding,HEVC)标准以及H.266/多功能视频编码(Versatile Video Coding,VVC)标准。本申请实施例提供的方案可以用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
在数字视频编码过程中,编码器对不同颜色格式的原始视频序列读取不相等的亮度分量的像素和色度分量的像素,即编码器读取一幅黑白图像或彩色图像,然后针对图像或彩色图像分别进行编码。其中,黑白图像可以包括亮度分量的像素,彩色图像可以包括色度分量的像素,可选的,彩色图像还可以包括亮度分量的像素。原始视频序列的颜色格式可以是亮度色度(YCbCr,YUV)格式或红绿蓝(Red-Green-Blue,RGB)格式等。编码器读取一幅黑白图像或彩色图像之后,分别将其划分成块数据,并对块数据进行编码。该块数据可以是编码树单元(Coding Tree Unit,CTU)或编码单元块(Coding Unit,CU),一个编码树单元又可以继续被划分成若干个CU,CU可以为长方形块也可以为正方形块。即编码器可基于CTU或CU进行编码。如今编码器通常为混合框架编码模式,一般包含帧内与帧间预测、变换与量化、反变换与反量化、环路滤波及熵编码等操作。帧内预测只参考同一帧图像的信息,预测当前划分块内的像素信息,用于消除空间冗余;帧间预测可以参考不同帧的图像信息,利用运动估计搜索最匹配当前划分块的运动矢量信息,用于消除时间冗余;变换将预测后的图像块转换到频率域,能量重新分布,结合量化可以将人眼不敏感的信息去除,用于消除视觉冗余;熵编码可以根据当前上下文模型以 及二进制码流的概率信息消除字符冗余。环路滤波则主要对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考。
为了便于理解,先对本申请提供的编码框架进行简单介绍。
图1是本申请实施例提供的编码框架100的示意性框图。
如图1所示,该编码框架100可包括帧内预测单元180、帧间预测单元170、残差单元110、变换与量化单元120、熵编码单元130、反变换与反量化单元140、以及环路滤波单元150。可选的,该编码框架100还可包括解码图像缓冲单元160。该编码框架100也可称为混合框架编码模式。
在编码框架100中,帧内预测单元180或帧间预测单元170可对待编码图像块进行预测,以输出预测块。残差单元110可基于预测块与待编码图像块计算残差块,即预测块和待编码图像块的差值。该残差块经由变换与量化单元120变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换与量化单元120变换与量化之前的残差块可称为时域残差块,经过变换与量化单元120变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元130接收到变换与量化单元120输出的变换量化系数后,可基于该变换量化系数输出码流。例如,熵编码单元130可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。例如,熵编码单元130可以用于基于上下文的自适应二进制算术熵编码(CABAC)。熵编码单元130也可称为头信息编码单元。可选的,在本申请中,该待编码图像块也可称为原始图像块或目标图像块,预测块也可称为预测图像块或图像预测块,还可以称为预测信号或预测信息,重建块也可称为重建图像块或图像重建块,还可以称为重建信号或重建信息。此外,针对编码端,该待编码图像块也可称为编码块或编码图像块,针对解码端,该待编码图像块也可称为解码块或解码图像块。该待编码图像块可以是CTU或CU。
简言之,编码框架100将预测块与待编码图像块计算残差得到残差块经由变换与量化等过程,将残差块传输到解码端。解码端接收并解析码流后,经过反变换与反量化等步骤得到残差块,将解码端预测得到的预测块叠加残差块后得到重建块。
需要说明的是,编码框架100中的反变换与反量化单元140、环路滤波单元150以及解码图像缓冲单元160可用于形成一个解码器。相当于,帧内预测单元180或帧间预测单元170可基于已有的重建块对待编码图像块进行预测,进而能够保证编码端和解码端的对参考帧的理解一致。换言之,编码器可复制解码器的处理环路,进而可与解码端产生相同的预测。具体而言,量化的变换系数通过反变换与反量化单元140反变换与反量化来复制解码端的近似残差块。该近似残差块加上预测块后可经过环路滤波单元150,以平滑滤除由于基于块处理和量化产生的块效应等影响。环路滤波单元150输出的图像块可存储在解码图像缓存单元160中,以便用于后续图像的预测。
帧内预测单元180可用于帧内预测,帧内预测只参考同一帧图像的信息,预测待编码图像块内的像素信息,用于消除空间冗余;帧内预测所使用的帧可以为I帧。例如,可根据从左至右、从上到下的编码顺序,待编码图像块可以参考左上方图像块,上方图像块以及左侧图像块作为参考信息来预测待编码图像块,而待编码图像块又作为下一个图像块的参考信息,如此,可对整幅图像进行预测。若输入的数字视频为彩色格式,例如YUV 4:2:0格式,则该数字视频的每一图像帧的每4个像素点由4个Y分量和2个UV分量组成,编码框架100可对Y分量(即亮度块)和UV分量(即色度块)分别进行编码。类似的,解码端也可根据格式进行相应的解码。帧间预测单元170可用于帧间预测,帧间预测可以参考不同帧的图像信息,利用运动估计搜索最匹配待编码图像块的运动矢量信息,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。
针对帧内预测过程,帧内预测可借助角度预测模式与非角度预测模式对待编码图像块进行预测,以得到预测块,根据预测块与待编码图像块计算得到的率失真信息,筛选出待编码图像块最优的预测模式,并将该预测模式经码流传输到解码端。解码端解析出预测模式,预测得到目标解码块的预测块并叠加经码流传输而获取的时域残差块,可得到重建块。经过历代的数字视频编解码标准发展,非角度模式保持相对稳定,有均值模式和平面模式;角度模式则随着数字视频编解码标准的演进而不断增加。以国际数字视频编码标准H系列为例,H.264/AVC标准仅有8种角度预测模式和1种非角度预测模式;H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。在H.266/VVC中,帧内预测模式被进一步拓展,对于亮度块共有67种传统预测模式和非传统的预测模式矩阵加权帧内预测(Matrix weighted intra-frame prediction,MIP)模式,传统预测模式包括:模式编号0的平面(planar)模式、模式编号1的DC模式和模式编号2到模式编号66的角度预测模式。
应理解,图1仅为本申请的示例,不应理解为对本申请的限制。
例如,该编码框架100中的环路滤波单元150可包括去块滤波器(DBF)、样点自适应补偿滤波(SAO)以及自适应修正滤波器(ALF)。DBF的作用是去块效应,SAO的作用是去振铃效应。在本申请的其他实施例中,该编码框架100可采用基于神经网络的环路滤波算法,以提高视频的压缩效率。或者说, 该编码框架100可以是基于深度学习的神经网络的视频编码混合框架。在一种实现中,可以在去块滤波器和样点自适应补偿滤波基础上,采用基于神经网络计算对像素滤波后的结果。环路滤波单元150在亮度分量和色度分量上的网络结构可以相同,也可以有所不同。考虑到亮度分量包含更多的视觉信息,还可以采用亮度分量指导色度分量的滤波,以提升色度分量的重建质量。
图2是本申请实施例提供的解码框架200的示意性框图。
如图2所示,该解码框架200可包括熵解码单元210、反变换反量化单元220、残差单元230、帧内预测单元240、帧间预测单元250、环路滤波单元260、解码图像缓存单元270。
熵解码单元210接收并解析码流后,以获取预测块和频域残差块,针对频域残差块,通过反变换反量化单元220进行反变换与反量化等步骤,可获取时域残差块,残差单元230将帧内预测单元240或帧间预测单元250预测得到的预测块叠加至经过通过反变换反量化单元220进行反变换与反量化之后的时域残差块,可得到重建块。例如,帧内预测单元240或帧间预测单元250可通过解码码流的头信息,获取预测块。
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够实现视频解压缩,但仍然需要追求更好的数字视频解压缩技术,以提升解码性能。此外,在视频编解码标准例如AVS3或VVC中,传统环路滤波模块主要包含有去块滤波器(DBF)、样值自适应补偿(SAO)和自适应修正滤波器(ALF)等工具。但是,随着深度学习技术的发展,也可以通过引入基于神经网络的滤波器的来提升解码性能。
有鉴于此,本申请提供了一种滤波方法、滤波装置以及电子设备,能够提升解码性能。
图3是本申请实施例的滤波方法300的示意性流程图。所述方法300可以由包括基于神经网络的滤波单元的解码框架实现。在一种实现方式中,可将基于神经网络的滤波单元扩展至图2所述的解码框架中,以执行所述滤波方法300。
如图3所示,所述滤波方法300可包括:
S310,解析码流得到当前重建图像块;
S320,确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
S330,利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
本实施例中,一方面,通过引入第二神经网络,并将所述第二神经网络设计为用于对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块,另一方面,通过引入第一神经网络,并将所述第一神经网络设计为基于当前重建图像块的特征图像块对所述当前重建图像块进行滤波,由此,不仅能够实现基于神经网络的滤波处理,还能够保证用于对所述当前重建图像块进行滤波的信息为尽可能贴合所述原始图像块的信息,进而,能够提升所述当前重建图像块的图像质量并提升解码性能。
此外,通过引入第二神经网络,能够保证编码端和解码端对所述当前重建图像块的特征图像块的理解保持一致,进一步提升了解码性能。
换言之,本申请考虑到滤波的目的是使当前重建图像块更加接近于原始图像块,通过引入的第一神经网络将提取的原始图像块的特征图像块作为输入,对当前重建图像块进行滤波处理,能够提升当前重建图像的质量并提高解码性能;此外,对于原始图像块的特征图像块,本申请考虑到虽然编码端可以通过分析原始图像块来获取,但是解码端却无法获取,通过引入第二神经网络作为特征提取器,能够保证解码端能够获取原始图像块的特征图像块,进一步提升解码性能。也即是说,本申请提出了一种利用原始图像的特征图像块对当前重建图像块进行滤波的神经网络或滤波方法,能够提升当前重建图像块的图像质量以及提升解码性能。
需要说明的是,本申请可以仅基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,也可以结合其他信息基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,本申请对此不作具体限定。
示例性地,可以将所述当前重建图像块的信息和所述当前重建图像块的特征图像块作为所述第一神经网络的输入,对所述当前重建图像块进行滤波,以提升解码性能。可选的,所述当前重建图像块的信息包括但不限于:颜色分量(Y/U/V)的像素值,块划分信息,预测信息,去块边界强度信息以及量化步长(QP)信息等。例如,可以引入了亮度分量作为输入来指导色度分量的滤波。需要说明的是,所述当前重建图像块的特征图像块为所述当前重建图像块的原始图像块的信息,其不用于所述当前重建图像块的信息。
此外,本申请对所述第一神经网络的网络结构也不作具体限定。
示例性地,所述第一神经网络可以是基于深度学习的环路滤波器。
示例性地,所述第一神经网络可以是基于残差神经网络的环路滤波器(CNNLF)。
示例性地,所述第一神经网络可包括用于亮度分量的网络结构和用于色度分量的网络结构。可选的,用于亮度分量的网络结构或用于色度分量的网络结构可由卷积层、激活层、残差块、跳转连接等组成;其中残差块的网络结构由卷积层、激活层和跳转连接组成;进一步的,还可以包括一条从输入到输出的全局跳转连接,使网络结构专注于学习残差,加速了网络结构的收敛过程。可选的,用于色度分量的网络结构可以引入了亮度分量作为输入来指导色度分量的滤波。
示例性地,所述第一神经网络可以是基于深度卷积神经网络的环路滤波器。
示例性地,所述第一神经网络可以包括采用多层残差网络。可选的,可引入多种模式的信息作为输入对当前重建图像块进行滤波,并通过计算各个模型的率失真代价来选择最佳模型进行滤波处理。
此外,本申请对所述第一神经网络和所述第二神经网络在编解码框架或滤波单元中的具体位置不作限定。示例性地,所述第一神经网络和所述第二神经网络形成的网络结构也可称为基于神经网络的环路滤波器(Neural Network based Loop Filter,NNLF)。
下面结合图4至图8对所述第一神经网络和所述第二神经网络的连接关系和网络结构进行示例性说明。
图4是本申请实施例提供的第一神经网络和第二神经网络的连接关系的示意图。
如图4所示,考虑到当前重建图像块的原始图像块无法在解码端获取,为了编解码相互匹配,导致在编码端理论上也不能直接使用原始图像块的特征图像块。因此,本申请先通过第二神经网络来预测原始图像块的特征图像块,然后将预测的特征图像块输入第二神经网络中对当前重建图像块进行滤波处理。也即是说,用于对所述当前重建图像块进行滤波的神经网络环路滤波器由两个神经网络组成,分别为第二神经网络和第一神经网络。
图5是本申请实施例提供的包括第一神经网络和第二神经网络的滤波单元的示意图。
如图5所示,所述第一神经网络和所述第二神经网络形成的网络结构可位于SAO和ALF之间。可选的,所述第一神经网络和所述第二神经网络形成的网络结构的使用不依赖于DBF、SAO、ALF的开关,只是在位置上置于SAO和ALF之间。
图6是本申请实施例提供的第二神经网络的示意性结构图。
如图6所示,所述第二神经网络由k层卷积层组成,其中除了最后一层卷积层之外,每层卷积层后有非线性激活函数(PReLU)层。所述第二神经网络的输入为当前重建图像块,输出为预测的当前重建图像块的特征图像块。
图7是本申请实施例提供的第一神经网络的示意性结构图。
如图7所示,所述第一神经网络首先对第二神经网络输出的当前重建图像块的特征图像块进行卷积操作,然后与输入的当前重建图像块进行通道间合并,输入到下一层网络中。所述第一神经网络的第二层和最后一层为卷积层,并且有一条输入到输出端的跳连接线,使所述第一神经网络专注于学习残差,加速了网络结构的收敛过程。所述第一神经网络的中间位置可以级联有n个残差块。所述第一神经网络的输入为第二神经网络输出的当前重建图像块的特征图像块和当前重建图像块,所述第一神经网络的输出为滤波处理后的重建图像块。
图8是本申请实施例提供的残差块的示意性结构图。
如图8所示,一个残差块可由两层卷积层组成,其中第一层卷积层后有非线性激活函数(PReLU)层,并且也有一条输入到输出端的跳连接线。
当然,图3至图8仅为本申请的示例,不应理解为对本申请的限制。
例如,例如,本申请实施例对神经网络的结构,包括卷积层的数量、残差块的数量以及非线性激活函数等具体实现方式不作限定。
在一些实施例中,所述S320之前,所述方法300还可包括:
解析所述码流得到序列标识的取值;
其中,所述序列标识的取值为第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像序列中的重建图像块进行滤波,所述序列标识的取值为第二数值时表示不允许使用所述第一神经网络对所述当前重建图像序列中的重建图像块进行滤波;
基于所述序列标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
示例性地,所述序列标识可携带在码流的序列头中。
示例性地,下面结合表1对序列头的格式进行说明。
表1
Figure PCTCN2021143804-appb-000001
如表1所示,sequence_header可表示序列头,nnlf_enable_flag可用于表示所述序列标识。例如,nnlf_enable_flag的取值为1时,表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像序列中的重建图像块进行滤波;例如,nnlf_enable_flag的取值为0时,表示不允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像序列中的重建图像块进行滤波。
本实施例中,若所述序列标识的取值为所述第二数值,相当于,对于所述当前重建图像序列中的所有重建图像块,解码端可一次性确定不使用所述第一神经网络进行滤波,能够避免解码端以遍历的方式针对所述当前重建图像序列中的每一个重建图像块确定是否使用所述第一神经网络进行滤波,能够提升解码效率。
应当理解,本申请对所述第一数值和所述第二数值的具体取值不作具体限定。例如,在一种实现方式中,所述第一数值为1且所述第二数值为0,在另一种实现方式中,所述第一数值为0且所述第二数值为1。
在一些实施例中,若所述序列标识的取值为所述第一数值,则解析所述码流得到分量标识的取值;其中,所述分量标识的取值为所述第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波,所述分量标识的取值为所述第二数值时表示不允许使用所述第一神经网络对所述当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波;基于所述分量标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
示例性地,所述分量标识可携带在图像头中。
示例性地,下面结合表2对图像头的格式进行说明。
表2
Figure PCTCN2021143804-appb-000002
如表2所示,picture_header可用于表示图像头,nnlf_enable_flag可用于表示所述序列标识,compIdx可用于表示当前重建图像块的第x个分量。例如,compIdx为0时表示亮度分量;compIdx为1时表示Cb分量;compIdx为2时表示Cr分量。picture_nnlf_enable_flag[compIdx]表示第x个分量的分量标识。例如,picture_nnlf_enable_flag[compIdx]的取值为1时,表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的第x个分量的重建图像块进行滤波;picture_nnlf_enable_flag[compIdx]的取值为0时,表示不允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的第x个分量的重建图像块进行滤波。
本实施例中,若所述分量标识的取值为所述第二数值,相当于,对于所述当前重建图像中的与所述当前重建图像块的分量相同的重建图像块,解码端可一次性确定不使用所述第一神经网络进行滤波,能够避免解码端以遍历的方式针对所述当前重建图像序列中的每一个重建图像块的分量确定是否使用所述第一神经网络进行滤波,能够提升解码效率。
在一些实施例中,若所述分量标识的取值为所述第一数值,则通过解析所述码流得到图像块标识的取值;其中,所述图像块标识的取值为所述第一数值时表示使用所述第一神经网络对所述当前重建图像块进行滤波,若所述图像块标识的取值为所述第二数值时表示不使用所述第一神经网络对所述当前重建图像块进行滤波;基于所述图像块标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
示例性地,所述分量标识可携带在批次(patch)中。
示例性地,下面结合表3对批次的格式进行说明。
表3
Figure PCTCN2021143804-appb-000003
如表3所示,picture_header可用于表示图像头,nnlf_enable_flag可用于表示所述序列标识,compIdx可用于表示当前重建图像块的第x个分量。例如,compIdx为0时表示亮度分量;compIdx为1时表示Cb分量;compIdx为2时表示Cr分量。picture_nnlf_enable_flag[compIdx]表示第x个分量的分量标识。patch_nnlf_enable_flag[compIdx][LcuIdx]表示所述当前重建图像块所属的当前重建图像中的第x个分量的第x个图像块的图像块标识。例如,patch_nnlf_enable_flag[compIdx][LcuIdx]的取值为1时,表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的第x个分量的第x个图像块进行滤波;patch_nnlf_enable_flag[compIdx][LcuIdx]的取值为0时,表示不允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的第x个分量的第x个图像块进行滤波。
在一些实施例中,所述S330之前,所述方法300还可包括:
获取第一原始图像的重建图像和所述第一原始图像的特征图像;
基于所述第一原始图像的重建图像和所述第一原始图像的特征图像,得到至少一个第一训练数据对,所述至少一个第一训练数据对包括至少一个第一重建图像块和所述至少一个第一重建图像块分别对应的至少一个第一特征图像块;
利用所述第一神经网络,分别基于所述至少一个第一特征图像块对所述至少一个第一重建图像块进行滤波,得到滤波后的至少一个第二重建图像块;
基于所述至少一个第一重建图像块和所述至少一个第二重建图像块之间的差异,调整所述第一神经网络,得到训练后的所述第一神经网络。
示例性地,对于第一神经网络来说,其目标是训练一套网络参数,使得所述第一神经网络输出的图像更加接近于目标图像。例如,使用VVC视频编码探索实验小组所推荐的数据集DIV2K和BVI-DVC作为训练集,首先将PNG格式图像或MP4格式视频转换为YUV420格式的待压缩视频,获取各颜色分量的原始视频信息。然后使用VTM参考软件测试平台进行编码得到重建视频,按照亮度分量的图像块为128x128大小的图像块,色度分量的图像块为64x64大小的图像块,将重建图像块与原始图像块组成训练数据对(data-label),作为第一神经网络的训练集。
在一些实施例中,所述第一原始图像的特征图像为利用所述第二神经网络对所述第一原始图像进行预测得到的特征图像,或所述第一原始图像的特征图像为已标注的所述第一原始图像的特征图像。
在一些实施例中,所述S320之前,所述方法300还可包括:
获取第二原始图像的重建图像和已标注的所述第二原始图像的特征图像;
基于所述第二原始图像的重建图像和所述第二原始图像的特征图像,得到至少一个第二训练数据对,所述至少一个第二训练数据对包括至少一个第三重建图像块和所述至少一个第二重建图像块对应的至少一个第二特征图像块;
利用所述第二神经网络对所述至少一个第三重建图像进行预测,得到至少一个第三特征图像块;
基于所述至少一个第二特征图像块和所述至少一个第三特征图像块之间的差异,调整所述第二神经网络,得到训练后的所述第二神经网络。
示例性地,对于第二神经网络来说,其目标是训练一套网络参数,使得所述第二神经网络输出的图像更加接近于目标图像的特征图像,因此,需要对目标图像的特征图像进行采集,以得到特征图像。一张图像具有丰富的特征信息,例如颜色特征、纹理特征、形状特征和空间关系特征等。颜色和纹理特征主要用于描述图像或图像区域所对应的景物的表面性质,形状特征主要用于描述图像的轮廓和区域特征,空间关系特征主要用于描述图像分割出来的多个目标之间的相互空间位置或相对方向关系。示例性地,所述空间特征可包括图像显著性。在机器视觉领域,图像显著性通过为图像中的每个像素分配一个标签的过程,使得相同标签的像素能够共享某些特征。图像显著性是图像中重要的视觉特征,体现出视觉对图像各区域的重视程度,图像显著性也广泛运用在压缩编码,边缘加强,显著性目标分区和特征提 取等方面。
以图像显著性为例,显著性检测任务是机器视觉领域的一个热门研究方向,可通过数学方法用于检测视觉空域显著信息。例如基于最小化栅格距离(MBD)来计算图像中各个像素点和背景候选像素集合(一般选取图像的边界像素)的距离;Binary方法在MBD的基础上,根据设定的门限值得到一张二元显著图;鲁棒背景检测(RBD)方法,利用连续性来提高背景先验的鲁棒性,利用分割算法将图像划分为多个区域,分别计算各区域与边界的关联性,确定最终的显著区域;FT算法则从频域出发,设计了一个带通滤波器,通过较低的低通截止频率突出显示整个显著区域,通过较高的高通截止频率显示明确的边界处并截断掉高频噪声信息。基于深度学习的方法也被用于空时域显著性检测任务中,例如SALICON模型,通过深层神经网络学习图像中的高级语义信息来对显著性区域进行检测,其检测性能较好。
以FT算法为例,来获取图像显著性S,具体计算公式如下所示:
S(x,y)=‖I μ-I ωhc(x,y)‖
其中,I μ表示图像的算术平均像素值,I ωhc表示图像的高斯模糊像素值,以消除细纹理、噪声及编码伪影,…‖表示计算欧氏距离。
需要说明的是,本申请提供的方案并不只局限于上述示例的特征,也可以是其他特征信息,本申请对此不作具体限定。
示例性地,可以使用VVC视频编码探索实验小组所推荐的数据集DIV2K和BVI-DVC作为训练集,以图像显著性为例,可以通过FT算法计算得到原始图像的显著特征图像,并转换为YUV420格式。然后使用VTM参考软件测试平台进行编解码得到重建视频,按照亮度分量的图像块为128x128大小的图像块,色度分量的图像块为64x64大小的图像块,将重建图像块与特征图像块组成训练数据对(data-label),作为第二神经网络的训练集。
需要说明的是,本申请中,用于对所述当前重建图像块进行滤波的滤波器包括第一神经网络和第二神经网络。其中,第二神经网络和第一神经网络应具有不同的训练目标,否则第二神经网络的存在没有意义。因此,第二神经网络和第一神经网络的训练可以是各自独立进行的,也可以是基于训练好的第二神经网络训练所述第一神经网络,本申请对此不作具体限定。此外,本申请对训练所述第一神经网络和/或所述第二神经网络所使用的具体参数不作限定。
在一些实施例中,所述当前重建图像块的特征图像块用于表征所述当前重建图像块的原始图像块的以下特征中的至少一项:
颜色特征、纹理特征、形状特征、空间特征。
上文中结合图3至图8,从解码端的角度详细描述了根据本申请实施例的滤波方法,下面将结合图9,从编码端的角度描述根据本申请实施例的滤波方法。
图9是本申请实施例提供的滤波方法400的示意性流程图。所述方法400可以由包括基于神经网络的滤波单元的编码框架实现。在一种实现方式中,可将基于神经网络的滤波单元扩展至图1所述的编码框架中,以执行所述滤波方法400。
如图9所示,所述滤波方法400可包括:
S410,获取当前重建图像块;
S420,确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
S430,利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
在一些实施例中,所述S420之前,所述方法400还可包括:
获取序列标识的取值;
其中,所述序列标识的取值为第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像序列中的重建图像块进行滤波,所述序列标识的取值为第二数值时表示不允许使用所述第一神经网络对所述当前重建图像序列中的重建图像块进行滤波;
基于所述序列标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
示例性地,所述序列标识可以是序列参数集中的标识。
表4
Figure PCTCN2021143804-appb-000004
Figure PCTCN2021143804-appb-000005
如表4所示,序列参数集(seq_parameter_set_rbsp)可包括序列标识(sps_aip_enabled_flag),序列标识为序列级控制开关,用于控制在当前序列是否允许开启基于第一神经网络的环路滤波,如该flag为1则代表在当前序列允许开启基于第一神经网络的环路滤波,为0则代表不允许开启基于第一神经网络的环路滤波。在具体实现中,可以通过用户设置的方式控制序列参数集是否开启序列标识,在序列参数集中开启所述序列标识的情况下,编码器可通过查询用户配置的配置文件获取所述序列标识的具体数值,即该flag为1还是为0。
在一些实施例中,若所述序列标识的取值为所述第一数值,则使用所述第一神经网络对所述当前重建图像块进行滤波,得到所述当前重建图像块滤波后的率失真代价;若所述当前重建图像块滤波后的率失真代价大于所述当前重建图像块滤波前的率失真代价,则确定使用所述第一神经网络对所述当前重建图像块进行滤波;若所述当前重建图像块滤波后的率失真代价小于或等于所述当前重建图像块滤波前的率失真代价,则确定不使用所述第一神经网络对所述当前重建图像块进行滤波。
在一些实施例中,所述方法400还可包括:
将所述序列标识的取值,写入对基于所述当前重建图像块得到的当前残差块进行编码得到的码流。
在一些实施例中,所述方法400还可包括:
若所述序列标识的取值为所述第一数值,则生成分量标识的取值;
其中,所述分量标识的取值为所述第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波,所述分量标识的取值为所述第二数值时表示不允许使用所述第一神经网络对所述当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波;
将所述分量标识的取值,写入对基于所述当前重建图像块得到的当前残差块进行编码得到的码流。
在一些实施例中,若所述当前重建图像中的与所述当前重建图像块的分量相同的每一个重建图像块滤波后的率失真代价均小于或小于所述每一个重建图像滤波前的率失真代价,则确定分量标识的取值为所述第一数值;若所述当前重建图像包括滤波后的率失真代价大于滤波前的率失真代价的重建图像块,则确定所述分量标识的取值为所述第二数值。
在一些实施例中,所述方法400还可包括:
若所述分量标识的取值为所述第一数值,则生成图像块标识的取值;
其中,所述图像块标识的取值为所述第一数值时表示使用所述第一神经网络对所述当前重建图像块进行滤波,若所述图像块标识的取值为所述第二数值时表示不使用所述第一神经网络对所述当前重建图像块进行滤波。
在一些实施例中,若所述当前重建图像块滤波后的率失真代价大于所述当前重建图像块滤波前的率失真代价,则确定所述图像块标识的取值为所述第一数值;若所述当前重建图像块滤波后的率失真代价小于或等于所述当前重建图像块滤波前的率失真代价,则确定所述图像块标识的取值为所述第二数值。
在一些实施例中,所述S430之前,所述方法400还可包括:
获取第一原始图像的重建图像和所述第一原始图像的特征图像;
基于所述第一原始图像的重建图像和所述第一原始图像的特征图像,得到至少一个第一训练数据对,所述至少一个第一训练数据对包括至少一个第一重建图像块和所述至少一个第一重建图像块分别对应的至少一个第一特征图像块;
利用所述第一神经网络,分别基于所述至少一个第一特征图像块对所述至少一个第一重建图像块进行滤波,得到滤波后的至少一个第二重建图像块;
基于所述至少一个第一重建图像块和所述至少一个第二重建图像块之间的差异,调整所述第一神经网络,得到训练后的所述第一神经网络。
在一些实施例中,所述第一原始图像的特征图像为利用所述第二神经网络对所述第一原始图像进行预测得到的特征图像,或所述第一原始图像的特征图像为已标注的所述第一原始图像的特征图像。
在一些实施例中,所述S420之前,所述方法400还可包括:
获取第二原始图像的重建图像和已标注的所述第二原始图像的特征图像;
基于所述第二原始图像的重建图像和所述第二原始图像的特征图像,得到至少一个第二训练数据对,所述至少一个第二训练数据对包括至少一个第三重建图像块和所述至少一个第二重建图像块对应的至少一个第二特征图像块;
利用所述第二神经网络对所述至少一个第三重建图像进行预测,得到至少一个第三特征图像块;
基于所述至少一个第二特征图像块和所述至少一个第三特征图像块之间的差异,调整所述第二神经网络,得到训练后的所述第二神经网络。
在一些实施例中,所述当前重建图像块的特征图像块用于表征所述当前重建图像块的原始图像块的以下特征中的至少一项:
颜色特征、纹理特征、形状特征、空间特征。
应当理解,所述滤波方法400中的术语和步骤可以参考滤波方法300中描述的相应术语和相应步骤,为了简洁,在此不再赘述。
下面结合具体实施例对本申请的方案进行说明。
编码端进行环路滤波时,可按照规定的滤波器顺序进行处理,当进入神经网络环路滤波器模块(即由上述第一神经网络和第二神经网络组成的滤波器)时,可按照以下步骤进行环路滤波:
步骤a:
根据nnlf_enable_flag的取值,判断当前重建图像序列下是否可以使用神经网络环路滤波器模块。若nnlf_enable_flag为“1”,则对当前重建图像序列尝试进行神经网络环路滤波器模块处理,即跳至步骤b;若nnlf_enable_flag为“0”,则当前重建图像序列不使用神经网络环路滤波器模块,即结束基于神经网络的环路滤波处理。
步骤b:
对于当前重建图像序列的当前重建图像,遍历所有颜色分量的所有重建图像块,基于每一个重建编码块尝试神经网络环路滤波器,并与未滤波前的重建图像块相比较,计算率失真代价D,D=D_net-D_rec,即计算神经网络滤波处理后减少的失真,其中D_net为滤波后的失真,D_rec为滤波前的失真。如果滤波后的代价小于滤波前的代价,即D<0时,则令patch_nnlf_enable_flag[compIdx][LcuIdx]的取值为1;如果滤波后的代价大于滤波前的代价,即D≥0时,则令patch_nnlf_enable_flag[compIdx][LcuIdx]为0。若当前帧内所有颜色分量的所有图像块已遍历完成,则跳至步骤c。
步骤c:
对于当前重建图像序列的当前重建图像,如果patch_nnlf_enable_flag[compIdx][LcuIdx]的值全为0,则picture_nnlf_enable_flag[compIdx]为0,如果有patch_nnlf_enable_flag[compIdx][LcuIdx]的值不为0的重建图像块,则picture_nnlf_enable_flag[compIdx]为1。若当前帧已完成神经网络环路滤波器模块的决策,则加载下一帧进行处理,并跳转至步骤b。
相应的,解码器获取并解析码流,当解析到环路滤波时,可按照规定的滤波器顺序进行处理,当进入神经网络环路滤波器模块(即由上述第一神经网络和第二神经网络组成的滤波器)时,可按照以下步骤进行环路滤波:
步骤a:
根据nnlf_enable_flag的取值,判断当前重建图像序列下是否可以使用神经网络环路滤波器模块。若nnlf_enable_flag为1,则对当前重建图像序列尝试进行神经网络环路滤波器模块处理,即跳至步骤b;若nnlf_enable_flag为0,则当前重建图像序列不使用神经网络环路滤波器模块,即结束基于神经网络的环路滤波处理。
步骤b:
对于当前重建图像序列的当前重建图像,如果picture_nnlf_enable_flag[compIdx]为1,则跳至步骤c;如果picture_nnlf_enable_flag[compIdx]为0,则跳至步骤d。
步骤c:
对于当前重建图像序列的当前重建图像的当前颜色分量,遍历所有重建图像块,针对当前重建图像块,如果patch_nnlf_enable_flag[compIdx][LcuIdx]为1,则对当前重建图像块进行神经网络环路滤波;如果patch_nnlf_enable_flag[compIdx][LcuIdx]为0,则对当前重建图像块不进行神经网络环路滤波。若当前重建图像内所有颜色分量的所有重建图像块已遍历完成,则跳至步骤c。
步骤d:
若当前重建图像已完成神经网络环路滤波模块的决策,则加载下一帧进行处理,跳转至步骤b。
需要说明的是,上述实施例仅为本申请的示例,不应理解为对本申请的限制。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上文详细描述了本申请的方法实施例,下文结合图10至图12,详细描述本申请的装置实施例。
图10是本申请实施例的滤波装置500的示意性框图。
如图10所示,所述滤波装置500可包括:
解析单元510,用于解析码流得到当前重建图像块;
预测单元520,用于确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
滤波单元530,用于利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
在一些实施例中,所述预测单元520利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块之前,所述预测单元520还用于:
解析所述码流得到序列标识的取值;
其中,所述序列标识的取值为第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像序列中的重建图像块进行滤波,所述序列标识的取值为第二数值时表示不允许使用所述第一神经网络对所述当前重建图像序列中的重建图像块进行滤波;
基于所述序列标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
在一些实施例中,所述预测单元520具体用于:
若所述序列标识的取值为所述第一数值,则解析所述码流得到分量标识的取值;
其中,所述分量标识的取值为所述第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波,所述分量标识的取值为所述第二数值时表示不允许使用所述第一神经网络对所述当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波;
基于所述分量标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
在一些实施例中,所述预测单元520具体用于:
若所述分量标识的取值为所述第一数值,则通过解析所述码流得到图像块标识的取值;
其中,所述图像块标识的取值为所述第一数值时表示使用所述第一神经网络对所述当前重建图像块进行滤波,若所述图像块标识的取值为所述第二数值时表示不使用所述第一神经网络对所述当前重建图像块进行滤波;
基于所述图像块标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
在一些实施例中,所述滤波单元530利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块之前,所述滤波单元530还用于:
获取第一原始图像的重建图像和所述第一原始图像的特征图像;
基于所述第一原始图像的重建图像和所述第一原始图像的特征图像,得到至少一个第一训练数据对,所述至少一个第一训练数据对包括至少一个第一重建图像块和所述至少一个第一重建图像块分别对应的至少一个第一特征图像块;
利用所述第一神经网络,分别基于所述至少一个第一特征图像块对所述至少一个第一重建图像块进行滤波,得到滤波后的至少一个第二重建图像块;
基于所述至少一个第一重建图像块和所述至少一个第二重建图像块之间的差异,调整所述第一神经网络,得到训练后的所述第一神经网络。
在一些实施例中,所述第一原始图像的特征图像为利用所述第二神经网络对所述第一原始图像进行预测得到的特征图像,或所述第一原始图像的特征图像为已标注的所述第一原始图像的特征图像。
在一些实施例中,所述预测单元520利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块之前,所述预测单元520还用于:
获取第二原始图像的重建图像和已标注的所述第二原始图像的特征图像;
基于所述第二原始图像的重建图像和所述第二原始图像的特征图像,得到至少一个第二训练数据对,所述至少一个第二训练数据对包括至少一个第三重建图像块和所述至少一个第二重建图像块对应的至少一个第二特征图像块;
利用所述第二神经网络对所述至少一个第三重建图像进行预测,得到至少一个第三特征图像块;
基于所述至少一个第二特征图像块和所述至少一个第三特征图像块之间的差异,调整所述第二神经网络,得到训练后的所述第二神经网络。
在一些实施例中,所述当前重建图像块的特征图像块用于表征所述当前重建图像块的原始图像块的以下特征中的至少一项:
颜色特征、纹理特征、形状特征、空间特征。
需要说明的是,上述实施例仅为本申请的示例,不应理解为对本申请的限制。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。具体地,图10所示的滤波装置500可以对应于执行本申请实施例的方法300中的相应主体,即滤波装置500中的各个单元的前述和其它操作和/或功能分别为了实现方法300等各个方法中的相应流程,为避免重复,此处不再赘述。
图11是本申请实施例的滤波装置600的示意性框图。
如图11所示,所述滤波装置600可包括:
获取单元610,用于获取当前重建图像块;
预测单元620,用于确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
滤波单元630,用于利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
在一些实施例中,所述预测单元620利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块之前,所述预测单元620还用于:
获取序列标识的取值;
其中,所述序列标识的取值为第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像序列中的重建图像块进行滤波,所述序列标识的取值为第二数值时表示不允许使用所述第一神经网络对所述当前重建图像序列中的重建图像块进行滤波;
基于所述序列标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
在一些实施例中,所述预测单元620具体用于:
若所述序列标识的取值为所述第一数值,则使用所述第一神经网络对所述当前重建图像块进行滤波,得到所述当前重建图像块滤波后的率失真代价;
若所述当前重建图像块滤波后的率失真代价大于所述当前重建图像块滤波前的率失真代价,则确定使用所述第一神经网络对所述当前重建图像块进行滤波;
若所述当前重建图像块滤波后的率失真代价小于或等于所述当前重建图像块滤波前的率失真代价,则确定不使用所述第一神经网络对所述当前重建图像块进行滤波。
在一些实施例中,所述预测单元620还用于:
将所述序列标识的取值,写入对基于所述当前重建图像块得到的当前残差块进行编码得到的码流。
在一些实施例中,所述预测单元620还用于:
若所述序列标识的取值为所述第一数值,则生成分量标识的取值;
其中,所述分量标识的取值为所述第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波,所述分量标识的取值为所述第二数值时表示不允许使用所述第一神经网络对所述当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波;
将所述分量标识的取值,写入对基于所述当前重建图像块得到的当前残差块进行编码得到的码流。
在一些实施例中,所述预测单元620具体用于:
若所述当前重建图像中的与所述当前重建图像块的分量相同的每一个重建图像块滤波后的率失真代价均小于或小于所述每一个重建图像滤波前的率失真代价,则确定分量标识的取值为所述第一数值;
若所述当前重建图像包括滤波后的率失真代价大于滤波前的率失真代价的重建图像块,则确定所述分量标识的取值为所述第二数值。
在一些实施例中,所述预测单元620还用于:
若所述分量标识的取值为所述第一数值,则生成图像块标识的取值;
其中,所述图像块标识的取值为所述第一数值时表示使用所述第一神经网络对所述当前重建图像块进行滤波,若所述图像块标识的取值为所述第二数值时表示不使用所述第一神经网络对所述当前重建图像块进行滤波。
在一些实施例中,所述预测单元620具体用于:
若所述当前重建图像块滤波后的率失真代价大于所述当前重建图像块滤波前的率失真代价,则确定所述图像块标识的取值为所述第一数值;
若所述当前重建图像块滤波后的率失真代价小于或等于所述当前重建图像块滤波前的率失真代价,则确定所述图像块标识的取值为所述第二数值。
在一些实施例中所述滤波单元630利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块之前,所述滤波单元630还用于:
获取第一原始图像的重建图像和所述第一原始图像的特征图像;
基于所述第一原始图像的重建图像和所述第一原始图像的特征图像,得到至少一个第一训练数据对,所述至少一个第一训练数据对包括至少一个第一重建图像块和所述至少一个第一重建图像块分别对应的至少一个第一特征图像块;
利用所述第一神经网络,分别基于所述至少一个第一特征图像块对所述至少一个第一重建图像块进行滤波,得到滤波后的至少一个第二重建图像块;
基于所述至少一个第一重建图像块和所述至少一个第二重建图像块之间的差异,调整所述第一神经网络,得到训练后的所述第一神经网络。
在一些实施例中,所述第一原始图像的特征图像为利用所述第二神经网络对所述第一原始图像进行预测得到的特征图像,或所述第一原始图像的特征图像为已标注的所述第一原始图像的特征图像。
在一些实施例中,所述预测单元620利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块之前,所述预测单元620还用于:
获取第二原始图像的重建图像和已标注的所述第二原始图像的特征图像;
基于所述第二原始图像的重建图像和所述第二原始图像的特征图像,得到至少一个第二训练数据对,所述至少一个第二训练数据对包括至少一个第三重建图像块和所述至少一个第二重建图像块对应的至少一个第二特征图像块;
利用所述第二神经网络对所述至少一个第三重建图像进行预测,得到至少一个第三特征图像块;
基于所述至少一个第二特征图像块和所述至少一个第三特征图像块之间的差异,调整所述第二神经网络,得到训练后的所述第二神经网络。
在一些实施例中,所述当前重建图像块的特征图像块用于表征所述当前重建图像块的原始图像块的以下特征中的至少一项:
颜色特征、纹理特征、形状特征、空间特征。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。具体地,图11所示的滤波装置600可以对应于执行本申请实施例的方法400中的相应主体,即滤波装置600中的各个单元的前述和其它操作和/或功能分别为了实现方法400等各个方法中的相应流程,为避免重复,此处不再赘述。
还应当理解,本申请实施例涉及的滤波装置500或滤波装置600中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该滤波装置500或滤波装置600也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括例如中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的通用计算机的通用计算设备上运行能够执行相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造本申请实施例涉及的滤波装置500或滤波装置600,以及来实现本申请实施例的滤波方法。计算机程序可以记载于例如计算机可读存储介质上,并通过计算机可读存储介质装载于电子设备中,并在其中运行,来实现本申请实施例的相应方法。
换言之,上文涉及的单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过软硬件结合的形式实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件组合执行完成。可选地,软件可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图12是本申请实施例提供的电子设备700的示意结构图。
如图12所示,该电子设备700至少包括处理器710以及计算机可读存储介质720。其中,处理器710以及计算机可读存储介质720可通过总线或者其它方式连接。计算机可读存储介质720用于存储计算机程序721,计算机程序721包括计算机指令,处理器710用于执行计算机可读存储介质720存储的计算机指令。处理器710是电子设备700的计算核心以及控制核心,其适于实现一条或多条计算机指令,具体适于加载并执行一条或多条计算机指令从而实现相应方法流程或相应功能。
作为示例,处理器710也可称为中央处理器(CentralProcessingUnit,CPU)。处理器710可以包括但不限于:通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者 其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
作为示例,计算机可读存储介质720可以是高速RAM存储器,也可以是非不稳定的存储器(Non-VolatileMemory),例如至少一个磁盘存储器;可选的,还可以是至少一个位于远离前述处理器710的计算机可读存储介质。具体而言,计算机可读存储介质720包括但不限于:易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
示例性地,该电子设备700可以是本申请实施例涉及的编码端、编码器或编码框架;该计算机可读存储介质720中存储有第一计算机指令;由处理器710加载并执行计算机可读存储介质720中存放的第一计算机指令,以实现本申请实施例提供的滤波方法中的相应步骤;换言之,计算机可读存储介质720中的第一计算机指令由处理器710加载并执行相应步骤,为避免重复,此处不再赘述。
示例性地,该电子设备700可以是本申请实施例涉及的解码端、解码器或解码框架;该计算机可读存储介质720中存储有第二计算机指令;由处理器710加载并执行计算机可读存储介质720中存放的第二计算机指令,以实现本申请实施例提供的滤波方法中的相应步骤;换言之,计算机可读存储介质720中的第二计算机指令由处理器710加载并执行相应步骤,为避免重复,此处不再赘述。
根据本申请的另一方面,本申请实施例还提供了一种计算机可读存储介质(Memory),计算机可读存储介质是电子设备700中的记忆设备,用于存放程序和数据。例如,计算机可读存储介质720。可以理解的是,此处的计算机可读存储介质720既可以包括电子设备700中的内置存储介质,当然也可以包括电子设备700所支持的扩展存储介质。计算机可读存储介质提供存储空间,该存储空间存储了电子设备700的操作系统。并且,在该存储空间中还存放了适于被处理器710加载并执行的一条或多条的计算机指令,这些计算机指令可以是一个或多个的计算机程序721(包括程序代码)。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。例如,计算机程序721。此时,数据处理设备700可以是计算机,处理器710从计算机可读存储介质720读取该计算机指令,处理器710执行该计算机指令,使得该计算机执行上述各种可选方式中提供的滤波方法。
换言之,当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地运行本申请实施例的流程或实现本申请实施例的功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质进行传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元以及流程步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
最后需要说明的是,以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (26)

  1. 一种滤波方法,其特征在于,包括:
    解析码流得到当前重建图像块;
    确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
    利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
  2. 根据权利要求1所述的方法,其特征在于,所述利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块之前,所述方法还包括:
    解析所述码流得到序列标识的取值;
    其中,所述序列标识的取值为第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像序列中的重建图像块进行滤波,所述序列标识的取值为第二数值时表示不允许使用所述第一神经网络对所述当前重建图像序列中的重建图像块进行滤波;
    基于所述序列标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述序列标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波,包括:
    若所述序列标识的取值为所述第一数值,则解析所述码流得到分量标识的取值;
    其中,所述分量标识的取值为所述第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波,所述分量标识的取值为所述第二数值时表示不允许使用所述第一神经网络对所述当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波;
    基于所述分量标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述分量标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波,包括:
    若所述分量标识的取值为所述第一数值,则通过解析所述码流得到图像块标识的取值;
    其中,所述图像块标识的取值为所述第一数值时表示使用所述第一神经网络对所述当前重建图像块进行滤波,若所述图像块标识的取值为所述第二数值时表示不使用所述第一神经网络对所述当前重建图像块进行滤波;
    基于所述图像块标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块之前,所述方法还包括:
    获取第一原始图像的重建图像和所述第一原始图像的特征图像;
    基于所述第一原始图像的重建图像和所述第一原始图像的特征图像,得到至少一个第一训练数据对,所述至少一个第一训练数据对包括至少一个第一重建图像块和所述至少一个第一重建图像块分别对应的至少一个第一特征图像块;
    利用所述第一神经网络,分别基于所述至少一个第一特征图像块对所述至少一个第一重建图像块进行滤波,得到滤波后的至少一个第二重建图像块;
    基于所述至少一个第一重建图像块和所述至少一个第二重建图像块之间的差异,调整所述第一神经网络,得到训练后的所述第一神经网络。
  6. 根据权利要求5所述的方法,其特征在于,所述第一原始图像的特征图像为利用所述第二神经网络对所述第一原始图像进行预测得到的特征图像,或所述第一原始图像的特征图像为已标注的所述第一原始图像的特征图像。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块之前,所述方法还包括:
    获取第二原始图像的重建图像和已标注的所述第二原始图像的特征图像;
    基于所述第二原始图像的重建图像和所述第二原始图像的特征图像,得到至少一个第二训练数据对,所述至少一个第二训练数据对包括至少一个第三重建图像块和所述至少一个第二重建图像块对应的至少一个第二特征图像块;
    利用所述第二神经网络对所述至少一个第三重建图像进行预测,得到至少一个第三特征图像块;
    基于所述至少一个第二特征图像块和所述至少一个第三特征图像块之间的差异,调整所述第二神经 网络,得到训练后的所述第二神经网络。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述当前重建图像块的特征图像块用于表征所述当前重建图像块的原始图像块的以下特征中的至少一项:
    颜色特征、纹理特征、形状特征、空间特征。
  9. 一种滤波方法,其特征在于,包括:
    获取当前重建图像块;
    确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
    利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
  10. 根据权利要求9所述的方法,其特征在于,所述利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块之前,所述方法还包括:
    获取序列标识的取值;
    其中,所述序列标识的取值为第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像序列中的重建图像块进行滤波,所述序列标识的取值为第二数值时表示不允许使用所述第一神经网络对所述当前重建图像序列中的重建图像块进行滤波;
    基于所述序列标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波。
  11. 根据权利要求10所述的方法,其特征在于,所述基于所述序列标识的取值,确定是否使用所述第一神经网络对所述当前重建图像块进行滤波,包括:
    若所述序列标识的取值为所述第一数值,则使用所述第一神经网络对所述当前重建图像块进行滤波,得到所述当前重建图像块滤波后的率失真代价;
    若所述当前重建图像块滤波后的率失真代价大于所述当前重建图像块滤波前的率失真代价,则确定使用所述第一神经网络对所述当前重建图像块进行滤波;
    若所述当前重建图像块滤波后的率失真代价小于或等于所述当前重建图像块滤波前的率失真代价,则确定不使用所述第一神经网络对所述当前重建图像块进行滤波。
  12. 根据权利要求10或11所述的方法,其特征在于,所述方法还包括:
    将所述序列标识的取值,写入对基于所述当前重建图像块得到的当前残差块进行编码得到的码流。
  13. 根据权利要求10至12中任一项所述的方法,其特征在于,所述方法还包括:
    若所述序列标识的取值为所述第一数值,则生成分量标识的取值;
    其中,所述分量标识的取值为所述第一数值时表示允许使用所述第一神经网络对所述当前重建图像块所属的当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波,所述分量标识的取值为所述第二数值时表示不允许使用所述第一神经网络对所述当前重建图像中的与所述当前重建图像块的分量相同的重建图像块进行滤波;
    将所述分量标识的取值,写入对基于所述当前重建图像块得到的当前残差块进行编码得到的码流。
  14. 根据权利要求13所述的方法,其特征在于,所述生成分量标识的取值,包括:
    若所述当前重建图像中的与所述当前重建图像块的分量相同的每一个重建图像块滤波后的率失真代价均小于或小于所述每一个重建图像滤波前的率失真代价,则确定分量标识的取值为所述第一数值;
    若所述当前重建图像包括滤波后的率失真代价大于滤波前的率失真代价的重建图像块,则确定所述分量标识的取值为所述第二数值。
  15. 根据权利要求13或14所述的方法,其特征在于,所述方法还包括:
    若所述分量标识的取值为所述第一数值,则生成图像块标识的取值;
    其中,所述图像块标识的取值为所述第一数值时表示使用所述第一神经网络对所述当前重建图像块进行滤波,若所述图像块标识的取值为所述第二数值时表示不使用所述第一神经网络对所述当前重建图像块进行滤波。
  16. 根据权利要求15所述的方法,其特征在于,所述生成图像块标识的取值,包括:
    若所述当前重建图像块滤波后的率失真代价大于所述当前重建图像块滤波前的率失真代价,则确定所述图像块标识的取值为所述第一数值;
    若所述当前重建图像块滤波后的率失真代价小于或等于所述当前重建图像块滤波前的率失真代价,则确定所述图像块标识的取值为所述第二数值。
  17. 根据权利要求9至16中任一项所述的方法,其特征在于,所述利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块之前,所述方法还包括:
    获取第一原始图像的重建图像和所述第一原始图像的特征图像;
    基于所述第一原始图像的重建图像和所述第一原始图像的特征图像,得到至少一个第一训练数据对,所述至少一个第一训练数据对包括至少一个第一重建图像块和所述至少一个第一重建图像块分别对应的至少一个第一特征图像块;
    利用所述第一神经网络,分别基于所述至少一个第一特征图像块对所述至少一个第一重建图像块进行滤波,得到滤波后的至少一个第二重建图像块;
    基于所述至少一个第一重建图像块和所述至少一个第二重建图像块之间的差异,调整所述第一神经网络,得到训练后的所述第一神经网络。
  18. 根据权利要求17所述的方法,其特征在于,所述第一原始图像的特征图像为利用所述第二神经网络对所述第一原始图像进行预测得到的特征图像,或所述第一原始图像的特征图像为已标注的所述第一原始图像的特征图像。
  19. 根据权利要求9至18中任一项所述的方法,其特征在于,所述利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块之前,所述方法还包括:
    获取第二原始图像的重建图像和已标注的所述第二原始图像的特征图像;
    基于所述第二原始图像的重建图像和所述第二原始图像的特征图像,得到至少一个第二训练数据对,所述至少一个第二训练数据对包括至少一个第三重建图像块和所述至少一个第二重建图像块对应的至少一个第二特征图像块;
    利用所述第二神经网络对所述至少一个第三重建图像进行预测,得到至少一个第三特征图像块;
    基于所述至少一个第二特征图像块和所述至少一个第三特征图像块之间的差异,调整所述第二神经网络,得到训练后的所述第二神经网络。
  20. 根据权利要求9至19中任一项所述的方法,其特征在于,所述当前重建图像块的特征图像块用于表征所述当前重建图像块的原始图像块的以下特征中的至少一项:
    颜色特征、纹理特征、形状特征、空间特征。
  21. 一种滤波装置,其特征在于,包括:
    解析单元,用于解析码流得到当前重建图像块;
    预测单元,用于确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
    滤波单元,用于利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
  22. 一种滤波装置,其特征在于,包括:
    获取单元,用于获取当前重建图像块;
    预测单元,用于确定使用第一神经网络对所述当前重建图像块进行滤波时,利用第二神经网络对所述当前重建图像块的原始图像块的特征进行预测,得到所述当前重建图像块的特征图像块;
    滤波单元,用于利用所述第一神经网络,基于所述当前重建图像块的特征图像块对所述当前重建图像块进行滤波,得到滤波后的重建图像块。
  23. 一种电子设备,其特征在于,包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至8中任一项所述的方法或如权利要求9至20中任一项所述的方法。
  24. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至8中任一项所述的方法或如权利要求9至20中任一项所述的方法。
  25. 一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序/指令被处理器执行时实现如权利要求1至8中任一项所述的方法或如权利要求9至20中任一项所述的方法。
  26. 一种码流,其特征在于,所述码流如权利要求1至8中任一项所述的方法中所述的码流或如权利要求9至20中任一项所述的方法生成的码流。
PCT/CN2021/143804 2021-12-31 2021-12-31 滤波方法、滤波装置以及电子设备 WO2023123398A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/143804 WO2023123398A1 (zh) 2021-12-31 2021-12-31 滤波方法、滤波装置以及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/143804 WO2023123398A1 (zh) 2021-12-31 2021-12-31 滤波方法、滤波装置以及电子设备

Publications (1)

Publication Number Publication Date
WO2023123398A1 true WO2023123398A1 (zh) 2023-07-06

Family

ID=86997261

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143804 WO2023123398A1 (zh) 2021-12-31 2021-12-31 滤波方法、滤波装置以及电子设备

Country Status (1)

Country Link
WO (1) WO2023123398A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110971915A (zh) * 2018-09-28 2020-04-07 杭州海康威视数字技术股份有限公司 滤波方法和设备
CN112422989A (zh) * 2020-11-17 2021-02-26 杭州师范大学 一种视频编码方法
US20210329286A1 (en) * 2020-04-18 2021-10-21 Alibaba Group Holding Limited Convolutional-neutral-network based filter for video coding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110971915A (zh) * 2018-09-28 2020-04-07 杭州海康威视数字技术股份有限公司 滤波方法和设备
US20210329286A1 (en) * 2020-04-18 2021-10-21 Alibaba Group Holding Limited Convolutional-neutral-network based filter for video coding
CN112422989A (zh) * 2020-11-17 2021-02-26 杭州师范大学 一种视频编码方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Z. DAI (OPPO), Y. YU (OPPO), H. YU (OPPO), K. SATO (OPPO), L. XU (OPPO), Z. XIE (OPPO), D. WANG(OPPO): "AHG11: Neural Network-based Adaptive Model Selection for CNN In-Loop Filtering", 24. JVET MEETING; 20211006 - 20211015; TELECONFERENCE; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 5 October 2021 (2021-10-05), XP030298016 *

Similar Documents

Publication Publication Date Title
CN113115047B (zh) 视频编解码方法和设备
WO2021004152A1 (zh) 图像分量的预测方法、编码器、解码器以及存储介质
JP7423647B2 (ja) 異なるクロマフォーマットを使用した三角予測ユニットモードでのビデオコーディング
WO2022052533A1 (zh) 编码方法、解码方法、编码器、解码器以及编码系统
WO2021185008A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
CN117596413A (zh) 视频处理方法及装置
CN113068026B (zh) 编码预测方法、装置及计算机存储介质
WO2022155923A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
CN114902670B (zh) 用信号通知子图像划分信息的方法和装置
US20230042484A1 (en) Decoding method and coding method for unmatched pixel, decoder, and encoder
WO2022116085A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
WO2023123398A1 (zh) 滤波方法、滤波装置以及电子设备
CN118044184A (zh) 用于执行组合帧间预测和帧内预测的方法和系统
CN114868386B (zh) 编码方法、解码方法、编码器、解码器以及电子设备
WO2023193254A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023197181A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023197179A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2024007116A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023193253A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023197180A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023070505A1 (zh) 帧内预测方法、解码器、编码器及编解码系统
CN118103850A (zh) 滤波方法、滤波装置以及电子设备
RU2789030C2 (ru) Устройство и способ для фильтра деблокинга при кодировании видео
WO2022188239A1 (zh) 系数的编解码方法、编码器、解码器及计算机存储介质
US20240107015A1 (en) Encoding method, decoding method, code stream, encoder, decoder and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21969745

Country of ref document: EP

Kind code of ref document: A1