WO2021249290A1 - 环路滤波方法和装置 - Google Patents

环路滤波方法和装置 Download PDF

Info

Publication number
WO2021249290A1
WO2021249290A1 PCT/CN2021/098251 CN2021098251W WO2021249290A1 WO 2021249290 A1 WO2021249290 A1 WO 2021249290A1 CN 2021098251 W CN2021098251 W CN 2021098251W WO 2021249290 A1 WO2021249290 A1 WO 2021249290A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
pixel matrix
pixel
corresponding position
image block
Prior art date
Application number
PCT/CN2021/098251
Other languages
English (en)
French (fr)
Inventor
陈旭
杨海涛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021249290A1 publication Critical patent/WO2021249290A1/zh
Priority to US18/063,955 priority Critical patent/US12052443B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/88Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving rearrangement of data among different coding units, e.g. shuffling, interleaving, scrambling or permutation of pixel data or permutation of transform coefficient data among different blocks

Definitions

  • the embodiments of the present invention relate to the technical field of artificial intelligence (AI)-based video or image compression, and in particular, to a loop filtering method and device.
  • AI artificial intelligence
  • Video encoding (video encoding and decoding) is widely used in digital video applications, such as broadcast digital TV, video transmission on the Internet and mobile networks, real-time conversation applications such as video chat and video conferencing, DVD and Blu-ray discs, and video content acquisition and editing systems And the security application of camcorders.
  • Video compression equipment usually uses software and/or hardware on the source side to encode video data before transmission or storage, thereby reducing the amount of data required to represent digital video images. Then, the compressed data is received by the video decompression device on the destination side.
  • a loop filter module can be used to remove coding distortions such as block effects and ringing effects in the reconstructed image.
  • a neural network is used to implement the filtering function of the loop filter module, and then the reconstructed image or reconstructed image block information input to the neural network is filtered to obtain a filtered reconstructed image or reconstructed image block.
  • this method cannot achieve good filtering effects for input images or image blocks of different qualities.
  • the present application provides a loop filtering method and device, which can improve the filtering effect for reconstructed images of various qualities.
  • this application relates to a loop filtering method.
  • the method is executed by the loop filter in the encoder or decoder.
  • the method includes:
  • the element at the corresponding position in the first pixel matrix corresponds to (for example, represents) the brightness value of the pixel at the corresponding position in the first image block, and the first image block is a reconstructed image block or a reconstructed image block.
  • the output pixel matrix includes a third pixel matrix, and the element at the corresponding position in the third pixel matrix corresponds to the brightness value of the pixel at the corresponding position in the second image block or the brightness residual value of the pixel, so
  • the second image block is an image block obtained by filtering the first image block, wherein the input pixel matrix is at least related to the first pixel matrix and the second pixel matrix.
  • An image block (for example, the first image block) can be understood as a pixel matrix X, and an element at a corresponding position in the pixel matrix X can be understood as a pixel point (or pixel value) at a corresponding position in the image block.
  • the pixel value includes The brightness value of the pixel or the chroma value of the pixel).
  • the size of the image block is 64 ⁇ 64, which means that the pixel point distribution of the image block is 64 rows ⁇ 64 columns, and x(i,j) represents the pixels in the i-th row and j-th column in the image block. Point (or pixel value).
  • the input pixel matrix A includes 64 rows and 64 columns, with a total of 64 ⁇ 64 elements, and A(i,j) represents the elements in the i-th row and j-th column in the pixel matrix A.
  • A(i,j) and x(i,j) correspond (for example, A(i,j) represents the value of the pixel point x(i,j)), and the element at the corresponding position in the input pixel matrix A corresponds to (for example, represents )
  • the brightness value of the pixel point in the corresponding position in the image block, that is, the value of the element A(i,j) is the brightness value of the pixel point x(i,j).
  • the element at the corresponding position in the input pixel matrix A may also correspond to (for example, represent) other values of the pixel at the corresponding position in the image block, that is, the element A(i,j)
  • the value of can be other values of the pixel point x(i,j), such as the quantization step value corresponding to the brightness value of the pixel point x(i,j), or the chromaticity value of the pixel point x(i,j) ,
  • Another example is the quantization step value corresponding to the chrominance value of the pixel point x(i,j), another example is the luminance residual value of the pixel point x(i,j), and another example is the color of the pixel point x(i,j) Degree residual value, etc., this application does not make specific limitations.
  • the input pixel matrix A when the element at the corresponding position in the input pixel matrix A represents the brightness value of the pixel at the corresponding position in the image block, the input pixel matrix A is an example of the aforementioned first pixel matrix; or, when the input pixel matrix A
  • the element at the corresponding position in the image block represents the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the image block.
  • the input pixel matrix A is an example of the aforementioned second pixel matrix; it should be understood that when the input pixel matrix A is The element at the corresponding position in the image block represents the chromaticity value of the pixel at the corresponding position in the image block, and the input pixel matrix A is an example of the fifth pixel matrix; or, when the element at the corresponding position in the input pixel matrix A represents the pixel in the image block
  • the quantization step value corresponding to the chrominance value of the pixel at the corresponding position the input pixel matrix A is an example of the sixth pixel matrix.
  • the output pixel matrix B output by the filter network corresponds to the filtered image block (for example, the second image block), that is, the element B(i, j) in the output pixel matrix and the pixels in the filtered image block y(i,j) corresponds.
  • the value of element B(i,j) can represent the brightness value of pixel y(i,j).
  • the element at the corresponding position in the pixel matrix B may also correspond to (for example, represent) other values of the pixel at the corresponding position in the filtered image block, that is, the element B(i
  • the value of ,j) can be other values of pixel y(i,j), such as the luminance residual value of pixel y(i,j), or the chromaticity value of pixel y(i,j), Another example is the chromaticity residual value of the pixel point y(i, j), which is not specifically limited in this application.
  • the output pixel matrix B is an example of the third pixel matrix; or, when the output pixel The element at the corresponding position in the matrix B represents the brightness residual value of the pixel at the corresponding position in the filtered image block.
  • the output pixel matrix B is another example of the third pixel matrix; it should be understood that when the output pixel The element at the corresponding position in the matrix B represents the chromaticity value of the pixel at the corresponding position in the filtered image block.
  • the output pixel matrix B is an example of the seventh pixel matrix; or, when the corresponding position in the output pixel matrix B is The element represents the chrominance residual value of the pixel at the corresponding position in the filtered image block, and the output pixel matrix B is an example of the eighth pixel matrix.
  • the above-mentioned first image block may be an image block in a reconstructed image reconstructed by an encoder or a decoder, or may be a reconstructed image block reconstructed by an encoder or a decoder.
  • the loop filtering method in the embodiment of this application includes, but is not limited to, filtering the reconstructed image block. It should be understood that it can also be applied to filtering the reconstructed image. "Adaptability is replaced by reconstructed images, so I won’t repeat them here.
  • the first image block and the second image block may also adopt the RGB format.
  • the element at the corresponding position in the first pixel matrix may correspond to (for example, represent) the pixel at the corresponding position in the first image block.
  • the element at the corresponding position in the second pixel matrix may correspond to (for example, represent) the quantization step size corresponding to the R value, G value or B value of the pixel at the corresponding position in the first image block Value, or the quantization step value used by the three.
  • This application uses a filter network to implement filtering while introducing the quantization step value of each pixel of the reconstructed image block, so that the pixel matrix corresponding to the image block of the input filter network can be better filtered and the filtering effect is improved.
  • the input pixel matrix includes the first pixel matrix and the second pixel matrix; or, the input pixel matrix is a combination of the first pixel matrix and the second pixel matrix.
  • Matrix is a first preprocessing matrix obtained by preprocessing; or, the input pixel matrix includes a normalized matrix of the first pixel matrix and a normalized matrix of the second pixel matrix; or, the input pixel
  • the matrix is a second preprocessing matrix obtained by preprocessing the normalization matrix of the first pixel matrix and the normalization matrix of the second pixel matrix.
  • the normalized matrix refers to a matrix obtained by normalizing the values of elements at corresponding positions in the corresponding matrix.
  • the input pixel matrix is the processing object of the filter network, and the input pixel matrix may include the acquired first pixel matrix and the second pixel matrix, that is, the two pixel matrices are directly input into the filter network for filtering processing.
  • the input pixel matrix Before the input pixel matrix is input to the filter network, it can also be obtained by preprocessing and/or normalizing one or more pixel matrices according to the training data form of the filter network during training and the processing capabilities of the filter network Pixel matrix.
  • the purpose of the normalization processing is to adjust the value of the element to a uniform value interval, such as [0,1] or [-0.5,0.5], so that the calculation efficiency can be improved in the calculation of the filter network.
  • Preprocessing can include matrix addition, matrix multiplication, matrix merging (contact), etc., which can reduce the amount of calculation of the filter network.
  • the input pixel matrix can include the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix, that is, the first pixel matrix and the second pixel matrix are respectively normalized, and then the normalized matrix Enter the filter network for processing.
  • the input pixel matrix may be a preprocessing matrix obtained by adding, multiplying or combining the first pixel matrix and the second pixel matrix.
  • the input pixel matrix may be a preprocessing matrix obtained by adding, multiplying, or merging the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix.
  • Matrix addition means adding the values of the elements at the corresponding positions of the two matrices.
  • Matrix multiplication means multiplying the values of the elements at the corresponding positions of the two matrices.
  • Matrix merging means increasing the number of channels of the matrix. For example, one matrix is a two-dimensional matrix with a size of m ⁇ n, and the other matrix is also a two-dimensional matrix with a size of m ⁇ n. The two matrices are combined to obtain Is a three-dimensional matrix whose size is m ⁇ n ⁇ 2.
  • the element at the corresponding position in the third pixel matrix represents the brightness value of the pixel at the corresponding position in the second image block.
  • the element at the corresponding position in the third pixel matrix represents the luminance residual value of the pixel at the corresponding position in the second image block; the method further includes: A pixel matrix and the values of elements at corresponding positions in the third pixel matrix are added to obtain a fourth pixel matrix, and the elements at corresponding positions in the fourth pixel matrix correspond to (for example, represent) the second image block The brightness value of the pixel at the corresponding position in.
  • the elements in the third pixel matrix of the output filter network can represent two meanings, one of which corresponds to the pixel at the corresponding position in the second image block
  • Brightness value that is, the filter network can directly output the third pixel matrix representing the brightness value of the filtered second image block; the other is the brightness residual value of the pixel at the corresponding position in the second image block, that is, the filter network output characterization
  • the third pixel matrix of the residual value of the brightness of the filtered second image block needs to be further processed to obtain the fourth pixel matrix of the brightness value of the filtered second image block.
  • the processing is to add the first pixel matrix representing the brightness value of the unfiltered first image block and the third pixel matrix representing the brightness residual value of the filtered second image block to obtain the filtered second image
  • the fourth pixel matrix of the brightness value of the block is not specifically limited, and there may be more possibilities for the filter network.
  • the method further includes: performing denormalization processing on the value of an element at a corresponding position in the third pixel matrix.
  • the first pixel matrix and the third pixel matrix are added to the values of the elements at the corresponding positions to obtain the first pixel matrix.
  • the four-pixel matrix includes: adding values of elements at corresponding positions in the first pixel matrix and the denormalized third pixel matrix to obtain the fourth pixel matrix.
  • the matrix is normalized before it is input to the filter network, the values of all elements in the matrix are normalized to the same interval, this is to improve the efficiency of calculation, but the value of the elements is no longer A meaningful value representing the pixel at the corresponding position in the image block. Therefore, in order to adapt to the subsequent image processing process, after the filter network outputs the matrix, the matrix needs to be denormalized to restore the meaning represented by the elements in the matrix.
  • the method further includes: obtaining a fifth pixel matrix, where an element at a corresponding position in the fifth pixel matrix corresponds to (for example, represents) an element at a corresponding position in the first image block
  • the chromaticity value of the pixel accordingly, the input pixel matrix is at least related to the first pixel matrix, the second pixel matrix, and the fifth pixel matrix.
  • the input pixel matrix includes the first pixel matrix, the second pixel matrix, and the fifth pixel matrix; or, the input pixel matrix includes Matrix and the second pixel matrix are preprocessed to obtain the first preprocessing matrix and the fifth pixel matrix; or, the input pixel matrix includes the normalized matrix of the first pixel matrix, the second The normalized matrix of the pixel matrix and the normalized matrix of the fifth pixel matrix; or, the input pixel matrix includes the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix A normalized matrix of the second pre-processing matrix and the fifth pixel matrix obtained by pre-processing the unified matrix.
  • the present application can also add the chrominance value of the pixel in the first image block to the filter network as Reference factors for filtering.
  • the luminance value and chrominance value of the pixels in the first image block are correlated. Therefore, during filtering, the luminance value and chrominance value of the pixels can have an auxiliary filtering effect on each other. Therefore, when training the filter network, the chroma matrix of the image block is added to the training data, so that the filter network can refer to the chroma information during filtering to improve the filtering effect.
  • the present application can also add the factor of the fifth pixel matrix that characterizes the chromaticity values of the pixels in the first image block to the input pixel matrix.
  • the specific content and acquisition method of the input pixel matrix can be referred to the above description, and will not be repeated here.
  • the method further includes: acquiring a sixth pixel matrix, where an element at a corresponding position in the sixth pixel matrix corresponds to (for example, represents) an element at a corresponding position in the first image block
  • the quantization step value corresponding to the chrominance value of the pixel accordingly, the input pixel matrix is at least related to the first pixel matrix, the second pixel matrix, the fifth pixel matrix, and the sixth pixel matrix .
  • this application can also add a characterizing pixel in the first image block to the filter network.
  • the quantization step size corresponding to the chrominance value is used as a reference factor for filtering.
  • the quantization step matrix corresponding to the chroma value of the image block is added to the training data, so that the filter network can further refine the filtering function of the neural network based on the quantization step matrix during filtering to achieve more accurate Filtering effect.
  • the present application may also add a factor of the sixth pixel matrix representing the quantization step value corresponding to the chrominance value of the pixel in the first image block to the input pixel matrix.
  • the specific content and acquisition method of the input pixel matrix can be referred to the above description, and will not be repeated here.
  • the input pixel matrix includes the first pixel matrix, the second pixel matrix, the fifth pixel matrix, and the sixth pixel matrix; or, the input pixel matrix Including a first preprocessing matrix obtained by preprocessing the first pixel matrix and the second pixel matrix, the fifth pixel matrix, and the sixth pixel matrix; or, the input pixel matrix includes the The first pixel matrix, the second pixel matrix, and the third preprocessing matrix obtained by preprocessing the fifth pixel matrix and the sixth pixel matrix; or, the input pixel matrix includes the first preprocessing matrix Processing matrix and the third preprocessing matrix; or, the input pixel matrix includes a normalized matrix of the first pixel matrix, a normalized matrix of the second pixel matrix, and a normalized matrix of the fifth pixel matrix A normalized matrix and a normalized matrix of the sixth pixel matrix; or, the input pixel matrix includes a normalized matrix of the first pixel matrix and a normalized matrix of the second pixel matrix
  • the quantization step value corresponding to the chrominance value can be added when training the filter network. Therefore, this application can also add the chrominance characterizing the pixels in the first image block to the input pixel matrix.
  • the value corresponds to the factor of the sixth pixel matrix of the quantization step value.
  • the specific content and acquisition method of the input pixel matrix can also be referred to the above description, which will not be repeated here.
  • the performing filtering processing on the input pixel matrix through a filter network to obtain an output pixel matrix includes: performing filtering processing on the input pixel matrix through the filtering network to obtain a third pixel matrix and a seventh pixel A matrix, the element at the corresponding position in the seventh pixel matrix corresponds to (for example, represents) the chromaticity value of the pixel at the corresponding position in the second image block.
  • the filter network can It is trained to have a seventh pixel matrix whose filter output is used to characterize the chrominance value of the pixels in the filtered second image block, so that the filter network can filter the luminance component and chrominance component of the first image block separately , To realize the filtering processing of the first image block in different dimensions.
  • the method further includes: performing denormalization processing on the value of an element at a corresponding position in the seventh pixel matrix.
  • the principle of the denormalization process is similar to the above-mentioned denormalization process of the third pixel matrix, and will not be repeated here.
  • the performing filtering processing on the input pixel matrix through the filter network to obtain the third pixel matrix includes: performing filtering processing on the input pixel matrix through the filtering network to obtain the third pixel matrix and the eighth pixel matrix.
  • Pixel matrix, the element at the corresponding position in the eighth pixel matrix corresponds to (for example, represents) the chrominance residual value of the pixel at the corresponding position in the second image block; and the fifth pixel matrix and the The values of the elements at the corresponding positions in the eighth pixel matrix are added to obtain a ninth pixel matrix, and the elements at the corresponding positions in the ninth pixel matrix correspond to (for example, represent) the pixels at the corresponding positions in the second image block The chromaticity value.
  • the method further includes: performing denormalization processing on the value of an element at a corresponding position in the eighth pixel matrix;
  • the adding the values of the elements at the corresponding positions in the fifth pixel matrix and the eighth pixel matrix to obtain the ninth pixel matrix includes: denormalizing the fifth pixel matrix and The values of elements at corresponding positions in the processed eighth pixel matrix are added to obtain the ninth pixel matrix.
  • the filter network can output the third pixel matrix representing the luminance residual value of the second image block, or it can output the eighth pixel matrix representing the chrominance residual value of the second image block, and output the eighth pixel matrix
  • the subsequent processing of is similar to the subsequent processing of the third pixel matrix, and will not be repeated here.
  • the preprocessing includes: adding elements at corresponding positions in the two matrices; or, merging the two matrices in the depth direction; or, combining the corresponding elements in the two matrices Multiply the elements at the position.
  • the method further includes: obtaining a training matrix set, where the training matrix set includes pre-filtered brightness matrices (that is, unfiltered brightness matrices) of a plurality of image blocks, and a quantization step size matrix And the filtered brightness matrix (that is, the filtered brightness matrix), the element at the corresponding position in the brightness matrix before filtering corresponds to (for example, represents) the brightness value before filtering of the pixel at the corresponding position in the corresponding image block, the The element at the corresponding position in the quantization step matrix corresponds to (for example, represents) the quantization step value corresponding to the luminance value of the pixel at the corresponding position in the corresponding image block, and the element at the corresponding position in the filtered luminance matrix corresponds to ( For example, it means) the filtered brightness value of the pixel at the corresponding position in the corresponding image block; the filtering network is obtained by training according to the training matrix set.
  • the training matrix set includes pre-filtered brightness matrices (that is, unfiltered brightness matrice
  • the training matrix set also includes the pre-filtered chrominance matrix (that is, the unfiltered chrominance matrix) and the filtered chrominance matrix (that is, the filtered chrominance matrix) of the multiple image blocks.
  • Chromaticity matrix the element at the corresponding position in the chromaticity matrix before filtering corresponds to (for example, represents) the chromaticity value before filtering of the pixel at the corresponding position in the corresponding image block, and the corresponding element in the chromaticity matrix after filtering
  • the element of the position corresponds to (for example, represents) the filtered chrominance value of the pixel at the corresponding position in the corresponding image block.
  • the required input, realized functions, and available output of the filter network are all related to the training data in the training phase.
  • the training data in this application is the above-mentioned training matrix set.
  • the filter network includes at least a convolutional layer and an activation layer.
  • the depth of the convolution kernel of the convolution layer is 2, 3, 4, 5, 6, 16, 24, 32, 48, 64, or 128;
  • the size of the convolution kernel is 1 ⁇ 1, 3 ⁇ 3, 5 ⁇ 5, or 7 ⁇ 7.
  • the size of a certain convolutional layer is 3 ⁇ 3 ⁇ 2 ⁇ 10, where 3 ⁇ 3 represents the size of the convolution kernel in the convolution layer; 2 represents the depth of the convolution kernel contained in the convolution layer, The number of data channels input to the convolution layer is consistent with the depth of the convolution kernel contained in the convolution layer, that is, the number of data channels input to the convolution layer is also 2; 10 represents the number of convolution kernels included in the convolution layer , The number of data channels outputting the convolutional layer is the same as the number of convolution kernels included in the convolutional layer, that is, the number of data channels outputting the convolutional layer is also 10.
  • the filter network includes a convolutional neural network CNN, a deep neural network DNN, or a recurrent neural network RNN.
  • the present application provides an encoder, including a processing circuit, configured to execute the method according to any one of the above-mentioned first aspects.
  • the present application provides a decoder, including a processing circuit, configured to execute the method described in any one of the above-mentioned first aspects.
  • the present application provides a computer program product, including program code, which when executed on a computer or a processor, is used to execute the method described in any one of the above-mentioned first aspects.
  • the present application provides an encoder including: one or more processors; a non-transitory computer-readable storage medium coupled to the processor and storing a program executed by the processor, wherein the When the program is executed by the processor, the decoder is caused to execute the method according to any one of the above-mentioned first aspects.
  • the present application provides a decoder including: one or more processors; a non-transitory computer-readable storage medium coupled to the processor and storing a program executed by the processor, wherein the When the program is executed by the processor, the encoder executes the method according to any one of the above-mentioned first aspects.
  • the present application provides a non-transitory computer-readable storage medium, including program code, which when executed by a computer device, is used to execute the method described in any one of the above-mentioned first aspects.
  • the present invention relates to a decoding device, and the beneficial effects can be referred to the description of the first aspect and will not be repeated here.
  • the decoding device has the function of realizing the behavior in the method embodiment of the first aspect described above.
  • the functions can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the decoding device includes: a reconstruction module, used to obtain a first pixel matrix; a quantization module, used to obtain a second pixel matrix; a loop filter module, used to implement any one of the above-mentioned first aspects The method described in the item. These modules can perform the corresponding functions in the method example of the first aspect. For details, please refer to the detailed description in the method example, which will not be repeated here.
  • FIG. 1A is a block diagram of an example of a video decoding system for implementing an embodiment of the present invention, in which the system uses a neural network to encode or decode video images;
  • FIG. 1B is a block diagram of another example of a video decoding system for implementing an embodiment of the present invention, in which the video encoder and/or video decoder uses a neural network to encode or decode video images;
  • FIG. 2 is a block diagram of an example of a video encoder used to implement an embodiment of the present invention, where the video encoder 20 uses a neural network to encode video images;
  • FIG. 3 is a block diagram of an example of a video decoder for implementing an embodiment of the present invention, where the video decoder 30 uses a neural network to decode video images;
  • Figure 4 is a schematic block diagram of a video decoding device for implementing an embodiment of the present invention.
  • Figure 5 is a schematic block diagram of a video decoding device for implementing an embodiment of the present invention
  • 6a-6c are schematic diagrams of a matrix of an input filter network according to an embodiment of the present application.
  • Figures 7a-7e are schematic diagrams of trained neural networks introduced into the loop filter module provided by embodiments of the present application.
  • FIG. 8 is a flowchart showing a process 800 of a loop filtering method according to an embodiment of the present application.
  • Figures 9a-9l show several exemplary examples of the input pixel matrix of the filter network
  • FIG. 10 is a schematic structural diagram of a decoding device 1000 according to an embodiment of the present application.
  • the embodiments of the present application provide an AI-based video image compression technology, in particular a neural network-based video compression technology, and specifically provide a neural network-based filtering technology to improve the traditional hybrid video coding and decoding system.
  • Video coding generally refers to the processing of a sequence of images that form a video or video sequence.
  • the terms "picture”, “frame” or “image” can be used as synonyms.
  • Video encoding (or commonly referred to as encoding) includes two parts: video encoding and video decoding.
  • Video encoding is performed on the source side and usually includes processing (eg, compressing) the original video image to reduce the amount of data required to represent the video image (and thus more efficient storage and/or transmission).
  • Video decoding is performed on the destination side and usually involves inverse processing relative to the encoder to reconstruct the video image.
  • the "encoding" of a video image (or generally referred to as an image) involved in the embodiment should be understood as the “encoding” or “decoding” of the video image or video sequence.
  • the encoding part and the decoding part are also collectively referred to as the codec (coding and decoding, CODEC).
  • the original video image can be reconstructed, that is, the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission).
  • further compression is performed through quantization to reduce the amount of data required to represent the video image, and the decoder side cannot completely reconstruct the video image, that is, the quality of the reconstructed video image is better than that of the original video image. Low or poor.
  • Video coding standards belong to "lossy hybrid video codecs" (that is, combining spatial and temporal prediction in the pixel domain with 2D transform coding for applying quantization in the transform domain).
  • Each image in a video sequence is usually divided into a set of non-overlapping blocks, which are usually coded at the block level.
  • the encoder usually processes and encodes video at the block (video block) level, for example, through spatial (intra) prediction and temporal (inter) prediction to generate prediction blocks; from the current block (currently processed/to-be-processed) Block) is subtracted from the prediction block to obtain the residual block; the residual block is transformed in the transform domain and the residual block is quantized to reduce the amount of data to be transmitted (compressed), and the decoder side will be inversely processed relative to the encoder Partially applied to the encoded or compressed block to reconstruct the current block for presentation.
  • the encoder needs to repeat the processing steps of the decoder, so that the encoder and the decoder generate the same prediction (for example, intra-frame prediction and inter-frame prediction) and/or reconstruct pixels for processing, that is, to encode subsequent blocks.
  • FIG. 1A is a schematic block diagram of an exemplary decoding system 10, such as a video decoding system 10 (or simply referred to as a decoding system 10) that can utilize the technology of the present application.
  • the video encoder 20 (or simply the encoder 20) and the video decoder 30 (or simply the decoder 30) in the video coding system 10 represent devices that can be used to perform various technologies according to various examples described in this application, etc. .
  • the decoding system 10 includes a source device 12 configured to provide encoded image data 21 such as an encoded image to a destination device 14 for decoding the encoded image data 21.
  • the source device 12 includes an encoder 20. In addition, optionally, it may include an image source 16, a preprocessor (or preprocessing unit) 18 such as an image preprocessor, and a communication interface (or communication unit) 22.
  • a preprocessor or preprocessing unit 18 such as an image preprocessor
  • a communication interface or communication unit 22.
  • the image source 16 may include or may be any type of image capturing device for capturing real-world images, etc., and/or any type of image generating device, such as a computer graphics processor for generating computer animation images, or any type of device. It is a device for acquiring and/or providing real-world images, computer-generated images (for example, screen content, virtual reality (VR) images, and/or any combination thereof (for example, augmented reality (AR) images).
  • the image source may be any type of memory or storage that stores any of the above-mentioned images.
  • the image (or image data) 17 may also be referred to as the original image (or original image data) 17.
  • the preprocessor 18 is used to receive (original) image data 17 and preprocess the image data 17 to obtain a preprocessed image (preprocessed image data) 19.
  • the preprocessing performed by the preprocessor 18 may include trimming, color format conversion (for example, conversion from RGB to YCbCr), toning, or denoising. It can be understood that the pre-processing unit 18 may be an optional component.
  • the video encoder (or encoder) 20 is used to receive preprocessed image data 19 and provide encoded image data 21 (which will be further described below according to FIG. 2 and the like).
  • the communication interface 22 in the source device 12 can be used to: receive the encoded image data 21 and send the encoded image data 21 (or any other processed version) to another device such as the destination device 14 or any other device through the communication channel 13 for storage Or rebuild directly.
  • the destination device 14 includes a decoder 30, and in addition, optionally, may include a communication interface (or communication unit) 28, a post-processor (or post-processing unit) 32, and a display device 34.
  • a communication interface or communication unit
  • a post-processor or post-processing unit
  • the communication interface 28 in the destination device 14 is used to directly receive the encoded image data 21 (or any other processed version) from the source device 12 or from any other source device such as a storage device.
  • the storage device is an encoded image data storage device, And the encoded image data 21 is supplied to the decoder 30.
  • the communication interface 22 and the communication interface 28 can be used to pass through a direct communication link between the source device 12 and the destination device 14, such as a direct wired or wireless connection, or through any type of network, such as a wired network, a wireless network, or any of them.
  • a combination, any type of private network and public network, or any type of combination sends or receives coded image data (or coded data) 21.
  • the communication interface 22 can be used to encapsulate the encoded image data 21 into a suitable format such as a message, and/or use any type of transmission coding or processing to process the encoded image data, so as to be used in a communication link or communication network. Transfer on.
  • the communication interface 28 corresponds to the communication interface 22, for example, can be used to receive transmission data, and use any type of corresponding transmission decoding or processing and/or decapsulation to process the transmission data to obtain the encoded image data 21.
  • Both the communication interface 22 and the communication interface 28 can be configured as a one-way communication interface as indicated by the arrow from the source device 12 to the corresponding communication channel 13 of the destination device 14 in FIG. 1A, or a two-way communication interface, and can be used to send and receive messages Etc. to establish a connection, confirm and exchange any other information related to the communication link and/or data transmission such as the transmission of encoded image data, etc.
  • the video decoder (or decoder) 30 is used to receive the encoded image data 21 and provide a decoded image (or decoded image data) 31 (which will be further described below according to FIG. 3 and the like).
  • the post-processor 32 is used to perform post-processing on decoded image data 31 (also referred to as reconstructed image data) such as a decoded image to obtain post-processed image data 33 such as a post-processed image.
  • the post-processing performed by the post-processing unit 32 may include, for example, color format conversion (for example, conversion from YCbCr to RGB), toning, trimming, or resampling, or any other processing for generating decoded image data 31 for display by a display device 34, etc. .
  • the display device 34 is used to receive the post-processing image data 33 to display the image to the user or viewer.
  • the display device 34 may be or include any type of display for representing the reconstructed image, for example, an integrated or external display screen or display.
  • the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, and a liquid crystal on silicon (LCoS) display. ), digital light processor (digital light processor, DLP) or any type of other display screen.
  • the decoding system 10 also includes a training engine 25.
  • the training engine 25 is used to train the encoder 20 (especially the loop filter 220 in the encoder 20) or the decoder 30 (especially the loop filter 320 in the decoder 30). ) To filter the reconstructed image.
  • the training data in the embodiment of the application includes: a training matrix set, the training matrix set includes the brightness matrix before filtering, the quantization step matrix and the brightness matrix after filtering of the image block, wherein the pixel points in the corresponding position in the brightness matrix before filtering correspond to Corresponding to the brightness value of the pixel at the corresponding position in the image block before filtering, the pixel at the corresponding position in the quantization step matrix corresponds to the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the corresponding image block, after filtering The pixel point at the corresponding position in the brightness matrix corresponds to the filtered brightness value of the pixel at the corresponding position in the corresponding image block.
  • Multiple matrices in the training matrix set may be input to the training engine 25 in the manner shown in FIGS. 6a to 6c, for example.
  • multiple matrices in the training matrix set are directly input to the training engine 25, and the multiple matrices are all two-dimensional matrices.
  • part or all of the multiple matrices in the training matrix set are selected for merging to obtain a multi-dimensional matrix, and then the multi-dimensional matrix is input to the training engine 25.
  • part or all of the multiple matrices in the training matrix set are selected for addition (or multiplication) to obtain a two-dimensional matrix, and then the two-dimensional matrix is input to the training engine 25.
  • the above-mentioned training data can be stored in a database (not shown), and the training engine 25 trains to obtain a target model based on the training data (for example, it can be a neural network used for loop filtering, etc.). It should be noted that the embodiment of the present application does not limit the source of the training data. For example, the training data may be obtained from the cloud or other places for model training.
  • the process of training the target model by the training engine 25 makes the pixels before filtering approximate the original pixel values.
  • Each training process can use a mini-batch size of 64 images and an initial learning rate of 1e-4, following a step size of 10.
  • the training data may be data generated by the encoder under different QP quantization parameter settings.
  • the target model can be used to implement the loop filtering method provided in the embodiments of the present application, that is, the reconstructed image or image block is input into the target model after relevant preprocessing, and the filtered image or image block can be obtained.
  • the target model in the embodiment of the present application may specifically be a filter network. The target model will be described in detail below in conjunction with FIGS. 7a-7e.
  • the target model trained by the training engine 25 may be applied to the decoding system 10, for example, applied to the source device 12 (for example, the encoder 20) or the target device 14 (for example, the decoder 30) shown in FIG. 1A.
  • the training engine 25 can train in the cloud to obtain the target model, and then the decoding system 10 can download and use the target model from the cloud; or, the training engine 25 can train in the cloud to obtain the target model and use the target model, and the decoding system 10 directly from the cloud Get the processing result.
  • the training engine 25 trains to obtain a target model with filtering function, the decoding system 10 downloads the target model from the cloud, and then the loop filter 220 in the encoder 20 or the loop filter 320 in the decoder 30 can be based on the The target model performs filtering processing on the input reconstructed image or image block to obtain a filtered image or image block.
  • the training engine 25 trains to obtain a target model with filtering function
  • the decoding system 10 does not need to download the target model from the cloud
  • the encoder 20 or the decoder 30 transmits the reconstructed image or image block to the cloud
  • the cloud passes the target model Perform filtering processing on the reconstructed image or image block to obtain the filtered image or image block and transmit it to the encoder 20 or the decoder 30.
  • FIG. 1A shows the source device 12 and the destination device 14 as independent devices
  • the device embodiment may also include both the source device 12 and the destination device 14 or the functions of the source device 12 and the destination device 14, that is, the source device 12 and the destination device 14 are included at the same time.
  • the device 12 or corresponding function and the destination device 14 or corresponding function may be implemented by using the same hardware and/or software or by separate hardware and/or software or any combination thereof.
  • the existence and (accurate) division of different units or functions in the source device 12 and/or the destination device 14 shown in FIG. 1A may vary depending on the actual device and application, which is obvious to the skilled person. .
  • Encoder 20 for example, video encoder 20 or decoder 30 (for example, video decoder 30) or both can be implemented by the processing circuit shown in FIG. 1B, such as one or more microprocessors, digital signal processors (digital signal processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), discrete logic, hardware, video coding processor or any combination thereof .
  • Encoder 20 may be implemented by processing circuit 46 to include the various modules discussed with encoder 20 in FIG. 2 and/or any other encoder systems or subsystems described herein.
  • the decoder 30 may be implemented by the processing circuit 46 to include the various modules discussed with the decoder 30 in FIG. 3 and/or any other decoder systems or subsystems described herein.
  • the processing circuit 46 can be used to perform various operations discussed below. As shown in Figure 5, if part of the technology is implemented in software, the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and use one or more processors to execute the instructions in the hardware, thereby Perform the technology of the present invention.
  • One of the video encoder 20 and the video decoder 30 may be integrated into a single device as a part of a combined codec (encoder/decoder, CODEC), as shown in FIG. 1B.
  • the source device 12 and the destination device 14 may include any of a variety of devices, including any type of handheld device or fixed device, for example, a notebook computer or laptop computer, mobile phone, smart phone, tablet or tablet computer, camera, Desktop computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (for example, content service servers or content distribution servers), broadcast receiving devices, broadcast transmitting devices, etc., and may not Use or use any type of operating system.
  • the source device 12 and the destination device 14 may be equipped with components for wireless communication. Therefore, the source device 12 and the destination device 14 may be wireless communication devices.
  • the video decoding system 10 shown in FIG. 1A is merely exemplary, and the technology provided in this application can be applied to video encoding settings (for example, video encoding or video decoding), and these settings do not necessarily include encoding equipment and Any data communication between decoding devices.
  • the data is retrieved from local storage, sent over the network, and so on.
  • the video encoding device can encode data and store the data in the memory, and/or the video decoding device can retrieve the data from the memory and decode the data.
  • encoding and decoding are performed by devices that do not communicate with each other but only encode data to memory and/or retrieve and decode data from memory.
  • FIG. 1B is an explanatory diagram of an example of a video coding system 40 including the video encoder 20 of FIG. 2 and/or the video decoder 30 of FIG. 3 according to an exemplary embodiment.
  • the video decoding system 40 may include an imaging device 41, a video encoder 20, a video decoder 30 (and/or a video encoder/decoder implemented by a processing circuit 46), an antenna 42, one or more processors 43, a Or a plurality of memory storage 44 and/or display device 45.
  • the imaging device 41, the antenna 42, the processing circuit 46, the video encoder 20, the video decoder 30, the processor 43, the memory storage 44, and/or the display device 45 can communicate with each other.
  • the video coding system 40 may only include the video encoder 20 or only the video decoder 30.
  • antenna 42 may be used to transmit or receive an encoded bitstream of video data.
  • the display device 45 may be used to present video data.
  • the processing circuit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
  • the video decoding system 40 may also include an optional processor 43, and the optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
  • the memory memory 44 may be any type of memory, such as volatile memory (for example, static random access memory (SRAM), dynamic random access memory (DRAM), etc.) or non-volatile memory. Lost memory (for example, flash memory, etc.), etc.
  • the memory storage 44 may be implemented by a cache memory.
  • the processing circuit 46 may include a memory (e.g., cache, etc.) for implementing image buffers and the like.
  • the video encoder 20 implemented by logic circuits may include an image buffer (e.g., implemented by the processing circuit 46 or memory storage 44) and a graphics processing unit (e.g., implemented by the processing circuit 46).
  • the graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include the video encoder 20 implemented by the processing circuit 46 to implement the various modules discussed with reference to FIG. 2 and/or any other encoder systems or subsystems described herein.
  • Logic circuits can be used to perform the various operations discussed herein.
  • the video decoder 30 may be implemented by the processing circuit 46 in a similar manner to implement the various types discussed with reference to the video decoder 30 of FIG. 3 and/or any other decoder systems or subsystems described herein.
  • the video decoder 30 implemented by logic circuits may include an image buffer (implemented by the processing circuit 46 or the memory storage 44) and a graphics processing unit (implemented by the processing circuit 46, for example).
  • the graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include the video decoder 30 implemented by the processing circuit 46 to implement the various modules discussed with reference to FIG. 3 and/or any other decoder systems or subsystems described herein.
  • antenna 42 may be used to receive an encoded bitstream of video data.
  • the encoded bitstream may include data, indicators, index values, mode selection data, etc., related to the encoded video frame discussed herein, such as data related to encoded partitions (e.g., transform coefficients or quantized transform coefficients). , (As discussed) optional indicators, and/or data defining coded partitions).
  • the video coding system 40 may also include a video decoder 30 coupled to the antenna 42 and used to decode the encoded bitstream.
  • the display device 45 is used to present video frames.
  • the video decoder 30 may be used to perform the reverse process.
  • the video decoder 30 can be used to receive and parse such syntax elements, and decode related video data accordingly.
  • video encoder 20 may entropy encode the syntax elements into an encoded video bitstream.
  • video decoder 30 may parse such syntax elements and decode the related video data accordingly.
  • VVC Versatile Video Coding
  • VCEG ITU-T Video Coding Experts Group
  • MPEG ISO/IEC Motion Picture Experts Group
  • HEVC High-Efficiency Video Coding
  • FIG. 2 is a schematic block diagram of an example of a video encoder 20 for implementing the technology of the present application.
  • the video encoder 20 includes an input terminal (or input interface) 201, a residual calculation unit 204, a transformation processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transformation processing unit 212, and a reconstruction unit 214.
  • a loop filter 220 a decoded picture buffer (DPB) 230, a mode selection unit 260, an entropy encoding unit 270, and an output terminal (or output interface) 272.
  • the mode selection unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a division unit 262.
  • the inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown).
  • the video encoder 20 shown in FIG. 2 may also be referred to as a hybrid video encoder or a video encoder based on a hybrid video codec.
  • the loop filter module is a trained target model (also called a neural network), and the neural network is used to process an input image or image area or image block to obtain a filtered image or image area or image block.
  • a neural network for loop filtering is used to receive input images or image regions or image blocks, such as the input image data illustrated in FIGS. 6a to 6c, and generate filtered images or image regions or image blocks.
  • the neural network used for loop filtering will be described in detail below in conjunction with FIGS. 7a-7e.
  • the residual calculation unit 204, the transformation processing unit 206, the quantization unit 208, and the mode selection unit 260 constitute the forward signal path of the encoder 20, and the inverse quantization unit 210, the inverse transformation processing unit 212, the reconstruction unit 214, the buffer 216, the loop A path filter 220, a decoded picture buffer (DPB) 230, an inter prediction unit 244, and an intra prediction unit 254 constitute the backward signal path of the encoder, where the backward signal path of the encoder 20 corresponds to the decoding The signal path of the decoder (see decoder 30 in Figure 3).
  • DPB decoded picture buffer
  • the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the loop filter 220, the decoded image buffer 230, the inter prediction unit 244, and the intra prediction unit 254 also constitute the "built-in decoder" of the video encoder 20 .
  • the quantization unit 208 is configured to quantize the transform coefficient 207 by, for example, scalar quantization or vector quantization to obtain a quantized transform coefficient 209.
  • the quantized transform coefficient 209 may also be referred to as a quantized residual coefficient 209.
  • the quantization process can reduce the bit depth associated with some or all of the transform coefficients 207. For example, n-bit transform coefficients can be rounded down to m-bit transform coefficients during quantization, where n is greater than m.
  • the degree of quantization can be modified by adjusting the quantization parameter (QP). For example, for scalar quantization, different levels of scale can be applied to achieve finer or coarser quantization. A smaller quantization step size corresponds to a finer quantization, and a larger quantization step size corresponds to a coarser quantization.
  • the appropriate quantization step size can be indicated by a quantization parameter (QP).
  • the quantization parameter may be an index of a predefined set of suitable quantization step sizes.
  • a smaller quantization parameter can correspond to fine quantization (smaller quantization step size), and a larger quantization parameter can correspond to coarse quantization (larger quantization step size), and vice versa.
  • the quantization may include division by the quantization step size, and the corresponding or inverse dequantization performed by the inverse quantization unit 210 or the like may include the multiplication by the quantization step size.
  • Embodiments according to some standards such as HEVC can be used to determine the quantization step size using quantization parameters.
  • the quantization step size can be calculated using a fixed-point approximation of an equation including division according to the quantization parameter.
  • the video encoder 20 (correspondingly, the quantization unit 208) can be used to output quantization parameters (quantization parameter, QP), for example, directly output or output after being encoded or compressed by the entropy encoding unit 270, for example, making the video
  • the decoder 30 may receive and use the quantization parameter for decoding.
  • the inverse quantization unit 210 is configured to perform inverse quantization of the quantization unit 208 on the quantized coefficients to obtain the dequantized coefficients 211, for example, perform inverse quantization of the quantization scheme performed by the quantization unit 208 according to or using the same quantization step size as the quantization unit 208 plan.
  • the dequantized coefficient 211 may also be referred to as the dequantized residual coefficient 211, which corresponds to the transform coefficient 207, but due to the loss caused by quantization, the dequantized coefficient 211 is usually not exactly the same as the transform coefficient.
  • the reconstruction unit 214 (for example, the summer 214) is used to add the transform block 213 (that is, the reconstruction residual block 213) to the prediction block 265 to obtain the reconstruction block 215 in the pixel domain, for example, the reconstruction of the residual block 213
  • the pixel value and the pixel value of the prediction block 265 are added.
  • the loop filter unit 220 (or “loop filter” 220 for short) is used to filter the reconstruction block 215 to obtain the filter block 221, or is generally used to filter the reconstructed pixels to obtain filtered pixel values.
  • the loop filter unit is used to smoothly perform pixel transitions or improve video quality.
  • the loop filter unit 220 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a self-adaptive offset filter.
  • the loop filter unit 220 may include a deblocking filter, a SAO filter, and an ALF filter.
  • the order of the filtering process can be deblocking filter, SAO filter and ALF filter.
  • LMCS luma mapping with chroma scaling
  • This process is executed before deblocking.
  • the deblocking filtering process can also be applied to internal sub-block edges, such as affine sub-block edges, ATMVP sub-block edges, sub-block transform (SBT) edges, and intra sub-partition (intra sub-partition, ISP) edges. )edge.
  • the loop filter unit 220 is shown as a loop filter in FIG. 2, in other configurations, the loop filter unit 220 may be implemented as a post-loop filter.
  • the filtering block 221 may also be referred to as a filtering reconstruction block 221.
  • the video encoder 20 may be used to output loop filter parameters (such as SAO filter parameters, ALF filter parameters, or LMCS parameters), for example, directly output or by entropy
  • the encoding unit 270 performs entropy encoding and outputs it, for example, so that the decoder 30 can receive and use the same or different loop filter parameters for decoding.
  • FIG. 3 shows an exemplary video decoder 30 for implementing the technology of the present application.
  • the video decoder 30 is configured to receive, for example, the encoded image data 21 (for example, an encoded bit stream 21) encoded by the encoder 20 to obtain a decoded image 331.
  • the coded image data or bitstream includes information used to decode the coded image data, such as data representing image blocks of a coded video slice (and/or coded block group or coded block) and related syntax elements.
  • the decoder 30 includes an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (such as a summer 314), a loop filter 320, and a decoded image buffer (DBP ) 330, a mode application unit 360, an inter prediction unit 344, and an intra prediction unit 354.
  • the inter prediction unit 344 may be or include a motion compensation unit.
  • the video decoder 30 may perform a decoding process that is substantially the reverse of the encoding process described with reference to the video encoder 100 of FIG. 2.
  • the loop filter module is a trained target model (also called a neural network), which is used to process an input image or image area or image block to generate a filtered image or image area or image block.
  • a neural network for loop filtering is used to receive input images or image regions or image blocks, such as the input image data illustrated in FIGS. 6a to 6c, and generate filtered images or image regions or image blocks.
  • the neural network used for loop filtering will be described in detail below in conjunction with FIGS. 7a-7e.
  • the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the loop filter 220, the decoded image buffer DPB230, the inter prediction unit 344, and the intra prediction unit 354 also constitute a video encoder 20 "built-in decoder".
  • the inverse quantization unit 310 can be functionally the same as the inverse quantization unit 110
  • the inverse transformation processing unit 312 can be functionally the same as the inverse transformation processing unit 122
  • the reconstruction unit 314 can be functionally the same as the reconstruction unit 214.
  • the filter 320 may be functionally the same as the loop filter 220
  • the decoded image buffer 330 may be functionally the same as the decoded image buffer 230. Therefore, the explanation of the corresponding units and functions of the video encoder 20 applies to the corresponding units and functions of the video decoder 30 accordingly.
  • the dequantization unit 310 may be used to receive quantization parameters (QP) (or generally information related to dequantization) and quantization coefficients from the encoded image data 21 (for example, parsing and/or decoding by the entropy decoding unit 304), and based on The quantization parameter performs inverse quantization on the decoded quantization coefficient 309 to obtain an inverse quantization coefficient 311, and the inverse quantization coefficient 311 may also be referred to as a transform coefficient 311.
  • the inverse quantization process may include using the quantization parameter calculated by the video encoder 20 for each video block in the video slice to determine the degree of quantization, and also determine the degree of inverse quantization that needs to be performed.
  • the reconstruction unit 314 (for example, the summer 314) is used to add the reconstruction residual block 313 to the prediction block 365 to obtain the reconstruction block 315 in the pixel domain. For example, the pixel value of the residual block 313 and the prediction block are reconstructed. The pixel value of 365 is added.
  • the loop filter unit 320 (in or after the encoding loop) is used to filter the reconstruction block 315 to obtain the filter block 321, so as to smoothly perform pixel conversion or improve video quality.
  • the loop filter unit 320 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a self-adaptive offset filter.
  • the loop filter unit 220 may include a deblocking filter, a SAO filter, and an ALF filter. The order of the filtering process can be deblocking filter, SAO filter and ALF filter.
  • LMCS luma mapping with chroma scaling
  • SBT sub-block transform
  • ISP intra sub-partition
  • the decoder 30 is configured to output the decoded image 311 through the output terminal 312 and the like, for display to the user or for the user to view.
  • the embodiments of the decoding system 10, the encoder 20, and the decoder 30 and other embodiments described herein can also be used for still image processing or encoding and decoding. That is, the processing or coding and decoding of a single image independent of any previous or consecutive images in the video codec. Generally, if image processing is limited to a single image 17, the inter prediction unit 244 (encoder) and the inter prediction unit 344 (decoder) may not be available.
  • All other functions (also called tools or techniques) of the video encoder 20 and the video decoder 30 can also be used for still image processing, such as residual calculation 204/304, transform 206, quantization 208, inverse quantization 210/310, (inverse ) Transform 212/312, partition 262/362, intra prediction 254/354 and/or loop filtering 220/320, entropy coding 270 and entropy decoding 304.
  • FIG. 4 is a schematic diagram of a video decoding device 400 provided by an embodiment of the present invention.
  • the video coding device 400 is suitable for implementing the disclosed embodiments described herein.
  • the video decoding device 400 may be a decoder, such as the video decoder 30 in FIG. 1A, or an encoder, such as the video encoder 20 in FIG. 1A.
  • the video decoding device 400 includes: an input port 410 (or input port 410) and a receiver unit (Rx) 420 for receiving data; a processor, a logic unit or a central processing unit (central processing unit) for processing data , CPU) 430; for example, the processor 430 here may be a neural network processor 430; a transmitter unit (Tx) 440 and an output port 450 (or output port 450) used to transmit data; Storage 460.
  • the video decoding device 400 may further include an optical-to-electrical (OE) component and an electrical-to-optical (EO) component coupled to the input port 410, the receiving unit 420, the sending unit 440, and the output port 450, Used for the exit or entrance of optical or electrical signals.
  • OE optical-to-electrical
  • EO electrical-to-optical
  • the processor 430 is implemented by hardware and software.
  • the processor 430 may be implemented as one or more processor chips, cores (for example, multi-core processors), FPGAs, ASICs, and DSPs.
  • the processor 430 communicates with the ingress port 410, the receiving unit 420, the transmitting unit 440, the egress port 450, and the memory 460.
  • the processor 430 includes a decoding module 470 (for example, a neural network (NN)-based decoding module 470).
  • the decoding module 470 implements the embodiments disclosed above. For example, the decoding module 470 performs, processes, prepares, or provides various encoding operations.
  • the function of the video decoding device 400 is substantially improved through the decoding module 470, and the switching of the video decoding device 400 to different states is affected.
  • the decoding module 470 is implemented by instructions stored in the memory 460 and executed by the processor 430.
  • the memory 460 includes one or more magnetic disks, tape drives, and solid-state hard disks, and can be used as an overflow data storage device for storing such programs when the program is selected to be executed, and storing instructions and data read during program execution.
  • the memory 460 may be volatile and/or non-volatile, and may be a read-only memory (ROM), a random access memory (RAM), and a three-state content addressing memory (ternary memory). content-addressable memory (TCAM) and/or static random-access memory (SRAM).
  • FIG. 5 is a simplified block diagram of an apparatus 500 provided by an exemplary embodiment.
  • the apparatus 500 may be used as either or both of the source device 12 and the destination device 14 in FIG. 1A.
  • the processor 502 in the device 500 may be a central processing unit.
  • the processor 502 may be any other type of device or multiple devices that can manipulate or process information, which is currently or will be developed in the future.
  • a single processor such as the processor 502 as shown in the figure can be used to implement the disclosed implementation, using more than one processor is faster and more efficient.
  • the memory 504 in the apparatus 500 may be a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can be used as the memory 504.
  • the memory 504 may include code and data 506 that the processor 502 accesses through the bus 512.
  • the memory 504 may further include an operating system 508 and an application program 510, and the application program 510 includes at least one program that allows the processor 502 to execute the method described herein.
  • the application program 510 may include applications 1 to N, and also include a video coding application that performs the method described herein.
  • the apparatus 500 may also include one or more output devices, such as a display 518.
  • the display 518 may be a touch-sensitive display that combines a display with a touch-sensitive element that can be used to sense touch input.
  • the display 518 may be coupled to the processor 502 through the bus 512.
  • bus 512 in the device 500 is described herein as a single bus, the bus 512 may include multiple buses.
  • the auxiliary storage may be directly coupled to other components of the device 500 or accessed through a network, and may include a single integrated unit such as a memory card or multiple units such as multiple memory cards. Therefore, the device 500 may have various configurations.
  • Neural network is a machine learning model.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes xs and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation functions of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with many hidden layers. There is no special metric for "many” here. Dividing DNN according to the position of different layers, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as It should be noted that there is no W parameter in the input layer.
  • W the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer.
  • more hidden layers make the network more capable of portraying complex situations in the real world.
  • a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
  • Convolutional neural network (convolutional neural network, CNN) is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • the deep learning architecture refers to the use of machine learning algorithms in different abstractions. There are multiple levels of learning at different levels.
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a pooling layer. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • the convolution layer can include many convolution operators.
  • the convolution operator is also called the kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can essentially It is a weight matrix. This weight matrix is usually defined in advance. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels then two pixels) along the horizontal direction on the input image... ...It depends on the value of stride), so as to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image.
  • the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column) are applied. That is, multiple matrixes of the same type.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image.
  • one weight matrix is used to extract edge information of the image
  • another weight matrix is used to extract specific colors of the image
  • another weight matrix is used to eliminate unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices of the same size have the same size, and then the multiple extracted feature maps of the same size are combined to form a convolution operation.
  • the weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.
  • the initial convolutional layer often extracts more general features, which can also be called low-level features; as the depth of the convolutional neural network deepens.
  • the features extracted by the later convolutional layers become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the convolutional layer can be a convolutional layer followed by a pooling layer, or a multi-layer convolutional layer followed by a Layer or multi-layer pooling layer.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network After the convolutional layer/pooling layer is processed, the convolutional neural network is not enough to output the required output information. Because as mentioned earlier, the convolutional layer/pooling layer only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (the required class information or other related information), the convolutional neural network needs to use the neural network layer to generate one or a group of required classes of output. Therefore, the neural network layer can include multiple hidden layers, and the parameters contained in the hidden layers can be pre-trained according to the relevant training data of the specific task type. For example, the task type can include image recognition, Image classification, image super-resolution reconstruction, etc.
  • the output layer of the entire convolutional neural network is also included.
  • the output layer has a loss function similar to the classification cross-entropy, which is specifically used to calculate the prediction error.
  • the back propagation will start to update the weight values and deviations of the aforementioned layers to reduce the loss of the convolutional neural network, and the results and ideals output by the convolutional neural network through the output layer The error between the results.
  • Recurrent Neural Networks are used to process sequence data.
  • the layers are fully connected, and the nodes in each layer are disconnected.
  • this ordinary neural network has solved many problems, it is still powerless for many problems. For example, if you want to predict what the next word of a sentence is, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called recurrent neural network is that the current output of a sequence is also related to the previous output.
  • the specific form of expression is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer are no longer unconnected but connected, and the input of the hidden layer includes not only The output of the input layer also includes the output of the hidden layer at the previous moment.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the error back-propagation algorithm is also used, but there is a difference: that is, if the RNN is networked, the parameters, such as W, are shared; this is not the case with the traditional neural network such as the above example.
  • the output of each step depends not only on the current step of the network, but also on the state of the previous steps of the network. This learning algorithm is called Back Propagation Through Time (BPTT).
  • BPTT Back Propagation Through Time
  • Important equation taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.
  • Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal until the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.
  • Generative adversarial networks is a deep learning model.
  • the model includes at least two modules: one module is a Generative Model, and the other is a Discriminative Model. Through these two modules, they learn from each other through games to produce better output.
  • Both the generative model and the discriminant model can be a neural network, specifically a deep neural network, or a convolutional neural network.
  • GAN Global System for Mobile Communications
  • D Discriminator
  • G a network that generates pictures
  • G(z) a random noise z through this noise
  • G(z) a discriminant network used to discriminate whether a picture is "real”.
  • Its input parameter is x
  • x represents a picture
  • the output D(x) represents the probability that x is a real picture. If it is 1, it means 100% is a real picture. If it is 0, it means it cannot be real. picture.
  • the goal of generating network G is to generate as real pictures as possible to deceive the discriminating network D, and the goal of discriminating network D is to distinguish between the pictures generated by G and the real pictures as much as possible Come.
  • G and D constitute a dynamic "game” process, that is, the "confrontation” in the "generative confrontation network”.
  • an excellent generative model G is obtained, which can be used to generate pictures.
  • FIGS. 7a-7e show an exemplary architecture of a target model (for example, a neural network for filtering, abbreviated as a filter network).
  • a target model for example, a neural network for filtering, abbreviated as a filter network.
  • the first pixel matrix (the value of the pixel at the corresponding position in the first pixel matrix corresponds to the brightness value of the pixel at the corresponding position in the first image block obtained by reconstruction) and the second pixel matrix (the first The pixel point at the corresponding position in the two-pixel matrix corresponds to the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the first image block) input to the filter network.
  • the filter network processes the first pixel matrix through a 3 ⁇ 3 convolutional layer (3 ⁇ 3Conv) and an activation layer (Relu), and processes the second pixel matrix through another 3 ⁇ 3 convolutional layer and another activation layer Process, and then merge the two matrices obtained after the above processing (concat), and then go through the block processing layer (Res-Block),..., block processing layer, 3 ⁇ 3 convolution layer, activation layer, 3 ⁇ 3 convolution
  • the layer obtains a residual matrix, and the pixel at the corresponding position in the residual matrix corresponds to the chrominance residual value of the pixel at the corresponding position in the filtered second image block.
  • the third pixel matrix is obtained by adding the pixel values at the corresponding positions of the first pixel matrix and the residual matrix, and the pixel points at the corresponding positions in the third pixel matrix correspond to the brightness values of the pixels at the corresponding positions in the second image block.
  • the above-mentioned block processing layer may include a 3 ⁇ 3 convolutional layer, an activation layer, and a 3 ⁇ 3 convolutional layer. After the input matrix is processed by these three layers, the processed matrix and The pixel values at the corresponding positions of the initial input matrix are added to obtain the final output matrix. As shown in Figure 7c, the above-mentioned block processing layer may include a 3 ⁇ 3 convolutional layer, an activation layer, a 3 ⁇ 3 convolutional layer, and an activation layer.
  • the input matrix is passed through a 3 ⁇ 3 convolutional layer, an activation layer and After the 3 ⁇ 3 convolutional layer is processed, the processed matrix and the pixel values at the corresponding positions of the initial input matrix are added, and finally an activation layer is passed to obtain the final output matrix.
  • the first pixel matrix (the value of the pixel at the corresponding position in the first pixel matrix corresponds to the brightness value of the pixel at the corresponding position in the first image block is reconstructed )
  • the second pixel matrix (the pixel at the corresponding position in the second pixel matrix corresponds to the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the first image block) and the pixel value at the corresponding position in the second pixel matrix or Multiply to get the input pixel matrix. Then input the input pixel matrix into the filter network.
  • the filter network passes a 3 ⁇ 3 convolution layer, an activation layer, a block processing layer,..., a block processing layer, a 3 ⁇ 3 convolution layer, an activation layer and a 3 ⁇ 3 convolution layer to input pixels
  • the matrix is processed to obtain a third pixel matrix, and the pixel point in the corresponding position in the third pixel matrix corresponds to the brightness value of the pixel in the corresponding position in the second image block.
  • the first pixel matrix (the value of the pixel at the corresponding position in the first pixel matrix corresponds to the brightness value of the pixel at the corresponding position in the first image block obtained by reconstruction) is input to the filter network, and the first Multiply the pixel value of the corresponding position in the pixel matrix and the second pixel matrix (the pixel at the corresponding position in the second pixel matrix corresponds to the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the first image block)
  • the resulting pixel matrix is also input to the filter network.
  • the filter network processes one of the inputs through a 3 ⁇ 3 convolutional layer and an activation layer, and processes the other input through another 3 ⁇ 3 convolutional layer and another activation layer, and then processes the result Combine the two matrices (concat), and then pass through the block processing layer,..., the block processing layer, the 3 ⁇ 3 convolution layer, the activation layer and the 3 ⁇ 3 convolution layer to obtain the residual matrix.
  • the corresponding position in the residual matrix The pixel point corresponds to the chrominance residual value of the pixel at the corresponding position in the filtered second image block.
  • the third pixel matrix is obtained by adding the pixel values at the corresponding positions of the first pixel matrix and the residual matrix, and the pixel points at the corresponding positions in the third pixel matrix correspond to the brightness values of the pixels at the corresponding positions in the second image block.
  • convolutional neural networks shown in Figures 7a-7e are only a few examples of convolutional neural networks. In specific applications, convolutional neural networks can also exist in the form of other network models. This application There is no specific restriction on this.
  • FIG. 8 is a flowchart showing a process 800 of a loop filtering method according to an embodiment of the present application.
  • the process 800 may be executed by the video encoder 20 or the video decoder 30, and specifically, may be executed by the loop filters 220 and 320 of the video encoder 20 or the video decoder 30.
  • the process 800 is described as a series of steps or operations. It should be understood that the process 800 may be executed in various orders and/or occur simultaneously, and is not limited to the execution order shown in FIG. 8. Assuming that a video data stream with multiple video frames is using a video encoder or a video decoder, the process 800 including the following steps is executed to filter the reconstructed image or image block.
  • the process 800 may include:
  • Step 801 Obtain a first pixel matrix, the element at the corresponding position in the first pixel matrix corresponds to the brightness value of the pixel at the corresponding position in the first image block, and the first image block is a reconstructed image block or an image block in the reconstructed image .
  • a certain image block (for example, the first image block) can be a reconstructed image obtained by dequantizing and reconstructing the encoding result of the image in the encoder, or it can be an image block in the reconstructed image, or an image block The reconstructed image block obtained after the encoding result is inversely quantized and reconstructed.
  • An image block (for example, the first image block) can be understood as a pixel matrix X, and an element at a corresponding position in the pixel matrix X can be understood as a pixel point (or pixel value) at a corresponding position in the image block.
  • the pixel value includes The brightness value of the pixel or the chroma value of the pixel).
  • the size of the image block is 64 ⁇ 64, which means that the pixel point distribution of the image block is 64 rows ⁇ 64 columns, and x(i,j) represents the pixels in the i-th row and j-th column in the image block. Point (or pixel value).
  • the input pixel matrix A includes 64 rows and 64 columns, with a total of 64 ⁇ 64 elements, and A(i,j) represents the elements in the i-th row and j-th column in the pixel matrix A.
  • A(i,j) and x(i,j) correspond (for example, A(i,j) represents the value of the pixel point x(i,j)), and the element at the corresponding position in the input pixel matrix A corresponds to (for example, represents )
  • the brightness value of the pixel point in the corresponding position in the image block, that is, the value of the element A(i,j) is the brightness value of the pixel point x(i,j).
  • the element at the corresponding position in the input pixel matrix A may also correspond to (for example, represent) other values of the pixel at the corresponding position in the image block, that is, the element A(i,j)
  • the value of can be other values of the pixel point x(i,j), such as the quantization step value corresponding to the brightness value of the pixel point x(i,j) (as described in step 802), and another example is the pixel point x(i,j).
  • the input pixel matrix A when the element at the corresponding position in the input pixel matrix A represents the brightness value of the pixel at the corresponding position in the image block, the input pixel matrix A is an example of the aforementioned first pixel matrix; or, when the input pixel matrix A The element at the corresponding position in the image block represents the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the image block.
  • the input pixel matrix A is an example of the second pixel matrix below; it should be understood that when the pixel matrix A is input
  • the element at the corresponding position in the image block represents the chromaticity value of the pixel at the corresponding position in the image block.
  • the input pixel matrix A is an example of the fifth pixel matrix below; or, when the element at the corresponding position in the input pixel matrix A represents the image block
  • the input pixel matrix A is an example of the sixth pixel matrix below.
  • the above-mentioned first image block may be an image block in a reconstructed image reconstructed by an encoder or a decoder, or may be a reconstructed image block reconstructed by an encoder or a decoder.
  • the loop filtering method in the embodiment of this application includes, but is not limited to, filtering the reconstructed image block. It should be understood that it can also be applied to filtering the reconstructed image. "Adaptability is replaced by reconstructed images, so I won’t repeat them here.
  • the first image block and the second image block in step 803 may also be in RGB format.
  • the element at the corresponding position in the first pixel matrix may correspond to (for example, represent) the corresponding element in the first image block.
  • the R value, G value, or B value of the pixel at the position, and the element at the corresponding position in the second pixel matrix may correspond to (for example, indicate) the R value, G value, or B value of the pixel at the corresponding position in the first image block
  • Step 802 Obtain a second pixel matrix, the element at the corresponding position in the second pixel matrix corresponds to the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the first image block, the size of the second pixel matrix and the first The size of the pixel matrix is equal.
  • the encoding process of the first image block by the encoder includes a quantization operation on the residual information.
  • This process involves a quantization step value, and each pixel in the first image block corresponds to a quantization step value.
  • the elements in the second pixel matrix are used to characterize the aforementioned quantization step value.
  • This application uses a filter network to implement filtering while introducing the quantization step value of each pixel of the reconstructed image block, so that the pixel matrix corresponding to the image block of the input filter network can be better filtered and the filtering effect is improved.
  • Step 803 Perform filtering processing on the input pixel matrix through a filter network to obtain an output pixel matrix.
  • the filter network is a neural network with filtering function obtained through training.
  • the output pixel matrix includes a third pixel matrix.
  • the element corresponds to the brightness value of the pixel at the corresponding position in the second image block or the brightness residual value of the pixel.
  • the second image block is the image block obtained by filtering the first image block, wherein the input pixel matrix is at least the same as the first pixel
  • the matrix is related to the second pixel matrix.
  • the input pixel matrix includes the first pixel matrix and the second pixel matrix; or, the input pixel matrix is a combination of the first pixel matrix and the second pixel matrix.
  • Matrix is a first preprocessing matrix obtained by preprocessing; or, the input pixel matrix includes a normalized matrix of the first pixel matrix and a normalized matrix of the second pixel matrix; or, the input pixel
  • the matrix is a second preprocessing matrix obtained by preprocessing the normalization matrix of the first pixel matrix and the normalization matrix of the second pixel matrix.
  • the normalized matrix refers to a matrix obtained by normalizing the values of elements at corresponding positions in the corresponding matrix.
  • the input pixel matrix is the processing object of the filter network, and the input pixel matrix may include the acquired first pixel matrix and the second pixel matrix, that is, the two pixel matrices are directly input into the filter network for filtering processing.
  • the input pixel matrix Before the input pixel matrix is input to the filter network, it can also be obtained by preprocessing and/or normalizing one or more pixel matrices according to the training data form of the filter network during training and the processing capabilities of the filter network Pixel matrix.
  • the purpose of the normalization processing is to adjust the value of the element to a uniform value interval, such as [0,1] or [-0.5,0.5], so that the calculation efficiency can be improved in the calculation of the filter network.
  • Preprocessing can include matrix addition, matrix multiplication, matrix merging (contact), etc., which can reduce the calculation amount of the filter network.
  • the input pixel matrix can include the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix, that is, the first pixel matrix and the second pixel matrix are respectively normalized, and then the normalized matrix Enter the filter network for processing.
  • the input pixel matrix may be a preprocessing matrix obtained by adding, multiplying or combining the first pixel matrix and the second pixel matrix.
  • the input pixel matrix may be a preprocessing matrix obtained by adding, multiplying, or merging the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix.
  • Matrix addition means adding the values of the elements at the corresponding positions of the two matrices.
  • Matrix multiplication means multiplying the values of the elements at the corresponding positions of the two matrices.
  • Matrix merging means increasing the number of channels of the matrix. For example, one matrix is a two-dimensional matrix with a size of m ⁇ n, and the other matrix is also a two-dimensional matrix with a size of m ⁇ n. The two matrices are combined to obtain Is a three-dimensional matrix whose size is m ⁇ n ⁇ 2.
  • the output pixel matrix B output by the filter network corresponds to the filtered image block (for example, the second image block), that is, the element B(i,j) in the output pixel matrix and the filtered image block
  • the pixel y(i,j) in the corresponding, in an example, the value of the element B(i,j) can represent the brightness value of the pixel y(i,j).
  • the element at the corresponding position in the pixel matrix B may also correspond to (for example, represent) other values of the pixel at the corresponding position in the filtered image block, that is, the element B(i
  • the value of ,j) can be other values of pixel y(i,j), such as the luminance residual value of pixel y(i,j), or the chromaticity value of pixel y(i,j), Another example is the chromaticity residual value of the pixel point y(i, j), which is not specifically limited in this application.
  • the output pixel matrix B is an example of the third pixel matrix; or, when the output pixel The element at the corresponding position in the matrix B represents the brightness residual value of the pixel at the corresponding position in the filtered image block.
  • the output pixel matrix B is another example of the third pixel matrix; it should be understood that when the output pixel The element at the corresponding position in the matrix B represents the chromaticity value of the pixel at the corresponding position in the filtered image block.
  • the output pixel matrix B is an example of the seventh pixel matrix; or, when the corresponding position in the output pixel matrix B is The element represents the chrominance residual value of the pixel at the corresponding position in the filtered image block, and the output pixel matrix B is an example of the eighth pixel matrix.
  • the input pixel matrix may include the first pixel matrix and the second pixel matrix, as shown in FIG. 9a; or, the input pixel matrix may be based on at least the first pixel matrix and the second pixel matrix.
  • the matrix obtained by the pixel matrix is shown in Figure 9b, 9c or 9d.
  • the first pixel matrix and the second pixel matrix are normalized, for example, the value range of each pixel in the first pixel matrix is 0 -255, these values can be normalized to 0 to 1 or -0.5 to 0.5, that is, the input pixel matrix is a normalized matrix.
  • the third pixel matrix needs to be denormalized, for example, the value of the elements in the third pixel matrix is denormalized to be between 0 and 255.
  • the input pixel matrix may include a first pixel matrix, a second pixel matrix, and a fifth pixel matrix, as shown in FIG. 9e; or, the input pixel matrix may include preprocessing the first pixel matrix and the second pixel matrix to obtain The first preprocessing matrix and the fifth pixel matrix of are shown in Figure 9g.
  • the input pixel matrix may include a first pixel matrix, a second pixel matrix, a fifth pixel matrix, and a sixth pixel matrix, as shown in FIG. 9f; or, the input pixel matrix It may include a first preprocessing matrix and a third preprocessing matrix obtained by preprocessing the fifth pixel matrix and the sixth pixel matrix, as shown in FIG.
  • the input matrix may include the first preprocessing matrix and the fifth pixel matrix.
  • Matrix and a sixth pixel matrix; or, the input pixel matrix may include a first pixel matrix, a second pixel matrix, and a third preprocessing matrix.
  • the first pixel matrix, the second pixel matrix, the fifth pixel matrix, and the sixth pixel matrix are normalized, for example, the value of each element in the first pixel matrix
  • the range of is 0 ⁇ 255, these values can be normalized to 0 ⁇ 1 or -0.5 ⁇ 0.5, that is, the input pixel matrix is a normalized matrix
  • the output pixel The matrix performs denormalization processing, for example, denormalizes the value of the element in the seventh pixel matrix to be between 0 and 255.
  • Figures 9a-9l show several exemplary examples of the input pixel matrix of the filter network.
  • the first pixel matrix and the second pixel matrix are directly input to the filter network.
  • the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the brightness value of the pixel at the corresponding position in the first image block
  • the second pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block.
  • the input pixel matrix includes a first pixel matrix and a second pixel matrix.
  • the first pixel matrix and the second pixel matrix are added together and then input to the filter network.
  • the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the brightness value of the pixel at the corresponding position in the first image block
  • the second pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block.
  • the quantization step value corresponding to the brightness value of the pixel at the corresponding position in.
  • the values of the elements at the corresponding positions in the first pixel matrix and the second pixel matrix are added to obtain an N ⁇ M first preprocessing matrix.
  • the input pixel matrix is the first preprocessing matrix.
  • the input pixel matrix is input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M.
  • the element at the corresponding position in the third pixel matrix may correspond to the brightness value or brightness of the pixel at the corresponding position in the second image block The residual value.
  • the first pixel matrix and the second pixel matrix are combined and then input to the filter network.
  • the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the brightness value of the pixel at the corresponding position in the first image block
  • the second pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block.
  • the first pixel matrix and the second pixel matrix are combined to obtain a first preprocessing matrix of N ⁇ M ⁇ 2.
  • the input pixel matrix is the first preprocessing matrix.
  • the input pixel matrix is input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M.
  • the element at the corresponding position in the third pixel matrix may correspond to the brightness value or brightness of the pixel at the corresponding position in the second image block The residual value.
  • the first pixel matrix and the second pixel matrix are multiplied and then input to the filter network.
  • the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the brightness value of the pixel at the corresponding position in the first image block
  • the second pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block.
  • the quantization step value corresponding to the brightness value of the pixel at the corresponding position in.
  • the input pixel matrix is the first preprocessing matrix.
  • the input pixel matrix is input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M.
  • the element at the corresponding position in the third pixel matrix may correspond to the brightness value or brightness of the pixel at the corresponding position in the second image block The residual value.
  • the first pixel matrix, the second pixel matrix, and the fifth pixel matrix are directly input to the filter network.
  • the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the brightness value of the pixel at the corresponding position in the first image block
  • the second pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block.
  • the quantization step value corresponding to the brightness value of the pixel at the corresponding position in, the fifth pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the chrominance value of the pixel at the corresponding position in the first image block.
  • the input pixel matrix includes a first pixel matrix, a second pixel matrix, and a fifth pixel matrix.
  • the first pixel matrix, the second pixel matrix, and the fifth pixel matrix are input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M, and the element at the corresponding position in the third pixel matrix corresponds to the second image block The brightness value or brightness residual value of the pixel at the corresponding position.
  • the seventh pixel matrix or the eighth pixel matrix may also be output, and the element at the corresponding position in the seventh pixel matrix corresponds to the chromaticity value of the pixel at the corresponding position in the second image block, and the eighth pixel matrix The element in the corresponding position in corresponds to the chrominance residual value of the pixel in the corresponding position in the second image block.
  • the first pixel matrix and the second pixel matrix are combined and then input to the filter network, and the fifth pixel matrix is directly input to the filter network.
  • the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the brightness value of the pixel at the corresponding position in the first image block
  • the second pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block.
  • the first pixel matrix and the second pixel matrix are combined to obtain a first preprocessing matrix of N ⁇ M ⁇ 2.
  • the input pixel matrix includes the first preprocessing matrix and the fifth pixel matrix.
  • the input pixel matrix is input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M.
  • the element at the corresponding position in the third pixel matrix may correspond to the brightness value or brightness of the pixel at the corresponding position in the second image block The residual value.
  • the seventh pixel matrix or the eighth pixel matrix may also be output, and the element at the corresponding position in the seventh pixel matrix corresponds to the chromaticity value of the pixel at the corresponding position in the second image block, and the eighth pixel matrix
  • the element in the corresponding position in corresponds to the chrominance residual value of the pixel in the corresponding position in the second image block.
  • the first pixel matrix, the second pixel matrix, the fifth pixel matrix, and the sixth pixel matrix are directly input to the filter network.
  • the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the brightness value of the pixel at the corresponding position in the first image block
  • the second pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block.
  • the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the fifth pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the chromaticity value of the pixel at the corresponding position in the first image block
  • the sixth pixel matrix It is N ⁇ M
  • the element at the corresponding position corresponds to the quantization step value corresponding to the chrominance value of the pixel at the corresponding position in the first image block.
  • the input pixel matrix includes a first pixel matrix, a second pixel matrix, a fifth pixel matrix, and a sixth pixel matrix.
  • the first pixel matrix, the second pixel matrix, the fifth pixel matrix, and the sixth pixel matrix are input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M, and the element at the corresponding position in the third pixel matrix corresponds to The brightness value or the brightness residual value of the pixel at the corresponding position in the second image block.
  • the seventh pixel matrix or the eighth pixel matrix may also be output, and the element at the corresponding position in the seventh pixel matrix corresponds to the chromaticity value of the pixel at the corresponding position in the second image block, and the eighth pixel matrix
  • the element in the corresponding position in corresponds to the chrominance residual value of the pixel in the corresponding position in the second image block.
  • the seventh pixel matrix or the eighth pixel matrix may also be output, and the element at the corresponding position in the seventh pixel matrix corresponds to the chromaticity value of the pixel at the corresponding position in the second image block, and the eighth pixel matrix The element in the corresponding position in corresponds to the chrominance residual value of the pixel in the corresponding position in the second image block.
  • the first pixel matrix and the second pixel matrix are combined and then input to the filter network, and the fifth pixel matrix and the sixth pixel matrix are combined and then input to the filter network.
  • the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the brightness value of the pixel at the corresponding position in the first image block
  • the second pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block.
  • the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the fifth pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the chromaticity value of the pixel at the corresponding position in the first image block
  • the sixth pixel matrix It is N ⁇ M
  • the element at the corresponding position corresponds to the quantization step value corresponding to the chrominance value of the pixel at the corresponding position in the first image block.
  • the first pixel matrix and the second pixel matrix are combined to obtain a first preprocessing matrix of N ⁇ M ⁇ 2
  • the fifth pixel matrix and the sixth pixel matrix are combined to obtain a third preprocessing matrix of N ⁇ M ⁇ 2.
  • the input pixel matrix includes the first preprocessing matrix and the third preprocessing matrix.
  • the input pixel matrix is input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M.
  • the element at the corresponding position in the third pixel matrix may correspond to the brightness value or brightness of the pixel at the corresponding position in the second image block The residual value.
  • the seventh pixel matrix or the eighth pixel matrix may also be output, and the element at the corresponding position in the seventh pixel matrix corresponds to the chromaticity value of the pixel at the corresponding position in the second image block, and the eighth pixel matrix
  • the element in the corresponding position in corresponds to the chrominance residual value of the pixel in the corresponding position in the second image block.
  • the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix are directly input to the filter network.
  • the normalized matrix of the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the brightness value of the pixel at the corresponding position in the first image block
  • the normalized matrix of the second pixel matrix It is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the first image block.
  • the input pixel matrix includes the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix.
  • the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix are added together and then input to the filter network.
  • the normalized matrix of the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the brightness value of the pixel at the corresponding position in the first image block
  • the normalized matrix of the second pixel matrix It is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the first image block.
  • the values of elements at corresponding positions in the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix are added to obtain the N ⁇ M second preprocessing matrix.
  • the input pixel matrix is the second preprocessing matrix.
  • the input pixel matrix is input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M.
  • the element at the corresponding position in the third pixel matrix may correspond to the brightness value or brightness of the pixel at the corresponding position in the second image block
  • the normalized value of the residual value needs to undergo denormalization processing to correspond to the brightness value or the brightness residual value of the pixel at the corresponding position in the second image block.
  • the normalized matrix of the first pixel matrix, the normalized matrix of the second pixel matrix, the normalized matrix of the fifth pixel matrix, and the normalized matrix of the sixth pixel matrix are directly input to the filter The internet.
  • the normalized matrix of the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the brightness value of the pixel at the corresponding position in the first image block
  • the normalized matrix of the second pixel matrix Is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the first image block
  • the normalized matrix of the fifth pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the chrominance value of the pixel at the corresponding position in the first image block
  • the normalized matrix of the sixth pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block
  • the input pixel matrix includes the normalized matrix of the first pixel matrix, the normalized matrix of the second pixel matrix, the normalized matrix of the fifth pixel matrix, and the normalized matrix of the sixth pixel matrix.
  • the pixel matrix is N ⁇ M, and the element at the corresponding position in the third pixel matrix corresponds to the normalized value of the brightness value or the brightness residual value of the pixel at the corresponding position in the second image block, which needs to be denormalized
  • the processing capacity corresponds to the brightness value or the brightness residual value of the pixel at the corresponding position in the second image block.
  • the seventh pixel matrix or the eighth pixel matrix can also be output, and the element at the corresponding position in the seventh pixel matrix corresponds to the normalized value of the chromaticity value of the pixel at the corresponding position in the second image block, The element at the corresponding position in the eighth pixel matrix corresponds to the normalized value of the chrominance residual value of the pixel at the corresponding position in the second image block.
  • the normalized matrix of the first pixel matrix and the normalized matrix of the second pixel matrix are combined and then input to the filter network, and the normalized matrix of the fifth pixel matrix and the normalized matrix of the sixth pixel matrix are unified After merging the matrices, enter the filter network.
  • the normalized matrix of the first pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the brightness value of the pixel at the corresponding position in the first image block
  • the normalized matrix of the second pixel matrix Is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the quantization step value corresponding to the brightness value of the pixel at the corresponding position in the first image block
  • the normalized matrix of the fifth pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the normalized value of the chrominance value of the pixel at the corresponding position in the first image block
  • the normalized matrix of the sixth pixel matrix is N ⁇ M
  • the element at the corresponding position corresponds to the first image block
  • the input pixel matrix includes the second preprocessing matrix and the fourth preprocessing matrix.
  • the input pixel matrix is input to the filter network to perform the filtering operation, and the output third pixel matrix is N ⁇ M.
  • the element at the corresponding position in the third pixel matrix may correspond to the brightness value or brightness of the pixel at the corresponding position in the second image block
  • the normalized value of the residual value needs to undergo denormalization processing to correspond to the brightness value or the brightness residual value of the pixel at the corresponding position in the second image block.
  • the seventh pixel matrix or the eighth pixel matrix can also be output, and the element at the corresponding position in the seventh pixel matrix corresponds to the normalized value of the chromaticity value of the pixel at the corresponding position in the second image block,
  • the element at the corresponding position in the eighth pixel matrix corresponds to the normalized value of the chrominance residual value of the pixel at the corresponding position in the second image block.
  • Figures 9a-9l exemplarily show examples of data input to the filter network.
  • This application does not specifically limit the data input to the filter network, nor does it make specific preprocessing before entering the filter network.
  • the preprocessing of the two pixel matrices also includes addition and multiplication.
  • the above-mentioned preprocessing method adopted for the pixel matrix is also implemented by the filter network.
  • This application uses a filter network to achieve filtering while introducing the quantization step value of each pixel of the reconstructed image block, so that it can better guide the filtering of the pixel matrix of the input filter network, and can improve the reconstructed image of various qualities. Filtering effect.
  • FIG. 10 is a schematic structural diagram of a decoding device 1000 according to an embodiment of the present application.
  • the decoding device 1000 may correspond to the video encoder 20 or the video decoder 30.
  • the decoding device 1000 includes a reconstruction module 1001, a quantization/dequantization module 1002, and a loop filter module 1003.
  • the reconstruction module 1001 is used to obtain the first pixel matrix; the quantization/dequantization module 1002 is used to obtain the second pixel. Matrix; the loop filter module 1003 is used to implement the method embodiment shown in FIG. 8.
  • the reconstruction module 1001 may correspond to the reconstruction unit 214 in FIG. 2 or the reconstruction unit 314 in FIG. 3; in an example, the quantization/dequantization module 1002 may correspond to the reconstruction unit 314 in FIG.
  • the quantization unit 208 may correspond to the dequantization unit 310 in FIG. 3; in an example, the loop filter module 1003 may correspond to the loop filter module 220 in FIG. 2 or the loop filter module in FIG. 3 Module 320.
  • the computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium that includes any medium that facilitates the transfer of a computer program from one place to another (for example, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this application.
  • the computer program product may include a computer-readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or structures that can be used to store instructions or data Any other media that can be accessed by the computer in the form of desired program code. And, any connection is properly termed a computer-readable medium.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave to transmit instructions from a website, server, or other remote source
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are actually directed to non-transitory tangible storage media.
  • magnetic disks and optical discs include compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), and Blu-ray discs. Disks usually reproduce data magnetically, while optical discs use lasers to reproduce data optically. data. Combinations of the above should also be included in the scope of computer-readable media.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • the term "processor” as used herein may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein.
  • the functions described in the various illustrative logical blocks, modules, and steps described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or combined Into the combined codec.
  • the technology can be fully implemented in one or more circuits or logic elements.
  • the technology of this application can be implemented in a variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a set of ICs (for example, chipsets).
  • ICs integrated circuits
  • a set of ICs for example, chipsets.
  • Various components, modules, or units are described in this application to emphasize the functional aspects of the device for performing the disclosed technology, but they do not necessarily need to be implemented by different hardware units.
  • various units can be combined with appropriate software and/or firmware in the codec hardware unit, or by interoperating hardware units (including one or more processors as described above). supply.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供了环路滤波方法及装置,涉及基于人工智能(AI)的视频或图像压缩技术领域,具体涉及基于神经网络的视频压缩技术领域。该方法包括:获取第一像素矩阵,第一像素矩阵中的对应位置的像素点的值对应于第一图像块中的对应位置的像素的亮度值;获取第二像素矩阵,第二像素矩阵中的对应位置的像素点对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值;通过滤波网络对输入像素矩阵进行滤波处理得到输出像素矩阵,滤波网络为经训练得到的具有滤波功能的神经网络,输出像素矩阵包括第三像素矩阵,输入像素矩阵至少与第一像素矩阵和所述第二像素矩阵相关。本申请能够针对各种质量的重构图像提升滤波效果。

Description

环路滤波方法和装置
本申请要求于2020年6月10日提交中国专利局、申请号为202010525274.8、申请名称为“环路滤波方法和装置”,以及于2020年9月27日提交中国专利局、申请号为202011036512.5、申请名称为“环路滤波方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及基于人工智能(AI)的视频或图像压缩技术领域,尤其涉及一种环路滤波方法及装置。
背景技术
视频编码(视频编码和解码)广泛用于数字视频应用,例如广播数字电视、互联网和移动网络上的视频传输、视频聊天和视频会议等实时会话应用、DVD和蓝光光盘、视频内容采集和编辑系统以及可携式摄像机的安全应用。
即使在影片较短的情况下也需要对大量的视频数据进行描述,当数据要在带宽容量受限的网络中发送或以其它方式传输时,这样可能会造成困难。因此,视频数据通常要先压缩然后在现代电信网络中传输。由于内存资源可能有限,当在存储设备上存储视频时,视频的大小也可能成为问题。视频压缩设备通常在信源侧使用软件和/或硬件,以在传输或存储之前对视频数据进行编码,从而减少用来表示数字视频图像所需的数据量。然后,压缩的数据在目的地侧由视频解压缩设备接收。在有限的网络资源以及对更高视频质量的需求不断增长的情况下,需要改进压缩和解压缩技术,这些改进的技术能够提高压缩率而几乎不影响图像质量。
近年来,将深度学习应用于在图像和视频编解码领域逐渐成为一种趋势。采用混合架构的视频编码器和视频解码器中,可以通过环路滤波模块去除重建图像中的块效应、振铃效应等编码失真。相关技术中借由神经网络实现环路滤波模块的滤波功能,进而对输入至神经网络的重建图像或重建图像块信息进行滤波处理,得到滤波后的重建图像或重建图像块。但是该方法对不同质量的输入图像或图像块无法均达到良好的滤波效果。
发明内容
本申请提供一种环路滤波方法及装置,能够针对各种质量的重构图像提升滤波效果。
上述和其它目标通过独立权利要求的主题实现。其它实现方式在从属权利要求、具体实施方式和附图中显而易见。
具体实施例在所附独立权利要求中概述,其它实施例在从属权利要求中概述。
第一方面,本申请涉及环路滤波方法。所述方法由编码器或解码器中的环路滤波器执行。所述方法包括:
获取第一像素矩阵,所述第一像素矩阵中的对应位置的元素对应于(例如表示)第一图像块中的对应位置的像素的亮度值,所述第一图像块为重建图像块或者重建图像中的图像块;获取第二像素矩阵,所述第二像素矩阵中的对应位置的元素对应于(例如表示)所 述第一图像块中的对应位置的像素的亮度值所对应的量化步长值,所述第二像素矩阵的尺寸和所述第一像素矩阵的尺寸相等;通过滤波网络对输入像素矩阵进行滤波处理得到输出像素矩阵,所述滤波网络为经训练得到的具有滤波功能的神经网络,所述输出像素矩阵包括第三像素矩阵,所述第三像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的亮度值或者像素的亮度残差值,所述第二图像块为所述第一图像块经滤波后得到的图像块,其中所述输入像素矩阵至少与所述第一像素矩阵和所述第二像素矩阵相关。
某个图像块(例如第一图像块)可以理解为像素矩阵X,该像素矩阵X中的对应位置的元素可以理解为该图像块中的对应位置的像素点(或者像素值,例如像素值包括像素的亮度值或像素的色度值)。示例性的,该图像块的尺寸为64×64,表示该图像块的像素点分布为64行×64列,x(i,j)表示该图像块中的第i行、第j列的像素点(或者像素值)。与之对应,输入像素矩阵A包括64行和64列,共有64×64个元素,A(i,j)表示该像素矩阵A中的第i行、第j列的元素。A(i,j)和x(i,j)对应(例如A(i,j)表示像素点x(i,j)的值),输入像素矩阵A中的对应位置的元素对应于(例如表示)该图像块中的对应位置的像素点的亮度值,即表示元素A(i,j)的取值是像素点x(i,j)的亮度值。可选的,在另一种示例下,输入像素矩阵A中的对应位置的元素也可以对应于(例如表示)该图像块中的对应位置的像素的其他值,即元素A(i,j)的取值可以是像素点x(i,j)的其他值,例如像素点x(i,j)的亮度值对应的量化步长值,又例如像素点x(i,j)的色度值,又例如像素点x(i,j)的色度值对应的量化步长值,又例如像素点x(i,j)的亮度残差值,又例如像素点x(i,j)的色度残差值等,对此本申请不做具体限定。应当理解的是,当输入像素矩阵A中的对应位置的元素表示该图像块中的对应位置的像素的亮度值,输入像素矩阵A就是前述第一像素矩阵的示例;或者,当输入像素矩阵A中的对应位置的元素表示该图像块中的对应位置的像素的亮度值对应的量化步长值,输入像素矩阵A就是前述第二像素矩阵的示例;应当理解的是,当输入像素矩阵A中的对应位置的元素表示该图像块中的对应位置的像素的色度值,输入像素矩阵A就是第五像素矩阵的示例;或者,当输入像素矩阵A中的对应位置的元素表示该图像块中的对应位置的像素的色度值对应的量化步长值,输入像素矩阵A就是第六像素矩阵的示例。
同理,滤波网络输出的输出像素矩阵B和经过滤波后的图像块(例如第二图像块)对应,即输出像素矩阵中的元素B(i,j)和经过滤波后的图像块中的像素y(i,j)对应,在一种示例下,元素B(i,j)的值可以表示像素y(i,j)的亮度值。可选的,在另一种示例下,像素矩阵B中的对应位置的元素也可以对应于(例如表示)该经过滤波后的图像块中的对应位置的像素的其他值,即元素B(i,j)的取值可以是像素点y(i,j)的其他值,例如像素点y(i,j)的亮度残差值,又例如像素点y(i,j)的色度值,又例如像素点y(i,j)的色度残差值等,对此本申请不做具体限定。应当理解的是,当输出像素矩阵B中的对应位置的元素表示经过滤波后的图像块中的对应位置的像素的亮度值,输出像素矩阵B是第三像素矩阵的示例;或者,当输出像素矩阵B中的对应位置的元素表示经过滤波后的图像块中的对应位置的像素的亮度残差值,输出像素矩阵B是第三像素矩阵的另一种示例;应当理解的是,当输出像素矩阵B中的对应位置的元素表示经过滤波后的图像块中的对应位置的像素的色度值,输出像素矩阵B是第七像素矩阵的示例;或者,当输出像素矩阵B中的对应位置的元素表示该经过滤波后的图像块中的对应位置的像素的色度残差值,输出像素矩阵B是第八像素矩阵 的示例。
上述第一图像块可以是编码器或解码器重建得到的重建图像中的一个图像块,也可以是编码器或解码器重建得到的重建图像块。本申请实施例的环路滤波方法包括但不限于对重建图像块进行滤波处理,应当理解的是,也可以适用于对重建图像进行滤波处理,即将本申请实施例的方法中的“重建图像块”适应性替换为重建图像,这里不再赘述。
需要说明的是,第一图像块和第二图像块还可以采用RGB格式,此时第一像素矩阵中的对应位置的元素可以对应于(例如表示)第一图像块中的对应位置的像素的R值、G值或者B值,第二像素矩阵中的对应位置的元素可以对应于(例如表示)第一图像块中的对应位置的像素的R值、G值或者B值对应的量化步长值,或者三者共同采用的量化步长值。
本申请采用滤波网络实现滤波的同时引入了重建图像块的每个像素的量化步长值,从而能够更好的对输入滤波网络的对应于图像块的像素矩阵进行滤波处理,提升滤波效果。
在一种可能的实现方式中,所述输入像素矩阵包括所述第一像素矩阵和所述第二像素矩阵;或者,所述输入像素矩阵为对所述第一像素矩阵和所述第二像素矩阵进行预处理得到的第一预处理矩阵;或者,所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵;或者,所述输入像素矩阵为对所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵进行预处理得到的第二预处理矩阵。所述归一化矩阵是指将对应的矩阵中的对应位置的元素的取值进行归一化处理得到的矩阵。
输入像素矩阵是滤波网络的处理对象,输入像素矩阵可以包括获取到的上述第一像素矩阵和第二像素矩阵,即直接将这两个像素矩阵输入滤波网络进行滤波处理。
输入像素矩阵在被输入滤波网络之前,也可以是根据滤波网络在训练时的训练数据形式,以及滤波网络的处理能力,对一个或多个像素矩阵进行预处理和/或归一化处理得到的像素矩阵。其中,归一化处理的目的是为了使元素的值调整为一个统一的取值区间,例如[0,1]或者[-0.5,0.5],这样在滤波网络的计算中可以提高运算效率。预处理可以包括矩阵相加、矩阵相乘、矩阵合并(contact)等,这样可以减少滤波网络的计算量。因此输入像素矩阵可以包括第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵,即将第一像素矩阵和第二像素矩阵分别进行归一化处理,再将其归一化矩阵输入滤波网络进行处理。或者,输入像素矩阵可以是对第一像素矩阵和第二像素矩阵进行相加、相乘或合并后得到的预处理矩阵。又或者,输入像素矩阵可以是对第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵进行相加、相乘或合并后得到的预处理矩阵。
矩阵相加表示将两个矩阵对应位置的元素的值相加。矩阵相乘表示将两个矩阵对应位置的元素的值相乘。矩阵合并(contact)表示将矩阵的通道数增加,例如,一个矩阵是二维矩阵,其尺寸为m×n,另一个矩阵也是二维矩阵,其尺寸也是m×n,这两个矩阵合并得到的是三维矩阵,其尺寸为m×n×2。
在一种可能的实现方式中,所述第三像素矩阵中的对应位置的元素表示所述第二图像块中的对应位置的像素的亮度值。
在一种可能的实现方式中,所述第三像素矩阵中的对应位置的元素表示所述第二图像块中的对应位置的像素的亮度残差值;所述方法还包括:将所述第一像素矩阵和所述第三像素矩阵中的对应位置的元素的值相加得到第四像素矩阵,所述第四像素矩阵中的对应位 置的元素对应于(例如表示)所述第二图像块中的对应位置的像素的亮度值。
根据滤波网络在训练时的训练数据形式,以及滤波网络的处理能力,输出滤波网络的第三像素矩阵中的元素可以表示两个含义,其中一个是对应于第二图像块中对应位置的像素的亮度值,即滤波网络可以直接输出表征滤波后的第二图像块的亮度值的第三像素矩阵;其中另一个是第二图像块中对应位置的像素的亮度残差值,即滤波网络输出表征滤波后的第二图像块的亮度的残差值的第三像素矩阵,此时需要对该第三像素矩阵进行进一步处理才能得到滤波后的第二图像块的亮度值的第四像素矩阵,该处理是将未经滤波的第一图像块的表征亮度值的第一像素矩阵和表征滤波后的第二图像块的亮度残差值的第三像素矩阵相加,得到表征滤波后的第二图像块的亮度值的第四像素矩阵。这样不对滤波网络做具体限定,可以是滤波网络存在更多的可能性。
在一种可能的实现方式中,当所述输入像素矩阵为归一化矩阵时,所述方法还包括:对所述第三像素矩阵中的对应位置的元素的值进行反归一化处理。
在一种可能的实现方式中,当所述输入像素矩阵经过归一化处理时,所述将所述第一像素矩阵和所述第三像素矩阵中的对应位置的元素的值相加得到第四像素矩阵,包括:将所述第一像素矩阵和经过反归一化处理的第三像素矩阵中的对应位置的元素的值相加得到所述第四像素矩阵。
如上所述,如果在将矩阵输入滤波网络之前,对其进行了归一化处理,将矩阵中的所有元素的值规范至同一个区间内,这是为了提高运算效率,但元素的值已经不能表征图像块中的对应位置的像素的具有含义的值。因此为了适应后续的图像处理过程,在滤波网络输出矩阵后,需要对该矩阵进行反归一化处理,恢复矩阵中的元素所表征的含义。
在一种可能的实现方式中,所述方法还包括:获取第五像素矩阵,所述第五像素矩阵中的对应位置的元素对应于(例如表示)所述第一图像块中的对应位置的像素的色度值;相应地,所述输入像素矩阵至少与所述第一像素矩阵、所述第二像素矩阵和所述第五像素矩阵相关。在一种可能的实现方式中,所述输入像素矩阵包括所述第一像素矩阵、所述第二像素矩阵和所述第五像素矩阵;或者,所述输入像素矩阵包括对所述第一像素矩阵和所述第二像素矩阵进行预处理得到的第一预处理矩阵和所述第五像素矩阵;或者,所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵、所述第二像素矩阵的归一化矩阵和所述所第五像素矩阵的归一化矩阵;或者,所述输入像素矩阵包括对所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵进行预处理得到的第二预处理矩阵和所述第五像素矩阵的归一化矩阵。
除了表征第一图像块中的像素的亮度值和亮度值对应的量化步长值的两个像素矩阵外,本申请还可以在滤波网络中加入表征第一图像块中的像素的色度值作为滤波的参考因素。第一图像块中的像素的亮度值和色度值具有相关性,因此在滤波时,像素的亮度值和色度值可以对彼此起到辅助滤波的效果。因此训练滤波网络时,训练数据中加上图像块的色度矩阵,这样滤波网络可以在滤波时再参考色度信息,提高滤波效果。因此本申请还可以在输入像素矩阵中加上表征第一图像块中的像素的色度值的第五像素矩阵的因素。而输入像素矩阵的具体内容和获取方式可参见上述描述,此处不再赘述。
在一种可能的实现方式中,所述方法还包括:获取第六像素矩阵,所述第六像素矩阵中的对应位置的元素对应于(例如表示)所述第一图像块中的对应位置的像素的色度值对 应的量化步长值;相应地,所述输入像素矩阵至少与所述第一像素矩阵、所述第二像素矩阵、所述第五像素矩阵和所述第六像素矩阵相关。
除了表征第一图像块中的像素的亮度值、亮度值对应的量化步长值以及色度值的三个像素矩阵外,本申请还可以在滤波网络中加入表征第一图像块中的像素的色度值对应的量化步长作为滤波的参考因素。训练滤波网络时,训练数据中再加上图像块的色度值对应的量化步长矩阵,这样滤波网络可以在滤波时基于该量化步长矩阵进一步细化神经网络的滤波功能,达到更精确的滤波效果。因此本申请还可以在输入像素矩阵中加上表征第一图像块中的像素的色度值对应的量化步长值的第六像素矩阵的因素。而输入像素矩阵的具体内容和获取方式可参见上述描述,此处不再赘述。
在一种可能的实现方式中,所述输入像素矩阵包括所述第一像素矩阵、所述第二像素矩阵、所述第五像素矩阵和所述第六像素矩阵;或者,所述输入像素矩阵包括对所述第一像素矩阵和所述第二像素矩阵进行预处理得到的第一预处理矩阵、所述第五像素矩阵和所述第六像素矩阵;或者,所述输入像素矩阵包括所述第一像素矩阵、所述第二像素矩阵和对所述第五像素矩阵和所述第六像素矩阵进行预处理得到的第三预处理矩阵;或者,所述输入像素矩阵包括所述第一预处理矩阵和所述第三预处理矩阵;或者,所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵、所述第二像素矩阵的归一化矩阵、所述第五像素矩阵的归一化矩阵和所述第六像素矩阵的归一化矩阵;或者,所述输入像素矩阵包括对所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵进行预处理得到的第二预处理矩阵、所述第五像素矩阵的归一化矩阵和所述第六像素矩阵的归一化矩阵;或者,所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵、所述第二像素矩阵的归一化矩阵和对所述第五像素矩阵的归一化矩阵和所述第六像素矩阵的归一化矩阵进行预处理得到的第四预处理矩阵;或者,所述输入像素矩阵包括所述第二预处理矩阵和所述第四预处理矩阵。
同理,有了色度信息,训练滤波网络时还可以加上色度值对应的量化步长值,因此本申请还可以在输入像素矩阵中加上表征第一图像块中的像素的色度值对应的量化步长值的第六像素矩阵的因素。而输入像素矩阵的具体内容和获取方式也可参见上述描述,此处不再赘述。
在一种可能的实现方式中,所述通过滤波网络对输入像素矩阵进行滤波处理得到输出像素矩阵包括:通过所述滤波网络对所述输入像素矩阵进行滤波处理得到第三像素矩阵和第七像素矩阵,所述第七像素矩阵中的对应位置的元素对应于(例如表示)所述第二图像块中的对应位置的像素的色度值。
有了表征第一图像块中的像素的色度值的第五像素矩阵,甚至有了表征第一图像块中的像素的色度值对应的量化步长值的第六像素矩阵,滤波网络可以被训练得能够具有滤波输出用于表征滤波后的第二图像块中的像素的色度值的第七像素矩阵,这样滤波网络可以分别对第一图像块的亮度分量和色度分量进行滤波处理,实现第一图像块在不同维度上的滤波处理。
在一种可能的实现方式中,当所述输入像素矩阵为归一化矩阵时,所述方法还包括:对所述第七像素矩阵中的对应位置的元素的值进行反归一化处理。
反归一化处理的原理和上述第三像素矩阵的反归一化处理类似,此处不再赘述。
在一种可能的实现方式中,所述通过滤波网络对输入像素矩阵进行滤波处理得到第三像素矩阵包括:通过所述滤波网络对所述输入像素矩阵进行滤波处理得到第三像素矩阵和第八像素矩阵,所述第八像素矩阵中的对应位置的元素对应于(例如表示)所述第二图像块中的对应位置的像素的色度残差值;将所述第五像素矩阵和所述第八像素矩阵中的对应位置的元素的值相加得到第九像素矩阵,所述第九像素矩阵中的对应位置的元素对应于(例如表示)所述第二图像块中的对应位置的像素的色度值。
在一种可能的实现方式中,当所述输入像素矩阵为归一化矩阵时,所述方法还包括:对所述第八像素矩阵中的对应位置的元素的值进行反归一化处理;相应地,所述将所述第五像素矩阵和所述第八像素矩阵中的对应位置的元素的值相加得到第九像素矩阵,包括:将所述第五像素矩阵和经过反归一化处理的第八像素矩阵中的对应位置的元素的值相加得到所述第九像素矩阵。
同理,滤波网络可以输出表征第二图像块的亮度残差值的第三像素矩阵,也可以输出表征第二图像块的色度残差值的第八像素矩阵,而输出该第八像素矩阵的后续处理和第三像素矩阵的后续处理类似,此处不再赘述。
在一种可能的实现方式中,所述预处理包括:将两个矩阵中的对应位置上的元素相加;或者,将两个矩阵在深度方向上合并;或者,将两个矩阵中的对应位置上的元素相乘。
在一种可能的实现方式中,所述方法还包括:获取训练矩阵集合,其中所述训练矩阵集合包括多个图像块的滤波前亮度矩阵(即未经过滤波的亮度矩阵)、量化步长矩阵和滤波后亮度矩阵(即经过滤波的亮度矩阵),所述滤波前亮度矩阵中的对应位置的元素对应于(例如表示)对应图像块中的对应位置的像素的滤波前的亮度值,所述量化步长矩阵中的对应位置的元素对应于(例如表示)对应图像块中的对应位置的像素的亮度值对应的量化步长值,所述滤波后亮度矩阵中的对应位置的元素对应于(例如表示)对应图像块中的对应位置的像素的滤波后的亮度值;根据所述训练矩阵集合训练得到所述滤波网络。
在一种可能的实现方式中,所述训练矩阵集合还包括所述多个图像块的滤波前色度矩阵(即未经过滤波的色度矩阵)和滤波后色度矩阵(即经过滤波的色度矩阵),所述滤波前色度矩阵中的对应位置的元素对应于(例如表示)对应图像块中的对应位置的像素的滤波前的色度值,所述滤波后色度矩阵中的对应位置的元素对应于(例如表示)对应图像块中的对应位置的像素的滤波后的色度值。
如上所述,滤波网络作为神经网络,其所需要的输入、实现的功能以及可以得到的输出,均与训练阶段的训练数据相关,本申请中训练数据即为上述训练矩阵集合。
在一种可能的实现方式中,所述滤波网络至少包括卷积层和激活层。
在一种可能的实现方式中,所述卷积层的卷积核的深度为2、3、4、5、6、16、24、32、48、64或者128;所述卷积层中的卷积核的尺寸为1×1、3×3、5×5或者7×7。例如,某一卷积层的尺寸为3×3×2×10,其中,3×3表示该卷积层中的卷积核的尺寸;2表示卷积层中包含的卷积核的深度,输入该卷积层的数据通道数和卷积层中包含的卷积核的深度一致,即输入该卷积层的数据通道数也是2;10表示卷积层中包含的卷积核的个数,输出该卷积层的数据通道数和卷积层中包含的卷积核的个数一致,即输出该卷积层的数据通道数也是10。
在一种可能的实现方式中,所述滤波网络包括卷积神经网络CNN、深度神经网络DNN 或者循环神经网络RNN。
第二方面,本申请提供一种编码器,包括处理电路,用于执行根据上述第一方面任一项所述的方法。
第三方面,本申请提供一种解码器,包括处理电路,用于执行上述第一方面任一项所述的方法。
第四方面,本申请提供一种计算机程序产品,包括程序代码,当其在计算机或处理器上执行时,用于执行上述第一方面任一项所述的方法。
第五方面,本申请提供一种编码器,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述解码器执行上述第一方面任一项所述的方法。
第六方面,本申请提供一种解码器,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述编码器执行上述第一方面任一项所述的方法。
第七方面,本申请提供一种非瞬时性计算机可读存储介质,包括程序代码,当其由计算机设备执行时,用于执行上述第一方面任一项所述的方法。
第八方面,本发明涉及译码装置,有益效果可以参见第一方面的描述此处不再赘述。所述译码装置具有实现上述第一方面的方法实施例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述译码装置包括:重建模块,用于获取第一像素矩阵;量化模块,用于获取第二像素矩阵;环路滤波模块,用于实现上述第一方面任一项所述的方法。这些模块可以执行上述第一方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
附图及以下说明中将详细描述一个或多个实施例。其它特征、目的和优点在说明、附图以及权利要求中是显而易见的。
附图说明
下面对本申请实施例用到的附图进行介绍。
图1A为用于实现本发明实施例的视频译码系统示例的框图,其中该系统利用神经网络来编码或解码视频图像;
图1B为用于实现本发明实施例的视频译码系统另一示例的框图,其中该视频编码器和/或视频解码器使用神经网络来编码或解码视频图像;
图2为用于实现本发明实施例的视频编码器实例示例的框图,其中该视频编码器20使用神经网络来编码视频图像;
图3为用于实现本发明实施例的视频解码器实例示例的框图,其中该视频解码器30使用神经网络来解码视频图像;
图4为用于实现本发明实施例的视频译码装置的示意性框图;
图5为用于实现本发明实施例的视频译码装置的示意性框图
图6a-6c是本申请实施例的输入滤波网络的矩阵的示意图;
图7a-7e是本申请实施例提供的引入到环路滤波模块中的经训练的神经网络的示意图;
图8是示出根据本申请一种实施例的环路滤波方法的过程800的流程图;
图9a-9l示出了滤波网络的输入像素矩阵的几种示例性的示例;
图10是示出根据本申请一种实施例的译码装置1000的结构示意图。
具体实施方式
本申请实施例提供一种基于AI的视频图像压缩技术,尤其是提供一种基于神经网络的视频压缩技术,具体提供一种基于神经网络的滤波技术,以改进传统的混合视频编解码系统。
视频编码通常是指处理形成视频或视频序列的图像序列。在视频编码领域,术语“图像(picture)”、“帧(frame)”或“图片(image)”可以用作同义词。视频编码(或通常称为编码)包括视频编码和视频解码两部分。视频编码在源侧执行,通常包括处理(例如,压缩)原始视频图像以减少表示该视频图像所需的数据量(从而更高效存储和/或传输)。视频解码在目的地侧执行,通常包括相对于编码器作逆处理,以重建视频图像。实施例涉及的视频图像(或通常称为图像)的“编码”应理解为视频图像或视频序列的“编码”或“解码”。编码部分和解码部分也合称为编解码(编码和解码,CODEC)。
在无损视频编码情况下,可以重建原始视频图像,即重建的视频图像与原始视频图像具有相同的质量(假设存储或传输期间没有传输损耗或其它数据丢失)。在有损视频编码情况下,通过量化等执行进一步压缩,来减少表示视频图像所需的数据量,而解码器侧无法完全重建视频图像,即重建的视频图像的质量比原始视频图像的质量较低或较差。
几个视频编码标准属于“有损混合型视频编解码”(即,将像素域中的空间和时间预测与变换域中用于应用量化的2D变换编码结合)。视频序列中的每个图像通常分割成不重叠的块集合,通常在块级上进行编码。换句话说,编码器通常在块(视频块)级处理及编码视频,例如,通过空间(帧内)预测和时间(帧间)预测来产生预测块;从当前块(当前处理/待处理的块)中减去预测块,得到残差块;在变换域中变换残差块并量化残差块,以减少待传输(压缩)的数据量,而解码器侧将相对于编码器的逆处理部分应用于编码或压缩的块,以重建用于表示的当前块。另外,编码器需要重复解码器的处理步骤,使得编码器和解码器生成相同的预测(例如,帧内预测和帧间预测)和/或重建像素,用于处理,即编码后续块。
在以下译码系统10的实施例中,编码器20和解码器30根据图1A至图3进行描述。
图1A为示例性译码系统10的示意性框图,例如可以利用本申请技术的视频译码系统10(或简称为译码系统10)。视频译码系统10中的视频编码器20(或简称为编码器20)和视频解码器30(或简称为解码器30)代表可用于根据本申请中描述的各种示例执行各技术的设备等。
如图1A所示,译码系统10包括源设备12,源设备12用于将编码图像等编码图像数据21提供给用于对编码图像数据21进行解码的目的设备14。
源设备12包括编码器20,另外即可选地,可包括图像源16、图像预处理器等预处理器(或预处理单元)18、通信接口(或通信单元)22。
图像源16可包括或可以为任意类型的用于捕获现实世界图像等的图像捕获设备,和/或任意类型的图像生成设备,例如用于生成计算机动画图像的计算机图形处理器或任意类型的用于获取和/或提供现实世界图像、计算机生成图像(例如,屏幕内容、虚拟现实(virtual reality,VR)图像和/或其任意组合(例如增强现实(augmented reality,AR)图像)的设备。所述图像源可以为存储上述图像中的任意图像的任意类型的内存或存储器。
为了区分预处理器(或预处理单元)18执行的处理,图像(或图像数据)17也可称为原始图像(或原始图像数据)17。
预处理器18用于接收(原始)图像数据17,并对图像数据17进行预处理,得到预处理图像(预处理图像数据)19。例如,预处理器18执行的预处理可包括修剪、颜色格式转换(例如从RGB转换为YCbCr)、调色或去噪。可以理解的是,预处理单元18可以为可选组件。
视频编码器(或编码器)20用于接收预处理图像数据19并提供编码图像数据21(下面将根据图2等进一步描述)。
源设备12中的通信接口22可用于:接收编码图像数据21并通过通信信道13向目的设备14等另一设备或任何其它设备发送编码图像数据21(或其它任意处理后的版本),以便存储或直接重建。
目的设备14包括解码器30,另外即可选地,可包括通信接口(或通信单元)28、后处理器(或后处理单元)32和显示设备34。
目的设备14中的通信接口28用于直接从源设备12或从存储设备等任意其它源设备接收编码图像数据21(或其它任意处理后的版本),例如,存储设备为编码图像数据存储设备,并将编码图像数据21提供给解码器30。
通信接口22和通信接口28可用于通过源设备12与目的设备14之间的直连通信链路,例如直接有线或无线连接等,或者通过任意类型的网络,例如有线网络、无线网络或其任意组合、任意类型的私网和公网或其任意类型的组合,发送或接收编码图像数据(或编码数据)21。
例如,通信接口22可用于将编码图像数据21封装为报文等合适的格式,和/或使用任意类型的传输编码或处理来处理所述编码后的图像数据,以便在通信链路或通信网络上进行传输。
通信接口28与通信接口22对应,例如,可用于接收传输数据,并使用任意类型的对应传输解码或处理和/或解封装对传输数据进行处理,得到编码图像数据21。
通信接口22和通信接口28均可配置为如图1A中从源设备12指向目的设备14的对应通信信道13的箭头所指示的单向通信接口,或双向通信接口,并且可用于发送和接收消息等,以建立连接,确认并交换与通信链路和/或例如编码后的图像数据传输等数据传输相关的任何其它信息,等等。
视频解码器(或解码器)30用于接收编码图像数据21并提供解码图像(或解码图像数据)31(下面将根据图3等进一步描述)。
后处理器32用于对解码后的图像等解码图像数据31(也称为重建后的图像数据)进行后处理,得到后处理后的图像等后处理图像数据33。后处理单元32执行的后处理可以包括例如颜色格式转换(例如从YCbCr转换为RGB)、调色、修剪或重采样,或者用于产生供显示设备34等显示的解码图像数据31等任何其它处理。
显示设备34用于接收后处理图像数据33,以向用户或观看者等显示图像。显示设备34可以为或包括任意类型的用于表示重建后图像的显示器,例如,集成或外部显示屏或显示器。例如,显示屏可包括液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、等离子显示器、投影仪、微型LED显示器、硅基液晶显示器(liquid crystal on silicon,LCoS)、数字光处理器(digital light processor,DLP)或任意类型的其它显示屏。
译码系统10还包括训练引擎25,训练引擎25用于训练编码器20(尤其是编码器20中的环路滤波器220)或解码器30(尤其是解码器30中的环路滤波器320)以对重构图像进行滤波处理。
本申请实施例中训练数据包括:训练矩阵集合,该训练矩阵集合包括图像块的滤波前亮度矩阵、量化步长矩阵和滤波后亮度矩阵,其中滤波前亮度矩阵中的对应位置的像素点对应于对应图像块中的对应位置的像素的滤波前的亮度值,量化步长矩阵中的对应位置的像素点对应于对应图像块中的对应位置的像素的亮度值对应的量化步长值,滤波后亮度矩阵中的对应位置的像素点对应于对应图像块中的对应位置的像素的滤波后的亮度值。
训练矩阵集合中的多个矩阵例如可以以图6a至6c所示的方式输入训练引擎25。如图6a所示,将训练矩阵集合中的多个矩阵直接输入训练引擎25,该多个矩阵均是二维矩阵。如图6b所示,选取训练矩阵集合中的多个矩阵的部分或全部做合并处理得到多维矩阵,再将该多维矩阵输入训练引擎25。如图6c所示,选取训练矩阵集合中的多个矩阵的部分或全部做相加(或相乘)处理得到二维矩阵,再将该二维矩阵输入训练引擎25。
上述训练数据可以存入数据库(未示意)中,训练引擎25基于训练数据训练得到目标模型(例如:可以是用于环路滤波的神经网络等)。需要说明的是,本申请实施例对于训练数据的来源不做限定,例如可以是从云端或其他地方获取训练数据进行模型训练。
训练引擎25训练目标模型的过程使得滤波前像素逼近原始像素值。每个训练过程可以使用64个图像的小批量大小和1e-4的初始学习率,遵循步长大小为10。在训练数据可以是通过编码器在不同QP量化参数设置下生成的数据。目标模型能够用于实现本申请实施例提供的环路滤波方法,即,将重构得到的图像或图像块通过相关预处理后输入该目标模型,可以得到滤波后的图像或图像块。本申请实施例中的目标模型具体可以为滤波网络,下文将结合图7a-7e详细说明目标模型。
训练引擎25训练得到的目标模型可以应用于译码系统10中,例如,应用于图1A所示的源设备12(例如编码器20)或目的设备14(例如解码器30)。训练引擎25可以在云端训练得到目标模型,然后译码系统10从云端下载并使用该目标模型;或者,训练引擎25可以在云端训练得到目标模型并使用该目标模型,译码系统10从云端直接获取处理结果。例如,训练引擎25训练得到具备滤波功能的目标模型,译码系统10从云端下载该目标模型,然后编码器20中的环路滤波器220或解码器30中的环路滤波器320可以根据 该目标模型对输入的重建的图像或图像块进行滤波处理,得到滤波后的图像或图像块。又例如,训练引擎25训练得到具备滤波功能的目标模型,译码系统10无需从云端下载该目标模型,编码器20或解码器30将重建的图像或图像块传输给云端,由云端通过目标模型对该重建的图像或图像块进行滤波处理,得到滤波后的图像或图像块并传输给编码器20或解码器30。
尽管图1A示出了源设备12和目的设备14作为独立的设备,但设备实施例也可以同时包括源设备12和目的设备14或同时包括源设备12和目的设备14的功能,即同时包括源设备12或对应功能和目的设备14或对应功能。在这些实施例中,源设备12或对应功能和目的设备14或对应功能可以使用相同硬件和/或软件或通过单独的硬件和/或软件或其任意组合来实现。
根据描述,图1A所示的源设备12和/或目的设备14中的不同单元或功能的存在和(准确)划分可能根据实际设备和应用而有所不同,这对技术人员来说是显而易见的。
编码器20(例如视频编码器20)或解码器30(例如视频解码器30)或两者都可通过如图1B所示的处理电路实现,例如一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件、视频编码专用处理器或其任意组合。编码器20可以通过处理电路46实现,以包含参照图2编码器20论述的各种模块和/或本文描述的任何其它编码器系统或子系统。解码器30可以通过处理电路46实现,以包含参照图3解码器30论述的各种模块和/或本文描述的任何其它解码器系统或子系统。所述处理电路46可用于执行下文论述的各种操作。如图5所示,如果部分技术在软件中实施,则设备可以将软件的指令存储在合适的非瞬时性计算机可读存储介质中,并且使用一个或多个处理器在硬件中执行指令,从而执行本发明技术。视频编码器20和视频解码器30中的其中一个可作为组合编解码器(encoder/decoder,CODEC)的一部分集成在单个设备中,如图1B所示。
源设备12和目的设备14可包括各种设备中的任一种,包括任意类型的手持设备或固定设备,例如,笔记本电脑或膝上型电脑、手机、智能手机、平板或平板电脑、相机、台式计算机、机顶盒、电视机、显示设备、数字媒体播放器、视频游戏控制台、视频流设备(例如,内容业务服务器或内容分发服务器)、广播接收设备、广播发射设备,等等,并可以不使用或使用任意类型的操作系统。在一些情况下,源设备12和目的设备14可配备用于无线通信的组件。因此,源设备12和目的设备14可以是无线通信设备。
在一些情况下,图1A所示的视频译码系统10仅仅是示例性的,本申请提供的技术可适用于视频编码设置(例如,视频编码或视频解码),这些设置不一定包括编码设备与解码设备之间的任何数据通信。在其它示例中,数据从本地存储器中检索,通过网络发送,等等。视频编码设备可以对数据进行编码并将数据存储到存储器中,和/或视频解码设备可以从存储器中检索数据并对数据进行解码。在一些示例中,编码和解码由相互不通信而只是编码数据到存储器和/或从存储器中检索并解码数据的设备来执行。
图1B是根据一示例性实施例的包含图2的视频编码器20和/或图3的视频解码器30的视频译码系统40的实例的说明图。视频译码系统40可以包含成像设备41、视频编码器20、视频解码器30(和/或藉由处理电路46实施的视频编/解码器)、天线42、一个或多个处理器43、一个或多个内存存储器44和/或显示设备45。
如图1B所示,成像设备41、天线42、处理电路46、视频编码器20、视频解码器30、处理器43、内存存储器44和/或显示设备45能够互相通信。在不同实例中,视频译码系统40可以只包含视频编码器20或只包含视频解码器30。
在一些实例中,天线42可以用于传输或接收视频数据的经编码比特流。另外,在一些实例中,显示设备45可以用于呈现视频数据。处理电路46可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。视频译码系统40也可以包含可选的处理器43,该可选处理器43类似地可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。另外,内存存储器44可以是任何类型的存储器,例如易失性存储器(例如,静态随机存取存储器(static random access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)等)或非易失性存储器(例如,闪存等)等。在非限制性实例中,内存存储器44可以由超速缓存内存实施。在其它实例中,处理电路46可以包含存储器(例如,缓存等)用于实施图像缓冲器等。
在一些实例中,通过逻辑电路实施的视频编码器20可以包含(例如,通过处理电路46或内存存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路46实施的视频编码器20,以实施参照图2和/或本文中所描述的任何其它编码器系统或子系统所论述的各种模块。逻辑电路可以用于执行本文所论述的各种操作。
在一些实例中,视频解码器30可以以类似方式通过处理电路46实施,以实施参照图3的视频解码器30和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。在一些实例中,逻辑电路实施的视频解码器30可以包含(通过处理电路46或内存存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路46实施的视频解码器30,以实施参照图3和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。
在一些实例中,天线42可以用于接收视频数据的经编码比特流。如所论述,经编码比特流可以包含本文所论述的与编码视频帧相关的数据、指示符、索引值、模式选择数据等,例如与编码分割相关的数据(例如,变换系数或经量化变换系数,(如所论述的)可选指示符,和/或定义编码分割的数据)。视频译码系统40还可包含耦合至天线42并用于解码经编码比特流的视频解码器30。显示设备45用于呈现视频帧。
应理解,本申请实施例中对于参考视频编码器20所描述的实例,视频解码器30可以用于执行相反过程。关于信令语法元素,视频解码器30可以用于接收并解析这种语法元素,相应地解码相关视频数据。在一些例子中,视频编码器20可以将语法元素熵编码成经编码视频比特流。在此类实例中,视频解码器30可以解析这种语法元素,并相应地解 码相关视频数据。
为便于描述,参考通用视频编码(Versatile video coding,VVC)参考软件或由ITU-T视频编码专家组(Video Coding Experts Group,VCEG)和ISO/IEC运动图像专家组(Motion Picture Experts Group,MPEG)的视频编码联合工作组(Joint Collaboration Team on Video Coding,JCT-VC)开发的高性能视频编码(High-Efficiency Video Coding,HEVC)描述本发明实施例。本领域普通技术人员理解本发明实施例不限于HEVC或VVC。
编码器和编码方法
图2为用于实现本申请技术的视频编码器20的示例的示意性框图。在图2的示例中,视频编码器20包括输入端(或输入接口)201、残差计算单元204、变换处理单元206、量化单元208、反量化单元210、逆变换处理单元212、重建单元214、环路滤波器220、解码图像缓冲器(decoded picture buffer,DPB)230、模式选择单元260、熵编码单元270和输出端(或输出接口)272。模式选择单元260可包括帧间预测单元244、帧内预测单元254和分割单元262。帧间预测单元244可包括运动估计单元和运动补偿单元(未示出)。图2所示的视频编码器20也可称为混合型视频编码器或基于混合型视频编解码器的视频编码器。
参见图2,环路滤波模块为经过训练的目标模型(亦称为神经网络),该神经网络用于处理输入图像或图像区域或图像块,以得到滤波后的图像或图像区域或图像块。例如,用于环路滤波的神经网络用于接收输入的图像或图像区域或图像块,例如,图6a至6c所图示的输入图像数据,并且生成滤波后的图像或图像区域或图像块。下面将结合图7a-7e详细地描述用于环路滤波的神经网络。
残差计算单元204、变换处理单元206、量化单元208和模式选择单元260组成编码器20的前向信号路径,而反量化单元210、逆变换处理单元212、重建单元214、缓冲器216、环路滤波器220、解码图像缓冲器(decoded picture buffer,DPB)230、帧间预测单元244和帧内预测单元254组成编码器的后向信号路径,其中编码器20的后向信号路径对应于解码器的信号路径(参见图3中的解码器30)。反量化单元210、逆变换处理单元212、重建单元214、环路滤波器220、解码图像缓冲器230、帧间预测单元244和帧内预测单元254还组成视频编码器20的“内置解码器”。
量化
量化单元208用于通过例如标量量化或矢量量化对变换系数207进行量化,得到量化变换系数209。量化变换系数209也可称为量化残差系数209。
量化过程可减少与部分或全部变换系数207有关的位深度。例如,可在量化期间将n位变换系数向下舍入到m位变换系数,其中n大于m。可通过调整量化参数(quantization parameter,QP)修改量化程度。例如,对于标量量化,可以应用不同程度的比例来实现较细或较粗的量化。较小量化步长对应较细量化,而较大量化步长对应较粗量化。可通过量化参数(quantization parameter,QP)指示合适的量化步长。例如,量化参数可以为合适的量化步长的预定义集合的索引。例如,较小的量化参数可对应精细量化(较小量化步 长),较大的量化参数可对应粗糙量化(较大量化步长),反之亦然。量化可包括除以量化步长,而反量化单元210等执行的对应或逆解量化可包括乘以量化步长。根据例如HEVC一些标准的实施例可用于使用量化参数来确定量化步长。一般而言,可以根据量化参数使用包含除法的等式的定点近似来计算量化步长。可以引入其它比例缩放因子来进行量化和解量化,以恢复可能由于在用于量化步长和量化参数的等式的定点近似中使用的比例而修改的残差块的范数。在一种示例性实现方式中,可以合并逆变换和解量化的比例。或者,可以使用自定义量化表并在比特流中等将其从编码器向解码器指示。量化是有损操作,其中量化步长越大,损耗越大。
在一个实施例中,视频编码器20(对应地,量化单元208)可用于输出量化参数(quantization parameter,QP),例如,直接输出或由熵编码单元270进行编码或压缩后输出,例如使得视频解码器30可接收并使用量化参数进行解码。
反量化
反量化单元210用于对量化系数执行量化单元208的反量化,得到解量化系数211,例如,根据或使用与量化单元208相同的量化步长执行与量化单元208所执行的量化方案的反量化方案。解量化系数211也可称为解量化残差系数211,对应于变换系数207,但是由于量化造成损耗,反量化系数211通常与变换系数不完全相同。
重建
重建单元214(例如,求和器214)用于将变换块213(即重建残差块213)添加到预测块265,以在像素域中得到重建块215,例如,将重建残差块213的像素点值和预测块265的像素点值相加。
滤波
环路滤波器单元220(或简称“环路滤波器”220)用于对重建块215进行滤波,得到滤波块221,或通常用于对重建像素点进行滤波以得到滤波像素点值。例如,环路滤波器单元用于顺利进行像素转变或提高视频质量。环路滤波器单元220可包括一个或多个环路滤波器,例如去块滤波器、像素点自适应偏移(sample-adaptive offset,SAO)滤波器或一个或多个其它滤波器,例如自适应环路滤波器(adaptive loop filter,ALF)、噪声抑制滤波器(noise suppression filter,NSF)或任意组合。例如,环路滤波器单元220可以包括去块滤波器、SAO滤波器和ALF滤波器。滤波过程的顺序可以是去块滤波器、SAO滤波器和ALF滤波器。再例如,增加一个称为具有色度缩放的亮度映射(luma mapping with chroma scaling,LMCS)(即自适应环内整形器)的过程。该过程在去块之前执行。再例如,去块滤波过程也可以应用于内部子块边缘,例如仿射子块边缘、ATMVP子块边缘、子块变换(sub-block transform,SBT)边缘和内子部分(intra sub-partition,ISP)边缘。尽管环路滤波器单元220在图2中示为环路滤波器,但在其它配置中,环路滤波器单元220可以实现为环后滤波器。滤波块221也可称为滤波重建块221。
在一个实施例中,视频编码器20(对应地,环路滤波器单元220)可用于输出环路滤波器参数(例如SAO滤波参数、ALF滤波参数或LMCS参数),例如,直接输出或由熵 编码单元270进行熵编码后输出,例如使得解码器30可接收并使用相同或不同的环路滤波器参数进行解码。
解码器和解码方法
图3示出了用于实现本申请技术的示例性视频解码器30。视频解码器30用于接收例如由编码器20编码的编码图像数据21(例如编码比特流21),得到解码图像331。编码图像数据或比特流包括用于解码所述编码图像数据的信息,例如表示编码视频片(和/或编码区块组或编码区块)的图像块的数据和相关的语法元素。
在图3的示例中,解码器30包括熵解码单元304、反量化单元310、逆变换处理单元312、重建单元314(例如求和器314)、环路滤波器320、解码图像缓冲器(DBP)330、模式应用单元360、帧间预测单元344和帧内预测单元354。帧间预测单元344可以为或包括运动补偿单元。在一些示例中,视频解码器30可执行大体上与参照图2的视频编码器100描述的编码过程相反的解码过程。
参见图3,环路滤波模块为经过训练的目标模型(亦称为神经网络),该神经网络用于处理输入图像或图像区域或图像块,以生成滤波后的图像或图像区域或图像块。例如,用于环路滤波的神经网络用于接收输入的图像或图像区域或图像块,例如,图6a至6c所图示的输入图像数据,并且生成滤波后的图像或图像区域或图像块。下面将结合图7a-7e详细地描述用于环路滤波的神经网络。
如编码器20所述,反量化单元210、逆变换处理单元212、重建单元214、环路滤波器220、解码图像缓冲器DPB230、帧间预测单元344和帧内预测单元354还组成视频编码器20的“内置解码器”。相应地,反量化单元310在功能上可与反量化单元110相同,逆变换处理单元312在功能上可与逆变换处理单元122相同,重建单元314在功能上可与重建单元214相同,环路滤波器320在功能上可与环路滤波器220相同,解码图像缓冲器330在功能上可与解码图像缓冲器230相同。因此,视频编码器20的相应单元和功能的解释相应地适用于视频解码器30的相应单元和功能。
反量化
反量化单元310可用于从编码图像数据21(例如通过熵解码单元304解析和/或解码)接收量化参数(quantization parameter,QP)(或一般为与反量化相关的信息)和量化系数,并基于所述量化参数对所述解码的量化系数309进行反量化以获得反量化系数311,所述反量化系数311也可以称为变换系数311。反量化过程可包括使用视频编码器20为视频片中的每个视频块计算的量化参数来确定量化程度,同样也确定需要执行的反量化的程度。
重建
重建单元314(例如,求和器314)用于将重建残差块313添加到预测块365,以在像素域中得到重建块315,例如,将重建残差块313的像素点值和预测块365的像素点值 相加。
滤波
环路滤波器单元320(在编码环路中或之后)用于对重建块315进行滤波,得到滤波块321,从而顺利进行像素转变或提高视频质量等。环路滤波器单元320可包括一个或多个环路滤波器,例如去块滤波器、像素点自适应偏移(sample-adaptive offset,SAO)滤波器或一个或多个其它滤波器,例如自适应环路滤波器(adaptive loop filter,ALF)、噪声抑制滤波器(noise suppression filter,NSF)或任意组合。例如,环路滤波器单元220可以包括去块滤波器、SAO滤波器和ALF滤波器。滤波过程的顺序可以是去块滤波器、SAO滤波器和ALF滤波器。再例如,增加一个称为具有色度缩放的亮度映射(luma mapping with chroma scaling,LMCS)(即自适应环内整形器)的过程。该过程在去块之前执行。再例如,去块滤波过程也可以应用于内部子块边缘,例如仿射子块边缘、ATMVP子块边缘、子块变换(sub-block transform,SBT)边缘和内子部分(intra sub-partition,ISP)边缘。尽管环路滤波器单元320在图3中示为环路滤波器,但在其它配置中,环路滤波器单元320可以实现为环后滤波器。
解码器30用于通过输出端312等输出解码图像311,向用户显示或供用户查看。
尽管上述实施例主要描述了视频编解码,但应注意的是,译码系统10、编码器20和解码器30的实施例以及本文描述的其它实施例也可以用于静止图像处理或编解码,即视频编解码中独立于任何先前或连续图像的单个图像的处理或编解码。一般情况下,如果图像处理仅限于单个图像17,帧间预测单元244(编码器)和帧间预测单元344(解码器)可能不可用。视频编码器20和视频解码器30的所有其它功能(也称为工具或技术)同样可用于静态图像处理,例如残差计算204/304、变换206、量化208、反量化210/310、(逆)变换212/312、分割262/362、帧内预测254/354和/或环路滤波220/320、熵编码270和熵解码304。
图4为本发明实施例提供的视频译码设备400的示意图。视频译码设备400适用于实现本文描述的公开实施例。在一个实施例中,视频译码设备400可以是解码器,例如图1A中的视频解码器30,也可以是编码器,例如图1A中的视频编码器20。
视频译码设备400包括:用于接收数据的入端口410(或输入端口410)和接收单元(receiver unit,Rx)420;用于处理数据的处理器、逻辑单元或中央处理器(central processing unit,CPU)430;例如,这里的处理器430可以是神经网络处理器430;用于传输数据的发送单元(transmitter unit,Tx)440和出端口450(或输出端口450);用于存储数据的存储器460。视频译码设备400还可包括耦合到入端口410、接收单元420、发送单元440和出端口450的光电(optical-to-electrical,OE)组件和电光(electrical-to-optical,EO)组件,用于光信号或电信号的出口或入口。
处理器430通过硬件和软件实现。处理器430可实现为一个或多个处理器芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器430与入端口410、接收单元420、发 送单元440、出端口450和存储器460通信。处理器430包括译码模块470(例如,基于神经网络(neural networks,NN)的译码模块470)。译码模块470实施上文所公开的实施例。例如,译码模块470执行、处理、准备或提供各种编码操作。因此,通过译码模块470为视频译码设备400的功能提供了实质性的改进,并且影响了视频译码设备400到不同状态的切换。或者,以存储在存储器460中并由处理器430执行的指令来实现译码模块470。
存储器460包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择执行程序时存储此类程序,并且存储在程序执行过程中读取的指令和数据。存储器460可以是易失性和/或非易失性的,可以是只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、三态内容寻址存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(static random-access memory,SRAM)。
图5为示例性实施例提供的装置500的简化框图,装置500可用作图1A中的源设备12和目的设备14中的任一个或两个。
装置500中的处理器502可以是中央处理器。或者,处理器502可以是现有的或今后将研发出的能够操控或处理信息的任何其它类型设备或多个设备。虽然可以使用如图所示的处理器502等单个处理器来实施已公开的实现方式,但使用一个以上的处理器速度更快和效率更高。
在一种实现方式中,装置500中的存储器504可以是只读存储器(ROM)设备或随机存取存储器(RAM)设备。任何其它合适类型的存储设备都可以用作存储器504。存储器504可以包括处理器502通过总线512访问的代码和数据506。存储器504还可包括操作系统508和应用程序510,应用程序510包括允许处理器502执行本文所述方法的至少一个程序。例如,应用程序510可以包括应用1至N,还包括执行本文所述方法的视频译码应用。
装置500还可以包括一个或多个输出设备,例如显示器518。在一个示例中,显示器518可以是将显示器与可用于感测触摸输入的触敏元件组合的触敏显示器。显示器518可以通过总线512耦合到处理器502。
虽然装置500中的总线512在本文中描述为单个总线,但是总线512可以包括多个总线。此外,辅助储存器可以直接耦合到装置500的其它组件或通过网络访问,并且可以包括存储卡等单个集成单元或多个存储卡等多个单元。因此,装置500可以具有各种各样的配置。
由于本申请实施例涉及神经网络的应用,为了便于理解,下面先对本申请实施例所使用到的一些名词或术语进行解释说明,该名词或术语也作为发明内容的一部分。
(1)神经网络
神经网络(neural network,NN)是机器学习模型,神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2021098251-appb-000001
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2021098251-appb-000002
其中,
Figure PCTCN2021098251-appb-000003
是输入向量,
Figure PCTCN2021098251-appb-000004
是输出向量,
Figure PCTCN2021098251-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2021098251-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2021098251-appb-000007
由于DNN层数多,则系数W和偏移向量
Figure PCTCN2021098251-appb-000008
的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021098251-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021098251-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。卷积神经网络包含了一个由卷积层和池化层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。
卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。卷积层可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要 注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络进行正确的预测。当卷积神经网络有多个卷积层的时候,初始的卷积层往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络深度的加深,越往后的卷积层提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
在经过卷积层/池化层的处理后,卷积神经网络还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络需要利用神经网络层来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层中可以包括多层隐含层,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
可选的,在神经网络层中的多层隐含层之后,还包括整个卷积神经网络的输出层,该输出层具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络的前向传播完成,反向传播就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络的损失,及卷积神经网络通过输出层输出的结果和理想结果之间的误差。
(4)循环神经网络
循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题却无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面 的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播算法,不过有一点区别:即,如果将RNN进行网络展开,那么其中的参数,如W,是共享的;而如上举例上述的传统神经网络却不是这样。并且在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,还依赖前面若干步网络的状态。该学习算法称为基于时间的反向传播算法(Back propagation Through Time,BPTT)。
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
(5)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(6)反向传播算法
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。
(7)生成式对抗网络
生成式对抗网络(generative adversarial networks,GAN)是一种深度学习模型。该模型中至少包括两个模块:一个模块是生成模型(Generative Model),另一个模块是判别模型(Discriminative Model),通过这两个模块互相博弈学习,从而产生更好的输出。生成模型和判别模型都可以是神经网络,具体可以是深度神经网络,或者卷积神经网络。GAN的基本原理如下:以生成图片的GAN为例,假设有两个网络,G(Generator)和D (Discriminator),其中G是一个生成图片的网络,它接收一个随机的噪声z,通过这个噪声生成图片,记做G(z);D是一个判别网络,用于判别一张图片是不是“真实的”。它的输入参数是x,x代表一张图片,输出D(x)代表x为真实图片的概率,如果为1,就代表100%是真实的图片,如果为0,就代表不可能是真实的图片。在对该生成式对抗网络进行训练的过程中,生成网络G的目标就是尽可能生成真实的图片去欺骗判别网络D,而判别网络D的目标就是尽量把G生成的图片和真实的图片区分开来。这样,G和D就构成了一个动态的“博弈”过程,也即“生成式对抗网络”中的“对抗”。最后博弈的结果,在理想的状态下,G可以生成足以“以假乱真”的图片G(z),而D难以判定G生成的图片究竟是不是真实的,即D(G(z))=0.5。这样就得到了一个优异的生成模型G,它可以用来生成图片。
下面将结合图7a-7e详细地描述用于环路滤波的目标模型(亦称为神经网络)。图7a-7e示出目标模型(例如用于滤波的神经网络,简称滤波网络)的示例性架构。
如图7a所示,将第一像素矩阵(第一像素矩阵中的对应位置的像素点的值对应于重建得到第一图像块中的对应位置的像素的亮度值)和第二像素矩阵(第二像素矩阵中的对应位置的像素点对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值)输入滤波网络。该滤波网络通过一个3×3卷积层(3×3Conv)和一个激活层(Relu)对第一像素矩阵进行处理,通过另一个3×3卷积层和另一个激活层对第二像素矩阵进行处理,然后将上述处理后得到的两个矩阵合并(concat),再经过块处理层(Res-Block)、…、块处理层、3×3卷积层、激活层、3×3卷积层得到残差矩阵,该残差矩阵中的对应位置的像素点对应于滤波后的第二图像块中的对应位置的像素的色度残差值。将第一像素矩阵和残差矩阵对应位置的像素值相加后得到第三像素矩阵,第三像素矩阵中的对应位置的像素点对应于第二图像块中的对应位置的像素的亮度值。
如图7b所示,上述块处理层可以包括一个3×3卷积层、一个激活层和一个3×3卷积层,将输入矩阵经这三层处理后,再将处理后得到的矩阵和初始输入矩阵对应位置的像素值相加得到最终输出矩阵。如图7c所示,上述块处理层可以包括一个3×3卷积层、一个激活层、一个3×3卷积层和一个激活层,将输入矩阵经3×3卷积层、激活层和3×3卷积层处理后,再将处理后得到的矩阵和初始输入矩阵对应位置的像素值相加,最后经过一个激活层得到最终输出矩阵。
如图7d所示,将矩阵输入滤波网络之前,先对第一像素矩阵(第一像素矩阵中的对应位置的像素点的值对应于重建得到第一图像块中的对应位置的像素的亮度值)和第二像素矩阵(第二像素矩阵中的对应位置的像素点对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值)中对应位置上的像素值相加或相乘得到输入像素矩阵。然后将输入像素矩阵输入滤波网络。该滤波网络通过一个3×3卷积层、一个激活层、一个块处理层、…、一个块处理层、一个3×3卷积层、一个激活层和一个3×3卷积层对输入像素矩阵进行处理得到第三像素矩阵,该第三像素矩阵中的对应位置的像素点对应于第二图像块中的对应位置的像素的亮度值。
如图7e所示,将第一像素矩阵(第一像素矩阵中的对应位置的像素点的值对应于重建得到第一图像块中的对应位置的像素的亮度值)输入滤波网络,将第一像素矩阵和第二 像素矩阵(第二像素矩阵中的对应位置的像素点对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值)中的对应位置的像素值相乘后得到的像素矩阵也输入滤波网络。该滤波网络通过一个3×3卷积层和一个激活层对其中一个输入进行处理,通过另一个3×3卷积层和另一个激活层对另一个输入进行处理,然后将上述处理后得到的两个矩阵合并(concat),再经过块处理层、…、块处理层、3×3卷积层、激活层和3×3卷积层得到残差矩阵,该残差矩阵中的对应位置的像素点对应于滤波后的第二图像块中的对应位置的像素的色度残差值。将第一像素矩阵和残差矩阵对应位置的像素值相加后得到第三像素矩阵,第三像素矩阵中的对应位置的像素点对应于第二图像块中的对应位置的像素的亮度值。
需要说明的是,如图7a-7e所示的卷积神经网络仅作为卷积神经网络的几种示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,本申请对此不做具体限定。
图8是示出根据本申请一种实施例的环路滤波方法的过程800的流程图。过程800可由视频编码器20或视频解码器30执行,具体的,可以由视频编码器20或视频解码器30的环路滤波器220、320来执行。过程800描述为一系列的步骤或操作,应当理解的是,过程800可以以各种顺序执行和/或同时发生,不限于图8所示的执行顺序。假设具有多个视频帧的视频数据流正在使用视频编码器或者视频解码器,执行包括如下步骤的过程800来对重建的图像或图像块进行滤波处理。过程800可以包括:
步骤801、获取第一像素矩阵,第一像素矩阵中的对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第一图像块为重建图像块或者重建图像中的图像块。
某个图像块(例如第一图像块)可以是编码器中对图像的编码结果进行反量化、重建后得到的重建图像,也可以是重建图像中的一个图像块,还可以是对图像块的编码结果进行反量化、重建后得到的重建图像块。
某个图像块(例如第一图像块)可以理解为像素矩阵X,该像素矩阵X中的对应位置的元素可以理解为该图像块中的对应位置的像素点(或者像素值,例如像素值包括像素的亮度值或像素的色度值)。示例性的,该图像块的尺寸为64×64,表示该图像块的像素点分布为64行×64列,x(i,j)表示该图像块中的第i行、第j列的像素点(或者像素值)。与之对应,输入像素矩阵A包括64行和64列,共有64×64个元素,A(i,j)表示该像素矩阵A中的第i行、第j列的元素。A(i,j)和x(i,j)对应(例如A(i,j)表示像素点x(i,j)的值),输入像素矩阵A中的对应位置的元素对应于(例如表示)该图像块中的对应位置的像素点的亮度值,即表示元素A(i,j)的取值是像素点x(i,j)的亮度值。可选的,在另一种示例下,输入像素矩阵A中的对应位置的元素也可以对应于(例如表示)该图像块中的对应位置的像素的其他值,即元素A(i,j)的取值可以是像素点x(i,j)的其他值,例如像素点x(i,j)的亮度值对应的量化步长值(如步骤802所述),又例如像素点x(i,j)的色度值(如下文所述),又例如像素点x(i,j)的色度值对应的量化步长值(如下文所述),又例如像素点x(i,j)的亮度残差值(如下文所述),又例如像素点x(i,j)的色度残差值(如下文所述)等,对此本申请不做具体限定。应当理解的是,当输入像素矩阵A中的对应位置的元素表示该图像块中的对应位置的像素的亮度值,输入像素矩阵A就是前述第一像素矩阵的示例;或者,当输入像素矩阵A中的对应位置的元素表示该图像块中的对应位置的像素的亮度值对应的量化 步长值,输入像素矩阵A就是下文第二像素矩阵的示例;应当理解的是,当输入像素矩阵A中的对应位置的元素表示该图像块中的对应位置的像素的色度值,输入像素矩阵A就是下文第五像素矩阵的示例;或者,当输入像素矩阵A中的对应位置的元素表示该图像块中的对应位置的像素的色度值对应的量化步长值,输入像素矩阵A就是下文第六像素矩阵的示例。
上述第一图像块可以是编码器或解码器重建得到的重建图像中的一个图像块,也可以是编码器或解码器重建得到的重建图像块。本申请实施例的环路滤波方法包括但不限于对重建图像块进行滤波处理,应当理解的是,也可以适用于对重建图像进行滤波处理,即将本申请实施例的方法中的“重建图像块”适应性替换为重建图像,这里不再赘述。
需要说明的是,第一图像块和步骤803中的第二图像块还可以采用RGB格式,此时第一像素矩阵中的对应位置的元素可以对应于(例如表示)第一图像块中的对应位置的像素的R值、G值或者B值,第二像素矩阵中的对应位置的元素可以对应于(例如表示)第一图像块中的对应位置的像素的R值、G值或者B值对应的量化步长值,或者三者共同采用的量化步长值。
步骤802、获取第二像素矩阵,第二像素矩阵中的对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值,第二像素矩阵的尺寸和第一像素矩阵的尺寸相等。
编码器对第一图像块编码的过程中,包括对残差信息的量化操作,该过程涉及到量化步长值,第一图像块中的每个像素点对应一个量化步长值。第二像素矩阵中的元素用于表征前述量化步长值。本申请采用滤波网络实现滤波的同时引入了重建图像块的每个像素的量化步长值,从而能够更好的对输入滤波网络的对应于图像块的像素矩阵进行滤波处理,提升滤波效果。
步骤803、通过滤波网络对输入像素矩阵进行滤波处理得到输出像素矩阵,滤波网络为经训练得到的具有滤波功能的神经网络,输出像素矩阵包括第三像素矩阵,第三像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的亮度值或者像素的亮度残差值,第二图像块为第一图像块经滤波后得到的图像块,其中输入像素矩阵至少与第一像素矩阵和第二像素矩阵相关。
在一种可能的实现方式中,所述输入像素矩阵包括所述第一像素矩阵和所述第二像素矩阵;或者,所述输入像素矩阵为对所述第一像素矩阵和所述第二像素矩阵进行预处理得到的第一预处理矩阵;或者,所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵;或者,所述输入像素矩阵为对所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵进行预处理得到的第二预处理矩阵。所述归一化矩阵是指将对应的矩阵中的对应位置的元素的取值进行归一化处理得到的矩阵。
输入像素矩阵是滤波网络的处理对象,输入像素矩阵可以包括获取到的上述第一像素矩阵和第二像素矩阵,即直接将这两个像素矩阵输入滤波网络进行滤波处理。
输入像素矩阵在被输入滤波网络之前,也可以是根据滤波网络在训练时的训练数据形式,以及滤波网络的处理能力,对一个或多个像素矩阵进行预处理和/或归一化处理得到的像素矩阵。其中,归一化处理的目的是为了使元素的值调整为一个统一的取值区间,例如[0,1]或者[-0.5,0.5],这样在滤波网络的计算中可以提高运算效率。预处理可以包括矩阵 相加、矩阵相乘、矩阵合并(contact)等,这样可以减少滤波网络的计算量。因此输入像素矩阵可以包括第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵,即将第一像素矩阵和第二像素矩阵分别进行归一化处理,再将其归一化矩阵输入滤波网络进行处理。或者,输入像素矩阵可以是对第一像素矩阵和第二像素矩阵进行相加、相乘或合并后得到的预处理矩阵。又或者,输入像素矩阵可以是对第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵进行相加、相乘或合并后得到的预处理矩阵。
矩阵相加表示将两个矩阵对应位置的元素的值相加。矩阵相乘表示将两个矩阵对应位置的元素的值相乘。矩阵合并(contact)表示将矩阵的通道数增加,例如,一个矩阵是二维矩阵,其尺寸为m×n,另一个矩阵也是二维矩阵,其尺寸也是m×n,这两个矩阵合并得到的是三维矩阵,其尺寸为m×n×2。
如步骤801所述,滤波网络输出的输出像素矩阵B和经过滤波后的图像块(例如第二图像块)对应,即输出像素矩阵中的元素B(i,j)和经过滤波后的图像块中的像素y(i,j)对应,在一种示例下,元素B(i,j)的值可以表示像素y(i,j)的亮度值。可选的,在另一种示例下,像素矩阵B中的对应位置的元素也可以对应于(例如表示)该经过滤波后的图像块中的对应位置的像素的其他值,即元素B(i,j)的取值可以是像素点y(i,j)的其他值,例如像素点y(i,j)的亮度残差值,又例如像素点y(i,j)的色度值,又例如像素点y(i,j)的色度残差值等,对此本申请不做具体限定。应当理解的是,当输出像素矩阵B中的对应位置的元素表示经过滤波后的图像块中的对应位置的像素的亮度值,输出像素矩阵B是第三像素矩阵的示例;或者,当输出像素矩阵B中的对应位置的元素表示经过滤波后的图像块中的对应位置的像素的亮度残差值,输出像素矩阵B是第三像素矩阵的另一种示例;应当理解的是,当输出像素矩阵B中的对应位置的元素表示经过滤波后的图像块中的对应位置的像素的色度值,输出像素矩阵B是第七像素矩阵的示例;或者,当输出像素矩阵B中的对应位置的元素表示该经过滤波后的图像块中的对应位置的像素的色度残差值,输出像素矩阵B是第八像素矩阵的示例。
当获取第一像素矩阵和第二像素矩阵时,输入像素矩阵可以包括第一像素矩阵和第二像素矩阵,如图9a所示;或者,输入像素矩阵可以为至少根据第一像素矩阵和第二像素矩阵得到的矩阵,如图9b、9c或9d所示。
在一种可能的实现方式中,如果在滤波网络的输入阶段,对第一像素矩阵和第二像素矩阵进行了归一化处理,例如,第一像素矩阵中的各个像素的值的范围为0-255,可以将这些值均归一化为0~1或者-0.5~0.5之间,即输入像素矩阵为归一化矩阵。那么在滤波网络的输出阶段,需要对第三像素矩阵进行反归一化处理,例如将第三像素矩阵中的元素的值反归一化为0~255之间。
在上述第一像素矩阵和第二像素矩阵的基础上,还可以获取第五像素矩阵(第五像素矩阵中的对应位置的元素对应于第一图像块中的对应位置的像素的色度值),此时输入像素矩阵可以包括第一像素矩阵、第二像素矩阵和第五像素矩阵,如图9e所示;或者,输入像素矩阵可以包括对第一像素矩阵和第二像素矩阵进行预处理得到的第一预处理矩阵和第五像素矩阵,如图9g所示。
在上述第一像素矩阵、第二像素矩阵和第五像素矩阵的基础上,还可以获取第六像素矩阵(第六像素矩阵中的对应位置的元素对应于第一图像块中的对应位置的像素的色度值 对应的量化步长值),此时输入像素矩阵可以包括第一像素矩阵、第二像素矩阵、第五像素矩阵和第六像素矩阵,如图9f所示;或者,输入像素矩阵可以包括第一预处理矩阵和对第五像素矩阵和第六像素矩阵进行预处理得到的第三预处理矩阵,如图9h所示;或者,输入矩阵可以包括第一预处理矩阵、第五像素矩阵以及第六像素矩阵;或者,输入像素矩阵可以包括第一像素矩阵、第二像素矩阵和第三预处理矩阵。
同理,如果在滤波网络的输入阶段,对第一像素矩阵、第二像素矩阵、第五像素矩阵、第六像素矩阵进行了归一化处理,例如,第一像素矩阵中的各个元素的值的范围为0~255,可以将这些值均归一化为0~1或者-0.5~0.5之间,即输入像素矩阵为归一化矩阵,那么在滤波网络的输出阶段,需要对输出的像素矩阵进行反归一化处理,例如将第七像素矩阵中的元素的值反归一化为0~255之间。
图9a-9l示出了滤波网络的输入像素矩阵的几种示例性的示例。
如图9a所示,将第一像素矩阵和第二像素矩阵直接输入滤波网络。例如,第一像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第二像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值。此时输入像素矩阵包括第一像素矩阵和第二像素矩阵。
如图9b所示,将第一像素矩阵和第二像素矩阵相加后输入滤波网络。例如,第一像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第二像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值。将第一像素矩阵和第二像素矩阵中的对应位置的元素的值相加,得到N×M的第一预处理矩阵。此时输入像素矩阵为该第一预处理矩阵。将输入像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素可以对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。
如图9c所示,将第一像素矩阵和第二像素矩阵合并后输入滤波网络。例如,第一像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第二像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值。将第一像素矩阵和第二像素矩阵合并,得到N×M×2的第一预处理矩阵。此时输入像素矩阵为该第一预处理矩阵。将输入像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素可以对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。
如图9d所示,将第一像素矩阵和第二像素矩阵相乘后输入滤波网络。例如,第一像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第二像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值。将第一像素矩阵和第二像素矩阵中的对应位置的像素值相乘,得到N×M的第一预处理矩阵。此时输入像素矩阵为该第一预处理矩阵。将输入像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素可以对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。
如图9e所示,将第一像素矩阵、第二像素矩阵以及第五像素矩阵直接输入滤波网络。例如,第一像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第二像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素 的亮度值对应的量化步长值,第五像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值。此时输入像素矩阵包括第一像素矩阵、第二像素矩阵以及第五像素矩阵。将第一像素矩阵、第二像素矩阵和第五像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。可选的,还可以输出第七像素矩阵或第八像素矩阵,该第七像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度值,该第八像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度残差值。
如图9f所示,将第一像素矩阵和第二像素矩阵合并后输入滤波网络,将第五像素矩阵直接输入滤波网络。例如,第一像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第二像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值。将第一像素矩阵和第二像素矩阵合并,得到N×M×2的第一预处理矩阵。此时输入像素矩阵包括该第一预处理矩阵和第五像素矩阵。将输入像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素可以对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。可选的,还可以输出第七像素矩阵或第八像素矩阵,该第七像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度值,该第八像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度残差值。
如图9g所示,将将第一像素矩阵、第二像素矩阵、第五像素矩阵和第六像素矩阵直接输入滤波网络。例如,第一像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第二像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值,第五像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值,第六像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值对应的量化步长值。此时输入像素矩阵包括第一像素矩阵、第二像素矩阵、第五像素矩阵和第六像素矩阵。将第一像素矩阵、第二像素矩阵、第五像素矩阵和第六像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。可选的,还可以输出第七像素矩阵或第八像素矩阵,该第七像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度值,该第八像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度残差值。可选的,还可以输出第七像素矩阵或第八像素矩阵,该第七像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度值,该第八像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度残差值。
如图9h所示,将第一像素矩阵和第二像素矩阵合并后输入滤波网络,将第五像素矩阵和第六像素矩阵合并后输入滤波网络。例如,第一像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,第二像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值,第五像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值,第六像素矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值对应的量化步长值。 将第一像素矩阵和第二像素矩阵合并,得到N×M×2的第一预处理矩阵,将第五像素矩阵和第六像素矩阵合并,得到N×M×2的第三预处理矩阵。此时输入像素矩阵包括该第一预处理矩阵和该第三预处理矩阵。将输入像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素可以对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。可选的,还可以输出第七像素矩阵或第八像素矩阵,该第七像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度值,该第八像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度残差值。
如图9i所示,将第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵直接输入滤波网络。例如,第一像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值的归一化值,第二像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值的归一化值。此时输入像素矩阵包括第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵。
如图9j所示,将第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵相加后输入滤波网络。例如,第一像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值的归一化值,第二像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值的归一化值。将第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵中的对应位置的元素的值相加,得到N×M的第二预处理矩阵。此时输入像素矩阵为该第二预处理矩阵。将输入像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素可以对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值的归一化值,需经过反归一化处理才能对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。
如图9k所示,将将第一像素矩阵的归一化矩阵、第二像素矩阵的归一化矩阵、第五像素矩阵的归一化矩阵和第六像素矩阵的归一化矩阵直接输入滤波网络。例如,第一像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值的归一化值,第二像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值的归一化值,第五像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值的归一化值,第六像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值对应的量化步长值的归一化值。此时输入像素矩阵包括第一像素矩阵的归一化矩阵、第二像素矩阵的归一化矩阵、第五像素矩阵的归一化矩阵和第六像素矩阵的归一化矩阵。将第一像素矩阵的归一化矩阵、第二像素矩阵的归一化矩阵、第五像素矩阵的归一化矩阵和第六像素矩阵的归一化矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值的归一化值,需经过反归一化处理才能对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。可选的,还可以输出第七像素矩阵或第八像素矩阵,该第七像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度值的归一化值,该第八像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度残差值的归一化值。
如图9l所示,将第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵合并后输入滤波网络,将第五像素矩阵的归一化矩阵和第六像素矩阵的归一化矩阵合并后输入滤波网络。例如,第一像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值的归一化值,第二像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的亮度值对应的量化步长值的归一化值,第五像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值的归一化值,第六像素矩阵的归一化矩阵为N×M,对应位置的元素对应于第一图像块中的对应位置的像素的色度值对应的量化步长值的归一化值。将第一像素矩阵的归一化矩阵和第二像素矩阵的归一化矩阵合并,得到N×M×2的第二预处理矩阵,将第五像素矩阵的归一化矩阵和第六像素矩阵的归一化矩阵合并,得到N×M×2的第四预处理矩阵。此时输入像素矩阵包括该第二预处理矩阵和该第四预处理矩阵。将输入像素矩阵输入滤波网络执行滤波操作,输出第三像素矩阵为N×M,该第三像素矩阵中的对应位置的元素可以对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值的归一化值,需经过反归一化处理才能对应于第二图像块中的对应位置的像素的亮度值或者亮度残差值。可选的,还可以输出第七像素矩阵或第八像素矩阵,该第七像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度值的归一化值,该第八像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的色度残差值的归一化值。
需要说明的是,图9a-9l示例性的示出了输入滤波网络的数据的示例,本申请对输入滤波网络的数据不做具体限定,对进入滤波网络之前所做的预处理也不做具体限定,两种像素矩阵的预处理还包括相加和相乘等。另外,上述对像素矩阵采用的预处理的方式也由滤波网络实现。
本申请采用滤波网络实现滤波的同时引入了重建图像块的每个像素的量化步长值,从而能够更好的对输入滤波网络的像素矩阵进行指导滤波,能够针对各种质量的重构图像提升滤波效果。
图10是示出根据本申请一种实施例的译码装置1000的结构示意图。该译码装置1000可以对应于视频编码器20或视频解码器30。该译码装置1000包括重建模块1001、量化/反量化模块1002和环路滤波模块1003,其中,重建模块1001,用于获取第一像素矩阵;量化/反量化模块1002用于获取获取第二像素矩阵;环路滤波模块1003用于实现图8所示的方法实施例。在一种示例下,重建模块1001可以对应于图2中的重建单元214,或者对应于图3中的重建单元314;在一种示例下,量化/反量化模块1002可以对应于图2中的量化单元208,或者对应于图3中的反量化单元310;在一种示例下,环路滤波模块1003可以对应于图2中的环路滤波模块220,或者对应于图3中的环路滤波模块320。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存 储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (25)

  1. 一种环路滤波方法,其特征在于,包括:
    获取第一像素矩阵,所述第一像素矩阵中的对应位置的元素对应于第一图像块中的对应位置的像素的亮度值,所述第一图像块为重建图像块或者重建图像中的图像块;
    获取第二像素矩阵,所述第二像素矩阵中的对应位置的元素对应于所述第一图像块中的对应位置的像素的亮度值对应的量化步长值,所述第二像素矩阵的尺寸和所述第一像素矩阵的尺寸相等;
    通过滤波网络对输入像素矩阵进行滤波处理得到输出像素矩阵,所述滤波网络为经训练得到的具有滤波功能的神经网络,所述输出像素矩阵包括第三像素矩阵,所述第三像素矩阵中的对应位置的元素对应于第二图像块中的对应位置的像素的亮度值或亮度残差值,所述第二图像块为所述第一图像块经滤波后得到的图像块,其中所述输入像素矩阵至少与所述第一像素矩阵和所述第二像素矩阵相关。
  2. 根据权利要求1所述的方法,其特征在于,所述输入像素矩阵包括所述第一像素矩阵和所述第二像素矩阵;或者,
    所述输入像素矩阵为对所述第一像素矩阵和所述第二像素矩阵进行预处理得到的第一预处理矩阵;或者,
    所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵;或者,
    所述输入像素矩阵为对所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵进行预处理得到的第二预处理矩阵。
  3. 根据权利要求1或2所述的方法,其特征在于,当所述第三像素矩阵中的对应位置的元素对应于所述第二图像块中的对应位置的像素的亮度残差值时,所述方法还包括:
    将所述第一像素矩阵和所述第三像素矩阵中的对应位置的元素的值相加得到第四像素矩阵,所述第四像素矩阵中的对应位置的元素对应于所述第二图像块中的对应位置的像素的亮度值。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,当所述输入像素矩阵为归一化矩阵时,所述方法还包括:
    对所述第三像素矩阵中的对应位置的元素的值进行反归一化处理。
  5. 根据权利要求3所述的方法,其特征在于,当所述输入像素矩阵经过归一化处理时,所述将所述第一像素矩阵和所述第三像素矩阵中的对应位置的元素的值相加得到第四像素矩阵,包括:
    将所述第一像素矩阵和经过反归一化处理的第三像素矩阵中的对应位置的元素的值相加得到所述第四像素矩阵。
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述方法还包括:
    获取第五像素矩阵,所述第五像素矩阵中的对应位置的元素对应于所述第一图像块中的对应位置的像素的色度值;
    相应地,所述输入像素矩阵至少与所述第一像素矩阵、所述第二像素矩阵和所述第五像素矩阵相关。
  7. 根据权利要求6所述的方法,其特征在于,所述输入像素矩阵包括所述第一像素矩阵、所述第二像素矩阵和所述第五像素矩阵;或者,
    所述输入像素矩阵包括对所述第一像素矩阵和所述第二像素矩阵进行预处理得到的第一预处理矩阵和所述第五像素矩阵;或者,
    所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵、所述第二像素矩阵的归一化矩阵和所述第五像素矩阵的归一化矩阵;或者,
    所述输入像素矩阵包括对所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵进行预处理得到的第二预处理矩阵和所述第五像素矩阵的归一化矩阵。
  8. 根据权利要求6或7所述的方法,其特征在于,所述方法还包括:
    获取第六像素矩阵,所述第六像素矩阵中的对应位置的元素对应于所述第一图像块中的对应位置的像素的色度值对应的量化步长值;
    相应地,所述输入像素矩阵至少与所述第一像素矩阵、所述第二像素矩阵、所述第五像素矩阵和所述第六像素矩阵相关。
  9. 根据权利要求8所述的方法,其特征在于,所述输入像素矩阵包括所述第一像素矩阵、所述第二像素矩阵、所述第五像素矩阵和所述第六像素矩阵;或者,
    所述输入像素矩阵包括对所述第一像素矩阵和所述第二像素矩阵进行预处理得到的第一预处理矩阵、所述第五像素矩阵和所述第六像素矩阵;或者,
    所述输入像素矩阵包括所述第一像素矩阵、所述第二像素矩阵和对所述第五像素矩阵和所述第六像素矩阵进行预处理得到的第三预处理矩阵;或者,
    所述输入像素矩阵包括所述第一预处理矩阵和所述第三预处理矩阵;或者,
    所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵、所述第二像素矩阵的归一化矩阵、所述第五像素矩阵的归一化矩阵和所述第六像素矩阵的归一化矩阵;或者,
    所述输入像素矩阵包括对所述第一像素矩阵的归一化矩阵和所述第二像素矩阵的归一化矩阵进行预处理得到的第二预处理矩阵、所述第五像素矩阵的归一化矩阵和所述第六像素矩阵的归一化矩阵;或者,
    所述输入像素矩阵包括所述第一像素矩阵的归一化矩阵、所述第二像素矩阵的归一化矩阵和对所述第五像素矩阵的归一化矩阵和所述第六像素矩阵的归一化矩阵进行预处理得到的第四预处理矩阵;或者,
    所述输入像素矩阵包括所述第二预处理矩阵和所述第四预处理矩阵。
  10. 根据权利要求6-9中任一项所述的方法,其特征在于,所述通过滤波网络对输入像素矩阵进行滤波处理得到输出像素矩阵,包括:
    通过所述滤波网络对所述输入像素矩阵进行滤波处理得到所述输出像素矩阵,所述输出像素矩阵包括所述第三像素矩阵和第七像素矩阵,所述第七像素矩阵中的对应位置的元素对应于所述第二图像块中的对应位置的像素的色度值。
  11. 根据权利要求10所述的方法,其特征在于,当所述输入像素矩阵为归一化矩阵时,所述方法还包括:
    对所述第七像素矩阵中的对应位置的元素的值进行反归一化处理。
  12. 根据权利要求6-9中任一项所述的方法,其特征在于,所述通过滤波网络对输入像素矩阵进行滤波处理得到输出像素矩阵,包括:
    通过所述滤波网络对所述输入像素矩阵进行滤波处理得到所述输出像素矩阵,所述输出像素矩阵包括所述第三像素矩阵和第八像素矩阵,所述第八像素矩阵中的对应位置的元素对应于所述第二图像块中的对应位置的像素的色度残差值;
    将所述第五像素矩阵和所述第八像素矩阵中的对应位置的元素的值相加得到第九像素矩阵,所述第九像素矩阵中的对应位置的元素对应于所述第二图像块中的对应位置的像素的色度值。
  13. 根据权利要求12所述的方法,其特征在于,当所述输入像素矩阵为归一化矩阵时,所述方法还包括:
    对所述第八像素矩阵中的对应位置的元素的值进行反归一化处理;
    相应地,所述将所述第五像素矩阵和所述第八像素矩阵中的对应位置的元素的值相加得到第九像素矩阵,包括:
    将所述第五像素矩阵和经过反归一化处理的第八像素矩阵中的对应位置的元素的值相加得到所述第九像素矩阵。
  14. 根据权利要求2、7或9所述的方法,其特征在于,所述预处理包括:将两个矩阵中的对应位置上的元素相加;或者,将两个矩阵合并;或者,将两个矩阵中的对应位置上的元素相乘。
  15. 根据权利要求1-14中任一项所述的方法,其特征在于,所述方法还包括:
    获取训练矩阵集合,其中所述训练矩阵集合包括多个图像块的滤波前亮度矩阵、量化步长矩阵和滤波后亮度矩阵,所述滤波前亮度矩阵中的对应位置的元素对应于对应图像块中的对应位置的像素的滤波前的亮度值,所述量化步长矩阵中的对应位置的元素对应于对应图像块中的对应位置的像素的亮度值对应的量化步长值,所述滤波后亮度矩阵中的对应位置的元素对应于对应图像块中的对应位置的像素的滤波后的亮度值;
    根据所述训练矩阵集合训练得到所述滤波网络。
  16. 根据权利要求15所述的方法,其特征在于,所述训练矩阵集合还包括所述多个图像块的滤波前色度矩阵和滤波后色度矩阵,所述滤波前色度矩阵中的对应位置的元素对应于对应图像块中的对应位置的像素的滤波前的色度值,所述滤波后色度矩阵中的对应位置的元素对应于对应图像块中的对应位置的像素的滤波后的色度值。
  17. 根据权利要求1-16中任一项所述的方法,其特征在于,所述滤波网络至少包括卷积层和激活层。
  18. 根据权利要求18所述的方法,其特征在于,所述卷积层的卷积核的深度为2、3、4、5、6、16、24、32、48、64或者128;所述卷积层中的卷积核的尺寸为1×1、3×3、5×5或者7×7。
  19. 根据权利要求1-19中任一项所述的方法,其特征在于,所述滤波网络包括卷积神经网络CNN、深度神经网络DNN或者循环神经网络RNN。
  20. 一种编码器,其特征在于,包括处理电路,用于执行权利要求1至19任一项所述的方法。
  21. 一种解码器,其特征在于,包括处理电路,用于执行权利要求1至19任一项所述的方法。
  22. 一种计算机程序产品,其特征在于,包括程序代码,当其在计算机或处理器上执 行时,用于执行权利要求任一项所述的方法。
  23. 一种编码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述解码器执行权利要求任一项所述的方法。
  24. 一种解码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述编码器执行权利要求任一项所述的方法。
  25. 一种非瞬时性计算机可读存储介质,其特征在于,包括程序代码,当其由计算机设备执行时,用于执行权利要求任一项所述的方法。
PCT/CN2021/098251 2020-06-10 2021-06-04 环路滤波方法和装置 WO2021249290A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/063,955 US12052443B2 (en) 2020-06-10 2022-12-09 Loop filtering method and apparatus

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010525274.8 2020-06-10
CN202010525274 2020-06-10
CN202011036512.5A CN113784146A (zh) 2020-06-10 2020-09-27 环路滤波方法和装置
CN202011036512.5 2020-09-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/063,955 Continuation US12052443B2 (en) 2020-06-10 2022-12-09 Loop filtering method and apparatus

Publications (1)

Publication Number Publication Date
WO2021249290A1 true WO2021249290A1 (zh) 2021-12-16

Family

ID=78835082

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098251 WO2021249290A1 (zh) 2020-06-10 2021-06-04 环路滤波方法和装置

Country Status (3)

Country Link
US (1) US12052443B2 (zh)
CN (2) CN113784146A (zh)
WO (1) WO2021249290A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024077576A1 (zh) * 2022-10-13 2024-04-18 Oppo广东移动通信有限公司 基于神经网络的环路滤波、视频编解码方法、装置和系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541876B (zh) * 2020-12-15 2023-08-04 北京百度网讯科技有限公司 卫星图像处理方法、网络训练方法、相关装置及电子设备
WO2024082899A1 (en) * 2022-10-18 2024-04-25 Mediatek Inc. Method and apparatus of adaptive loop filter selection for positional taps in video coding

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110798690A (zh) * 2019-08-23 2020-02-14 腾讯科技(深圳)有限公司 视频解码方法、环路滤波模型的训练方法、装置和设备
WO2020069655A1 (zh) * 2018-10-06 2020-04-09 华为技术有限公司 插值滤波器的训练方法、装置及视频图像编解码方法、编解码器
CN111052740A (zh) * 2017-07-06 2020-04-21 三星电子株式会社 用于编码或解码图像的方法和装置
CN111194555A (zh) * 2017-08-28 2020-05-22 交互数字Vc控股公司 用模式感知深度学习进行滤波的方法和装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120087411A1 (en) * 2010-10-12 2012-04-12 Apple Inc. Internal bit depth increase in deblocking filters and ordered dither
EP3342164B1 (en) * 2015-09-03 2020-04-15 MediaTek Inc. Method and apparatus of neural network based processing in video coding
KR102481552B1 (ko) * 2017-03-08 2022-12-27 소니 세미컨덕터 솔루션즈 가부시키가이샤 아날로그-디지털 변환기, 고체 촬상 소자, 및, 전자 기기
EP3685577A4 (en) * 2017-10-12 2021-07-28 MediaTek Inc. METHOD AND DEVICE OF A NEURAL NETWORK FOR VIDEO ENCODING
JP7350082B2 (ja) * 2019-03-07 2023-09-25 オッポ広東移動通信有限公司 ループフィルタリング方法、装置およびコンピュータ記憶媒体

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111052740A (zh) * 2017-07-06 2020-04-21 三星电子株式会社 用于编码或解码图像的方法和装置
CN111194555A (zh) * 2017-08-28 2020-05-22 交互数字Vc控股公司 用模式感知深度学习进行滤波的方法和装置
WO2020069655A1 (zh) * 2018-10-06 2020-04-09 华为技术有限公司 插值滤波器的训练方法、装置及视频图像编解码方法、编解码器
CN110798690A (zh) * 2019-08-23 2020-02-14 腾讯科技(深圳)有限公司 视频解码方法、环路滤波模型的训练方法、装置和设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIA CHUANMIN; WANG SHIQI; ZHANG XINFENG; WANG SHANSHE; LIU JIAYING; PU SHILIANG; MA SIWEI: "Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 28, no. 7, 1 July 2019 (2019-07-01), USA, pages 3343 - 3356, XP011726694, ISSN: 1057-7149, DOI: 10.1109/TIP.2019.2896489 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024077576A1 (zh) * 2022-10-13 2024-04-18 Oppo广东移动通信有限公司 基于神经网络的环路滤波、视频编解码方法、装置和系统

Also Published As

Publication number Publication date
US20230209096A1 (en) 2023-06-29
US12052443B2 (en) 2024-07-30
CN118678104A (zh) 2024-09-20
CN113784146A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2021249290A1 (zh) 环路滤波方法和装置
WO2022068716A1 (zh) 熵编/解码方法及装置
WO2022194137A1 (zh) 视频图像的编解码方法及相关设备
WO2022253249A1 (zh) 特征数据编解码方法和装置
US11070808B2 (en) Spatially adaptive quantization-aware deblocking filter
WO2023279961A1 (zh) 视频图像的编解码方法及装置
JP2024513693A (ja) ピクチャデータ処理ニューラルネットワークに入力される補助情報の構成可能な位置
WO2022156688A1 (zh) 分层编解码的方法及装置
CN114125446A (zh) 图像编码方法、解码方法和装置
WO2022063267A1 (zh) 帧内预测方法及装置
US20230396810A1 (en) Hierarchical audio/video or picture compression method and apparatus
WO2023193629A1 (zh) 区域增强层的编解码方法和装置
JP2024511587A (ja) ニューラルネットワークベースのピクチャ処理における補助情報の独立した配置
WO2023165487A1 (zh) 特征域光流确定方法及相关设备
WO2023279968A1 (zh) 视频图像的编解码方法及装置
US20240296594A1 (en) Generalized Difference Coder for Residual Coding in Video Compression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21822168

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21822168

Country of ref document: EP

Kind code of ref document: A1