CN118103850A

CN118103850A - Filtering method, filtering device and electronic equipment

Info

Publication number: CN118103850A
Application number: CN202180103357.0A
Authority: CN
Inventors: 戴震宇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2024-05-28
Also published as: WO2023123398A1

Abstract

The embodiment of the application provides a filtering method, a filtering device and electronic equipment, wherein the filtering method comprises the following steps: analyzing the code stream to obtain a current reconstructed image block; when the first neural network is used for filtering the current reconstruction image block, predicting the characteristics of an original image block of the current reconstruction image block by using a second neural network to obtain a characteristic image block of the current reconstruction image block; and filtering the current reconstructed image block based on the characteristic image block of the current reconstructed image block by using the first neural network to obtain a filtered reconstructed image block. The filtering method provided by the application can improve the decoding performance.

Description

Filtering method, filtering device and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of image encoding and decoding, and particularly relates to a filtering method, a filtering device and electronic equipment.

Background

Digital video compression technology mainly compresses huge digital video data so as to facilitate transmission, storage and the like. With the proliferation of internet video and the increasing demands of people for video definition, although the existing digital video compression standards can implement video decompression, better digital video decompression techniques are still required to be pursued at present so as to improve decoding performance.

Disclosure of Invention

The application provides a filtering method, a filtering device and electronic equipment, which can improve decoding performance.

In a first aspect, the present application provides a filtering method, including:

analyzing the code stream to obtain a current reconstructed image block;

when the first neural network is used for filtering the current reconstruction image block, predicting the characteristics of an original image block of the current reconstruction image block by using a second neural network to obtain a characteristic image block of the current reconstruction image block;

And filtering the current reconstructed image block based on the characteristic image block of the current reconstructed image block by using the first neural network to obtain a filtered reconstructed image block.

In a second aspect, the present application provides a filtering method, comprising:

Acquiring a current reconstructed image block;

In a third aspect, the present application provides a filtering apparatus comprising:

The analysis unit is used for analyzing the code stream to obtain a current reconstructed image block;

the prediction unit is used for predicting the characteristics of the original image block of the current reconstruction image block by using a second neural network when the current reconstruction image block is determined to be filtered by using the first neural network, so as to obtain the characteristic image block of the current reconstruction image block;

And the filtering unit is used for filtering the current reconstruction image block based on the characteristic image block of the current reconstruction image block by using the first neural network to obtain a filtered reconstruction image block.

In a fourth aspect, the present application provides a filtering apparatus comprising:

the acquisition unit is used for acquiring the current reconstructed image block;

In a fifth aspect, the present application provides an electronic device, comprising:

A processor adapted to implement computer instructions; and

A computer readable storage medium storing computer instructions adapted to be loaded by a processor and to perform the filtering method of any one of the first to second aspects or implementations thereof.

In one implementation, the processor is one or more and the memory is one or more.

In one implementation, the computer-readable storage medium may be integral to the processor or separate from the processor.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium storing computer instructions that, when read and executed by a processor of a computer device, cause the computer device to perform the filtering method of any one of the first to second aspects or implementations thereof.

In a seventh aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. A processor of a computer device reads the computer instructions from a computer readable storage medium, the processor executing the computer instructions, causing the computer device to perform the method of any one of the above-mentioned first to second aspects or implementations thereof.

Based on the above technical scheme, on one hand, by introducing a second neural network and designing the second neural network to be used for predicting the characteristics of the original image block of the current reconstructed image block to obtain the characteristic image block of the current reconstructed image block, and on the other hand, by introducing a first neural network and designing the first neural network to be used for filtering the current reconstructed image block based on the characteristic image block of the current reconstructed image block, filtering processing based on the neural network can be realized, and information for filtering the current reconstructed image block can be ensured to be information which is attached to the original image block as much as possible, so that the image quality of the current reconstructed image block can be improved and decoding performance can be improved.

In addition, by introducing the second neural network, the understanding of the coding end and the decoding end on the characteristic image block of the current reconstructed image block can be ensured to be consistent, and the decoding performance is further improved.

In other words, the application considers the purpose of filtering to enable the current reconstructed image block to be more similar to the original image block, takes the extracted characteristic image block of the original image block as input through the introduced first neural network, and carries out filtering processing on the current reconstructed image block, thereby improving the quality of the current reconstructed image and improving the decoding performance; in addition, for the characteristic image block of the original image block, the application considers that although the encoding end can acquire the characteristic image block of the original image block through analyzing the original image block, the decoding end can not acquire the characteristic image block of the original image block, and the decoding end can be ensured to acquire the characteristic image block of the original image block through introducing a second neural network as a characteristic extractor, so that the decoding performance is further improved. That is, the present application proposes a neural network or a filtering method for filtering a current reconstructed image block by using a characteristic image block of an original image, which can improve the image quality of the current reconstructed image block and improve the decoding performance.

Drawings

Fig. 1 is a schematic block diagram of an encoding framework provided by an embodiment of the present application.

Fig. 2 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.

Fig. 3 is a schematic flow chart of a filtering method provided by an embodiment of the present application.

Fig. 4 is a schematic diagram of a connection relationship between a first neural network and a second neural network according to an embodiment of the present application.

Fig. 5 is a schematic diagram of a filtering unit including a first neural network and a second neural network according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a second neural network provided by an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a first neural network provided in an embodiment of the present application.

Fig. 8 is a schematic block diagram of a residual block provided by an embodiment of the present application.

Fig. 9 is another schematic flow chart of a filtering method provided by an embodiment of the present application.

Fig. 10 is a schematic block diagram of a filtering apparatus of an embodiment of the present application.

Fig. 11 is another schematic block diagram of a filtering apparatus of an embodiment of the present application.

Fig. 12 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The scheme provided by the embodiment of the application can be applied to the technical field of digital video coding, such as the field of image coding and decoding, the field of video coding and decoding, the field of hardware video coding and decoding, the field of special circuit video coding and decoding and the field of real-time video coding and decoding. The scheme provided by the embodiment of the application can be combined to an audio and video coding standard (Audio Video coding Standard, AVS), a second generation AVS standard (AVS 2) or a third generation AVS standard (AVS 3). Including but not limited to the h.264/audio Video Coding (Audio Video Coding, AVC) standard, the h.265/High Efficiency Video Coding (HEVC) standard, and the h.266/versatile Video Coding (VERSATILE VIDEO CODING, VVC) standard. The scheme provided by the embodiment of the application can be used for carrying out lossy compression (lossy compression) on the image and can also be used for carrying out lossless compression (lossless compression) on the image. The lossless compression may be visual lossless compression (visually lossless compression) or mathematical lossless compression (MATHEMATICALLY LOSSLESS COMPRESSION).

In digital video coding, the encoder reads pixels of unequal luminance components and pixels of chrominance components for original video sequences of different color formats, i.e., the encoder reads a black-and-white image or a color image, and then encodes the image or the color image, respectively. Wherein the black and white image may comprise pixels of the luminance component, the color image may comprise pixels of the chrominance component, and optionally the color image may further comprise pixels of the luminance component. The color format of the original video sequence may be a luminance and chrominance (YCbCr, YUV) format or a Red-Green-Blue (RGB) format, etc. After an encoder reads a black-and-white image or a color image, it divides the image into block data, and encodes the block data. The block data may be Coding Tree Units (CTUs) or Coding Unit blocks (CUs), and one Coding Tree Unit may be further divided into several CUs, where a CU may be a rectangular block or a square block. I.e. the encoder may encode based on CTU or CU. Today's encoders are typically hybrid frame coding modes, generally comprising intra and inter prediction, transform and quantization, inverse transform and inverse quantization, loop filtering, and entropy coding. The intra-frame prediction only refers to the information of the image of the same frame, predicts the pixel information in the current dividing block and is used for eliminating the space redundancy; inter prediction may refer to image information of different frames, search motion vector information of the most matched current partition block by using motion estimation, for eliminating temporal redundancy; the predicted image blocks are converted into a frequency domain through transformation, energy is redistributed, and information insensitive to human eyes can be removed through combination of quantization, so that visual redundancy is eliminated; entropy coding can eliminate character redundancy based on the current context model and probability information of the binary code stream. The loop filtering is used for mainly processing the pixels after the inverse transformation and the inverse quantization, making up distortion information and providing better reference for the subsequent coded pixels.

For ease of understanding, the coding framework provided by the present application will be briefly described.

Fig. 1 is a schematic block diagram of an encoding framework 100 provided by an embodiment of the present application.

As shown in fig. 1, the encoding framework 100 may include an intra prediction unit 180, an inter prediction unit 170, a residual unit 110, a transform and quantization unit 120, an entropy encoding unit 130, an inverse transform and inverse quantization unit 140, and a loop filtering unit 150. Optionally, the encoding framework 100 may further include a decoded image buffer unit 160. The coding framework 100 may also be referred to as a hybrid frame coding mode.

In the encoding framework 100, the intra prediction unit 180 or the inter prediction unit 170 may predict an image block to be encoded to output a prediction block. The residual unit 110 may calculate a residual block, i.e., a difference value of the prediction block and the image block to be encoded, based on the prediction block and the image block to be encoded. The residual block may remove information insensitive to human eyes through processes such as transformation and quantization of the transformation and quantization unit 120 to eliminate visual redundancy. Alternatively, the residual block before being transformed and quantized by the transforming and quantizing unit 120 may be referred to as a time domain residual block, and the time domain residual block after being transformed and quantized by the transforming and quantizing unit 120 may be referred to as a frequency residual block or a frequency domain residual block. After receiving the transformed quantized coefficients output from the transforming and quantizing unit 120, the entropy encoding unit 130 may output a bitstream based on the transformed quantized coefficients. For example, the entropy encoding unit 130 may eliminate character redundancy according to the target context model and probability information of the binary code stream. For example, the entropy encoding unit 130 may be used for context-based adaptive binary arithmetic entropy encoding (CABAC). The entropy encoding unit 130 may also be referred to as a header information encoding unit. Alternatively, in the present application, the image block to be encoded may also be referred to as an original image block or a target image block, the prediction block may also be referred to as a prediction image block or an image prediction block, may also be referred to as a prediction signal or prediction information, and the reconstruction block may also be referred to as a reconstruction image block or an image reconstruction block, and may also be referred to as a reconstruction signal or reconstruction information. Furthermore, for the encoding side, the image block to be encoded may also be referred to as an encoding block or an encoding image block, and for the decoding side, the image block to be encoded may also be referred to as a decoding block or a decoding image block. The image block to be encoded may be a CTU or a CU.

In short, the encoding framework 100 calculates residuals of the prediction block and the image block to be encoded to obtain residual blocks, and transmits the residual blocks to a decoding end through processes such as transformation and quantization. After receiving and analyzing the code stream, the decoding end obtains a residual block through steps of inverse transformation, inverse quantization and the like, and the prediction block obtained by prediction of the decoding end is overlapped with the residual block to obtain a reconstruction block.

It should be noted that the inverse transform and inverse quantization unit 140, the loop filtering unit 150, and the decoded image buffer unit 160 in the encoding framework 100 may be used to form a decoder. Equivalently, the intra-frame prediction unit 180 or the inter-frame prediction unit 170 may predict the image block to be encoded based on the existing reconstructed block, so as to ensure that the understanding of the reference frame at the encoding end and the decoding end is consistent. In other words, the encoder may replicate the processing loop of the decoder, and thus may generate the same prediction as the decoding side. Specifically, the quantized transform coefficients are inverse transformed and inverse quantized by the inverse transform and inverse quantization unit 140 to reproduce the approximate residual block at the decoding end. The approximate residual block, plus the prediction block, may be passed through a loop filtering unit 150 to smoothly filter out effects due to blocking artifacts, etc., generated based on block processing and quantization. The image block output from the loop filtering unit 150 may be stored in the decoded image buffer unit 160 so as to be used for prediction of a subsequent image.

The intra-frame prediction unit 180 may be used for intra-frame prediction, where the intra-frame prediction refers to only information of the same frame of image, predicts pixel information in an image block to be encoded, and is used for eliminating spatial redundancy; the frame used for intra prediction may be an I-frame. For example, the image block to be encoded may refer to an upper left image block, the upper left image block and the left image block as reference information according to the encoding order from left to right and from top to bottom, and the image block to be encoded is used as reference information of the next image block, so that the whole image may be predicted. If the input digital video is in a color format, such as YUV 4:2:0 format, each 4 pixels of each image frame of the digital video is composed of 4Y components and 2 UV components, and the coding frame 100 may code the Y components (i.e., luminance blocks) and the UV components (i.e., chrominance blocks) respectively. Similarly, the decoding end can also perform corresponding decoding according to the format. The inter prediction unit 170 may be used for inter prediction, which may refer to image information of different frames, search motion vector information of a best-matching image block to be encoded by using motion estimation, and remove temporal redundancy; the frames used for inter-prediction may be P frames, which refer to forward predicted frames, and/or B frames, which refer to bi-directional predicted frames.

For the intra-frame prediction process, the intra-frame prediction can predict the image block to be encoded by means of an angle prediction mode and a non-angle prediction mode to obtain a prediction block, the optimal prediction mode of the image block to be encoded is screened out according to the rate distortion information obtained by calculating the prediction block and the image block to be encoded, and the prediction mode is transmitted to a decoding end through a code stream. The decoding end analyzes the prediction mode, predicts the prediction block of the target decoding block, and superimposes the time domain residual block obtained by code stream transmission to obtain the reconstruction block. Through the development of the digital video coding and decoding standard of the past generation, the non-angle mode is kept relatively stable, and has a mean mode and a plane mode; the angle mode is increasing with the evolution of the digital video codec standard. Taking the international digital video coding standard H series as an example, the H.264/AVC standard only has 8 angle prediction modes and 1 non-angle prediction mode; h.265/HEVC extends to 33 angular prediction modes and 2 non-angular prediction modes. In H.266/VVC, intra prediction modes are further extended, and there are 67 conventional prediction modes and non-conventional prediction mode Matrix weighted intra prediction (MIP) modes for luminance blocks, the conventional prediction modes include: a planar (planar) mode of mode number 0, a DC mode of mode number 1, and an angular prediction mode of mode numbers 2 to 66.

It should be understood that fig. 1 is only an example of the present application and should not be construed as limiting the present application.

For example, the loop filtering unit 150 in the encoding framework 100 may include a deblocking filter (DBF), a sampling adaptive compensation filter (SAO), and an adaptive correction filter (ALF). The DBF acts as a deblocking effect and the SAO acts as a de-ringing effect. In other embodiments of the present application, the encoding framework 100 may employ a neural network-based loop filtering algorithm to improve the compression efficiency of video. Alternatively, the encoding framework 100 may be a hybrid video encoding framework based on a neural network for deep learning. In one implementation, the result of pixel filtering based on neural network calculations may be employed on the basis of deblocking filters and sample adaptive compensation filtering. The network structure of the loop filter unit 150 on the luminance component and the chrominance component may be the same or different. Considering that the luminance component contains more visual information, the luminance component may also be used to guide the filtering of the chrominance component to improve the quality of the reconstruction of the chrominance component.

Fig. 2 is a schematic block diagram of a decoding framework 200 provided by an embodiment of the present application.

As shown in fig. 2, the decoding framework 200 may include an entropy decoding unit 210, an inverse transform inverse quantization unit 220, a residual unit 230, an intra prediction unit 240, an inter prediction unit 250, a loop filtering unit 260, and a decoded image buffer unit 270.

After the entropy decoding unit 210 receives and parses the code stream to obtain a prediction block and a frequency domain residual block, the inverse transform and inverse quantization are performed on the frequency domain residual block by the inverse transform inverse quantization unit 220, so as to obtain a time domain residual block, and the residual unit 230 superimposes the prediction block predicted by the intra prediction unit 240 or the inter prediction unit 250 on the time domain residual block after the inverse transform and inverse quantization by the inverse transform inverse quantization unit 220, so as to obtain a reconstructed block. For example, the intra prediction unit 240 or the inter prediction unit 250 may acquire a prediction block by decoding header information of a bitstream.

Digital video compression technology mainly compresses huge digital video data so as to facilitate transmission, storage and the like. With the proliferation of internet video and the increasing demand of people for video definition, although the existing digital video compression standards can implement video decompression, better digital video decompression techniques are still required to be pursued to improve decoding performance. In addition, in video coding standards such as AVS3 or VVC, conventional loop filter modules mainly include deblocking filter (DBF), sample adaptive compensation (SAO), and adaptive correction filter (ALF) tools. But with the development of deep learning technology, decoding performance can also be improved by introducing a filter based on a neural network.

In view of the above, the present application provides a filtering method, a filtering device and an electronic device, which can improve decoding performance.

Fig. 3 is a schematic flow chart of a filtering method 300 of an embodiment of the application. The method 300 may be implemented by a decoding framework including a neural network-based filtering unit. In one implementation, the neural network based filtering unit may be extended into the decoding framework described in fig. 2 to perform the filtering method 300.

As shown in fig. 3, the filtering method 300 may include:

s310, analyzing the code stream to obtain a current reconstructed image block;

s320, predicting the characteristics of an original image block of the current reconstructed image block by using a second neural network when determining to filter the current reconstructed image block by using the first neural network to obtain a characteristic image block of the current reconstructed image block;

s330, filtering the current reconstruction image block based on the characteristic image block of the current reconstruction image block by using the first neural network to obtain a filtered reconstruction image block.

In this embodiment, on the one hand, the feature image block of the current reconstructed image block is obtained by introducing a second neural network and designing the second neural network to be used for predicting the feature of the original image block of the current reconstructed image block, and on the other hand, the image quality of the current reconstructed image block and the decoding performance of the current reconstructed image block can be improved by introducing a first neural network and designing the first neural network to be used for filtering the current reconstructed image block based on the feature image block of the current reconstructed image block, so that filtering processing based on the neural network can be realized, and information for filtering the current reconstructed image block can be ensured to be information which is attached to the original image block as much as possible.

It should be noted that the present application may filter the current reconstructed image block based only on the feature image block of the current reconstructed image block, or may filter the current reconstructed image block based on the feature image block of the current reconstructed image block in combination with other information, which is not limited in particular.

For example, the information of the current reconstructed image block and the characteristic image block of the current reconstructed image block may be used as the input of the first neural network, and the current reconstructed image block may be filtered to improve decoding performance. Optionally, the information of the current reconstructed image block includes, but is not limited to: pixel values of color components (Y/U/V), block partition information, prediction information, deblocking boundary strength information, quantization step size (QP) information, and the like. For example, a luminance component may be introduced as an input to guide the filtering of the chrominance components. It should be noted that, the feature image block of the current reconstructed image block is information of an original image block of the current reconstructed image block, and is not used for the information of the current reconstructed image block.

In addition, the network structure of the first neural network is not particularly limited in the present application.

The first neural network may be a deep learning based loop filter, for example.

The first neural network may be, for example, a loop filter (CNNLF) based on a residual neural network.

Illustratively, the first neural network may include a network structure for a luminance component and a network structure for a chrominance component. Alternatively, the network structure for the luminance component or the network structure for the chrominance component may consist of a convolutional layer, an active layer, a residual block, a jump connection, etc.; the network structure of the residual block consists of a convolution layer, an activation layer and jump connection; furthermore, the method can also comprise a global jump connection from input to output, so that the network structure is concentrated on learning residual errors, and the convergence process of the network structure is accelerated. Alternatively, the network structure for the chrominance components may introduce a luminance component as an input to guide the filtering of the chrominance components.

The first neural network may be a loop filter based on a deep convolutional neural network, for example.

Illustratively, the first neural network may include employing a multi-layer residual network. Alternatively, information of multiple modes can be introduced as input to filter the current reconstructed image block, and the optimal model is selected for filtering processing by calculating the rate distortion cost of each model.

In addition, the present application is not limited to a specific location of the first neural network and the second neural network in the codec frame or the filtering unit. The network structure formed by the first and second neural networks may also be referred to as a neural network based loop filter (Neural Network based Loop Filter, NNLF), for example.

The connection relationship and network structure of the first neural network and the second neural network are exemplarily described below with reference to fig. 4 to 8.

As shown in fig. 4, considering that the original image block of the current reconstructed image block cannot be obtained at the decoding end, for the purpose of encoding and decoding to match each other, the characteristic image block of the original image block cannot be directly used at the encoding end theoretically. Therefore, the application predicts the characteristic image block of the original image block through the second neural network, and then inputs the predicted characteristic image block into the second neural network to carry out filtering processing on the current reconstructed image block. That is, the neural network loop filter for filtering the current reconstructed image block is composed of two neural networks, a second neural network and a first neural network, respectively.

As shown in fig. 5, the network structure formed by the first neural network and the second neural network may be located between SAO and ALF. Alternatively, the network structure formed by the first neural network and the second neural network is used independently of DBF, SAO, ALF switches, but is positioned between SAO and ALF.

As shown in fig. 6, the second neural network is composed of k convolutional layers, wherein each convolutional layer is followed by a layer of nonlinear activation function (PReLU) except for the last one. The input of the second neural network is the current reconstructed image block, and the output is the predicted characteristic image block of the current reconstructed image block.

As shown in fig. 7, the first neural network first performs convolution operation on the characteristic image block of the current reconstructed image block output by the second neural network, then performs inter-channel combination with the input current reconstructed image block, and inputs the combined image block into the next layer network. The second layer and the last layer of the first neural network are convolution layers, and a jump connecting line input to the output end is arranged, so that the first neural network is focused on learning residual errors, and the convergence process of the network structure is accelerated. An intermediate position of the first neural network may be cascaded with n residual blocks. The input of the first neural network is a characteristic image block and a current reconstructed image block of the current reconstructed image block output by the second neural network, and the output of the first neural network is the reconstructed image block after filtering processing.

As shown in fig. 8, a residual block may consist of two convolutional layers, with the first convolutional layer being followed by a layer of nonlinear activation function (PReLU) and also with a patch cord input to the output.

Of course, fig. 3-8 are merely examples of the present application and should not be construed as limiting the application.

For example, the structure of the neural network, including the number of convolutional layers, the number of residual blocks, and the specific implementation manner of the nonlinear activation function, is not limited in the embodiments of the present application.

In some embodiments, prior to the S320, the method 300 may further include:

analyzing the code stream to obtain the value of the sequence identifier;

The method comprises the steps that when the value of a sequence identifier is a first numerical value, the first neural network is allowed to filter a reconstructed image block in a current reconstructed image sequence to which the current reconstructed image block belongs, and when the value of the sequence identifier is a second numerical value, the first neural network is not allowed to filter the reconstructed image block in the current reconstructed image sequence;

and determining whether to filter the current reconstructed image block by using the first neural network based on the value of the sequence identifier.

Illustratively, the sequence identity may be carried in a sequence header of the code stream.

Illustratively, the format of the sequence header is described below in connection with Table 1.

TABLE 1

As shown in table 1, sequence_header may represent a sequence header and nnlf _enable_flag may be used to represent the sequence identity. For example, when the value of nnlf _enable_flag is 1, it indicates that the first neural network is allowed to be used to filter a reconstructed image block in the current reconstructed image sequence to which the current reconstructed image block belongs; for example, when the value of nnlf _enable_flag is 0, it indicates that filtering is not allowed to be performed on the reconstructed image block in the current reconstructed image sequence to which the current reconstructed image block belongs by using the first neural network.

In this embodiment, if the value of the sequence identifier is the second value, equivalently, for all reconstructed image blocks in the current reconstructed image sequence, the decoding end may determine that the first neural network is not used for filtering at one time, so that the decoding end may be prevented from determining, in a traversal manner, whether to use the first neural network for filtering for each reconstructed image block in the current reconstructed image sequence, and decoding efficiency may be improved.

It should be understood that the specific values of the first and second values are not particularly limited by the present application. For example, in one implementation, the first value is 1 and the second value is 0, and in another implementation, the first value is 0 and the second value is 1.

In some embodiments, if the value of the sequence identifier is the first value, the code stream is parsed to obtain the value of the component identifier; the method comprises the steps that when a value of a component identifier is a first value, the first neural network is allowed to filter a reconstructed image block which is the same as a component of a current reconstructed image block in the current reconstructed image to which the current reconstructed image block belongs, and when the value of the component identifier is a second value, the first neural network is not allowed to filter the reconstructed image block which is the same as the component of the current reconstructed image block in the current reconstructed image; and determining whether to filter the current reconstructed image block by using the first neural network based on the value of the component identifier.

Illustratively, the component identification may be carried in the image header.

Illustratively, the format of the image header is described below in connection with Table 2.

TABLE 2

As shown in table 2, picture_header may be used to represent the picture header, nnlf _enable_flag may be used to represent the sequence identity, and compIdx may be used to represent the x-th component of the current reconstructed picture block. For example, compIdx is 0, which represents a luminance component; compIdx denotes a Cb component when 1; compIdx is 2, and represents a Cr component. picture_ nnlf _enable_flag [ compIdx ] represents the component identification of the x-th component. For example, when the value of picture_ nnlf _enable_flag [ compIdx ] is 1, it indicates that the first neural network is allowed to be used to filter a reconstructed image block of the x-th component in the current reconstructed image to which the current reconstructed image block belongs; when the value of picture_ nnlf _enable_flag [ compIdx ] is 0, it indicates that the first neural network is not allowed to filter the reconstructed image block of the x-th component in the current reconstructed image to which the current reconstructed image block belongs.

In this embodiment, if the value of the component identifier is the second value, equivalently, for a reconstructed image block in the current reconstructed image, which is the same as the component of the current reconstructed image block, the decoding end may determine that the first neural network is not used for filtering at one time, so that the decoding end may be prevented from determining, in a traversal manner, whether to use the first neural network for filtering for the component of each reconstructed image block in the current reconstructed image sequence, and decoding efficiency may be improved.

In some embodiments, if the value of the component identifier is the first value, the value of the image block identifier is obtained by parsing the code stream; the method comprises the steps that when the value of an image block identifier is the first value, the first neural network is used for filtering the current reconstructed image block, and when the value of the image block identifier is the second value, the first neural network is not used for filtering the current reconstructed image block; and determining whether to filter the current reconstructed image block by using the first neural network based on the value of the image block identifier.

Illustratively, the component identification may be carried in a batch (patch).

Illustratively, the format of the batch is described below in connection with Table 3.

TABLE 3 Table 3

As shown in table 3, picture_header may be used to represent the picture header, nnlf _enable_flag may be used to represent the sequence identity, and compIdx may be used to represent the x-th component of the current reconstructed picture block. For example, compIdx is 0, which represents a luminance component; compIdx denotes a Cb component when 1; compIdx is 2, and represents a Cr component. picture_ nnlf _enable_flag [ compIdx ] represents the component identification of the x-th component. Latch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] represents the tile identifier of the x-th tile of the x-th component in the current reconstructed image to which the current reconstructed tile belongs. For example, when the value of the patch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] is 1, the method indicates that the first neural network is allowed to be used for filtering an xth image block of an xth component in a current reconstructed image to which the current reconstructed image block belongs; when the value of the patch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] is 0, the method indicates that the first neural network is not allowed to be used for filtering an xth image block of an xth component in the current reconstructed image to which the current reconstructed image block belongs.

In some embodiments, prior to S330, the method 300 may further include:

acquiring a reconstructed image of a first original image and a characteristic image of the first original image;

Obtaining at least one first training data pair based on the reconstructed image of the first original image and the characteristic image of the first original image, wherein the at least one first training data pair comprises at least one first reconstructed image block and at least one first characteristic image block respectively corresponding to the at least one first reconstructed image block;

Filtering the at least one first reconstructed image block based on the at least one first feature image block by using the first neural network, respectively, to obtain at least one filtered second reconstructed image block;

And adjusting the first neural network based on the difference between the at least one first reconstructed image block and the at least one second reconstructed image block to obtain the trained first neural network.

Illustratively, for a first neural network, the goal is to train a set of network parameters such that the image output by the first neural network is closer to the target image. For example, using the data sets DIV2K and BVI-DVC recommended by the VVC video coding exploration experiment group as a training set, firstly converting PNG format images or MP4 format video into YUV420 format video to be compressed, and obtaining original video information of each color component. And then, encoding by using a VTM reference software test platform to obtain a reconstructed video, wherein the reconstructed video and the original video form a training data pair (data-label) as a training set of the first neural network according to the fact that the image block of the luminance component is 128x128 size and the image block of the chrominance component is 64x64 size.

In some embodiments, the feature image of the first original image is a feature image obtained by predicting the first original image by using the second neural network, or the feature image of the first original image is a feature image of the first original image that has been marked.

In some embodiments, prior to the S320, the method 300 may further include:

obtaining a reconstructed image of a second original image and a characteristic image of the noted second original image;

Obtaining at least one second training data pair based on the reconstructed image of the second original image and the characteristic image of the second original image, wherein the at least one second training data pair comprises at least one third reconstructed image block and at least one second characteristic image block corresponding to the at least one second reconstructed image block;

Predicting the at least one third reconstructed image by using the second neural network to obtain at least one third characteristic image block;

And adjusting the second neural network based on the difference between the at least one second characteristic image block and the at least one third characteristic image block to obtain the trained second neural network.

For example, for the second neural network, the goal is to train a set of network parameters so that the image output by the second neural network is closer to the feature image of the target image, and therefore, the feature image of the target image needs to be acquired to obtain the feature image. An image has rich feature information such as color features, texture features, shape features, spatial relationship features, and the like. The color and texture features are mainly used for describing the surface properties of the scenery corresponding to the image or the image region, the shape features are mainly used for describing the outline and the region features of the image, and the spatial relationship features are mainly used for describing the mutual spatial position or relative direction relationship among a plurality of targets segmented by the image. Illustratively, the spatial features may include image saliency. In the machine vision field, image saliency enables pixels of the same label to share certain features through the process of assigning a label to each pixel in the image. The image saliency is an important visual feature in the image, the importance degree of the vision on each region of the image is reflected, and the image saliency is also widely applied to the aspects of compression coding, edge reinforcement, saliency target partition, feature extraction and the like.

Taking image saliency as an example, the saliency detection task is a popular research direction in the field of machine vision, and can be used for detecting visual airspace saliency information by a mathematical method. Calculating distances of individual pixel points in the image and a set of background candidate pixels (typically selecting boundary pixels of the image), e.g., based on a minimum grid distance (MBD); the Binary method is based on MBD, and a Binary saliency map is obtained according to a set threshold value; the method comprises the steps of (1) a Robust Background Detection (RBD) method, wherein the continuity is utilized to improve the robustness of background priori, a segmentation algorithm is utilized to divide an image into a plurality of areas, the relevance between each area and a boundary is calculated respectively, and a final significant area is determined; the FT algorithm starts from the frequency domain and designs a bandpass filter that highlights the entire salient region with a lower low-pass cut-off frequency, shows a clear boundary with a higher high-pass cut-off frequency and cuts off high-frequency noise information. The deep learning-based method is also used in space-time domain saliency detection tasks, such as SALICON models, and the saliency areas are detected through advanced semantic information in deep neural network learning images, so that the detection performance is good.

Taking the FT algorithm as an example to obtain the image saliency S, the specific calculation formula is as follows:

S(x,y)＝‖I _μ-I _ωhc(x,y)‖

Where I _μ represents the arithmetic mean pixel value of the image, I _ωhc represents the gaussian blurred pixel value of the image to eliminate fine texture, noise and coding artifacts, and … l represents the calculated euclidean distance.

It should be noted that the solution provided by the present application is not limited to the features illustrated in the foregoing examples, but may be other feature information, and the present application is not limited thereto.

Illustratively, the data sets DIV2K and BVI-DVC recommended by the VVC video coding exploration experiment group may be used as training sets, and taking image saliency as an example, the salient feature image of the original image may be obtained through calculation by the FT algorithm and converted into YUV420 format. And then, encoding and decoding by using a VTM reference software test platform to obtain a reconstructed video, wherein the reconstructed video and the characteristic image blocks form a training data pair (data-label) as a training set of a second neural network according to the fact that the image blocks of the luminance component are 128x 128-sized image blocks and the image blocks of the chrominance component are 64x 64-sized image blocks.

In the present application, the filter for filtering the current reconstructed image block includes a first neural network and a second neural network. Wherein the second neural network and the first neural network should have different training objectives, otherwise the presence of the second neural network is meaningless. Thus, the training of the second neural network and the first neural network may be performed independently, or the first neural network may be trained based on the trained second neural network, which is not particularly limited in the present application. Furthermore, the specific parameters used to train the first neural network and/or the second neural network are not limited by the present application.

In some embodiments, the feature image block of the current reconstructed image block is used to characterize at least one of the following features of the original image block of the current reconstructed image block:

Color features, texture features, shape features, spatial features.

The filtering method according to the embodiment of the present application is described in detail from the viewpoint of the decoding end in the above with reference to fig. 3 to 8, and the filtering method according to the embodiment of the present application will be described from the viewpoint of the encoding end in the below with reference to fig. 9.

Fig. 9 is a schematic flow chart of a filtering method 400 provided by an embodiment of the present application. The method 400 may be implemented by an encoding framework including a neural network based filtering unit. In one implementation, the neural network-based filtering unit may be extended into the coding framework described in fig. 1 to perform the filtering method 400.

As shown in fig. 9, the filtering method 400 may include:

s410, acquiring a current reconstructed image block;

S420, when the first neural network is used for filtering the current reconstruction image block, predicting the characteristics of an original image block of the current reconstruction image block by using a second neural network to obtain a characteristic image block of the current reconstruction image block;

And S430, filtering the current reconstruction image block based on the characteristic image block of the current reconstruction image block by using the first neural network to obtain a filtered reconstruction image block.

In some embodiments, prior to S420, the method 400 may further include:

Acquiring the value of the sequence identifier;

The sequence identity may be an identity in a sequence parameter set, for example.

TABLE 4 Table 4

As shown in table 4, the sequence parameter set (seq_parameter_set_rbsp) may include a sequence identifier (sps_ aip _enabled_flag) that is a sequence level control switch for controlling whether the first neural network based loop filtering is allowed to be turned on at the current sequence, if the flag is 1, it indicates that the first neural network based loop filtering is allowed to be turned on at the current sequence, and if the flag is 0, it indicates that the first neural network based loop filtering is not allowed to be turned on. In a specific implementation, whether the sequence parameter set starts the sequence identifier can be controlled in a user setting mode, and under the condition that the sequence parameter set starts the sequence identifier, the encoder can acquire a specific value of the sequence identifier by inquiring a configuration file configured by a user, namely whether the flag is 1 or 0.

In some embodiments, if the value of the sequence identifier is the first value, filtering the current reconstructed image block by using the first neural network to obtain a rate-distortion cost after filtering the current reconstructed image block; if the rate distortion cost after the current reconstructed image block is filtered is greater than the rate distortion cost before the current reconstructed image block is filtered, determining to filter the current reconstructed image block by using the first neural network; and if the rate distortion cost after the current reconstructed image block is filtered is smaller than or equal to the rate distortion cost before the current reconstructed image block is filtered, determining that the first neural network is not used for filtering the current reconstructed image block.

In some embodiments, the method 400 may further comprise:

and writing the value of the sequence identifier into a code stream obtained by encoding the current residual block obtained based on the current reconstructed image block.

In some embodiments, the method 400 may further comprise:

if the value of the sequence identifier is the first numerical value, generating a value of a component identifier;

The method comprises the steps that when a value of a component identifier is a first value, the first neural network is allowed to filter a reconstructed image block which is the same as a component of a current reconstructed image block in the current reconstructed image to which the current reconstructed image block belongs, and when the value of the component identifier is a second value, the first neural network is not allowed to filter the reconstructed image block which is the same as the component of the current reconstructed image block in the current reconstructed image;

and writing the value of the component identifier into a code stream obtained by encoding the current residual block obtained based on the current reconstructed image block.

In some embodiments, if the rate distortion cost after filtering of each reconstructed image block in the current reconstructed image, which is the same as the component of the current reconstructed image block, is smaller than or less than the rate distortion cost before filtering of each reconstructed image, determining that the value of the component identifier is the first numerical value; and if the current reconstructed image comprises a reconstructed image block with the filtered rate distortion cost being greater than the pre-filtered rate distortion cost, determining the value of the component identifier as the second numerical value.

In some embodiments, the method 400 may further comprise:

if the value of the component identifier is the first numerical value, generating the value of the image block identifier;

And when the value of the image block identifier is the second value, the first neural network is not used for filtering the current reconstructed image block.

In some embodiments, if the rate-distortion cost after filtering the current reconstructed image block is greater than the rate-distortion cost before filtering the current reconstructed image block, determining the value of the image block identifier as the first numerical value; and if the rate distortion cost after the current reconstructed image block is filtered is smaller than or equal to the rate distortion cost before the current reconstructed image block is filtered, determining the value of the image block identifier as the second numerical value.

In some embodiments, prior to the S430, the method 400 may further include:

In some embodiments, prior to S420, the method 400 may further include:

Color features, texture features, shape features, spatial features.

It should be understood that the terms and steps in the filtering method 400 may refer to the corresponding terms and corresponding steps described in the filtering method 300, and are not described herein for brevity.

The following describes aspects of the application in connection with specific embodiments.

When the encoding end performs loop filtering, the processing can be performed according to a prescribed filter sequence, and when the encoding end enters a neural network loop filter module (i.e. a filter formed by the first neural network and the second neural network), the loop filtering can be performed according to the following steps:

step a:

And judging whether a neural network loop filter module can be used in the current reconstructed image sequence according to the nnlf _enable_flag value. If nnlf _enable_flag is "1", attempting to perform neural network loop filter module processing on the current reconstructed image sequence, i.e. jumping to step b; if nnlf _enable_flag is "0", the current reconstructed image sequence does not use the neural network loop filter module, i.e., the loop filtering process based on the neural network is ended.

Step b:

For a current reconstructed image of the current reconstructed image sequence, traversing all reconstructed image blocks of all color components, attempting a neural network loop filter based on each reconstructed encoding block, and comparing with the reconstructed image blocks before filtering, calculating a rate distortion cost D, d=d_net-d_rec, i.e. calculating reduced distortion after neural network filtering processing, wherein d_net is the distortion after filtering, and d_rec is the distortion before filtering. If the cost after filtering is smaller than the cost before filtering, namely D <0, the value of the patch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] is set to be 1; if the cost after filtering is greater than the cost before filtering, i.e., D is greater than or equal to 0, then the latch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] is made 0. If all image blocks of all color components in the current frame have been traversed, then step c is skipped.

Step c:

for the current reconstructed image of the current reconstructed image sequence, if the value of the patch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] is all 0, then the picture_ nnlf _enable_flag [ compIdx ] is 0, and if there is a reconstructed image block whose value of the patch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] is not 0, then the picture_ nnlf _enable_flag [ compIdx ] is 1. And c, if the current frame has completed the decision of the neural network loop filter module, loading the next frame for processing, and jumping to the step b.

Accordingly, the decoder acquires and parses the code stream, processes the code stream according to a prescribed filter sequence when parsing the code stream into loop filtering, and performs loop filtering according to the following steps when entering a neural network loop filter module (i.e., a filter composed of the first neural network and the second neural network), wherein the filter module is as follows:

step a:

And judging whether a neural network loop filter module can be used in the current reconstructed image sequence according to the nnlf _enable_flag value. If nnlf _enable_flag is 1, trying to process the neural network loop filter module on the current reconstructed image sequence, namely jumping to the step b; if nnlf _enable_flag is 0, the current reconstructed image sequence does not use the neural network loop filter module, namely the loop filtering processing based on the neural network is ended.

Step b:

c, for the current reconstructed image of the current reconstructed image sequence, if picture_ nnlf _enable_flag [ compIdx ] is 1, jumping to the step c; if picture_ nnlf _enable_flag [ compIdx ] is 0, go to step d.

Step c:

Traversing all reconstructed image blocks for the current color component of the current reconstructed image sequence, and performing neural network loop filtering on the current reconstructed image block if the patch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] is 1 for the current reconstructed image block; if the patch_ nnlf _enable_flag [ compIdx ] [ LcuIdx ] is 0, then no neural network loop filtering is performed on the current reconstructed image block. If all reconstructed image blocks of all color components in the current reconstructed image have been traversed, jumping to the step c.

Step d:

and c, if the current reconstructed image has completed the decision of the neural network loop filtering module, loading the next frame for processing, and jumping to the step b.

It should be noted that the above-mentioned embodiments are merely examples of the present application, and should not be construed as limiting the present application.

The preferred embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the present application is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application. For example, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further. As another example, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be regarded as the disclosure of the present application.

It should be further understood that, in the various method embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The method embodiments of the present application are described above in detail, and the apparatus embodiments of the present application are described below in detail with reference to fig. 10 to 12.

Fig. 10 is a schematic block diagram of a filtering apparatus 500 of an embodiment of the present application.

As shown in fig. 10, the filtering apparatus 500 may include:

the parsing unit 510 is configured to parse the code stream to obtain a current reconstructed image block;

the prediction unit 520 is configured to predict, when determining that the current reconstructed image block is filtered by using the first neural network, features of an original image block of the current reconstructed image block by using the second neural network, so as to obtain a feature image block of the current reconstructed image block;

And a filtering unit 530, configured to filter the current reconstructed image block based on the feature image block of the current reconstructed image block by using the first neural network, so as to obtain a filtered reconstructed image block.

In some embodiments, the prediction unit 520 predicts the features of the original image block of the current reconstructed image block using a second neural network, and before obtaining the feature image block of the current reconstructed image block, the prediction unit 520 is further configured to:

analyzing the code stream to obtain the value of the sequence identifier;

In some embodiments, the prediction unit 520 is specifically configured to:

If the value of the sequence identifier is the first numerical value, analyzing the code stream to obtain the value of the component identifier;

And determining whether to filter the current reconstructed image block by using the first neural network based on the value of the component identifier.

In some embodiments, the prediction unit 520 is specifically configured to:

If the value of the component identifier is the first numerical value, the value of the image block identifier is obtained by analyzing the code stream;

the method comprises the steps that when the value of an image block identifier is the first value, the first neural network is used for filtering the current reconstructed image block, and when the value of the image block identifier is the second value, the first neural network is not used for filtering the current reconstructed image block;

and determining whether to filter the current reconstructed image block by using the first neural network based on the value of the image block identifier.

In some embodiments, the filtering unit 530 is further configured to, using the first neural network, filter the current reconstructed image block based on the feature image block of the current reconstructed image block, and before obtaining the filtered reconstructed image block, the filtering unit 530 is further configured to:

Color features, texture features, shape features, spatial features.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. Specifically, the filtering apparatus 500 shown in fig. 10 may correspond to a corresponding main body in the method 300 for executing the embodiment of the present application, that is, the foregoing and other operations and/or functions of each unit in the filtering apparatus 500 are respectively for implementing corresponding flows in each method such as the method 300, and are not repeated herein for avoiding repetition.

Fig. 11 is a schematic block diagram of a filtering apparatus 600 according to an embodiment of the present application.

As shown in fig. 11, the filtering apparatus 600 may include:

an obtaining unit 610, configured to obtain a current reconstructed image block;

A prediction unit 620, configured to determine that, when the current reconstructed image block is filtered by using the first neural network, predict, by using the second neural network, a feature of an original image block of the current reconstructed image block, so as to obtain a feature image block of the current reconstructed image block;

And a filtering unit 630, configured to filter the current reconstructed image block based on the feature image block of the current reconstructed image block by using the first neural network, so as to obtain a filtered reconstructed image block.

In some embodiments, the prediction unit 620 predicts the features of the original image block of the current reconstructed image block using a second neural network, and before obtaining the feature image block of the current reconstructed image block, the prediction unit 620 is further configured to:

Acquiring the value of the sequence identifier;

In some embodiments, the prediction unit 620 is specifically configured to:

If the value of the sequence identifier is the first value, filtering the current reconstructed image block by using the first neural network to obtain the filtered rate-distortion cost of the current reconstructed image block;

if the rate distortion cost after the current reconstructed image block is filtered is greater than the rate distortion cost before the current reconstructed image block is filtered, determining to filter the current reconstructed image block by using the first neural network;

and if the rate distortion cost after the current reconstructed image block is filtered is smaller than or equal to the rate distortion cost before the current reconstructed image block is filtered, determining that the first neural network is not used for filtering the current reconstructed image block.

In some embodiments, the prediction unit 620 is further configured to:

In some embodiments, the prediction unit 620 is specifically configured to:

If the rate distortion cost of each reconstructed image block in the current reconstructed image, which is the same as the component of the current reconstructed image block, after filtering is smaller than or smaller than the rate distortion cost of each reconstructed image before filtering, determining the value of the component identifier as the first numerical value;

and if the current reconstructed image comprises a reconstructed image block with the filtered rate distortion cost being greater than the pre-filtered rate distortion cost, determining the value of the component identifier as the second numerical value.

In some embodiments, the prediction unit 620 is further configured to:

In some embodiments, the prediction unit 620 is specifically configured to:

If the rate distortion cost after the current reconstructed image block is filtered is greater than the rate distortion cost before the current reconstructed image block is filtered, determining the value of the image block identifier as the first numerical value;

and if the rate distortion cost after the current reconstructed image block is filtered is smaller than or equal to the rate distortion cost before the current reconstructed image block is filtered, determining the value of the image block identifier as the second numerical value.

In some embodiments, the filtering unit 630 is further configured to, using the first neural network, filter the current reconstructed image block based on the feature image block of the current reconstructed image block, and before obtaining the filtered reconstructed image block, the filtering unit 630 is further configured to:

Color features, texture features, shape features, spatial features.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. Specifically, the filtering apparatus 600 shown in fig. 11 may correspond to a corresponding main body in the method 400 for executing the embodiment of the present application, that is, the foregoing and other operations and/or functions of each unit in the filtering apparatus 600 are respectively for implementing corresponding flows in each method such as the method 400, and are not repeated herein for avoiding repetition.

It should be further understood that each unit in the filtering apparatus 500 or the filtering apparatus 600 according to the embodiments of the present application may be separately or all combined into one or several other units to form a structure, or some unit(s) thereof may be further split into a plurality of units with smaller functions to form a structure, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the filtering apparatus 500 or the filtering apparatus 600 may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of a plurality of units. According to another embodiment of the present application, the filtering apparatus 500 or 600 according to the embodiment of the present application may be constructed by running a computer program (including a program code) capable of executing steps involved in the respective methods on a general-purpose computing device of a general-purpose computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and implementing the filtering method of the embodiment of the present application. The computer program may be recorded on a computer readable storage medium, and loaded into an electronic device and executed therein to implement a corresponding method of an embodiment of the present application.

In other words, the units referred to above may be implemented in hardware, or may be implemented by instructions in software, or may be implemented in a combination of hardware and software. Specifically, each step of the method embodiment in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiment of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software in the decoding processor. Alternatively, the software may reside in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

Fig. 12 is a schematic structural diagram of an electronic device 700 provided in an embodiment of the present application.

As shown in fig. 12, the electronic device 700 includes at least a processor 710 and a computer readable storage medium 720. Wherein the processor 710 and the computer-readable storage medium 720 may be connected by a bus or other means. The computer readable storage medium 720 is for storing a computer program 721, the computer program 721 comprising computer instructions, and the processor 710 is for executing the computer instructions stored by the computer readable storage medium 720. Processor 710 is a computing core and a control core of electronic device 700 that are adapted to implement one or more computer instructions, in particular to load and execute one or more computer instructions to implement a corresponding method flow or a corresponding function.

By way of example, processor 710 may also be referred to as a central processor (CentralProcessingUnit, CPU). Processor 710 may include, but is not limited to: a general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

By way of example, computer-readable storage medium 720 may be high-speed RAM memory or Non-volatile memory (Non-VolatileMemory), such as at least one disk memory; alternatively, it may be at least one computer-readable storage medium located remotely from the aforementioned processor 710. In particular, computer-readable storage media 720 include, but are not limited to: volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDR SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct memory bus RAM (DR RAM).

Illustratively, the electronic device 700 may be an encoding end, encoder, or encoding framework to which embodiments of the present application relate; the computer readable storage medium 720 has stored therein first computer instructions; the processor 710 loads and executes the first computer instructions stored in the computer readable storage medium 720 to implement the corresponding steps in the filtering method provided by the embodiment of the present application; in other words, the first computer instructions in the computer readable storage medium 720 are loaded by the processor 710 and execute the corresponding steps, and are not described herein for avoiding repetition.

Illustratively, the electronic device 700 may be a decoding side, a decoder, or a decoding framework according to an embodiment of the present application; the computer readable storage medium 720 has stored therein second computer instructions; the processor 710 loads and executes the second computer instructions stored in the computer-readable storage medium 720 to implement the corresponding steps in the filtering method provided by the embodiment of the present application; in other words, the second computer instructions in the computer readable storage medium 720 are loaded by the processor 710 and execute the corresponding steps, and are not described herein for avoiding repetition.

According to another aspect of the present application, the embodiment of the present application further provides a computer-readable storage medium (Memory), which is a Memory device in the electronic device 700, for storing programs and data. Such as computer readable storage medium 720. It is understood that the computer readable storage medium 720 herein may include a built-in storage medium in the electronic device 700, and may include an extended storage medium supported by the electronic device 700. The computer-readable storage medium provides storage space that stores an operating system of the electronic device 700. Also stored in this memory space are one or more computer instructions, which may be one or more computer programs 721 (including program code), adapted to be loaded and executed by the processor 710.

According to another aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. Such as a computer program 721. At this time, the data processing apparatus 700 may be a computer, and the processor 710 reads the computer instructions from the computer-readable storage medium 720, and the processor 710 executes the computer instructions so that the computer performs the filtering methods provided in the above-described various alternatives.

In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, runs the processes of, or implements the functions of, embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from one website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.

Those of ordinary skill in the art will appreciate that the elements and process steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Finally, it should be noted that the above is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about the changes or substitutions within the technical scope of the present application, and the changes or substitutions are all covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A method of filtering comprising:

analyzing the code stream to obtain a current reconstructed image block;

when the first neural network is used for filtering the current reconstruction image block, predicting the characteristics of an original image block of the current reconstruction image block by using a second neural network to obtain a characteristic image block of the current reconstruction image block;

And filtering the current reconstructed image block based on the characteristic image block of the current reconstructed image block by using the first neural network to obtain a filtered reconstructed image block.
The method of claim 1, wherein the predicting, using a second neural network, features of an original image block of the current reconstructed image block, the method further comprises, prior to deriving the feature image block of the current reconstructed image block:

analyzing the code stream to obtain the value of the sequence identifier;

The method comprises the steps that when the value of a sequence identifier is a first numerical value, the first neural network is allowed to filter a reconstructed image block in a current reconstructed image sequence to which the current reconstructed image block belongs, and when the value of the sequence identifier is a second numerical value, the first neural network is not allowed to filter the reconstructed image block in the current reconstructed image sequence;

and determining whether to filter the current reconstructed image block by using the first neural network based on the value of the sequence identifier.
The method of claim 2, wherein determining whether to filter the current reconstructed image block using the first neural network based on the value of the sequence identity comprises:

If the value of the sequence identifier is the first numerical value, analyzing the code stream to obtain the value of the component identifier;

The method comprises the steps that when a value of a component identifier is a first value, the first neural network is allowed to filter a reconstructed image block which is the same as a component of a current reconstructed image block in the current reconstructed image to which the current reconstructed image block belongs, and when the value of the component identifier is a second value, the first neural network is not allowed to filter the reconstructed image block which is the same as the component of the current reconstructed image block in the current reconstructed image;

And determining whether to filter the current reconstructed image block by using the first neural network based on the value of the component identifier.
The method of claim 3, wherein determining whether to filter the current reconstructed image block using the first neural network based on the value of the component identification comprises:

If the value of the component identifier is the first numerical value, the value of the image block identifier is obtained by analyzing the code stream;

the method comprises the steps that when the value of an image block identifier is the first value, the first neural network is used for filtering the current reconstructed image block, and when the value of the image block identifier is the second value, the first neural network is not used for filtering the current reconstructed image block;

and determining whether to filter the current reconstructed image block by using the first neural network based on the value of the image block identifier.
The method according to any one of claims 1 to 4, wherein the filtering the current reconstructed image block based on the characteristic image block of the current reconstructed image block with the first neural network, the method further comprising, before obtaining the filtered reconstructed image block:

acquiring a reconstructed image of a first original image and a characteristic image of the first original image;

Obtaining at least one first training data pair based on the reconstructed image of the first original image and the characteristic image of the first original image, wherein the at least one first training data pair comprises at least one first reconstructed image block and at least one first characteristic image block respectively corresponding to the at least one first reconstructed image block;

Filtering the at least one first reconstructed image block based on the at least one first feature image block by using the first neural network, respectively, to obtain at least one filtered second reconstructed image block;

And adjusting the first neural network based on the difference between the at least one first reconstructed image block and the at least one second reconstructed image block to obtain the trained first neural network.
The method of claim 5, wherein the feature image of the first original image is a feature image obtained by predicting the first original image using the second neural network, or the feature image of the first original image is a feature image of the first original image that has been labeled.
The method according to any one of claims 1 to 6, wherein the predicting, using a second neural network, features of an original image block of the current reconstructed image block, the method further comprising, prior to deriving a feature image block of the current reconstructed image block:

obtaining a reconstructed image of a second original image and a characteristic image of the noted second original image;

Obtaining at least one second training data pair based on the reconstructed image of the second original image and the characteristic image of the second original image, wherein the at least one second training data pair comprises at least one third reconstructed image block and at least one second characteristic image block corresponding to the at least one second reconstructed image block;

Predicting the at least one third reconstructed image by using the second neural network to obtain at least one third characteristic image block;

And adjusting the second neural network based on the difference between the at least one second characteristic image block and the at least one third characteristic image block to obtain the trained second neural network.
The method according to any one of claims 1 to 7, wherein the feature image block of the current reconstructed image block is used to characterize at least one of the following features of an original image block of the current reconstructed image block:

Color features, texture features, shape features, spatial features.
A method of filtering comprising:

Acquiring a current reconstructed image block;

when the first neural network is used for filtering the current reconstruction image block, predicting the characteristics of an original image block of the current reconstruction image block by using a second neural network to obtain a characteristic image block of the current reconstruction image block;

And filtering the current reconstructed image block based on the characteristic image block of the current reconstructed image block by using the first neural network to obtain a filtered reconstructed image block.
The method of claim 9, wherein the predicting, using a second neural network, features of an original image block of the current reconstructed image block, the method further comprises, prior to deriving the feature image block of the current reconstructed image block:

Acquiring the value of the sequence identifier;

The method comprises the steps that when the value of a sequence identifier is a first numerical value, the first neural network is allowed to filter a reconstructed image block in a current reconstructed image sequence to which the current reconstructed image block belongs, and when the value of the sequence identifier is a second numerical value, the first neural network is not allowed to filter the reconstructed image block in the current reconstructed image sequence;

and determining whether to filter the current reconstructed image block by using the first neural network based on the value of the sequence identifier.
The method of claim 10, wherein determining whether to filter the current reconstructed image block using the first neural network based on the value of the sequence identity comprises:

If the value of the sequence identifier is the first value, filtering the current reconstructed image block by using the first neural network to obtain the filtered rate-distortion cost of the current reconstructed image block;

if the rate distortion cost after the current reconstructed image block is filtered is greater than the rate distortion cost before the current reconstructed image block is filtered, determining to filter the current reconstructed image block by using the first neural network;

and if the rate distortion cost after the current reconstructed image block is filtered is smaller than or equal to the rate distortion cost before the current reconstructed image block is filtered, determining that the first neural network is not used for filtering the current reconstructed image block.
The method according to claim 10 or 11, characterized in that the method further comprises:

and writing the value of the sequence identifier into a code stream obtained by encoding the current residual block obtained based on the current reconstructed image block.
The method according to any one of claims 10 to 12, further comprising:

if the value of the sequence identifier is the first numerical value, generating a value of a component identifier;

The method comprises the steps that when a value of a component identifier is a first value, the first neural network is allowed to filter a reconstructed image block which is the same as a component of a current reconstructed image block in the current reconstructed image to which the current reconstructed image block belongs, and when the value of the component identifier is a second value, the first neural network is not allowed to filter the reconstructed image block which is the same as the component of the current reconstructed image block in the current reconstructed image;

and writing the value of the component identifier into a code stream obtained by encoding the current residual block obtained based on the current reconstructed image block.
The method of claim 13, wherein generating the value of the component identifier comprises:

If the rate distortion cost of each reconstructed image block in the current reconstructed image, which is the same as the component of the current reconstructed image block, after filtering is smaller than or smaller than the rate distortion cost of each reconstructed image before filtering, determining the value of the component identifier as the first numerical value;

and if the current reconstructed image comprises a reconstructed image block with the filtered rate distortion cost being greater than the pre-filtered rate distortion cost, determining the value of the component identifier as the second numerical value.
The method according to claim 13 or 14, characterized in that the method further comprises:

if the value of the component identifier is the first numerical value, generating the value of the image block identifier;

And when the value of the image block identifier is the second value, the first neural network is not used for filtering the current reconstructed image block.
The method of claim 15, wherein generating the value of the image block identifier comprises:

If the rate distortion cost after the current reconstructed image block is filtered is greater than the rate distortion cost before the current reconstructed image block is filtered, determining the value of the image block identifier as the first numerical value;

and if the rate distortion cost after the current reconstructed image block is filtered is smaller than or equal to the rate distortion cost before the current reconstructed image block is filtered, determining the value of the image block identifier as the second numerical value.
The method according to any one of claims 9 to 16, wherein the filtering the current reconstructed image block based on the characteristic image block of the current reconstructed image block with the first neural network, the method further comprising, before obtaining the filtered reconstructed image block:

acquiring a reconstructed image of a first original image and a characteristic image of the first original image;

Obtaining at least one first training data pair based on the reconstructed image of the first original image and the characteristic image of the first original image, wherein the at least one first training data pair comprises at least one first reconstructed image block and at least one first characteristic image block respectively corresponding to the at least one first reconstructed image block;

Filtering the at least one first reconstructed image block based on the at least one first feature image block by using the first neural network, respectively, to obtain at least one filtered second reconstructed image block;

And adjusting the first neural network based on the difference between the at least one first reconstructed image block and the at least one second reconstructed image block to obtain the trained first neural network.
The method of claim 17, wherein the feature image of the first original image is a feature image predicted from the first original image using the second neural network, or the feature image of the first original image is a feature image of the first original image that has been labeled.
The method according to any one of claims 9 to 18, wherein the predicting, using a second neural network, features of an original image block of the current reconstructed image block, the method further comprising, prior to deriving a feature image block of the current reconstructed image block:

obtaining a reconstructed image of a second original image and a characteristic image of the noted second original image;

Obtaining at least one second training data pair based on the reconstructed image of the second original image and the characteristic image of the second original image, wherein the at least one second training data pair comprises at least one third reconstructed image block and at least one second characteristic image block corresponding to the at least one second reconstructed image block;

Predicting the at least one third reconstructed image by using the second neural network to obtain at least one third characteristic image block;

And adjusting the second neural network based on the difference between the at least one second characteristic image block and the at least one third characteristic image block to obtain the trained second neural network.
The method according to any one of claims 9 to 19, wherein the feature image block of the current reconstructed image block is used to characterize at least one of the following features of an original image block of the current reconstructed image block:

Color features, texture features, shape features, spatial features.
A filtering apparatus, comprising:

The analysis unit is used for analyzing the code stream to obtain a current reconstructed image block;

the prediction unit is used for predicting the characteristics of the original image block of the current reconstruction image block by using a second neural network when the current reconstruction image block is determined to be filtered by using the first neural network, so as to obtain the characteristic image block of the current reconstruction image block;

And the filtering unit is used for filtering the current reconstruction image block based on the characteristic image block of the current reconstruction image block by using the first neural network to obtain a filtered reconstruction image block.
A filtering apparatus, comprising:

the acquisition unit is used for acquiring the current reconstructed image block;

the prediction unit is used for predicting the characteristics of the original image block of the current reconstruction image block by using a second neural network when the current reconstruction image block is determined to be filtered by using the first neural network, so as to obtain the characteristic image block of the current reconstruction image block;

And the filtering unit is used for filtering the current reconstruction image block based on the characteristic image block of the current reconstruction image block by using the first neural network to obtain a filtered reconstruction image block.
An electronic device, comprising:

A processor adapted to execute a computer program;

a computer readable storage medium having a computer program stored therein, which when executed by the processor, implements the method of any one of claims 1 to 8 or the method of any one of claims 9 to 20.
A computer readable storage medium storing a computer program for causing a computer to perform the method of any one of claims 1 to 8 or the method of any one of claims 9 to 20.
A computer program product comprising computer programs/instructions which when executed by a processor implement the method of any of claims 1 to 8 or the method of any of claims 9 to 20.
A code stream, characterized in that the code stream is the code stream as claimed in any one of claims 1 to 8 or the code stream as generated by the method as claimed in any one of claims 9 to 20.