CN111064958A

CN111064958A - Low-complexity neural network filtering algorithm for B frame and P frame

Info

Publication number: CN111064958A
Application number: CN201911382700.0A
Authority: CN
Inventors: 范益波; 刘超
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2019-12-28
Filing date: 2019-12-28
Publication date: 2020-04-24
Anticipated expiration: 2039-12-28
Also published as: CN111064958B

Abstract

The invention belongs to the technical field of video coding, and particularly relates to a low-complexity neural network filtering algorithm for B frames and P frames. The method comprises the following steps: and selecting the optimal filtering strength to be coded into the code stream by testing the reconstructed pixel fusion results output by the neural network filter and the video coding standard in different proportions so as to achieve the optimal filtering effect. A new syntax element, called frame-level syntax element, is designed, i.e. a syntax element existing for each component of each frame, which consists of n bits, and is used to characterize the result of neural network output in the current frame and the degree of fusion of reconstructed pixels of the video encoder. The effect of self-adaptive filtering strength is realized through the syntax element, and the problem of over-blurring and over-smoothing caused by directly using the filter is effectively solved. Compared with the original syntax elements at the CTU level, the filtering directly at the frame level does not bring extra artificial boundaries, and the method is an algorithm with excellent performance and low complexity.

Description

Low-complexity neural network filtering algorithm for B frame and P frame

Technical Field

The invention belongs to the technical field of video coding, and particularly relates to a low-complexity neural network filtering algorithm for B frames and P frames.

Background

In the field of video coding, a filtering technique based on a convolutional neural network is used and widely applied, the neural network filtering achieves a better filtering effect than the traditional DB SAO ALF, but the complexity is high, which causes the practical application to be limited, and especially in the B frame and the P frame, the repeated use of the neural network filter causes an over-blurring problem, that is, a block region is repeatedly smoothed, and the filtering causes the details and high frequency information of the current block to be lost.

In the last few years, a variety of solutions have been proposed by many researchers, mainly based on CTU-level filtering concepts, and B-frames and P-frames in video coding are predictively transformed on a block basis, so some blocks will have large residuals or motion vectors, and some will be almost the same as the reference filtered frames. CTU-level filtering allows almost every block to be selected to the best filtering result by making a filtering decision for every CTU. Of course, an additional bit is added to indicate whether each CTU is selected to be used or not, so that the bit stream is consumed comparatively, and therefore, the addition of an additional classifier is proposed by scholars to help the current CU to judge and decide whether to use the neural network filtering method or not. Learners have also used methods of iteratively training neural networks to reduce this over-smoothing, so that a globally optimal filtering effect is ultimately achieved.

In fact, the drawback of CTU filtering is also obvious, and for convolutional neural networks, using CTU filtering requires additional zero padding or padding of reconstructed pixels, and if 0 is padded, the filtering performance is obviously reduced due to the error brought by 0. If padding reconstructed pixels are used, the computational complexity of the neural network is significantly increased. We therefore propose frame level based filtering.

Disclosure of Invention

The invention aims to provide a low-complexity neural network filtering algorithm for B frames and P frames.

The low-complexity neural network filtering algorithm for the B frame and the P frame is different from filtering for the CTU level, adopts filtering for the whole frame, has lower complexity and better filtering effect, and comprises the steps of fusing the output reconstruction pixels of a video encoder and the neural network filtering result, realizing optimal filtering and solving the problem of over-smoothness caused by repeatedly using a neural network filter.

The invention provides a low-complexity neural network filtering algorithm for B frames and P frames, which comprises the following specific steps:

(1) at the encoding end, closing the DB and SAO options in the HM configuration file, and encoding the target video to obtain a reconstructed pixel X; after one frame is coded, the traditional coder uses a filter such as DB SAO to filter the reconstructed pixel R of the whole frame, the DB and the SAO are closed in the step, and the traditional filter is not used to filter the reconstructed pixel R;

(2) and filtering the unprocessed reconstructed pixel R by using a neural network filter to obtain a filtered pixel Y. Wherein, the neural network filter's node references [ Chao Liu, Heming Sun, Junan Chen, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto, Xiaoyang Zeng, and Yibo Fan, "Dual learning based video coding with admission pitch blocks," arXiv preprint arXiv:1911.09857,2019 ];

(3) unlike in the I frame, Y can be used directly instead of R. Since filtering B/P frames directly using Y can lead to over-blurring problems, the present invention uses a combination of filtered pixels Y and R

The original reconstructed pixel R is replaced. Thereby realizing the filtering of the B/P frame;

is calculated as equation (1) which depends on a new syntax element λ. And the lambda determines the strength and ratio of the filterSuch as when lambda_i0 means no filtering at all, and λ_iWhen 1, the filtering is performed by completely using a neural network filter;

(4) traversing different super-parameter lambda in an encoder_iThe value is calculated as equation (2), shown in FIG. 2, and λ_iIt is understood that 1 is equally divided into 2ⁿ1 interval, so for each increase of i by 1, λ_iIs increased

Thereby realizing the effect that the filtering strength in the interval is uniformly increased;

(5) different lambda_iCorresponding different values

For each one

All compare it with the original pixel

Mean square error value MSE between_i；

(6) Finding the minimum MSE_iAnd recording the corresponding lambda_iCoding the binary form of i corresponding to the binary form into a code stream; and filtering the result

Sending the data to a frame buffer;

(7) for the decoder, it does not needTraverse λ under different i_iBut directly decodes the filtering strength according to i in the code stream, thereby realizing the optimal filtering effect.

Specifically, at the decoding end, i can be decoded in the code stream by a corresponding entropy decoder, and λ is calculated according to i in the code stream_iThen correspondingly adding the result Y output by the neural network and the original reconstruction pixel R to obtain

The desired filtering result can be calculated finally. And similarly, the frame buffer is sent into a frame buffer, so that the frame to be coded later can be referred to, and the complete matching of the coding and decoding ends is realized.

In the invention, a new syntax element is designed, called a frame-level syntax element, that is, a syntax element existing for each component of each frame, which is composed of n bits and is used for representing the fusion degree of the neural network output result and the video encoder reconstruction pixels in the current frame. The larger the value of the syntax element is, the more the result of selecting to use the neural network filtering is taken as the final output result; the smaller the value of this syntax element, the more the video encoder tends to use its own original reconstructed pixels as the final output result. The invention realizes the effect of self-adaptive filtering strength through the syntax element, thereby effectively solving the problem of over-blurring and over-smoothing caused by directly using the filter. Compared with the syntax elements at the CTU level designed by the previous method, the filtering directly at the frame level does not bring additional artificial boundaries, and the algorithm has excellent performance and low complexity.

Drawings

FIG. 1 is a flow chart of the algorithm of the present invention.

FIG. 2 shows i and λ_iA relationship diagram of (c).

Detailed Description

The present invention is further described below, taking the example in HEVC video encoder.

Firstly, closing options of DB and SAO in a configuration file of HM, encoding a target video, after encoding of each frame is finished, putting the frame into a frame buffer memory to refer to a next frame, obtaining a reconstructed pixel X, and filtering the X by using a neural network filter to obtain a filtered picture Y.

Setting a parameter value to λ_iThe magnitude of which depends on the value of i, as shown in FIG. 2, λ corresponding to different i is calculated according to equation (1)_i. For different lambda_iAll calculate equation (2) to obtain a plurality of intermediate results

Which represents the temporary filtering effect. Then comparing the temporary stored filtering effect with the mean square error between the original pixels as shown in the formula (3), and finding out the minimum mean square error MSE_iCorresponding sum of i

i ranges from 0 to 2ⁿ1, so that it can just be represented by n-bit 2 system, and thus can be coded into the code stream as a new syntax element, where the entropy coding model can use bypass coding mode, and the probability of MPS is set to 0.5. While

The value of (a) is sent to the frame buffer as the output filtering result for the subsequent frame as the reference frame.

In fact, this process is a rate-distortion optimization RDO process, i.e. a process of finding the smallest J, where J is calculated as follows:

J＝D+kR (4)

wherein D represents distortion, R represents code rate, and k is a hyper-parameter for weighing the relationship between code rate and distortion; the loop filtering problem does not affect R, or different frames use the same extra bit number to represent the code rate, so that the optimization of the problem only needs to consider the minimum D to realize the RDO process. Thus, the minimum mean square error MSE among them can be found here_iCorresponding sum of i

As the value that is desired to be retained.

For this frame-level filtering syntax element, its position can be added after the original DB filtering syntax element, since both DB and the proposed algorithm use the same frame-level filtering concept. After reading the DB bit, the frame-level neural network filtering syntax element i can be read to calculate lambda_iTo control the final filtering strength.

At the decoding end, i can be decoded in the code stream through a corresponding entropy decoder, and lambda can be obtained through calculation according to i in the code stream_iThen correspondingly adding the result Y output by the neural network and the original reconstruction pixel R to obtain

Claims

1. A low-complexity neural network filtering algorithm for B frames and P frames is characterized by comprising the following specific steps:

(1) at the encoding end, closing DB and SAO options in the configuration file of the HM, and encoding the target video to obtain a reconstructed pixel X;

(2) filtering the unprocessed reconstructed pixel R by using a neural network filter to obtain a filtered pixel Y;

(3) using a combination of filtered pixels Y and R

Replacing the original reconstruction pixel R, thereby realizing the filtering of the B/P frame; wherein the content of the first and second substances,

is shown in formula (1), wherein a new syntax element λ is introduced, and λ determines the filtering strength of the filter when λ_iWhen 0, it means no filtering at allWhen λ is_iWhen 1, the filtering is performed by completely using a neural network filter;

(4) traversing different super-parameter lambda in an encoder_iThe value is represented by formula (2) < lambda >_iIs to equally divide 1 into 2ⁿ1 interval, so for each increase of i by 1, λ_iIs increased

(5) different lambda_iCorresponding different values

For each one

All compare it with the original pixel

Mean square error value MSE between_i，

Sending the data to a frame buffer;

(7) at the decoding end, do not passλ under different i_iBut directly decodes the filtering strength according to i in the code stream, thereby realizing the optimal filtering effect.

2. The low complexity neural network filtering algorithm for B-frames and P-frames according to claim 1, wherein at the decoding end, i is decoded in the code stream by a corresponding entropy decoder, and λ is calculated according to i in the code stream_iThen correspondingly adding the result Y output by the neural network and the original reconstruction pixel R to obtain

A desired filtering result is calculated.