CN108520505B

CN108520505B - Loop filtering implementation method based on multi-network combined construction and self-adaptive selection

Info

Publication number: CN108520505B
Application number: CN201810341067.XA
Authority: CN
Inventors: 林巍峣; 何晓艺
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-04-17
Filing date: 2018-04-17
Publication date: 2021-12-03
Anticipated expiration: 2038-04-17
Also published as: CN108520505A

Abstract

A method for realizing loop filtering based on multi-network combined construction and self-adaptive selection includes such steps as jointly constructing a convolutional neural network containing a multi-classification network and multiple filter networks, iteratively training the convolutional neural network by using video frame of compressed video as training data, and adaptively selecting loop filtering in video compression process.

Description

Loop filtering implementation method based on multi-network combined construction and self-adaptive selection

Technical Field

The invention relates to a technology in the field of digital image processing, in particular to a video compression coding loop filtering implementation method based on multi-network joint construction and self-adaptive selection.

Background

The existing video compression algorithms all adopt a lossy compression scheme, namely, a certain distortion exists between an image of a compressed video and an original video. Especially, when the compression rate is high, distortion of the image is more serious. Therefore, the image loop filtering of the compressed video has important significance on the premise of keeping a higher compression rate. Loop filters based on conventional methods, such as SAO and deblocking filter in HEVC (high efficiency video coding), are already available in existing video coding standards. Some loop filters based on convolutional neural networks are available, and the loop filters have better effects compared with the traditional loop filters. However, the existing loop filtering implementation methods based on the convolutional neural network are all based on a single convolutional neural network, and the robustness of the model is insufficient under the conditions of more complex coding and image distortion.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a loop filtering implementation method based on multi-network joint construction and self-adaptive selection, which uses a plurality of convolutional neural networks to carry out loop filtering in a video compression coding algorithm, has stronger robustness and expansibility, can further improve the performance of the prior compressed video loop filter implementation method based on the convolutional neural networks, and improves the coding efficiency of the video compression algorithm.

The invention is realized by the following technical scheme:

the invention relates to a loop filtering implementation method based on multi-network combined construction and self-adaptive selection.

The convolutional neural network comprises: a classification network and a plurality of filter networks, wherein: the classification network adopts but is not limited to VGG-16 network described in Very Deep conditional Networks for Large-Scale Image registration by K.Simony et al or ResNet classification network proposed in Deep Residual Learning for Image registration by K.He et al; the filter Network adopts, but is not limited to, a VRCNN Network proposed by Y.Dai et al in A capacitive neutral Network apparatus for Post-Processing in HEVC Intra Coding or a QECNN Network proposed by R.Yan et al in Enhancing Quality for HEVC Compressed video.

The classification classes of the multi-classification network are matched with the number of the filter networks, and are preferably the classification classes of power of 2.

In each iteration in the iterative training, a video frame of training data is firstly input into a multi-classification network, after a category i of the input video frame is obtained through prediction, the video frame is input into N filter networks, the serial number j of the jth network with the best filter effect is compared and recorded as a category label of the video frame to update parameters of the network, and then the parameters of the ith filter network are updated by using the video frame and an uncompressed original video frame corresponding to the video frame.

The filtering effect adopts, but is not limited to, peak signal to noise ratio (PSNR) as an evaluation index of image quality.

The loop filtering in the video compression process is realized by adopting any one of the following modes:

1) in a coding and decoding loop of video compression, firstly inputting a compressed video frame into a trained N classification network to obtain a predicted class i, then inputting the video frame into an i-th trained filter network, wherein the output of the filter network is the final filtered video frame;

2) in a coding and decoding loop of video compression, compressed video frames are respectively input into N trained filter networks, filtered video frames output by the N filter networks are compared according to evaluation indexes of image quality, a video frame output by the jth filter network with the best quality is selected as a final filtered video frame, j is expressed by binary system, and the j is written into a code stream obtained by coding.

Technical effects

Compared with the prior art, the method realizes the loop filtering of the compressed video by utilizing the plurality of convolution neural networks which are jointly constructed, and has better robustness and enhancement effect compared with the traditional method based on a single neural network. The realization method of the loop filter based on the single neural network cannot efficiently learn the complex image distortion with different degrees in the compressed video, and the plurality of models trained and constructed in the invention can better capture the complex image distortion phenomenon caused by the compression algorithm, thereby realizing better loop filtering effect.

Drawings

FIG. 1 is a block diagram of an embodiment of a multi-network federation configuration module;

FIGS. 2a and 2b are schematic diagrams of two loop filtering embodiments, respectively;

FIG. 3 is a schematic diagram of a system according to an embodiment.

Detailed Description

As shown in fig. 3, the system for implementing loop filtering according to this embodiment includes: the multi-network joint construction module and the loop filter module with self-adaptive selection connected with the multi-network joint construction module are characterized in that: and the multi-network joint construction module outputs the network model to the loop filtering module selected in a self-adaption mode, and the loop filtering module selected in the self-adaption mode carries out loop filtering on compressed video frames in a video compression coding algorithm according to the network model.

The multi-network joint construction module comprises: the network generation unit is used for constructing a convolutional neural network comprising a classification network and a plurality of filter networks, and the network joint training unit is connected with the network generation unit.

The loop filtering module comprises: and the filtering selection unit is used for selecting a filtering mode and is realized by adopting a convolutional neural network constructed and trained by a multi-network joint construction module.

The loop filter module is preferably embedded in a video coding algorithm.

The specific implementation steps of the embodiment include:

and step 1.1) performing compression coding on the videos in the data set by using video coding and decoding software HM-16.0 to finally obtain a plurality of decoded compressed videos. For each compressed video, its video frame and its corresponding video frame before compression are used as training data, and only the Y channel of the image is used.

And step 1.2) constructing a neural network based on tensoflow open source software, wherein the N-4 classification network adopts a VGG-16 network, 4 filter networks all adopt VRCNN networks, supervised training is carried out on the filter networks by using training data, and finally optimization of network parameters is completed.

The training is an iterative loop, and specifically comprises the following steps:

i) initializing network parameters randomly;

ii) for each iteration of the training stage, obtaining the class i of the input video frame of the training data predicted by the 4-class network, then inputting the video frame into 4 filter networks, calculating the PSNR of the output video frame, and recording the network with the highest PSNR gain as the jth network;

iii) updating the parameters of the classification network by using j as the class label of the video frame, and then updating the parameters of the ith filter network by using the video frame and the corresponding uncompressed original video frame.

Said updating of the parameters, respectively calculating a cost function (loss) of the network using pairs of data comprising: classifying the prediction type i and the labeling type j of the network, and filtering the video frame and the uncompressed original video frame output by the filter network, calculating to obtain a cost function, then carrying out backward propagation of the gradient used by the general neural network, and then updating the parameters of the network.

The cost function of the classification network is not limited to Softmax Loss, and specifically includes:

wherein: y is_i1 if i j (label category), otherwise y_i＝0；p_iRepresenting the network's prediction (probability) for class i, with N being the classification class.

The cost function adopted by the filter network adopts but is not limited to Mean Squared Error, and specifically comprises the following steps:

wherein: x_iAnd Y_iRespectively representing ith pixel values of the output filtered video frame and the original frame, wherein M is the total pixel number of the video frame and depends on the size of the video frame.

Step 1.3), deploying the trained network into a video coding algorithm, and performing loop filtering in two ways in this embodiment, as shown in fig. 1 and fig. 2:

as shown in fig. 1, in the encoding algorithm, a compressed video frame is input into a 4-class network VGG-16 obtained by training, so as to obtain a prediction class i; and then inputting the video frame into the filter network VRCNN obtained by the i x training to obtain a filtered video frame.

As shown in fig. 2, in the encoding algorithm, compressed video frames are respectively input to N trained filter networks, filtered video frames output by the N filter networks are compared according to evaluation indexes of image quality, a video frame output by the jth filter network with the best quality is selected as a final filtered video frame, and then j is represented by a binary system and written into a code stream obtained by encoding.

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A loop filtering implementation system based on multi-network joint construction and adaptive selection is characterized by comprising the following steps: the multi-network joint construction module and the loop filter module with self-adaptive selection connected with the multi-network joint construction module are characterized in that: the multi-network joint construction module outputs a network model to the loop filtering module selected in a self-adaption mode, and the loop filtering module selected in the self-adaption mode carries out loop filtering on compressed video frames in a video compression coding algorithm according to the network model;

the loop filtering means: firstly, a convolutional neural network comprising a multi-classification network and a plurality of filter networks is jointly constructed, then, a video frame of a compressed video is used as training data to carry out iterative training on the convolutional neural network, and finally, loop filtering of self-adaptive selection is carried out in the video compression process;

the training data uses video coding and decoding software HM-16.0 to compress and code the video in the data set, and finally a plurality of decoded compressed videos are obtained; for each compressed video, taking a video frame of the compressed video and a video frame before compression corresponding to the video frame as training data, and only adopting a Y channel of an image;

the classification number of the multi-classification network is matched with the number of the filter networks and is a power-of-2 classification network;

the multi-classification network adopts a VGG-16 network or a ResNet classification network; the filter network adopts a VRCNN network or a QECNN network;

in each iteration in the iterative training, a video frame of training data is firstly input into a multi-classification network, after a category i of the input video frame is obtained through prediction, the video frame is input into N filter networks, the serial number j of the jth network with the best filter effect is compared and recorded as a category label of the video frame to update parameters of the network, and then the parameters of the ith filter network are updated by using the video frame and an uncompressed original video frame corresponding to the video frame;

the filtering effect adopts the peak signal-to-noise ratio as an evaluation index of the image quality;

the parameter updating is to use a data pair to respectively calculate the cost function of the network, wherein the data pair comprises: classifying a prediction type i and a label type j of the prediction type i of the network, and a filtered video frame and an uncompressed original video frame output by the filter network; after the cost function is obtained through calculation, backward propagation of the gradient used by the general neural network is carried out, and then the parameters of the network are updated;

the cost function of the classification network adopts Softmax Loss, and specifically comprises the following steps:

wherein: if the prediction class i is equal to the annotation class j, i ═ j, y_i1, otherwise y_i＝0；p_iRepresenting the prediction result of the network to the category i, wherein N is a classification category; the cost function adopted by the filter network is Mean Squared Error, which specifically comprises the following steps:

wherein: x_iAnd Y_iRespectively representing ith pixel values of the output filtered video frame and the original frame, wherein M is the total pixel number of the video frame and depends on the size of the video frame;

the video compression process is realized by adopting any one of the following modes: