CN107197260B

CN107197260B - Video coding post-filter method based on convolutional neural networks

Info

Publication number: CN107197260B
Application number: CN201710439132.8A
Authority: CN
Inventors: 张永兵; 林荣群; 王兴政; 王好谦; 戴琼海
Original assignee: Shenzhen Weilai Media Technology Research Institute; Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Weilai Media Technology Research Institute; Shenzhen Graduate School Tsinghua University
Priority date: 2017-06-12
Filing date: 2017-06-12
Publication date: 2019-09-13
Anticipated expiration: 2037-06-12
Also published as: CN107197260A

Abstract

Video coding post-filter method based on convolutional neural networks, including convolutional neural networks model training step and filter step, training step include: that the quantization parameter of setting video compress is that 20 to 51 pairs of original videos carry out coding compression, obtain compression video；Frame is carried out to all videos to extract to obtain the frame pair of multiple compressed video frame-original video frames；To extract and obtain frame is multiple groups to the different demarcation by frame type and quantization parameter；Convolutional neural networks frame and initialization network parameter are built, neural network is trained respectively using the group of aforementioned division, obtains multiple neural network models corresponding to different quantization parameters and frame type.Filter step includes: the post-filtering link that the multiple neural network models that will be obtained are embedded in video encoder；Coding compression above-mentioned is executed to original video to be processed and frame extracts to obtain frame pair to be processed, and quantization parameter according to frame pair to be processed and the corresponding neural network model of frame type selection are filtered.

Description

Video coding post-filter method based on convolutional neural networks

Technical field

The present invention relates to computer visions and field of video encoding, and in particular to a kind of video based on convolutional neural networks Encode post-filter method.

Background technique

Abundant with various video display apparatus with the development of science and technology, video has had become can not in people's life The part lacked plays highly important effect in every field.It is past to have witnessed video resolution in decades and shown Show the great development of device screen, and the video council of ultrahigh resolution generates huge data volume, this can generate network bandwidth Greatly burden.Therefore, it is necessary to efficient Video codings and transmission technology to guarantee that user watches the experience of video, while to the greatest extent may be used Energy ground reduces the data volume of video, lightens the burden for network bandwidth.In consideration of it, researcher is in decades, constantly research is efficiently regarded Frequency coding method.Video coding technique mainly reduces the data volume of video by removing the redundancy in video to reach To effectively storing and transmitting multitude of video data, under the premise of being intended to keep original video quality as far as possible, with lower code rate Compress video.

However, current video encoding standard is mainly all based on the hybrid video coding technology of block, in this encoder block Under frame, block-based prediction within the frame/frames, conversion and coarse quantization can all lead to the decline of video quality, especially in low code In the case where rate.Therefore, reducing the distortion in Video coding becomes one of the research hotspot of current video coding field.Although Take some algorithms also in current video coding standards to reduce blocking artifact and improve subjective quality, but its effect is also not Enough ideals, treated video is there are still apparent blocking artifact and edge blurry, and loss in detail is still than more serious.

Summary of the invention

It is a primary object of the present invention to, to the strong capability of fitting of nonlinear transformation, propose one kind using convolutional neural networks Video coding post-filter method based on convolutional neural networks, this method, which is set up, damages video frame reflecting to lossless video frame It penetrates, so that approximate obtain the inverse transformation of Distortion course in Video coding, to achieve the purpose that reduce distortion.

Provided technical solution is as follows for the above-mentioned purpose by the present invention:

A kind of Video coding post-filter method based on convolutional neural networks, the training including convolutional neural networks model Step and post-filtering processing step, in which:

The training step includes S1 to S4:

S1, the quantization parameter that video compress is arranged are 20 to 51, carry out coding compression to original video, obtain compression view Frequently；

S2, the extraction that frame is carried out to the compression video and the original video, obtain multiple frames pair, each frame pair Include a compressed video frame and an original video frame；

S3, the frame for extracting step S2 are multiple groups to the different demarcation according to frame type and quantization parameter；

S4, convolutional neural networks frame and initialization network parameter are built, the group divided using step S3 is respectively to nerve Network is trained, and obtains multiple neural network models corresponding to different quantization parameters and frame type；

The post-filtering processing step includes S5 and S6:

S5, the multiple neural network models for obtaining step S4 are embedded in the post-filtering link of video encoder；

S6, frame pair to be processed is obtained to original video execution step S1 and S2 to be processed, and according to frame pair to be processed Quantization parameter and frame type select corresponding neural network model to be filtered.

The difference of quantization parameter and frame type causes their nature of distortion also different, and the present invention is in trained and actual treatment All it is the frame pair for extracting original video and compressing video composition in the process, is set up with this and damage video frame and lossless video frame Mapping；And the neural network model by training for different quantization parameters and frame type is embedded into post-filtering in encoder Link is filtered, and fully considers the influence of quantization parameter and frame type to distortion level, first using method of the invention Differentiate the quantization parameter and frame type of frame, reselection quantization parameter and all corresponding neural network model of frame type carry out Filtering, so as to effectively inhibit to be distorted.

Detailed description of the invention

Fig. 1 is the flow chart of the Video coding post-filter method provided by the invention based on convolutional neural networks；

Fig. 2 is convolutional neural networks frame diagram；

Fig. 3 is the training process schematic diagram of convolutional neural networks.

Specific embodiment

The invention will be further described with specific embodiment with reference to the accompanying drawing.

The present invention nonlinear fitting ability powerful using convolutional neural networks is sufficiently divided in conjunction with the characteristic of Video coding The reason of causing distortion in analysis Video coding compression process, proposes based on convolutional neural networks and is relevant to quantization parameter and frame The Video coding post-filter method of type, establishes the mapping damaged between video frame and lossless video frame, and training obtains difference Convolutional neural networks model under quantization parameter (abbreviation QP) and different frame type, for the post-filtering link in encoder It is filtered for the frame of different quantization parameters and frame type, can effectively inhibit to be distorted.

In mixed video coding framework, frame type mainly has I frame, P frame and B frame, in which:

For I frame using intra prediction mode, also referred to as intracoded frame (intra picture), I frame is usually every The start frame of one picture group (GOP), its information independent of before and after frames, therefore can be used as shuffle in broadcasting Point of penetration.It is the block of block and top based on the adjacent left side decoded in decoding end for intra prediction mode, uses Algorithm carries out the prediction of current block, and then along with residual error data carrys out the raw information of reconstruction image, the residual error data is by solving Code value and predicted value are subtracted each other and are obtained；It in coding side, needs using identical intra prediction, because of predicted value and raw image data It is more similar, thus the energy for the remaining data to be transmitted be it is low-down, so as to realize the compression of code stream.

P frame and B frame use inter-frame forecast mode, P frame, that is, forward predicted frame (predictive-frame), with reference to front I frame or B frame remove temporal redundancy, realize Efficient Compression.B frame, that is, bi-directional predicted interpolation coding frame, also known as (bi-directional interpolated prediction frame) needs to refer to the video frame of front and back, utilizes two The frame on side and the similitude of B frame remove redundancy.Inter-prediction is mainly based upon higher similarity between consecutive frame, utilizes The information of given frame rebuilds present frame, and commonly used approach is pre- to carry out interframe by way of motion estimation and compensation It surveys.Method, that is, estimation method of most matched macro block is found, this method is first to find most phase in former frame for current macro As macro block (i.e. match block), poor are asked to them, generates many zeros, can save with similar coding process in frame can The bit number of sight.

In mixed video coding framework, the video frame of different frame type, coding method be it is different, just because such as This, its nature of distortion of the video frame of different frame type is also different.Due to I frame be it is relatively independent, distortion the reason of master It to be quantizing process and dct transform (discrete cosine transform) process, quantizing process is substantially that most values are mapped to a small number of values Process, relatively, de-quantization process is the process for restoring most values by a small number of values, restores inevitably the presence of number According to error, therefore video frame will appear high-frequency noise caused by some losses as quantization.The major way for generating P frame and B frame is By way of inter-prediction, the source of both frames prediction is from I frame, therefore the nature of distortion that I frame itself has Also P frame and B frame can be broadcast to；In addition, there is also the factors for leading to distortion during motion estimation and compensation.For example, in frame Between the motion compensation link predicted, two neighboring piece of prediction block of present frame is possible to be not from the phase of same reference frame The prediction of adjacent two blocks, it may be possible to from the prediction of two blocks in different frame.That is, that two of adjacent block reference Edge of block itself does not just have continuity, this also leads to certain blocking artifact phenomenon, to cause to be distorted.

The basic principle of Video coding is exactly that redundancy is compressed using correlation possessed by video sequence, realizes view The high efficiency of transmission of frequency sequence.Wherein correlation possessed by video sequence is broadly divided into room and time correlation, spatially Correlation is mainly reflected in there is correlation between same video frame neighbor pixel, and temporal correlation is mainly reflected in Adjacent frame similarity with higher on time.Therefore, in mixed video coding framework, the generation side of each video frame Formula is not intra prediction as a result, being exactly the result of inter-prediction.Its generation mechanism of different types of frame is not also identical, meaning Need also to take into account frame type in the video encoding quality enhancing algorithm based on convolutional neural networks, it is more preferable to obtain Effect.

Based on principle above-mentioned, the specific embodiment of the invention proposes the Video coding postposition based on convolutional neural networks Filtering method, this method include the training step and post-filtering processing step of convolutional neural networks model, as shown in Figure 1:

The training step includes S1 to S4:

The post-filtering processing step includes S5 and S6:

Method of the invention is filtered place firstly the need of training neural network, then with trained neural network model Reason.The above-mentioned Video coding postposition based on convolutional neural networks provided by the invention is illustrated with a kind of specific embodiment below Filtering method.

The original video for training is chosen, carries out coding compression, quantization parameter QP when compression is set as 20 to 51 (i.e. Continuous integer 20,21,22,23,24 ..., 50,51), obtain the compression video under different Q P.It preferably, can be using normal JM10 and/or HM12 video encoding standard software carries out coding compression to original video.To original video and corresponding pressure Contracting video carries out frame extraction, obtains many frames pair, frame is to that can be expressed as " original video frame-compressed video frame ", i.e., each Frame is to including an original video frame and a corresponding compressed video frame.By obtained all frames to according to frame type and quantization parameter The different demarcation of QP is multiple groups, and specific partition process is, for example: all frames draw elder generation according to the difference (32) of QP Point, then I frame, P frame and B frame are divided into according to frame type to the frame under each QP, to obtain described multiple groups (in this example In, obtain), the frame in each group is to frame type having the same and quantization parameter QP.

The framework of neural network to be trained includes that convolutional layer and Relu layers are built in a kind of specific embodiment The framework of neural network can refer to Fig. 2, including 3 convolutional layers (Convolution layer1, Convolution in figure Layer2, Convolution layer3) and 2 Relu layers (ReLU layer1 and the ReLU layer2 in figure), convolutional layer It is successively alternately connected with Relu layers, for example, convolutional layer 1 (Convolution layer1) is connected after input layer input, convolution The output of layer 1 connects Relu layer 1 (ReLU layer1), and the output of Relu layer 1 connects 2 (Convolution of convolutional layer Layer2), convolutional layer 2 (Convolution layer2) output connect Relu layer 2 (ReLU layer2), Relu layer 2 it is defeated Convolutional layer 3 (Convolution layer3) is connected out, the output of convolutional layer 3 connects output layer output.It is to be appreciated that Fig. 2 Shown in convolutional neural networks be only exemplary, be not construed as limiting the invention.To in neural network shown in Fig. 2, The filtering core size (filtering core, that is, convolution kernel) that convolutional layer 1,2,3 is arranged is respectively 9,1,5, neuron number is respectively 64,32, 1；Relu layers of neuron number with and the neuron numbers of the Relu layers of upper one layer of convolutional layer connecting be consistent.

Then, using aforementioned obtained multiple groups frame to separately go to train it is above-mentioned put up it is for example shown in Fig. 2 Neural network.Firstly, the frame in each group is divided training set, verifying collection and test set, example to foundation resolution ratio and scene Such as by QP be 20 and frame type be I frame all frames to be divided into training set, verifying collection and test set, by QP be 20 and frame type For P frame all frames to be divided into training set, verifying collection and test set, wherein the frame in test set is to resolution ratio and field The feature of scape multiplicity.Secondly, all frames are obtained " primitive frame image block-to according to certain pel spacing interception image block The image block pair of condensed frame image block ", the input as neural network in training process.

Illustrate the process for training neural network by taking the framing pair that QP is 20, type is I frame as an example, for example, to the group The frame of interior all frame centerings is that spacing carrys out interception image block, such as the size of each image block is 33 × 33 by 28 pixels, In the training process, all convolutional layers do not use padding operation all to avoid boundary effect (because using padding phase When in the pixel on imagineering boundary, to expand the size of image, the information of boundary is just inaccurate), in order to The penalty values between output image block and original picture block are calculated, original picture block is also cut into output image block by the present invention It is in the same size.Then, all image blocks of interception are divided into multiple batches at random, the image block of a batch is sequentially input Neural network, completing a batch is then an iteration, how many batch carries out how many times iteration, sharp in an iterative process Back-propagating is carried out with stochastic gradient descent method, updates network parameter, constantly to minimize loss function.Every certain number of iteration The network parameter trained is then corrected using verifying collection, to prevent over-fitting.

Specific training process is as shown in figure 3, the neuron of convolutional layer is expressed as the operation of the image block of inputWherein n indicates the quantity of input picture block, w_iIndicate i-th of convolution kernel, x_iIndicate i-th of input picture Block, b indicate that biasing coefficient, * indicate convolution operation, and image block is indicated with picture element matrix；Relu layers of neuron is to image block Operation be expressed as M=max (N, 0), wherein N and M respectively indicate input picture block and export image block pixel value.

When parameter convergence, training is completed, obtains QP is 20, type is I frame the framing to corresponding neural network Model is embedded into encoder post-filtering link and uses as filtering.Similarly, QP 20, type are a framing of P frame to also adopting Neural network is trained in aforementioned manners, obtains the neural network model for corresponding to the framing pair that QP is 20, type is P frame； Frame under other QP values is to also training neural network using same method.

Assuming that compression video employed in training process has two kinds of softwares of JM10 and HM12, then two kinds of Software Compressions obtain Compression video should separate be handled, separate to train.To by above-mentioned training method, then available following Multiple neural network models:

32 different I frame models that the lower corresponding QP of JM10 standard software coding is 20 to 51, JM10 standard software coding 32 different P frame models that lower corresponding QP is 20 to 51, the lower corresponding QP of JM10 standard software coding be 32 of 20 to 51 not Same B frame model, 32 different I frame models that the lower corresponding QP of HM12 standard software coding is 20 to 51, HM12 standard software 32 different P frame models that the lower corresponding QP of coding is 20 to 51, the lower corresponding QP of HM12 standard software coding are the 32 of 20 to 51 A different B frame model.

In actual use, coding compression and frame are carried out to extraction to the encoded device of original video to be processed Afterwards, the judgement to obtained frame to quantization parameter and frame type is carried out, such as judgement learn that the QP of currently pending frame pair is 23, type is B frame, then select to belong to QP is the trained obtained neural network model of image block that 23, type is B frame come after carrying out Set filtering processing.In a preferred embodiment, the encoding software of use can also be taken into account, i.e., in training also according to coding The difference of software is trained respectively (as described in the preceding paragraph), so that also considering first resolution in actual use is which kind of Encoding software compressed encoding obtains, and considers further that quantization parameter and frame type to select corresponding neural network model to be filtered Processing.

The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those skilled in the art to which the present invention belongs, it is not taking off Under the premise of from present inventive concept, several equivalent substitute or obvious modifications can also be made, and performance or use is identical, all answered When being considered as belonging to protection scope of the present invention.

Claims

1. a kind of Video coding post-filter method based on convolutional neural networks, the training step including convolutional neural networks model Rapid and post-filtering processing step, in which:

The training step includes S1 to S4:

S1, the quantization parameter that video compress is arranged are 20 to 51, carry out coding compression to original video, obtain compression video；

S2, the extraction that frame is carried out to the compression video and the original video, obtain multiple frames pair, each frame to comprising One compressed video frame and an original video frame；

S4, convolutional neural networks frame and initialization network parameter are built, the group divided using step S3 is respectively to neural network It is trained, obtains multiple neural network models corresponding to different quantization parameters and frame type；Wherein, by all frames to pressing According to certain pel spacing interception image block, primitive frame image block-condensed frame image block image block pair is obtained, as training The input of neural network in journey；The convolutional neural networks frame built includes convolutional layer and Relu layers, the neuron of convolutional layer The operation of the image block of input is expressed asWherein n indicates the quantity of input picture block, w_iIt indicates i-th Convolution kernel, x_iIndicate that i-th of input picture block, b indicate that biasing coefficient, * indicate convolution operation, image block is with picture element matrix come table Show；Relu layers of neuron is expressed as M=max (N, 0) to the operation of image block, wherein N and M respectively indicate input picture block and Export the pixel value of image block；

The post-filtering processing step includes S5 and S6:

S6, frame pair to be processed, and the quantization according to frame pair to be processed are obtained to original video execution step S1 and S2 to be processed Parameter and frame type select corresponding neural network model to be filtered.

2. the Video coding post-filter method based on convolutional neural networks as described in claim 1, it is characterised in that: step Frame is divided into I frame, P frame and B frame to according to frame type in S3.

3. the Video coding post-filter method based on convolutional neural networks as described in claim 1, it is characterised in that: step S3 further includes that according to resolution ratio and scene, to divide, training set, verifying collects and test set by the frame in each group；It is wherein described Verifying collection is for the parameter that every iteration pre-determined number post-equalization trains in the training process, to prevent over-fitting.

4. the Video coding post-filter method based on convolutional neural networks as claimed in claim 3, it is characterised in that: described It include the frame pair of a variety of resolution ratio and several scenes in test set.

5. the Video coding post-filter method based on convolutional neural networks as described in claim 1, it is characterised in that: convolution Layer and Relu layers of successively alternately connection.

6. the Video coding post-filter method based on convolutional neural networks as described in claim 1, it is characterised in that: nerve When network training, the size of the original picture block of input is cut in the same size with output image block.