CN110177282A

CN110177282A - A kind of inter-frame prediction method based on SRCNN

Info

Publication number: CN110177282A
Application number: CN201910388829.6A
Authority: CN
Inventors: 颜成钢; 黄智坤; 李志胜; 孙垚棋; 张继勇; 张勇东
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2019-08-27
Anticipated expiration: 2039-05-10
Also published as: CN110177282B

Abstract

The invention discloses a kind of inter-frame prediction methods based on SRCNN, it is characterised in that carries out inter-prediction to image sequence using super-resolution convolutional neural networks；It takes exercises after estimation and operation of motion compensation to image sequence, trains characteristic model in conjunction with super-resolution convolutional neural networks；Super-resolution rebuilding is carried out to image using the parameter in model, while Motion estimation and compensation is carried out to image, obtains the consistent image of next frame image with present image.Deep learning is applied to the inter-prediction of Video coding by the present invention, using convolutional neural networks, carries out feature extraction to estimation, the operation of motion compensation image sequence and training learns.Meanwhile using super-resolution neural network, in image reconstruction, the image quality of image can be enhanced.

Description

A kind of inter-frame prediction method based on SRCNN

Technical field

The invention belongs to the inter-predictions in field of video encoding, mainly for improving video transmission efficiency, and in particular to A kind of inter-frame prediction method based on SRCNN.

Background technique

Super-resolution (Super-Resolution) means to turn a width low resolution (Low Resolution) image Become panel height resolution ratio (HighResolution) image, picture quality, clarity usually can be improved.Super-resolution convolution Neural network (Super-ResolutionConvolutionalNeuralNetwork, SRCNN) is one super applied to image The convolutional neural networks of resolution reconstruction after carrying out Nonlinear Mapping to feature, reconstruct height by extracting the feature of image block The image of resolution ratio.This convolutional neural networks is widely used, accuracy and reliability has obtained testing well from after proposing Card.

In this current information age, from the research of scientists and statistics statistics indicate that, what the mankind obtained comes from In extraneous information, general 75% is obtained by eyes, and the information of eyes acquisition is converted into image by vision system And it is transferred to brain.With the quick raising of current living standard, people are higher and higher to image/video quality requirement.And scheme The resolution ratio continuous improvement of picture, video also carrys out huge challenge for information transmission belt.More clearly image, video mean bigger Data volume and need higher transmission rate.In order to guarantee the perception comfort level of people, the frame per second one of the videos such as film now As be higher than that 24 frames are per second, if the image of each frame preserved, then play, not only hard-disk capacity required special frame by frame Height, and huge challenge is suffered to the transmission of playback equipment, display rate.If playing video by this method, because passing The limitation of defeated rate, then will not have the HD videos such as 2K, 4K.Video coding technique largely eliminates image sequence Redundancy between column cooperates existing hardware technology, ultra high-definition video is allowed to come into people's so that the data volume of video greatly reduces In life, the perception demand of people is largely met.

Inter-prediction is a most important ring in Video coding, is the correlation using video image interframe, i.e. time phase Guan Xing is widely used in the pressure of common TV, video conferencing, visual telephone, high-definition television to achieve the purpose that compression of images Reduce the staff code.In image transmission technology, live image especially television image is the main object of concern.Live image be by when Between on the temporal image sequence that forms of successive image frame using the frame period as interval, it is bigger than spatially having in time Correlation.The adjacent interframe variations in detail of most of television images is very little, i.e., video image interframe has very strong correlation Property, using interframe encode is carried out possessed by frame the characteristics of correlation, it can get the compression ratio more much higher than intraframe coding.

In inter prediction encoding, since there is certain correlations for the scenery in live image contiguous frames.Therefore, may be used Live image is divided into several pieces or macro block, and tries to search out the position of each piece or macro block in contiguous frames image, and The relative displacement of spatial position between the two out, obtained relative displacement are exactly usual signified motion vector, are obtained The process of motion vector is referred to as estimation.Motion vector and the prediction error obtained after motion match are jointly sent to Decoding end is found corresponding in the position that decoding end is indicated according to motion vector from decoded neighbouring reference frame image Block or macro block have just obtained the position of block or macro block in the current frame after being added with prediction error.It can be gone by estimation Except interframe redundancy, so that the bit number of transmission of video is greatly reduced, therefore, estimation is in Video compression system One important component.This section is first started with from the conventional method of estimation, and three keys of estimation are discussed Problem: sports ground parametrization, optimization matching function are defined and how to search out optimization matching.

Summary of the invention

The purpose of the present invention is being different from the HEVC Video coding mode of mainstream, propose that a kind of interframe based on SRCNN is pre- Survey method.The present invention is directed to use super-resolution convolutional neural networks to carry out inter-prediction to image sequence.Image sequence is done After Motion estimation and compensation operation, characteristic model is trained in conjunction with super-resolution convolutional neural networks.Using in model Parameter, can to image carry out super-resolution rebuilding, while to image carry out Motion estimation and compensation, obtain with currently The almost the same image of the next frame image of image.

The technical solution adopted by the present invention to solve the technical problems includes the following steps:

Step 1: collecting the video file of a large amount of different scenes, video is compressed by different quantization parameters (QP)；

Step 2: image sequence is extracted from video, when extracting image sequence, the time interval setting of front and back two field pictures For t, t < 0.1 second；

Step 3: the part in image sequence is divided into verifying collection.Residual image is read frame by frame, in addition to the image of reading Outside the first frame of sequence, every image uses present frame and former frame, calculates the residual error between two field pictures, by previous frame image and This residual error combines, and carries out motion compensation to it, obtains the prediction frame of previous frame image.It saves and calculates resulting prediction frame sequence, Prediction frame image sequence is divided, training set and test set are obtained, the ratio of the two is 4:1.

Step 4: input training set and test set are arranged suitable hyper parameter, use super-resolution convolutional neural networks (SRCNN) training parameter model；

Step 5: calculating the Y-PSNR for the i-th frame image and i+1 frame that verifying collects in interior each image sequence (PSNR), it is denoted as PSRN1；The parameter read in parameter model handles the i-th frame image in the image sequence of acquisition, obtains To reconstruction image I；The PSNR in reconstruction image I and verifying collection between the i-th frame image of image sequence is calculated, PSNR2 is denoted as；

Compare and calculate resulting two PSNR values, if PSNR2 >=PSNR1, then it is assumed that the model is effective；

If PSNR2 < PSNR1, then it is assumed that modelling effect is bad；Remember ERR=PSNR1-PSNR2；If ERR < 5, then think to instruct Practice hyper parameter and problematic, return step 4, the hyper parameter of regularized learning algorithm rate, then re -training parameter model are set；If ERR >= 5, then it is assumed that problem, return step 3 make data set include more scenes to the partition strategy of data set by EDS extended data set again, It is trained and verifies after repartitioning training set and test set；

If two image difference are larger, PSNR value exceeds minimum predetermined threshold value, then adjusting training collection, test set；

If two image difference are smaller, PSNR value is between best preset threshold and minimum predetermined threshold value, then return step 4 Adjust the parameter of super-resolution convolutional neural networks, re -training parameter model.

Described is implemented as follows using parameter model reconstruction image:

1. the low-resolution image of input, which is gone to YCbCr color space, takes grayscale image, as the defeated of image reconstruction operation Enter i.Down-sampling is carried out to image i, the step-length of down-sampling is set as k, obtains the image of low dimensional；

2. the image of pair low dimensional uses bicubic interpolation, target sizes are amplified to, that is, the low resolution figure inputted As size；

3. reading the parameter in parameter model, weight and biasing including each network node.Pass through three-layer coil product network Nonlinear Mapping is done to the image after interpolation, it is after obtained reconstruction as a result, image I；

4. image I is gone back to RGB color figure, the high-definition picture rebuild.

The present invention has the beneficial effect that:

The novelty of the present invention is deep learning to be applied to the inter-prediction of Video coding, convolutional Neural net is used Network carries out feature extraction to estimation, the operation of motion compensation image sequence and training learns.Meanwhile using super-resolution Rate neural network, in image reconstruction, the image quality of image can be enhanced.

Detailed description of the invention

Fig. 1 is the schematic diagram of super-resolution convolutional neural networks SRCNN；

Fig. 2 is the flow chart that the present invention is implemented.

Specific embodiment

Present invention is generally directed to the inter-frame prediction methods in Video coding to carry out algorithm innovation, the training for entire model Process is described in detail, and below in conjunction with attached drawing, elaborates specific implementation step of the invention, the purpose of the present invention and Effect will be apparent.

Fig. 1 is the schematic diagram of super-resolution convolutional neural networks SRCNN, and Cong Tuzhong is it will be clear that the convolutional Neural Network structure is simple, by Nonlinear Mapping and image reconstruction, can play humidification to the image quality of image.With the net Network can improve the resolution ratio of image while carrying out inter-prediction to image sequence.

Fig. 2 is implementation flow chart of the invention, and wherein concrete operations include:

It include a variety of different scenes 1. collecting the video file of a large amount of yuv formats.

2. being compressed using different quantization parameters to video file, quantization parameter is higher, then compression degree is higher, main Pay close attention to compression ratio of the quantization parameter between 28 to 42.

3. extracting image sequence from video file, according to the video of different durations, the image of different number is extracted, to protect The interval for demonstrate,proving image sequence is consistent.In order to guarantee that the variation between the two field pictures of front and back is little, the time interval for extracting image will be set Very little is set to obtain, is arranged with specific reference to the length of video.

Estimation, motion compensation 4. pair each image extracted is taken exercises, this operation be specially input present frame and Next frame image takes exercises estimation, motion compensation to present frame by comparing two field pictures.

5. using the image sequence handled well, tissue training's collection and test set.Verifying collection needed for verifying model then needs With the image sequence for not doing estimation, motion compensation.

6. inputting training set and test set, suitable parameter is set, is instructed using super-resolution convolutional neural networks SRCNN Practice model.

7. verifying, whether trained model is effective, by comparing the next frame image extracted originally and using the model The image that Reconstruction goes out, if there are few difference for two images, it is believed that the model is effective.If two images have apparent poor Not, it also to be made adjustment according to different situations.If the difference of two images is very big, need to adjust data set, re -training mould Type adjusts network parameter, re -training if difference is not very greatly, to need to make improvement on imaging effect between two images The model of composite demand out.

When comparison generates the next frame image of image and original image, need to combine visual subjective judgement with objective number Value analysis.It is subjective, two field pictures are observed by the naked eye, if there are few difference for two pictures, subjective can think that model has Effect.But it, need to also be by mathematical tool, to be compared to two images due to the difference of script before and after frames image and little.It can To use, i.e. Y-PSNR, that is, PSNR come to rebuild effect objectively evaluate, PSNR be it is a kind of evaluate image objective mark Standard, formula are as follows:

Wherein, MSE is mean square error (Meansquarederror).Calculate separately original image and its next frame image, original PSNR numerical value between image and the image reconstructed illustrates that the modelling effect is fine, reconstructs substantially if the two numerical value is close Identical with original image next frame image picture.If the PSNR numerical value of the latter is higher, it may be considered that, program to image into While row inter-prediction, picture quality is also improved.

By PSNR, the accuracy of model can be objectively verified again, workload is reduced with this, and guarantee the party Case is effectively implemented.

Claims

1. a kind of inter-frame prediction method based on SRCNN, it is characterised in that using super-resolution convolutional neural networks to image sequence Column carry out inter-prediction；It takes exercises after estimation and operation of motion compensation to image sequence, in conjunction with super-resolution convolutional neural networks Train characteristic model；Super-resolution rebuilding is carried out to image using the parameter in model, while estimation is carried out to image And motion compensation, obtain the consistent image of next frame image with present image.

2. the according to claim a kind of inter-frame prediction method based on SRCNN, it is characterised in that specific implementation includes as follows Step:

Step 1: collecting the video file of a large amount of different scenes, video is compressed by different quantization parameters；

Step 2: image sequence is extracted from video, when extracting image sequence, the time interval of front and back two field pictures is set as t, t < 0.1 second；

Step 3: the part in image sequence is divided into verifying collection；Residual image sequence is read frame by frame, in addition to the image of reading Outside the first frame of sequence, every image uses present frame and former frame, calculates the residual error between two field pictures, by previous frame image and This residual error combines, and carries out motion compensation to it, obtains the prediction frame of previous frame image；It saves and calculates resulting prediction frame image sequence Column divide prediction frame image sequence, obtain training set and test set, and the ratio of the two is 4:1；

Step 4: input training set and test set are arranged hyper parameter, use super-resolution convolutional neural networks training parameter model；

Step 5: calculating the Y-PSNR (PSNR) of the i-th frame image and i+1 frame that verifying collects in interior each image sequence, note Make PSRN1；The parameter read in parameter model handles the i-th frame image in the image sequence of acquisition, obtains reconstruction figure As I；The PSNR in reconstruction image I and verifying collection between the i-th frame image of image sequence is calculated, PSNR2 is denoted as；

If PSNR2 < PSNR1, then it is assumed that modelling effect is bad；Remember ERR=PSNR1-PSNR2；If ERR < 5, then think that training is super Parameter setting is problematic, return step 4, the hyper parameter of regularized learning algorithm rate, then re -training parameter model；If ERR >=5, Think the partition strategy and problem of data set, return step 3 makes data set include more scenes, again by EDS extended data set It is trained and verifies after dividing training set and test set；

If two image difference are smaller, PSNR value is between best preset threshold and minimum predetermined threshold value, then return step 4 adjusts The parameter of super-resolution convolutional neural networks, re -training parameter model.

3. the according to claim 2 kind of inter-frame prediction method based on SRCNN, it is characterised in that the use parameter mould Type carrys out reconstruction image and is implemented as follows:

1. the low-resolution image of input, which is gone to YCbCr color space, takes grayscale image, the input figure as image reconstruction operation As i；Down-sampling is carried out to input picture i, the step-length of down-sampling is set as k, obtains the image of low dimensional；

2. the image of pair low dimensional uses bicubic interpolation, target sizes are amplified to, that is, the low-resolution image inputted is big It is small；

3. reading the parameter in parameter model, weight and biasing including each network node；By three-layer coil product network to slotting Image after value does Nonlinear Mapping, the image I after being rebuild；