CN110177282A - A kind of inter-frame prediction method based on SRCNN - Google Patents

A kind of inter-frame prediction method based on SRCNN Download PDF

Info

Publication number
CN110177282A
CN110177282A CN201910388829.6A CN201910388829A CN110177282A CN 110177282 A CN110177282 A CN 110177282A CN 201910388829 A CN201910388829 A CN 201910388829A CN 110177282 A CN110177282 A CN 110177282A
Authority
CN
China
Prior art keywords
image
frame
parameter
super
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910388829.6A
Other languages
Chinese (zh)
Other versions
CN110177282B (en
Inventor
颜成钢
黄智坤
李志胜
孙垚棋
张继勇
张勇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910388829.6A priority Critical patent/CN110177282B/en
Publication of CN110177282A publication Critical patent/CN110177282A/en
Application granted granted Critical
Publication of CN110177282B publication Critical patent/CN110177282B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction

Abstract

The invention discloses a kind of inter-frame prediction methods based on SRCNN, it is characterised in that carries out inter-prediction to image sequence using super-resolution convolutional neural networks;It takes exercises after estimation and operation of motion compensation to image sequence, trains characteristic model in conjunction with super-resolution convolutional neural networks;Super-resolution rebuilding is carried out to image using the parameter in model, while Motion estimation and compensation is carried out to image, obtains the consistent image of next frame image with present image.Deep learning is applied to the inter-prediction of Video coding by the present invention, using convolutional neural networks, carries out feature extraction to estimation, the operation of motion compensation image sequence and training learns.Meanwhile using super-resolution neural network, in image reconstruction, the image quality of image can be enhanced.

Description

A kind of inter-frame prediction method based on SRCNN
Technical field
The invention belongs to the inter-predictions in field of video encoding, mainly for improving video transmission efficiency, and in particular to A kind of inter-frame prediction method based on SRCNN.
Background technique
Super-resolution (Super-Resolution) means to turn a width low resolution (Low Resolution) image Become panel height resolution ratio (HighResolution) image, picture quality, clarity usually can be improved.Super-resolution convolution Neural network (Super-ResolutionConvolutionalNeuralNetwork, SRCNN) is one super applied to image The convolutional neural networks of resolution reconstruction after carrying out Nonlinear Mapping to feature, reconstruct height by extracting the feature of image block The image of resolution ratio.This convolutional neural networks is widely used, accuracy and reliability has obtained testing well from after proposing Card.
In this current information age, from the research of scientists and statistics statistics indicate that, what the mankind obtained comes from In extraneous information, general 75% is obtained by eyes, and the information of eyes acquisition is converted into image by vision system And it is transferred to brain.With the quick raising of current living standard, people are higher and higher to image/video quality requirement.And scheme The resolution ratio continuous improvement of picture, video also carrys out huge challenge for information transmission belt.More clearly image, video mean bigger Data volume and need higher transmission rate.In order to guarantee the perception comfort level of people, the frame per second one of the videos such as film now As be higher than that 24 frames are per second, if the image of each frame preserved, then play, not only hard-disk capacity required special frame by frame Height, and huge challenge is suffered to the transmission of playback equipment, display rate.If playing video by this method, because passing The limitation of defeated rate, then will not have the HD videos such as 2K, 4K.Video coding technique largely eliminates image sequence Redundancy between column cooperates existing hardware technology, ultra high-definition video is allowed to come into people's so that the data volume of video greatly reduces In life, the perception demand of people is largely met.
Inter-prediction is a most important ring in Video coding, is the correlation using video image interframe, i.e. time phase Guan Xing is widely used in the pressure of common TV, video conferencing, visual telephone, high-definition television to achieve the purpose that compression of images Reduce the staff code.In image transmission technology, live image especially television image is the main object of concern.Live image be by when Between on the temporal image sequence that forms of successive image frame using the frame period as interval, it is bigger than spatially having in time Correlation.The adjacent interframe variations in detail of most of television images is very little, i.e., video image interframe has very strong correlation Property, using interframe encode is carried out possessed by frame the characteristics of correlation, it can get the compression ratio more much higher than intraframe coding.
In inter prediction encoding, since there is certain correlations for the scenery in live image contiguous frames.Therefore, may be used Live image is divided into several pieces or macro block, and tries to search out the position of each piece or macro block in contiguous frames image, and The relative displacement of spatial position between the two out, obtained relative displacement are exactly usual signified motion vector, are obtained The process of motion vector is referred to as estimation.Motion vector and the prediction error obtained after motion match are jointly sent to Decoding end is found corresponding in the position that decoding end is indicated according to motion vector from decoded neighbouring reference frame image Block or macro block have just obtained the position of block or macro block in the current frame after being added with prediction error.It can be gone by estimation Except interframe redundancy, so that the bit number of transmission of video is greatly reduced, therefore, estimation is in Video compression system One important component.This section is first started with from the conventional method of estimation, and three keys of estimation are discussed Problem: sports ground parametrization, optimization matching function are defined and how to search out optimization matching.
Summary of the invention
The purpose of the present invention is being different from the HEVC Video coding mode of mainstream, propose that a kind of interframe based on SRCNN is pre- Survey method.The present invention is directed to use super-resolution convolutional neural networks to carry out inter-prediction to image sequence.Image sequence is done After Motion estimation and compensation operation, characteristic model is trained in conjunction with super-resolution convolutional neural networks.Using in model Parameter, can to image carry out super-resolution rebuilding, while to image carry out Motion estimation and compensation, obtain with currently The almost the same image of the next frame image of image.
The technical solution adopted by the present invention to solve the technical problems includes the following steps:
Step 1: collecting the video file of a large amount of different scenes, video is compressed by different quantization parameters (QP);
Step 2: image sequence is extracted from video, when extracting image sequence, the time interval setting of front and back two field pictures For t, t < 0.1 second;
Step 3: the part in image sequence is divided into verifying collection.Residual image is read frame by frame, in addition to the image of reading Outside the first frame of sequence, every image uses present frame and former frame, calculates the residual error between two field pictures, by previous frame image and This residual error combines, and carries out motion compensation to it, obtains the prediction frame of previous frame image.It saves and calculates resulting prediction frame sequence, Prediction frame image sequence is divided, training set and test set are obtained, the ratio of the two is 4:1.
Step 4: input training set and test set are arranged suitable hyper parameter, use super-resolution convolutional neural networks (SRCNN) training parameter model;
Step 5: calculating the Y-PSNR for the i-th frame image and i+1 frame that verifying collects in interior each image sequence (PSNR), it is denoted as PSRN1;The parameter read in parameter model handles the i-th frame image in the image sequence of acquisition, obtains To reconstruction image I;The PSNR in reconstruction image I and verifying collection between the i-th frame image of image sequence is calculated, PSNR2 is denoted as;
Compare and calculate resulting two PSNR values, if PSNR2 >=PSNR1, then it is assumed that the model is effective;
If PSNR2 < PSNR1, then it is assumed that modelling effect is bad;Remember ERR=PSNR1-PSNR2;If ERR < 5, then think to instruct Practice hyper parameter and problematic, return step 4, the hyper parameter of regularized learning algorithm rate, then re -training parameter model are set;If ERR >= 5, then it is assumed that problem, return step 3 make data set include more scenes to the partition strategy of data set by EDS extended data set again, It is trained and verifies after repartitioning training set and test set;
If two image difference are larger, PSNR value exceeds minimum predetermined threshold value, then adjusting training collection, test set;
If two image difference are smaller, PSNR value is between best preset threshold and minimum predetermined threshold value, then return step 4 Adjust the parameter of super-resolution convolutional neural networks, re -training parameter model.
Described is implemented as follows using parameter model reconstruction image:
1. the low-resolution image of input, which is gone to YCbCr color space, takes grayscale image, as the defeated of image reconstruction operation Enter i.Down-sampling is carried out to image i, the step-length of down-sampling is set as k, obtains the image of low dimensional;
2. the image of pair low dimensional uses bicubic interpolation, target sizes are amplified to, that is, the low resolution figure inputted As size;
3. reading the parameter in parameter model, weight and biasing including each network node.Pass through three-layer coil product network Nonlinear Mapping is done to the image after interpolation, it is after obtained reconstruction as a result, image I;
4. image I is gone back to RGB color figure, the high-definition picture rebuild.
The present invention has the beneficial effect that:
The novelty of the present invention is deep learning to be applied to the inter-prediction of Video coding, convolutional Neural net is used Network carries out feature extraction to estimation, the operation of motion compensation image sequence and training learns.Meanwhile using super-resolution Rate neural network, in image reconstruction, the image quality of image can be enhanced.
Detailed description of the invention
Fig. 1 is the schematic diagram of super-resolution convolutional neural networks SRCNN;
Fig. 2 is the flow chart that the present invention is implemented.
Specific embodiment
Present invention is generally directed to the inter-frame prediction methods in Video coding to carry out algorithm innovation, the training for entire model Process is described in detail, and below in conjunction with attached drawing, elaborates specific implementation step of the invention, the purpose of the present invention and Effect will be apparent.
Fig. 1 is the schematic diagram of super-resolution convolutional neural networks SRCNN, and Cong Tuzhong is it will be clear that the convolutional Neural Network structure is simple, by Nonlinear Mapping and image reconstruction, can play humidification to the image quality of image.With the net Network can improve the resolution ratio of image while carrying out inter-prediction to image sequence.
Fig. 2 is implementation flow chart of the invention, and wherein concrete operations include:
It include a variety of different scenes 1. collecting the video file of a large amount of yuv formats.
2. being compressed using different quantization parameters to video file, quantization parameter is higher, then compression degree is higher, main Pay close attention to compression ratio of the quantization parameter between 28 to 42.
3. extracting image sequence from video file, according to the video of different durations, the image of different number is extracted, to protect The interval for demonstrate,proving image sequence is consistent.In order to guarantee that the variation between the two field pictures of front and back is little, the time interval for extracting image will be set Very little is set to obtain, is arranged with specific reference to the length of video.
Estimation, motion compensation 4. pair each image extracted is taken exercises, this operation be specially input present frame and Next frame image takes exercises estimation, motion compensation to present frame by comparing two field pictures.
5. using the image sequence handled well, tissue training's collection and test set.Verifying collection needed for verifying model then needs With the image sequence for not doing estimation, motion compensation.
6. inputting training set and test set, suitable parameter is set, is instructed using super-resolution convolutional neural networks SRCNN Practice model.
7. verifying, whether trained model is effective, by comparing the next frame image extracted originally and using the model The image that Reconstruction goes out, if there are few difference for two images, it is believed that the model is effective.If two images have apparent poor Not, it also to be made adjustment according to different situations.If the difference of two images is very big, need to adjust data set, re -training mould Type adjusts network parameter, re -training if difference is not very greatly, to need to make improvement on imaging effect between two images The model of composite demand out.
When comparison generates the next frame image of image and original image, need to combine visual subjective judgement with objective number Value analysis.It is subjective, two field pictures are observed by the naked eye, if there are few difference for two pictures, subjective can think that model has Effect.But it, need to also be by mathematical tool, to be compared to two images due to the difference of script before and after frames image and little.It can To use, i.e. Y-PSNR, that is, PSNR come to rebuild effect objectively evaluate, PSNR be it is a kind of evaluate image objective mark Standard, formula are as follows:
Wherein, MSE is mean square error (Meansquarederror).Calculate separately original image and its next frame image, original PSNR numerical value between image and the image reconstructed illustrates that the modelling effect is fine, reconstructs substantially if the two numerical value is close Identical with original image next frame image picture.If the PSNR numerical value of the latter is higher, it may be considered that, program to image into While row inter-prediction, picture quality is also improved.
By PSNR, the accuracy of model can be objectively verified again, workload is reduced with this, and guarantee the party Case is effectively implemented.

Claims (3)

1. a kind of inter-frame prediction method based on SRCNN, it is characterised in that using super-resolution convolutional neural networks to image sequence Column carry out inter-prediction;It takes exercises after estimation and operation of motion compensation to image sequence, in conjunction with super-resolution convolutional neural networks Train characteristic model;Super-resolution rebuilding is carried out to image using the parameter in model, while estimation is carried out to image And motion compensation, obtain the consistent image of next frame image with present image.
2. the according to claim a kind of inter-frame prediction method based on SRCNN, it is characterised in that specific implementation includes as follows Step:
Step 1: collecting the video file of a large amount of different scenes, video is compressed by different quantization parameters;
Step 2: image sequence is extracted from video, when extracting image sequence, the time interval of front and back two field pictures is set as t, t < 0.1 second;
Step 3: the part in image sequence is divided into verifying collection;Residual image sequence is read frame by frame, in addition to the image of reading Outside the first frame of sequence, every image uses present frame and former frame, calculates the residual error between two field pictures, by previous frame image and This residual error combines, and carries out motion compensation to it, obtains the prediction frame of previous frame image;It saves and calculates resulting prediction frame image sequence Column divide prediction frame image sequence, obtain training set and test set, and the ratio of the two is 4:1;
Step 4: input training set and test set are arranged hyper parameter, use super-resolution convolutional neural networks training parameter model;
Step 5: calculating the Y-PSNR (PSNR) of the i-th frame image and i+1 frame that verifying collects in interior each image sequence, note Make PSRN1;The parameter read in parameter model handles the i-th frame image in the image sequence of acquisition, obtains reconstruction figure As I;The PSNR in reconstruction image I and verifying collection between the i-th frame image of image sequence is calculated, PSNR2 is denoted as;
Compare and calculate resulting two PSNR values, if PSNR2 >=PSNR1, then it is assumed that the model is effective;
If PSNR2 < PSNR1, then it is assumed that modelling effect is bad;Remember ERR=PSNR1-PSNR2;If ERR < 5, then think that training is super Parameter setting is problematic, return step 4, the hyper parameter of regularized learning algorithm rate, then re -training parameter model;If ERR >=5, Think the partition strategy and problem of data set, return step 3 makes data set include more scenes, again by EDS extended data set It is trained and verifies after dividing training set and test set;
If two image difference are larger, PSNR value exceeds minimum predetermined threshold value, then adjusting training collection, test set;
If two image difference are smaller, PSNR value is between best preset threshold and minimum predetermined threshold value, then return step 4 adjusts The parameter of super-resolution convolutional neural networks, re -training parameter model.
3. the according to claim 2 kind of inter-frame prediction method based on SRCNN, it is characterised in that the use parameter mould Type carrys out reconstruction image and is implemented as follows:
1. the low-resolution image of input, which is gone to YCbCr color space, takes grayscale image, the input figure as image reconstruction operation As i;Down-sampling is carried out to input picture i, the step-length of down-sampling is set as k, obtains the image of low dimensional;
2. the image of pair low dimensional uses bicubic interpolation, target sizes are amplified to, that is, the low-resolution image inputted is big It is small;
3. reading the parameter in parameter model, weight and biasing including each network node;By three-layer coil product network to slotting Image after value does Nonlinear Mapping, the image I after being rebuild;
4. image I is gone back to RGB color figure, the high-definition picture rebuild.
CN201910388829.6A 2019-05-10 2019-05-10 Interframe prediction method based on SRCNN Active CN110177282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910388829.6A CN110177282B (en) 2019-05-10 2019-05-10 Interframe prediction method based on SRCNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910388829.6A CN110177282B (en) 2019-05-10 2019-05-10 Interframe prediction method based on SRCNN

Publications (2)

Publication Number Publication Date
CN110177282A true CN110177282A (en) 2019-08-27
CN110177282B CN110177282B (en) 2021-06-04

Family

ID=67690836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910388829.6A Active CN110177282B (en) 2019-05-10 2019-05-10 Interframe prediction method based on SRCNN

Country Status (1)

Country Link
CN (1) CN110177282B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112155511A (en) * 2020-09-30 2021-01-01 广东唯仁医疗科技有限公司 Method for compensating human eye shake in OCT (optical coherence tomography) acquisition process based on deep learning
CN112601095A (en) * 2020-11-19 2021-04-02 北京影谱科技股份有限公司 Method and system for creating fractional interpolation model of video brightness and chrominance
CN113191945A (en) * 2020-12-03 2021-07-30 陕西师范大学 High-energy-efficiency image super-resolution system and method for heterogeneous platform
CN113592719A (en) * 2021-08-14 2021-11-02 北京达佳互联信息技术有限公司 Training method of video super-resolution model, video processing method and corresponding equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133919A (en) * 2017-05-16 2017-09-05 西安电子科技大学 Time dimension video super-resolution method based on deep learning
CN108012157A (en) * 2017-11-27 2018-05-08 上海交通大学 Construction method for the convolutional neural networks of Video coding fractional pixel interpolation
CN108805808A (en) * 2018-04-04 2018-11-13 东南大学 A method of improving video resolution using convolutional neural networks
CN109087243A (en) * 2018-06-29 2018-12-25 中山大学 A kind of video super-resolution generation method generating confrontation network based on depth convolution
US20190139205A1 (en) * 2017-11-09 2019-05-09 Samsung Electronics Co., Ltd. Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133919A (en) * 2017-05-16 2017-09-05 西安电子科技大学 Time dimension video super-resolution method based on deep learning
US20190139205A1 (en) * 2017-11-09 2019-05-09 Samsung Electronics Co., Ltd. Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation
CN108012157A (en) * 2017-11-27 2018-05-08 上海交通大学 Construction method for the convolutional neural networks of Video coding fractional pixel interpolation
CN108805808A (en) * 2018-04-04 2018-11-13 东南大学 A method of improving video resolution using convolutional neural networks
CN109087243A (en) * 2018-06-29 2018-12-25 中山大学 A kind of video super-resolution generation method generating confrontation network based on depth convolution

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112155511A (en) * 2020-09-30 2021-01-01 广东唯仁医疗科技有限公司 Method for compensating human eye shake in OCT (optical coherence tomography) acquisition process based on deep learning
CN112601095A (en) * 2020-11-19 2021-04-02 北京影谱科技股份有限公司 Method and system for creating fractional interpolation model of video brightness and chrominance
CN112601095B (en) * 2020-11-19 2023-01-10 北京影谱科技股份有限公司 Method and system for creating fractional interpolation model of video brightness and chrominance
CN113191945A (en) * 2020-12-03 2021-07-30 陕西师范大学 High-energy-efficiency image super-resolution system and method for heterogeneous platform
CN113191945B (en) * 2020-12-03 2023-10-27 陕西师范大学 Heterogeneous platform-oriented high-energy-efficiency image super-resolution system and method thereof
CN113592719A (en) * 2021-08-14 2021-11-02 北京达佳互联信息技术有限公司 Training method of video super-resolution model, video processing method and corresponding equipment
CN113592719B (en) * 2021-08-14 2023-11-28 北京达佳互联信息技术有限公司 Training method of video super-resolution model, video processing method and corresponding equipment

Also Published As

Publication number Publication date
CN110177282B (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN110177282A (en) A kind of inter-frame prediction method based on SRCNN
CN107197260B (en) Video coding post-filter method based on convolutional neural networks
Vu et al. A spatiotemporal most-apparent-distortion model for video quality assessment
CN100527842C (en) Background-based motion estimation coding method
CN106960416B (en) A kind of video satellite that content complexity is adaptive compression image super-resolution method
US8345971B2 (en) Method and system for spatial-temporal denoising and demosaicking for noisy color filter array videos
CN104219525B (en) Perception method for video coding based on conspicuousness and minimum discernable distortion
CN104093021B (en) Monitoring video compression method
CN108924554B (en) Panoramic video coding rate distortion optimization method based on spherical weighting structure similarity
CN107241607B (en) Visual perception coding method based on multi-domain JND model
CN103210645B (en) Use the video decoding of the super-resolution of the Case-based Reasoning of motion compensation
Moorthy et al. Efficient video quality assessment along temporal trajectories
CN109451310B (en) Rate distortion optimization method and device based on significance weighting
CN104023227B (en) A kind of objective evaluation method of video quality based on spatial domain and spatial structure similitude
CN109451316A (en) A kind of QP selection algorithm based on CU conspicuousness
CN106961610A (en) With reference to the ultra high-definition video new type of compression framework of super-resolution rebuilding
US9268791B2 (en) Method and apparatus for image processing and computer readable medium
Liu et al. End-to-end neural video coding using a compound spatiotemporal representation
CN102510496B (en) Quick size reduction transcoding method based on region of interest
CN108769696A (en) A kind of DVC-HEVC video transcoding methods based on Fisher discriminates
CN113055674B (en) Compressed video quality enhancement method based on two-stage multi-frame cooperation
Herglotz et al. Power-efficient video streaming on mobile devices using optimal spatial scaling
CN110401832B (en) Panoramic video objective quality assessment method based on space-time pipeline modeling
CN101291433A (en) Modular movement vector matching and evaluating method in video coding technique
CN113507607B (en) Compressed video multi-frame quality enhancement method without motion compensation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant