CN107481188A

CN107481188A - A kind of image super-resolution reconstructing method

Info

Publication number: CN107481188A
Application number: CN201710488265.4A
Authority: CN
Inventors: 袁虹; 潘磊; 翟懿奎; 刘健; 商丽娟
Original assignee: Zhuhai Special Economic Zone Ehong Technology Co Ltd
Current assignee: Zhuhai Special Economic Zone Ehong Technology Co Ltd
Priority date: 2017-06-23
Filing date: 2017-06-23
Publication date: 2017-12-15

Abstract

The invention discloses a kind of image super-resolution reconstructing method, including：(1) combine the RPN neutral nets in alternately training Faster RCNN neutral nets and Fast RCNN neutral nets and obtained Faster RCNN neural network models will be trained to detect the face in input picture, car plate, object target, face, car plate, the position coordinates of object and the corresponding label detected in the output input picture；According to described position coordinates and label information, face, car plate, object tertiary target picture in input picture are cut, and obtained picture progress change of scale will be cut, obtains meeting the face of input requirements, car plate, the class low resolution picture of object three；(2) respectively super-resolution reconstruction neutral net is trained to obtain super-resolution reconstruction neural network model for face, car plate, subject image training set；The low resolution picture is input to the super-resolution reconstruction network model, obtains high-resolution pictures corresponding to the low resolution picture.

Description

A kind of image super-resolution reconstructing method

Technical field

The present invention relates to Computer Vision Recognition technical field and image processing field, schemes more specifically to one kind As ultra-resolution ratio reconstructing method.

Background technology

Image information is the important information in the human cognitive world, and image resolution ratio refers mainly to as assessment picture quality Mark, can weigh the information content that image contains, thus research processing is carried out to image information and is significant.In criminal investigation During, it is extremely important by image acquisition key message such as license board information, face information, tend to greatly accelerate The progress of criminal investigation.But during actual imaging, due to the original lack of resolution of hardware device, shooting process middle ring The factors such as environmental light, Changes in weather, shake and bandwidth for transmission limitation influence, and can produce different degrees of scalloping, obscure Change, loss in detail and noise introduce, and cause the decline of final image quality.What is obtained in criminal investigation shooting process is often low Quality low-resolution image, it is not easy to therefrom extract useful information.Thus, for low-resolution image be recovered in order to One important and difficult task.

, it is necessary to carry out specific objective detection to image and classify before image super-resolution.In in the past traditional image In target detection, the most methods used are sliding window searching methods, i.e., image are slided with the sliding window of different size size Dynamic search, this method is primary disadvantage is that time-consuming, it is impossible to accomplishes that the detection to target in image is classified in real time.In the last few years by In the development of deep learning, deep learning method is also applied in target retrieval classification task, and RCNN methods first are by nerve Network application is detected among classification task to image object, and achieves good effect, on the basis of RCNN, Fast- RCNN and Faster-RCNN is proposed in succession.

The method for carrying out super-resolution reconstruction to image in image super-resolution task mainly has three kinds：Based on interpolation Method, the method based on reconstruction and the method based on study.Method based on interpolation is usually single frames interpolation technique, is led to Often be only capable of increasing image form, without increasing or seldom increasing extra high-frequency information, and enlarged drawing edge it is discontinuous, There are shake bell effect or overall polarisation to slide；Although this method is quick easy, extra high-frequency information can not be introduced, it is difficult to multiple The effect sharpened in former image.Method main thought based on reconstruction is that Image Super-resolution reconstruct is exactly to be moved back by degrading The priori backstepping LR imaging processes of change, it is not any because all useful informations of this method all obtain from input picture Additional background knowledge, with the increase of required resolution ratio amplification coefficient, it is desirable to provide input picture sample size Sharply increase, after the amplification coefficient upper limit is reached, image reconstruction effect no longer changes with the increase of input picture number It is kind.Method based on study is the focus of proximal segment time super-resolution algorithms research, can be broken through in conventional reconstruction method first The limitation of knowledge is tested, high magnification improves the high quality reconstruct of image resolution ratio, especially single frames.

Causing each research field to be paid close attention to based on deep learning correlation theory, deep learning model can realize on The more abstract description of view data, ultra-low resolution image is reconstructed using deep learning model and its super-resolution It is the new approaches for solving present problems.It is past in the research of image super-resolution reconstruct is carried out using deep learning method in the past It is past that there is following shortcoming：(1) small picture region and contextual information are depended on unduly；(2) it is deep because task complexity is high It is too slow to spend network training convergence rate；(3) only single super-resolution is worked.

The content of the invention

In view of the shortcomings of the prior art, it is an object of the invention to provide a kind of image super-resolution reconstructing method, overcome The deficiency of existing low-quality images super-resolution reconstruction technology in criminal investigation purposes.

To achieve these goals, the technical scheme is that：(1) joint alternately trains Faster-RCNN nerve nets RPN neutral nets and Fast-RCNN neutral nets in network simultaneously will train obtained Faster-RCNN neural network models pair Face, car plate, object target in input picture are detected, export the face detected in the input picture, car plate, The position coordinates of object and corresponding label；According to described position coordinates and label information, to face, car in input picture Board, object tertiary target picture are cut, and will cut obtained picture progress change of scale, obtain meeting input requirements Face, car plate, the class low resolution picture of object three；(2) depth is rolled up for face, car plate, subject image training set respectively Product neutral net is trained, and is obtained for face, car plate, object three's super-resolution reconstruction neural network model；By institute State low resolution picture and be input to the super-resolution reconstruction network model, obtain high score corresponding to the low resolution picture Resolution picture.

People's car analyte detection based on Faster-RCNN neutral nets includes joint alternately training Faster-RCNN nerve nets RPN neutral nets and Fast-RCNN neutral nets in network simultaneously will train obtained Faster-RCNN neural network models pair People's car thing target in input picture is detected, the position coordinates of people's car thing that is detected in output input picture and correspondingly Label, specifically include：

By being respectively that different picture library of the target structure with mark and label of face, car plate, three, object is used as Training dataset and test data set, using Faster-RCNN algorithms, respectively face, car plate and object training one is based on The RPN convolutional neural networks and a Fast-RCNN convolutional neural networks of multiple pre-selection frames, in RPN neutral nets and Fast- In RCNN neutral nets, the parameter of preceding 5 convolutional layers is arranged to identical, with the instruction for including the people with mark, vehicle and object Practice data set to be trained the RPN convolutional neural networks and Fast-RCNN convolutional neural networks, then will train RPN neutral nets are used for handling training set picture, obtain multiple for three face, car plate and object different targets Pre-selection frame.Then training set picture and multiple pre-selection frames for face, car plate and object are sent into training simultaneously Good Fast-RCNN convolutional neural networks, give a mark according to the output of Fast-RCNN neutral nets to different pre-selection frames Whether the pre-selection frame for judging to be directed to face, car plate and object is optimal selection region, if it is, frame will be preselected Target-recognition is corresponding marking highest target, obtains the final area of face, car plate and object in picture.For example, input Face, car plate and the type objects of object three are included in picture, picture obtains inputting three classes in picture after RPN convolutional neural networks The pre-selection frame of object, then inputs picture and the pre-selection frame inputs Fast-RCNN convolutional neural networks simultaneously, the network pair The pre-selection frame of input is finely adjusted to obtain accurate pre-selection frame position, and exports and be directed to people's car thing three on the pre-selection frame The fraction of type objects, the object classification of highest scoring is determined as the final object classification of the pre-selection frame, final network is defeated Go out from input picture the label of the people's car thing tertiary target detected and the coordinate position in picture.

Especially, the mark refers to the mark carried out in a training set picture to people, vehicle and object, specifically Refer to that people, vehicle and the object top left co-ordinate in region and the lower right corner in picture is trained are sat in a training set picture Mark；The label refers to, to the mark progress classification mark, specifically, i.e., belonging to people, vehicle or thing to the label Three classifications of body illustrate, the label for labelling classification 0 to belonging to people, the label for labelling classification 1 to belonging to vehicle, to belonging to The label for labelling classification 2. of object

Further, it is described RPN convolutional neural networks and Fast-RCNN convolutional neural networks are trained it is specific Step is as follows：

A1：RPN convolutional neural networks and Fast-RCNN convolution are designed according to the Detection task to face, car plate and object Neutral net, the RPN convolutional neural networks and the Fast-RCNN convolutional neural networks belong to a network structure；

A2：RPN convolutional neural networks are initialized, the method used that initializes is in RPN neutral net Parameter average be 0, variance be 0.1 gaussian random parameter initialized；

A3：The candidate frame of multiple yardsticks and different proportion, input instruction are set for each point on input training picture Practice the reference frame in collection picture, convolutional neural networks train the output and training of picture for input after being initialized by contrast The original mark of data picture, neural network parameter is adjusted using back-propagation algorithm (BP algorithm), it is final to cause damage It is minimum to lose function；

A4：The training convolutional neural networks on all training set pictures, obtain inputting in pictures relative to face, car The thick candidate frame of board and the type objects of object three, and put on class label for every a kind of picture；

A5：Fast-RCNN convolutional neural networks are equally used with average for 0, the Gauss number that variance is 0.1 is carried out Initialization.The input of Fast-RCNN convolutional neural networks is the thick candidate frame of picture obtained in A4 steps, in conjunction with training Collect the mark and label on picture, the Fast-RCNN convolutional neural networks are trained.Convolution god after being trained Through networking model；

A6：Re -training RPN convolutional neural networks, by the learning rate of the convolutional layer of the specific number of plies before the network Being arranged to 0, (0 expression was never trained or the situation of re -training, and without learning rate, learning rate is continuous in training Lifting), parameter uses the convolution for passing through certain number before the Fast-RCNN convolutional neural networks that training obtains in A5 steps The parameter of layer, new RPN convolutional neural networks models are obtained by training；

A7：Using the RPN convolutional neural networks newly trained on training set picture, retrieve in training set sample The thick candidate frame of face, car plate and article；

A8：Re -training Fast-RCNN convolutional neural networks, will be special before the Fast-RCNN convolutional neural networks The parameter for determining the convolutional layer of number is arranged to 0, and parameter uses to be obtained before RPN convolutional neural networks in A7 steps by training The parameter of the layer convolutional layer of certain number.The thick candidate frame inputted in training set sample and A7, training obtain new Fast-RCNN convolutional neural networks.

Obtaining inputting in picture after the coordinate position and label information of the type objects of people's car thing three, in input picture People's car thing tertiary target picture is cut, and will cut obtained picture progress change of scale, makes people's car thing three after conversion Type objects picture meets the super-resolution reconstruction network inputs requirement for people's car thing tertiary target.

Based on depth convolutional neural networks model (Very Deep network for Super-Resolution, VDSR training and reconstruct) includes：Build training and test data set respectively first to the type objects of people's car thing three, then distinguish Super-resolution reconstruction neutral net is trained for people's car object image training set, obtained for face, car plate, object three Person's super-resolution reconstruction neural network model.Input picture becomes by Faster-RCNN network models and by graphical rule After alternatively, the class low resolution picture of people's car thing three for meeting input requirements is obtained, the low resolution picture is inputted into super-resolution Rate reconstructed network model, obtain high-resolution pictures corresponding to the low resolution picture.The network model has good Convergence and very high real-time.

Further, the parameter adjustment method that is trained to VDSR is as follows：

1：The nonoverlapping extraction image block of low-resolution image that input picture is obtained after bicubic interpolation amplifies Network is input to, is trained using the miniature batch downward gradient optimized regression target of backpropagation, adjustment network ginseng Number.

2：It is well known that it is for Optimized model parameter that loss function, which minimizes, x represents to obtain by bicubic interpolation Low-resolution image, represent high-definition picture.Give a training setUsual target is study one Model f predicts y '=f (x), and object function is expressed as

3：Due to requiring that network must retain all input details, and exporting is individually created from the feature of study 's.But the present invention possesses deep layer network model, directly predict that this mode end to end of high-definition picture needs network to have There is long-term memory, and network convergence speed can become very slow.Therefore, the present invention solves this by the way of residual error study Problem.

Defining residual image is

R=y-x (2)

Most of pixel value may be 0 or very small numerical value in r.

Object function is expressed as

R is target residual figure, and f is neural network forecast, and x is the low resolution figure obtained by bicubic interpolation, last oversubscription Resolution result is expressed as f (x)+x.

4：Training can be strengthened by improving learning rate, accelerate convergence rate.But gradient blast can be introduced or gradient disappears and asked Topic, it is very effective to solve the problem by way of cutting gradient, allows weight renewal to be limited in a suitable scope.

5：Training set picture is stored in a big data set, trained by specifying yardstick to carry out interpolation amplification When, network parameter is shared.

6：Convolution collecting image carries out convolution, center pixel is shifted onto using neighboring pixel, for edge pixel, it is impossible to have Effect ground obtain around too big region, the size of clipped final figure and can not visually meet to require, therefore, convolution it Before, the size that characteristic pattern is obtained after making convolution with 0 expansion image pixel keeps constant.

7：Using ReLU nonlinear activation functions, obtained SGD convergence rate can it is faster than Sigmoid/Tanh a lot. Compared to Sigmoid/Tanh, ReLU only needs a threshold value to can be obtained by activation value, a lot of complicated without spending calculation Computing.

Compared with prior art, the present invention not only has stronger object classification ability and robustness, effectively and accurately right Low-quality people's car thing is classified, and can reach the requirement of real-time detection；Gone out by deep neural network learning training from low End to end mapping function of the image in different resolution feature to high-definition picture feature so that single frames low-resolution image can It is reconstructed to high-definition picture, the quality of reconstruction image has obtained higher lifting, reaches the effect of sharpening image, And this method has very high versatility and real-time, can be quickly to Image Reconstruction.

Brief description of the drawings

With reference to the accompanying drawings and detailed description, the structure to the present invention and its advantageous effects are carried out specifically It is bright.

Fig. 1 is the face, car plate, object detection flow chart of the present invention.

Fig. 2 is the RPN convolutional neural networks structure charts of the present invention.

Fig. 3 is the Fast-RCNN convolutional neural networks structure charts of the present invention.

Fig. 4 is the face, car plate, object super-resolution reconstruction flow chart of the present invention.

Fig. 5 is the depth convolutional network structure chart of the present invention.

Embodiment

In order that goal of the invention, technical scheme and its advantageous effects of the present invention become apparent from, below in conjunction with accompanying drawing And embodiment, the present invention will be described in further detail.It is it should be appreciated that specific described in this specification Embodiment is not intended to limit the present invention just for the sake of explaining the present invention.

For the present invention is better described, now some technical terms being related in the present invention are explained as follows：

Faster-RCNN convolutional neural networks：A RPN convolution is included in Faster-RCNN convolutional neural networks structures Neutral net and a Fast-RCNN convolutional neural networks, RPN convolutional neural networks are directed to regression problem, specifically at this In invention, RPN convolutional neural networks solve be obtain people, three different targets of vehicle and object roughing frame the problem of. What Fast-RCNN convolutional neural networks solved is discriminant classification problem, specifically in the present invention, Fast-RCNN convolutional Neural nets What network solved be how to make in each roughing frame for three different targets obtained from RPN convolutional neural networks it is further Screening, obtain the problem of detecting the selected frame and its coordinate position of targets for three differences.

Residual error learns：The training of neutral net becomes further difficult because its level deepens.Residual error proposed by the invention Learning framework, which can more easily contrast forefathers and carry depth network, to be trained.What is learnt relative to network before is nothing The function of reference, we can learn significantly improved network structure according to the input of network to its residual error function.Network is deep Spend extremely important, but there is also some problems：Whether can be arrived preferably simply by more network layer study are increased NetworkIt is so-called gradient disappearance (blast) problem to solve one of obstacle of this problem, and this fundamentally hampers network receipts Hold back, although this problem by extensive discussions, and attempts to solve by some methods, including by specification initialization and The mode for introducing intermediate value standardization layer causes utilization stochastic gradient descent (SGD) method of up to tens layers to cause feedback network Solution restrained.Although these deeper networks are restrained, so-called (degradation) problem of degenerating Expose out, i.e., with the increase of network depth, the speed that accuracy rate (accuracy) increases can quickly reach saturation, so Just have dropped quickly afterwards.However, caused by such degenerate problem is not due to over-fitting, moreover, suitable to one deep Degree network model, which increases more levels, can make it that training error is higher, the experiment proved that residual error network than non-residual error network science Practise faster and result is more preferable.

Gradient is cut：Ensureing convergence rate quickening simultaneously, the problem of new-gradient blast can be introduced or gradient disappears. It is that very effective manner goes solve the problems, such as blast or gradient disappearance that gradient, which is cut,.It is to pass through ladder that network, which carries out backpropagation, Degree carries out parameter renewal, but is not that directly use obtained each weight gradient progress weight as in the past in the present invention Renewal, but first seeks weight gradient quadratic sum, if gradient quadratic sum is more than gradient and cuts threshold value, need by threshold value with The ratio between gradient quadratic sum obtains zoom factor, finally that all weight gradients and the product of the zoom factor are real as network Gradient is updated weight.So ensure that in an iteration renewal, the quadratic sum of all weight gradients is in a setting In the range of.

The image super-resolution reconstructing method of the present embodiment is come real based on the Open-Source Tools Caffe of deep learning framework It is existing.

The Part I of the image super-resolution reconstructing method of the present embodiment includes：Using RPN convolutional neural networks and Faster-RCNN convolutional neural networks build grader to people, vehicle and other object tertiary targets and in target images Three type objects are positioned and classified.

Referring to Fig. 1, in the present invention, using Faster-RCNN convolutional neural networks, the Faster-RCNN convolution Neutral net includes RPN convolutional neural networks and Fast-RCNN convolutional neural networks.Structure is applied to the class of people's car thing three first The database of object classification, the database form training sample set and test set by the picture with mark and label.Then A RPN convolutional neural networks and a Fast-RCNN convolutional neural networks are respectively trained using the database, and utilize The RPN convolutional neural networks trained obtain the roughing frame on the type objects of people's car thing three in input picture.Finally will be described thick Frame and the band is selected to mark image and be sent into Fast-RCNN convolutional neural networks together and differentiate to output work, according to described The last output vector of Fast-RCNN convolutional neural networks judge the roughing frame on the type objects of people's car thing three whether be Best region.

Especially, what the mark represented is the upper left corner about people's car thing tertiary target subject image region in image Coordinate and bottom right angular coordinate, the label refer to the corresponding label in the input image about the type objects of people's car thing three, In the present invention, the label is respectively 0,1 and 2 corresponding to face, car plate and object.

Further, in specific operation, first with crawlers it is online search for and download 50000 include There is an image of face, car plate and the type objects of object three, and by image normalization to 1000*1000 sizes.RPN volumes is built afterwards Product neutral net and Fast-RCNN convolutional neural networks.The RPN convolutional neural networks and Fast-RCNN convolutional neural networks It is the neutral net of two multilayers, is made up of multiple different types of layers, every layer is made up of multiple two-dimentional planes, each Plane is made up of multiple neurons.In this fact Example, the RPN convolutional neural networks are by 8 Ge Juan basic units and 1 Softmax Layer composition, the Fast-RCNN networks are by 5 Ge Juan basic units, 1 ROIpooling layer, 4 full articulamentums and 1 Softmax Layer composition.

Further, in the RPN convolutional neural networks, before 6 convolutional layers cascade successively, the 7th convolution Layer and the 8th convolutional layer are all connected on the 6th convolutional layer.In the convolutional layer of the RPN networks, preceding 5 convolutional layer conducts Feature extraction layer, the 6th convolutional layer are waited as Feature Mapping layer, the 7th convolutional layer output corresponding to face, car plate and object The recurrence confidence level of frame, the 8th convolutional layer output location parameter for returning frame are selected, Softmax layers enter to confidence level parameter Row normalization.

Further, in the Fast-RCNN convolutional neural networks structure, preceding 5 convolutional layers of network, ROIpooling layers, the first convolutional layer and second convolutional layer cascade up successively, the 3rd convolutional layer and the 4th convolutional layer It is connected respectively on second convolutional layer, the Softmax layers at the Fast-RCNN convolutional Neurals networking are connected to the 3rd volume On lamination.The parameter of the preceding 5 Ge Juan basic units of the RPN convolutional neural networks and the Fast-RCNN convolutional neural networks is total to Enjoy.When handling training set picture, the full articulamentums of the first two of Fast-RCNN convolutional neural networks is made non-to feature Linear transformation, the 3rd full articulamentum output judge the confidence level of classification, and the 4th full articulamentum enters to the roughing frame position Row amendment.

The concrete structure of RPN convolutional neural networks and Fast-RCNN convolutional neural networks is as shown in Figures 2 and 3.Complete , it is necessary to be trained to come to network after into the structure of the RPN convolutional neural networks and Fast-RCNN convolutional neural networks Adjusting parameter.The training step used in the present invention is as follows：

1st, RPN convolutional neural networks are initialized.Average specifically is used as 0, the gaussian random point that variance is 0.1 Cloth initializes to network parameter.

2nd, 50000 training sample pictures with mark and label are inputted into RPN convolutional neural networks, first to sample Each pixel in this picture sets 12 reference frames, carries out the output that propagated forward calculates neutral net, that is, predicts The SmothL1 penalty values of the reference frame of the reference frame and mark of the Softmax penalty values of confidence value and label and prediction.So Afterwards with BP algorithm adjustment network parameter, the Softmax values and SmothL1 is set to reach minimum.

3rd, the RPN networks after training are used on training set sample, obtain training in picture on people's car thing three Class detects the roughing frame of target.

4th, Fast-RCNN convolutional neural networks are used with average for 0, the gaussian random distribution initialization that variance is 0.1, The roughing frame obtained in previous step and training sample picture are inputted into Fast-RCNN convolutional neural networks together, calculate classification and The Softmax penalty values and the pre-selection frame of prediction and the SmothL1 values of callout box of label, the Fast- after being trained RCNN convolutional neural networks models.

5th, re -training RPN convolutional neural networks model, by 5 Ge Juan basic units parameters before RPN convolutional neural networks Practise speed and be arranged to 0, parameter is arranged to the preceding 5 Ge Juan basic units parameter of the Fast-RCNN networks after being trained in step 4, obtains New RPN models.

6th, training set sample inputs new RPN models, obtains new pre-selection frame.

7th, the new pre-selection frame and training set picture are re-entered into Fast-RCNN networks, by Fast-RCNN networks Preceding 5 convolutional layers parameter learning speed be arranged to 0, parameter is arranged to preceding 5 convolution of the RPN models newly obtained in step 5 Layer parameter, re -training obtain new Fast-RCNN models.

Input picture to pass through after Faster-RCNN neural network models, output obtains inputting owner's car thing in picture The corresponding target class label of the position coordinates of tertiary target.Obtaining position coordinates target class corresponding with its After distinguishing label, according to coordinate position of the Target Photo in picture is inputted, Target Photo is cut, then according to mesh The label marked on a map belonging to piece, change of scale is carried out to the picture after cutting, the picture after conversion is met super-resolution model defeated Enter requirement.

The Part II of the image super-resolution reconstructing method of the present embodiment includes：It is real using depth convolutional neural networks Existing face, car plate, object different target super-resolution reconstruction.

The flow of super-resolution reconstruction is carried out as shown in figure 4, input picture is exporting correspondingly after remarkable car analyte detection The class image of people's car thing three and its corresponding to label.The label corresponds to face, car plate, object point in the present embodiment Wei 0,1,2.Subsequent super-resolution model will cut and carries out the picture after change of scale and send respectively respectively according to tag number Enter face, car plate, the super-resolution reconstruction model corresponding to object.

Network architecture is very deep using the network architecture as shown in figure 5, in the present invention, in the same of increase receptive field When, it is slow to also bring along network convergence, and a series of problems such as gradient is degenerated, the mode of residual error study solves this and asked well Topic, does not accelerate network convergence speed closely, while optimize network performance.Different scale picture, parameter are trained on that network It is shared, and network can learn to change between predefined yardstick, and therefore, super-resolution yardstick be not only confined in integer, very Extremely can be to carry decimal point value.

Further, in specific operation, picture is first reduced again low-resolution image that interpolation amplification obtains as Input.For reconstructing human face super resolution database, first by training data, it aligns, and is normalized to the people of 144 × 144 sizes Face picture.Training set shares 421704 facial images after alignment.For car plate super-resolution reconstruction database, training data Obtained by Car license recognition demo, car plate 100000 is opened altogether.For object super-resolution reconstruction database, training data is adopted Schemed with 200 in 90 natural images of Yang et al. and BSD, altogether 291 natural images.For this 291 images by making an uproar Acoustic disturbance, contrast change, upset, mirror image, color change etc. carry out data enhancing, obtain 50000 images.

Especially, it is demonstrated experimentally that if it is intended to arrive more preferable super-resolution efect, data enhancing is to be highly desirable 's.

Further, for the VDSR convolutional neural networks, in addition to first layer and last layer, network model Formed by 20 layers of weightings hierarchy connection, for each convolutional layer parameter setting all same, each layer of convolutional layer there are 64 passages, volume Product core size elects 3 × 3 as, realizes that residual error learns from the mode of jump connection, each layer of convolutional layer selects nonlinear activation letter Number is ReLU, and the first layer operation is data input, and last layer is Image Reconstruction, and using only a kind of wave filter, its size is 3 ×3×64。

Further, loss layer has three inputs to be respectively：Residual error estimation, network inputs (obtain through bicubic interpolation Low resolution figure) and local high resolution graphics.Loss passes through reconstruct image and the Euclidean distance quilt of local high resolution graphics Calculate.

, it is necessary to carry out parameter setting to network after the completion of the network architecture.Parameter adjustment is described as follows in the present invention:

1st, VDSR convolutional neural networks are initialized, netinit is an a matter of great account feelings.But pass The Gaussian Profile initialization of the constant variance of system, so that model is difficult convergence when network deepens.Therefore, present invention choosing Initialized with ' msra' modes.

2nd, the nonoverlapping extraction image block of low-resolution image obtained to input picture after bicubic interpolation amplifies It is input in network and is trained.Convolutional neural networks are trained for input with label training picture, in order to obtain Least disadvantage function, it is trained using the miniature batch downward gradient optimized regression target of backpropagation, so as to adjust net Network parameter.

3rd, due to requiring that network must retain all input details, and exporting is individually created from the feature of study 's.But the present invention possesses deep layer network model, directly predict that this mode end to end of high-definition picture needs network to have There is long-term memory, and network convergence speed can become very slow.Therefore, the present invention solves this by the way of residual error study Problem.

Defining residual image is

R=y-x (4)

Most of pixel value may be 0 or very small numerical value in r.

Object function is expressed as

4th, training can be strengthened by improving learning rate, accelerate convergence rate.But gradient blast can be introduced or gradient disappears Problem, it is very effective to solve the problem by way of cutting gradient, allows weight renewal to be limited in a suitable model Enclose.

5th, in order to realize multiple dimensioned super-resolution reconstruction, training set picture is protected by specifying yardstick to carry out interpolation amplification In the presence of in a big data set.And in training, network parameter is shared.

6th, convolution is carried out with 64 3*3 convolution kernel, center pixel is shifted onto using neighboring pixel, for edge pixel, no Too big region around effectively obtaining, the size of clipped final figure and can not visually meet to require, therefore, rolling up Before product, the size that characteristic pattern is obtained after making convolution with 0 expansion image pixel keeps constant.

7th, using RELU nonlinear activation functions, obtained SGD convergence rate can it is faster than sigmoid/tanh a lot. Compared to sigmoid/tanh, ReLU only needs a threshold value to can be obtained by activation value, a lot of complicated without spending calculation Computing.

Further, in the VDSR convolutional neural networks, in addition to first layer and last layer, the present invention Network model is formed by 20 layers of weightings hierarchy connection, and for each convolutional layer parameter setting all same, each layer of convolutional layer has 64 Passage, convolution kernel size elect 3 × 3 as, realize that residual error learns from the mode of jump connection, each layer of convolutional layer is from non-thread Property activation primitive be ReLU, the first layer operation is data input, compared to SRCNN, becomes big for reconstruct image.Last layer is Image Reconstruction, using only a kind of wave filter, its size is 3 × 3 × 64.

Further, loss layer has three inputs to be respectively：Residual error estimation, network inputs (obtain through bicubic interpolation Low resolution figure) and local high resolution graphics.Loss passes through reconstruct image and the Euclidean distance quilt of local high resolution graphics Calculate.Assuming that network depth is D, then receptive field is sized to (2D+1) × (2D+1), deeper Internet, then has more More text messages.Also, network depth is deeper, income effect can be better.In the present embodiment, D 20.

The announcement and teaching of book according to the above description, those skilled in the art in the invention can also be to above-mentioned implementations Mode carries out appropriate change and modification.Therefore, the invention is not limited in embodiment disclosed and described above, Some modifications and changes of the present invention should also be as falling into the scope of the claims of the present invention.In addition, although this theory Some specific terms are used in bright book, but these terms are merely for convenience of description, do not form any limit to the present invention System.

Claims

A kind of 1. image super-resolution reconstructing method, it is characterised in that including：

(1) joint replaces the RPN neutral nets in training Faster-RCNN neutral nets and Fast-RCNN neutral nets and general Obtained Faster-RCNN neural network models are trained to detect the face in input picture, car plate, object target, it is defeated Go out in the input picture face, car plate, the position coordinates of object and the corresponding label detected；Sat according to described position Mark and label information, face, car plate, object tertiary target picture in input picture are cut, and obtained figure will be cut Piece carries out change of scale, obtains meeting the face of input requirements, car plate, the class low resolution picture of object three；

(2) depth convolutional neural networks are trained for face, car plate, subject image training set respectively, obtain being directed to people Face, car plate, object three's super-resolution reconstruction neural network model；The low resolution picture is input to the super-resolution Reconstructed network model, obtain high-resolution pictures corresponding to the low resolution picture.
2. image super-resolution reconstructing method according to claim 1, it is characterised in that described in step (1) The construction step of Faster-RCNN neural network models includes：

(101a) is respectively different picture library of the target structure with mark and label of face, car plate, three, object as instruction Practice data set and test data set, using Faster-RCNN algorithms, respectively face, car plate and object training one are based on more The RPN convolutional neural networks and a Fast-RCNN convolutional neural networks of individual pre-selection frame, in RPN neutral nets and Fast-RCNN In neutral net, the parameter of preceding 5 convolutional layers is arranged to identical；

(101b) with include RPN convolutional neural networks described in the training data set pair of people, vehicle and object that band marks and Fast-RCNN convolutional neural networks are trained, and then the RPN neutral nets trained are used for training set picture Reason, obtain multiple for face, the pre-selection frame of three different targets of car plate and object；

Training set picture and multiple pre-selection frames for face, car plate and object are sent into what is trained by (101c) simultaneously Fast-RCNN convolutional neural networks, marking is carried out to different pre-selection frames according to the output of Fast-RCNN neutral nets and judges pin Whether the pre-selection frame for face, car plate and object is optimal selection region, if it is, the target-recognition that will preselect frame For corresponding highest target of giving a mark, the final area of face, car plate and object in picture is obtained.
3. image super-resolution reconstructing method according to claim 1, it is characterised in that described to RPN volumes in step (1) What product neutral net and Fast-RCNN convolutional neural networks were trained comprises the following steps that：

(102a) designs RPN convolutional neural networks and Fast-RCNN convolution god according to the Detection task to face, car plate and object Through network, the RPN convolutional neural networks and the Fast-RCNN convolutional neural networks belong to a network structure；

(102b) initializes to RPN convolutional neural networks, and the method used that initializes is in RPN neutral net Parameter average is 0, and the gaussian random parameter that variance is 0.1 is initialized；

(102c) sets the candidate frame of multiple yardsticks and different proportion, input training for each point on input training picture Collect the reference frame in picture, convolutional neural networks train output and the training data of picture for input after being initialized by contrast The original mark of picture, neural network parameter is adjusted using back-propagation algorithm, finally make it that loss function is minimum；

(102d) training convolutional neural networks on all training set pictures, obtain inputting in pictures relative to face, car plate With the thick candidate frame of the type objects of object three, and class label is put on for every a kind of picture；

(102e) equally uses average to Fast-RCNN convolutional neural networks as 0, and the Gauss number that variance is 0.1 is carried out just Beginningization, the input of Fast-RCNN convolutional neural networks is the thick candidate frame of picture obtained in (102d) step, in conjunction with training Collect the mark and label on picture, the Fast-RCNN convolutional neural networks are trained, the convolutional Neural after being trained Networking model；

(102f) re -training RPN convolutional neural networks, the learning rate of the convolutional layer of the specific number of plies before the network is set 0 is set to, parameter uses the volume for passing through certain number before the Fast-RCNN convolutional neural networks that training obtains in (102e) step The parameter of lamination, new RPN convolutional neural networks models are obtained by training；

(102g), using the RPN convolutional neural networks newly trained, retrieves people in training set sample on training set picture The thick candidate frame of face, car plate and article；

(102h) re -training Fast-RCNN convolutional neural networks, will be specific before the Fast-RCNN convolutional neural networks The parameter of the convolutional layer of number is arranged to 0, and parameter uses to be obtained before RPN convolutional neural networks in (102g) step by training The parameter of the layer convolutional layer of certain number, input training set sample and the thick candidate frame in (102g), training obtain new Fast-RCNN convolutional neural networks.
4. image super-resolution reconstructing method according to claim 3, it is characterised in that the RPN convolutional neural networks It is the neutral net of two multilayers with Fast-RCNN convolutional neural networks, is made up of multiple different types of layers, every layer by multiple The plane composition of two dimension, each plane are made up of multiple neurons.
5. image super-resolution reconstructing method according to claim 4, it is characterised in that the RPN convolutional neural networks It is made up of 8 Ge Juan basic units and 1 Softmax layer, the Fast-RCNN networks are by 5 Ge Juan basic units, 1 ROIpooling layers, 4 Individual full articulamentum and 1 Softmax layers composition.
6. image super-resolution reconstructing method according to claim 5, it is characterised in that in the RPN convolutional Neurals net In network, before 6 convolutional layers cascade successively, the 7th convolutional layer and the 8th convolutional layer are all connected on the 6th convolutional layer, in institute In the convolutional layer for stating RPN networks, preceding 5 convolutional layers are as feature extraction layer, and the 6th convolutional layer is as Feature Mapping layer, the 7th Convolutional layer output corresponds to the recurrence confidence level of face, car plate and object candidate frame, and the 8th convolutional layer exports the recurrence frame Confidence level parameter is normalized for location parameter, Softmax layers.
7. image super-resolution reconstructing method according to claim 5, it is characterised in that in the Fast-RCNN convolution In neural network structure, preceding 5 convolutional layers, ROIpooling layers, the first convolutional layer and second convolutional layer of network level successively Connection gets up, and the 3rd convolutional layer and the 4th convolutional layer are connected respectively on second convolutional layer, the Fast-RCNN convolution god Softmax layers through networking are connected on the 3rd convolutional layer.
8. image super-resolution reconstructing method according to claim 5, it is characterised in that the RPN convolutional neural networks With the parameter sharing of 5 Ge Juan basic units before the Fast-RCNN convolutional neural networks, when handling training set picture, The full articulamentum of the first two of Fast-RCNN convolutional neural networks makees nonlinear transformation to feature, and the 3rd full articulamentum output is sentenced The confidence level of disconnected classification, the 4th full articulamentum are modified to the roughing frame position.
9. image super-resolution reconstructing method according to claim 1, it is characterised in that in step (2), rolled up to depth The parameter adjustment that product neutral net is trained includes：

(201) the nonoverlapping extraction image block of low-resolution image obtained input picture after bicubic interpolation amplifies is defeated Enter to network, be trained using the miniature batch downward gradient optimized regression target of backpropagation, adjust network parameter；

(202) training set is givenX represents the low-resolution image obtained by bicubic interpolation, and y is represented High-definition picture, object function are expressed as

<mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>y</mi> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

(203) defining residual image is

R=y-x (2)

Most of pixel value may be 0 or very small numerical value in r.

Object function is expressed as

<mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

R is target residual figure, and f is neural network forecast, and x is the low resolution figure obtained by bicubic interpolation, last super-resolution As a result it is expressed as f (x)+x；

(204) weight renewal is allowed to be limited in a suitable scope by way of cutting gradient；

(205) training set picture is stored in a big data set by specifying yardstick to carry out interpolation amplification, during training, Network parameter is shared；

(206) convolution collecting image carries out convolution, and center pixel is shifted onto using neighboring pixel, before convolution, expands image with 0 Pixel makes the size for obtaining characteristic pattern after convolution keep constant；

(207) ReLU nonlinear activation functions, obtained SGD convergence rate are used.