CN108764250A

CN108764250A - A method of extracting essential image with convolutional neural networks

Info

Publication number: CN108764250A
Application number: CN201810407424.8A
Authority: CN
Inventors: 蒋晓悦; 冯晓毅; 李会方; 吴俊�; 何贵青; 谢红梅; 夏召强
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2018-05-02
Filing date: 2018-05-02
Publication date: 2018-11-06
Anticipated expiration: 2038-05-02
Also published as: CN108764250B

Abstract

The present invention provides a kind of methods for extracting essential image with convolutional neural networks.First, a double-current convolutional network with parallel construction from image to image is built；Then, it is trained using specific training data set pair network, optimizes network parameter, to extract the multilayer feature with environment invariance, directly reconstruct essential image (reflectogram and illumination pattern).There is powerful ability in feature extraction as a result of the double-current convolutional neural networks based on deep learning the Theory Construction, reflectogram and illumination pattern can be directly isolated from original image.Simultaneously, the model is a kind of full convolutional network model from image to image, including Liang Ge branches flow to, it is respectively used to generate illumination pattern and reflectogram, and result of the convolution results of higher level after deconvolution operation is combined by the network structure, the reconstructed error that illumination pattern and reflectogram can be reduced to a certain extent improves the ability of network characterization reconstruct.

Description

A method of extracting essential image with convolutional neural networks

Technical field

The invention belongs to technical field of image processing, and in particular to a kind of side extracting essential image with convolutional neural networks Method.

Background technology

The understanding of image and analysis are most basic one of the tasks of image procossing.Since image imaging process is by many factors Such as joint effect of target object self character, shooting environmental and collecting device condition so that image processing process needs Fully consider the interference of factors, such as the transformation of the discontinuity of shade, color, targeted attitude.These changing factors pair Image processing algorithm brings larger challenge so that existing image analysis algorithm under complex environment performance by larger It influences.Therefore, the robustness for image analysis algorithm how being improved under complex environment has become research hotspot in recent years.It is true On, if it is possible to it is based on existing observed image, the substantive characteristics in image is analyzed, image analysis can be solved well The above problem encountered in process.Substantive characteristics refers to the intrinsic feature of the target object unrelated with ambient enviroment, i.e. object The style characteristic of reflection characteristic (the including the information such as color, texture) and object of body.Although for target object, the two Inherent feature itself will not change with the variation of ambient enviroment, still, the observed image of collected target object But it can be influenced by environment.Therefore, if the substantive characteristics of object can be analyzed directly from observed image, object has just been extracted The information such as intrinsic shape, color, the texture of body eliminate influence of the environmental change to image, to realize to image more Accurately understand, while also providing relatively reliable information base to further realize the image analysis of robust in complex environment Plinth.

Current existing algorithm can be divided into three classes according to the mode of extraction target substantive characteristics：A kind of algorithm is implicit Substantive characteristics parser, i.e., by algorithm for pattern recognition, to target object multi-modal, (target object is in different illumination conditions And the presentation under different postures) learnt.During study, there is no the various presentations of special consideration for such algorithm Between inner link, but directly to various observed results carry out pattern analysis, to attempt to obtain the target feature sky Between in distribution function.The serious problems that such algorithm encounters are exactly the popularization problem of goal description function.Namely It says, the distribution of training sample drastically influences the distribution function for finally learning to obtain.Exist if learning sample is target object Image under single illumination conditions or posture, then the result of training study is difficult to be generalized to target object in different illumination conditions Or on the image under new posture.Therefore, incomplete in sample, such algorithm is difficult to be generalized to target object to exist Situation in various complex environments.Another kind of algorithm is explicit substantive characteristics parser.Such algorithm can be according to object Idea's analysis of the body under different conditions inner link therein.Compared with first kind implicit algorithm, such algorithm is according to object Image-forming principle and the priori of reflectance signature and shape are managed, analysis directly is made to the reflectance signature of object and shape.From And image of target object in the state of new can directly be calculated according to these inherent features.Therefore, by explicit The obtained result of parser it is more accurate, while also having more generalization.However, such algorithm generally use is based on knot Estimation to substantive characteristics is realized in the constraints such as structure, texture, color, that is, follows the theoretical frame of Retinex by Modulation recognition problem It is converted into energy-optimised problem, and completes to calculate under single scale and analyze.Thus, precision of analysis is largely Dependent on the performance of optimization algorithm, simultaneously because it is impossible to ensure that the process that solves of the convexity of function to be optimized established often Local minimum can be absorbed in and optimal solution can not be acquired, or require initialization step will be as possible close to optimal solution.These are all limited The performance of such algorithm is made.Also a kind of algorithm is passed through using the essential image of neural network extraction based on deep learning One convolutional neural networks of training, directly predict essential image from a width RGB image.But such existing algorithm network knot Structure is fairly simple, and training set is by the artificial image of computer graphical software process quality, so the essential image of extraction is not It is to be apparent from, when it being especially applied to natural image.

Invention content

In order to overcome existing implicitly and explicitly substantive characteristics parser ability in feature extraction insufficient, and it is based on depth The problems such as neural network algorithm is mainly for artificial image is practised, the present invention provides a kind of with convolutional neural networks extraction essence figure The method of picture.Build a double-current convolutional network with parallel construction from image to image first, then to the network into Row training, optimizes network parameter, to extract the multilayer feature with environment invariance, directly reconstructs essential image (reflectogram With illumination pattern).Using multiflow construction, on the one hand task can be made to detach, different features is extracted in different shuntings；Another party Face, the two restrictive condition each other, can improve arithmetic accuracy.

A method of extracting essential image with convolutional neural networks, it is characterised in that steps are as follows：

Step 1：The double-current convolutional neural networks structural model with parallel construction is built, which is divided into a public affairs There are branch, two proprietary branches.

Wherein, publicly-owned branch is made of 5 convolutional layers, and each convolutional layer is followed by a pond layer.The convolution kernel of convolutional layer It is 3 × 3, each layer of one width characteristic image of output, it is 64 that the first convolutional layer, which exports characteristic image dimension, the output of the second convolutional layer Characteristic image dimension is 128, and it is 256 that third convolutional layer, which exports characteristic image dimension, Volume Four lamination and the output of the 5th convolutional layer Characteristic image dimension is 512, pond layer use size for 2 × 2 average pond.

Two proprietary branched structures are identical, separately include 3 warp laminations, and convolution kernel is 4 × 4, and a branch is used for Light image is reconstructed, for reconstructing reflected image, the output dimension of all warp laminations is 256 for another branch.

The characteristic image of the third convolutional layer output of the publicly-owned branch is defeated with the second warp lamination of proprietary branch Go out the input of the third warp lamination as proprietary branch；The publicly-owned branch Volume Four lamination output characteristic image with Input of the output of first warp lamination of proprietary branch as the second warp lamination of proprietary branch.

Step 2：Training dataset is built, is cut by the middle part of every piece image of Jiang et al. BOLD data sets created It is 1280 × 1280 image to take size, and respectively by five decile of interception image in row and column, then each in original data set Width image obtains the image that 25 width sizes are 256 × 256, randomly selects 53720 groups of image construction test sets therein, residual graph As composing training collection.

Step 3：The training set obtained using step 2 is trained the double-current convolutional neural networks that step 1 is built, first Random initializtion is carried out to the weights of each layer of network, then using the training method for the error back propagation for having supervision, to network It is trained, obtains trained network.Wherein, the basic learning rate of network is 10^-13, using fixed learning rate strategy, network Batch size be 5, loss function SoftmaxwithLoss, network convergence condition is the loss letter of front and back iteration twice The difference of numerical value is in ± 5% range of its value.

Step 4：The test set obtained in step 2 is handled using trained network, the essence figure extracted Picture, i.e. illumination pattern and reflectogram.

The present invention is also enterprising in essential image common data sets MIT Intrinsic Images dataset by this method Go test, the results showed that, this method still has validity.

The beneficial effects of the invention are as follows：As a result of the technology road of the essential image zooming-out based on deep learning theory Line can be directly from original image with the powerful ability in feature extraction of the neural network based on deep learning the Theory Construction Isolate reflectogram and illumination pattern.Further it is proposed that double-current convolutional neural networks be a kind of complete from image to image Convolutional network model is respectively used to generate illumination pattern and reflectogram including Liang Ge branches flow to；Also, the network structure will be compared with Result of the high-rise convolution results after deconvolution operation is combined, and can enhance the details of characteristic pattern after deconvolution operation, The reconstructed error for reducing illumination pattern and reflectogram to a certain extent, improves the ability of network characterization reconstruct.

Description of the drawings

Fig. 1 is a kind of method flow diagram extracting essential image with convolutional neural networks of the present invention

Fig. 2 is the double-current convolutional neural networks structure chart that the present invention is built

Fig. 3 is the data set parts of images example that the present invention is built

Specific implementation mode

Present invention will be further explained below with reference to the attached drawings and examples, and the present invention includes but are not limited to following implementations Example.

The present invention provides a kind of methods for extracting essential image with convolutional neural networks, as shown in Figure 1, main process It is as follows：

1, double-current convolutional neural networks structural model of the structure with parallel construction

The restructuring procedure of image is actually to assign different weights to the feature extracted from image, and by same type Feature combines to complete to reconstruct the target of illumination pattern and reflectogram from original image.In other words, institute spy in need Sign is all present in the same original image.Therefore characteristic extraction part can be shared, and two distinct types of essential image Reconstruct then need separate completion.Thus, the network that the present invention is built is divided into two parts of public branch and proprietary branch.It is passing through After crossing the convolution algorithm of publicly-owned branch, the characteristic pattern size of each layer output gradually reduces.In order to make input picture and output image Same size is kept on space structure, is devised three warp laminations in two proprietary branches respectively, is made the sky of characteristic pattern Between size be gradually restored to original size.It is inspired by residual error network structure, the present invention is had found during the experiment by publicly-owned point Behind in branch two layers in proprietary branch behind two layers it is combined, network parameter can be made to obtain better effect of optimization. Based on the above reason, the present invention constructs the double-current convolutional neural networks structure with parallel construction as shown in Figure 2.The network Model is divided into a publicly-owned branch, two proprietary branches.

Wherein, publicly-owned branch is made of 5 convolutional layers, and each convolutional layer is followed by a pond layer.The convolution kernel of convolutional layer It is 3 × 3, each layer of one width characteristic image of output, it is 64 that the first convolutional layer, which exports characteristic image dimension, the output of the second convolutional layer Characteristic image dimension is 128, and it is 256 that third convolutional layer, which exports characteristic image dimension, Volume Four lamination and the output of the 5th convolutional layer Characteristic image dimension is 512, pond layer use size for 2 × 2 average pond.Two proprietary branched structures are identical, respectively Including 3 warp laminations, convolution kernel is 4 × 4, and for reconstructing light image, another branch is anti-for reconstructing for a branch Image is penetrated, the output dimension of all warp laminations is 256.Also, the characteristic image of the third convolutional layer output of publicly-owned branch The input of output with the second warp lamination of proprietary branch collectively as the third warp lamination of proprietary branch；Publicly-owned branch The characteristic image of Volume Four lamination output is with the output of the first warp lamination of proprietary branch collectively as the second of proprietary branch The input of warp lamination.

2, data set is built

Network structure proposed by the invention is more complicated, needs trained network parameter more.In order to make network play Its optimal performance, BOLD data sets (Jiang X Y, Schofield AJ, the Wyatt J that the present invention is created in Jiang et al. L.Correlation-Based Intrinsic Image Extraction from a Single Image[C] .European Conference on Computer Vision,2010:One is constructed on the basis of 58-71) for studying The data set of essential image zooming-out algorithm.The data set includes 268,600 groups of pictures, every group of picture include an original image, One illumination pattern and a reflectogram.53,720 groups of composition test sets are therefrom extracted at random, for testing essential image zooming-out Algorithm performance.Remaining 214,880 groups of composing training collection, for training deep learning neural network.BOLD databases include big High-resolution image group is measured, they are all the objects shot under the lighting condition adjusted meticulously, include mainly various multiple Miscellaneous decorative pattern, face and Outdoor Scene, Fig. 3 give data set parts of images example.Jiang et al. builds the purpose of the database It is to provide a test platform for image processing algorithm.Specifically, including mainly essential image zooming-out algorithm, removing illumination algorithm With light source algorithm for estimating etc..For this purpose, illumination condition figure and body surface figure, i.e. illumination pattern and reflectogram are they provided, and All it is the standard rgb color space picture with linear luminance characteristic.The present invention is complicated from picture number, picture quality and scene The various aspects such as degree consider, and final decision selection goes structure for this by intricate detail based on the picture group of reference object The data set of research, original image have 1280 pixels in each dimension transverse direction, there is 1350 pixels on longitudinal direction, for Data volume is excessively huge for common computer, and the problem of be easy to cause study, this is unfavorable for deep learning nerve very much The training of network.There are one apparent features for the image category tool that the present invention chooses：Key message concentrates in the middle part of image.Cause This, portion chooses one 1280 × 1280 feature frame to the present invention in the picture, intercepts original image, then in row and column respectively By five decile of image.In this way, an original image is segmented into 25 256 × 256 smaller images.It is right in this way Original image is cut, and the key message in original image is remained, and realizes data using maximization, while being also originally to grind Study carefully and provides a variety of conveniences：The data volume of every group of picture is moderate, to computer performance without too high request；Relatively reasonable Image size designs more convenient when convolutional neural networks；Image after shearing includes positive negative sample simultaneously, can be in certain journey Over-fitting is avoided on degree.

3, network training

The present embodiment is under Caffe frames using constructed to train based on the training set in the data set that BOLD is created Deconvolution neural network.It is compared with other frames, Caffe frames not only simple installation, but also all operating systems of support is also right Python and Matlab has good interface to support.Since constructed network structure is relatively complicated, the data learnt are needed To measure larger, it is more that network needs the number of iteration also to compare, and also to avoid e-learning is too fast from missing optimal solution, so During being trained to network, the present invention determines basic learning rate being set as 10 by repetition test^-13, learning rate plan Slightly it is set as " fixed ", i.e., fixed learning rate.It is too fast also for network convergence is avoided in view of computer performance, network Batch size are set to 5, loss function SoftmaxwithLoss.

Loss function is used for calculating increase of the difference between output result and true tag with iterations, network damage Lose smaller and smaller, i.e., estimated result becomes closer to true tag.Loss function in single dimension can be write as：

Wherein, { (x¹,y¹),...,(x^m,y^m) indicate the good training data of m group echos_{, x}Indicate that input, y indicate corresponding Label, and y ∈ [0,255].1 { F } is indicator function, and functional value is 1 when F is genuine when be vacation, and functional value is 0, θ Indicate the parameter of convolutional neural networks.In the training process, estimate that the error between image and true tag is backward propagated to In neural network to optimize its parameter, so that error is gradually reduced.For RGB image, loss function is above-mentioned loss function The sum of error in tri- dimensions of image R, G, B.

About after iteration 210,000 time, difference is floated in ± 5% range before and after loss function value, and network is gradually received It holds back, i.e., tends to be optimal in Exist Network Structure lower network parameter, the ability of the essential image of network extraction tends to be best, is trained Good network.Although although two proprietary branches look the same in structure, due to being supplied to their true tag not Together, during network training, they learn obtained network parameter also can be different.Thus, extracting essence using network When image, they also have corresponding difference to the operation of data so that different branches can extract different types of Essential image.

4, essential image is extracted with trained network

The test set for including in the data set established in step 2 is handled using trained network, i.e., it will wherein Including RGB pictures be converted into three-dimensional matrice, as the input of network, the essence extracted after the multilayer operation of network Image, i.e. illumination pattern and reflectogram.The present invention is also by this method in essential image common data sets MIT Intrinsic It is tested on Images dataset, the results showed that, this method still has validity.

Claims

1. a kind of method for extracting essential image with convolutional neural networks, it is characterised in that steps are as follows：

Step 1：The double-current convolutional neural networks structural model with parallel construction is built, which is divided into one publicly-owned point Branch, two proprietary branches；

Wherein, publicly-owned branch is made of 5 convolutional layers, and each convolutional layer is followed by a pond layer；The convolution kernel of convolutional layer is 3 × 3, each layer of one width characteristic image of output, it is 64 that the first convolutional layer, which exports characteristic image dimension, and the second convolutional layer exports feature Image dimension is 128, and it is 256 that third convolutional layer, which exports characteristic image dimension, and Volume Four lamination and the 5th convolutional layer export feature Image dimension is 512, pond layer use size for 2 × 2 average pond；

Two proprietary branched structures are identical, separately include 3 warp laminations, and convolution kernel is 4 × 4, and a branch is for reconstructing Light image, for reconstructing reflected image, the output dimension of all warp laminations is 256 for another branch；

The characteristic image of third convolutional layer output of the publicly-owned branch is made with the output of the second warp lamination of proprietary branch For the input of the third warp lamination of proprietary branch；The publicly-owned branch Volume Four lamination output characteristic image with it is proprietary Input of the output of first warp lamination of branch as the second warp lamination of proprietary branch；

Step 2：Training dataset is built, is intercepted by the middle part of every piece image of Jiang et al. BOLD data sets created big The small image for being 1280 × 1280, and respectively by five decile of interception image in row and column, then each width figure in original data set The image for being 256 × 256 as obtaining 25 width sizes, randomly selects 53720 groups of image construction test sets therein, residual image structure At training set；

Step 3：The training set obtained using step 2 is trained the double-current convolutional neural networks that step 1 is built, first to net The weights of each layer of network carry out random initializtion, then using the training method for the error back propagation for having supervision, are carried out to network Training, obtains trained network；Wherein, the basic learning rate of network is 10^-13, using fixed learning rate strategy, network Batch size are 5, loss function SoftmaxwithLoss, and network convergence condition is the loss function of front and back iteration twice The difference of value is in ± 5% range of its value；

Step 4：The test set that step 2 obtains is handled using trained network, the essential image extracted, i.e. light According to figure and reflectogram.