CN106339753A

CN106339753A - Method for effectively enhancing robustness of convolutional neural network

Info

Publication number: CN106339753A
Application number: CN201610682828.9A
Authority: CN
Inventors: 田新梅; 沈旭; 何岸峰; 孙韶言; 陶大程
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2016-08-17
Filing date: 2016-08-17
Publication date: 2017-01-18

Abstract

The invention discloses a method for effectively enhancing the robustness of a convolutional neural network. The method comprises the steps that random two-dimensional transformation is performed on an input characteristic graph in forward transmission and then convolutional operation is performed, and finally the prediction value of the classification result of the input characteristic graph is obtained; backward transmission is performed on the error between the obtained prediction value and a real value in backward transmission, the corresponding gradient of the parameters of each layer is calculated and then the parameters are updated along the reverse direction of the gradient; the forward transmission process and the backward transmission process are repeated for the preset number of times and then a model of the optimal parameters is obtained; and the model of the optimal parameters is tested in testing so that the robustness of the convolutional neural network can be enhanced. When the method is used for image classification and retrieval, layer-by-layer random two-dimensional transformation is introduced in the training process with no requirement for introducing a new characteristic extraction module, new parameters or performing extra processing on the input image so that the robustness of the convolutional neural network can be effectively enhanced and the complexity of image processing can be reduced.

Description

A kind of method of effective lifting convolutional neural networks robustness

Technical field

The present invention relates to image classification, image retrieval technologies field, more particularly, to a kind of effectively lifting convolutional neural networks The method of robustness.

Background technology

In today of the Internet high speed development, the especially popularization of image/video so that we be all the time required for into Row image recognition and retrieval.Depth learning technology achieved breakthrough progress in the association area of image recognition in the last few years, Greatly surmount the performance of traditional algorithm, the accuracy of identification is greatly improved.Carry out figure in deep learning The model adopting as identification is main is convolutional neural networks, and this model mainly contains convolution and two operations of pondization, by this A deep neural network being successively superimposed is built in two kinds of operations, realizes from local to the overall situation, specific to abstract successively semanteme Extract, the higher level of abstraction semantic feature finally giving is highly useful for the image recognition inter-related task such as image classification and retrieval.

Existing convolutional neural networks structure is not especially sane for the conversion of image, and such as, image is through rotating, putting down After the primarily two-dimensional conversion such as moving, scale, then carry out feature representation by convolutional neural networks, we can find that it extracts High-level characteristic difference can be very big, directly result in recognition accuracy and robustness drastically decline.

In order to improve the robustness that convolutional neural networks convert for image, particularly than large scale, overall situation change Change, existing algorithm mainly has three kinds.The first is in the training process, artificially image is carried out multiple different conversion with Produce more training samples, then the sample after artificial conversion is inputted convolutional neural networks together plus original sample It is trained, so, increased the multiformity of sample so that convolutional neural networks have more fully for the conversion of image Study, natural energy brings the lifting of robustness.Second method is the output of each layer of convolutional layer for convolutional neural networks (we term it characteristic pattern) carries out the scaling of multiple yardsticks or the rotation of multi-angle, the result then producing these conversion Carry out synthesis, be then further continued for passing to next layer.The third method is inputting an image into before convolutional neural networks go, Learn the reasonable conversion of image first with the special neutral net of another one, image is first carried out according to the conversion acquired inverse Map function, be allowed to be in one more normal, be more prone on the change of scale that distinguishes, so also can obtain carrying of effect Rise.Above-mentioned three kinds of algorithms, during learning model, in order to lift robustness, or introducing new characteristic extracting module, Increase new learning parameter, or needing to do extra process to the image of input, this allows at large-scale image In reason, complexity is especially high, and when the model training is applied in new problem, its generalization ability also can be subject to Impact.

Content of the invention

It is an object of the invention to provide a kind of method of effective lifting convolutional neural networks robustness, can effectively lift volume Long-pending neutral net robustness, and reduce the complexity of image procossing.

The purpose of the present invention is achieved through the following technical solutions:

A kind of method of effective lifting convolutional neural networks robustness, comprising:

In forward direction transmittance process, random two-dimensional transform is carried out to input feature vector figure and carries out convolution operation again, finally give The predictive value of the classification results of input feature vector figure；

During back transfer, the error between the predictive value obtaining and actual value is carried out back transfer, calculates The corresponding gradient of parameter of each layer, then parameter is updated along the opposite direction of gradient；

Repeat above-mentioned forward direction transmittance process with back transfer process until after pre-determined number, obtaining the model of optimal parameter；

In test process, the model of optimal parameter is tested, thus lifting convolutional neural networks robustness.

Described input feature vector figure carried out with random two-dimensional transform carry out convolution operation again including:

If random two-dimensional transform isThen random two-dimensional transform is carried out to the characteristic pattern x of inputIt is expressed as:

Carry out convolution operation again, obtain characteristic pattern y, its expression formula is as follows:

y = f ((w * \hat{x}) + b);

Wherein, * represents convolution operation, and w represents convolution kernel, and b represents the bias variable of characteristic pattern y.

The predictive value of the described classification results finally giving input feature vector figure includes:

If comprising five convolutional layer c1～c5 being sequentially connected in convolutional neural networks, and three full articulamentum fc6～ fc8；Then in each convolutional layer, input feature vector figure is carried out with random two-dimensional transform and carries out convolution operation, wherein previous convolution again The characteristic pattern of layer output is as the input feature vector figure of next convolutional layer；After the characteristic pattern of last convolutional layer output is transformed It is drawn into a characteristic vector, and is input to full articulamentum fc6, finally exported the vector of a p dimension by full articulamentum fc8, often The size of one-dimensional vector represents the probability size that input feature vector figure belongs to respective classes；Take the maximum classification of probit as input The predictive value of characteristic pattern.

Described random two-dimensional transform is 3 × 3 transformation matrix, and its expression formula is as follows:

Wherein, d_x、d_yIt is respectively the quantity of the pixel that x, y side offset up, s_x、s_yIt is respectively the scaling on x, y direction Ratio, θ is the anglec of rotation；θ、d_x、d_y、s_x、s_yObedience average is μ, and variance is σ²Gauss distribution.

Using x^l-1Represent the vector of a length of m that the output characteristic figure of l-1 layer launches, if through two dimension at random ConversionAfterwards, output is changed into the vector of n dimension, then be equivalent to x^l-1Coefficient matrix tx after one interpolation of premultiplication^l-1, pass through Convolution kernel w is launched into the form of toeplitz matrix, then obtains convolution operation and be expressed as:

z^l=toep (w) (tx^l-1)+b^l；

x^l=f (z^l)

Wherein, b^lFor the bias variable of l layer, t is to convert corresponding coefficient matrix, and w is convolution kernel, z^lFor the input of l layer Linear combination result, f be nonlinear mapping function, x^lOutput characteristic figure for l layer；

After obtaining the output characteristic figure of l layer, by the error delta between l+1 layer predictive value and actual value^l+1Pass to The δ of l layer^l, calculate the corresponding gradient of parameter of each layer, then parameter be updated along the opposite direction of gradient, its table It is shown as:

δ^l=t^ttoep(w)^tf'(z^l)δ^l+1.

As seen from the above technical solution provided by the invention, when the program is used for image classification and retrieval, Jin Jinxu To introduce random two-dimensional transform successively in the training process, new characteristic extracting module, new parameter or right need not be introduced Input picture does extra process, thus effectively lifting convolutional neural networks robustness, and reduces the complexity of image procossing.

Brief description

In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be to required use in embodiment description Accompanying drawing be briefly described it should be apparent that, drawings in the following description are only some embodiments of the present invention, for this For the those of ordinary skill in field, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.

Fig. 1 is the convolutional neural networks structural representation being applied to image classification provided in an embodiment of the present invention；

Fig. 2 is the schematic diagram of traditional convolution operation provided in an embodiment of the present invention and convolution operation of the present invention；

Fig. 3 is the experimental result on mnist data set provided in an embodiment of the present invention；

Fig. 4 is the experimental result on ilsvrc-2012 data set provided in an embodiment of the present invention；

Fig. 5 is the experimental result on uk-bench data set provided in an embodiment of the present invention.

Specific embodiment

With reference to the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on this Bright embodiment, the every other enforcement that those of ordinary skill in the art are obtained under the premise of not making creative work Example, broadly falls into protection scope of the present invention.

The embodiment of the present invention provides a kind of method of effective lifting convolutional neural networks robustness, and the method is mainly in volume The characteristic pattern in the learning process of long-pending neutral net, each layer of convolution being obtained enters row stochastic two-dimensional transform, main inclusion rotation Turn, translate, scaling etc..

Convolutional neural networks (cnn) are a kind of deep neural network structures of multilamellar, and each layer of convolutional layer passes through feature Extract operator to be applied on the characteristic pattern that last layer obtains, and learn to various various level character representations；As Fig. 1 institute Show, c_iThe output representing obtained by i-th layer of convolution is characteristic pattern.fc_iRepresent the output of i-th layer of full articulamentum, last layer of fc8 Can be used for the input sample classification task of 1000 classes for 1000 dimensions.

Usually, each layer of convolutional layer is carried out according to the weight of convolution to the fritter of each of input feature vector figure local Then linear combination obtain one of output characteristic point through nonlinear mapping again.And the output of last layer is (in Fig. 1 Fc8 grader) will be imported into or return the device corresponding object function of calculating, then use gradient descent method again to network Parameter is adjusted.

The core operation of convolutional layer is each of the output atom atom adjacent with local one fritter of last layer It is connected, we term it local receptor field mechanism.The weight connecting, we term it convolution kernel.With a convolution kernel to defeated Enter to carry out output (2d) that convolution obtains we term it characteristic pattern.In the characteristic pattern of same input, different local fritter is in volume In long-pending operation, corresponding weight is identical, we term it weights are shared.Convolution operation calculating process can represent As follows:

x_{j}^{l} = f ((w_{j}^{l} * x^{l - 1}) + b_{j}^{l})

Wherein, * represents convolution operation, x^l-1It is input feature vector figure,Represent j-th convolution kernel,Represent j-th input The bias variable of characteristic pattern, both need to be learnt by gradient descent method afterwards.

In the embodiment of the present invention, in order to make convolutional layer more sane to the conversion of input, in learning process, calculate every time All the input feature vector figure of convolutional layer is entered with row stochastic conversion (main inclusion translation, scaling, rotation) first, then will become at random The result changed carries out convolution and is exported.That compares novelty with traditional convolutional layer introduces random operation, such random Operation make in each iterative learning procedure of model, can not specifically know the corresponding yardstick of input feature vector figure, skew, The situation of rotation, thus its feature of learning out is also just lower to the dependency of Input transformation accordingly, from obtained from defeated Go out characteristic pattern also just more sane to the conversion inputting.More intuitively compare and ask for an interview Fig. 2, Fig. 2 a is traditional convolutional layer, Fig. 2 b is The convolutional layer that the scheme that the present invention is implemented proposes, before being input to next layer network, is carried out to output characteristic figure first at random Map function.

In the embodiment of the present invention, the described characteristic pattern that each layer of convolution is obtained enters row stochastic two-dimensional transform mainly to be had Three types: translation, rotation and scaling.If before and after conversion in image the coordinate of pixel be expressed as (x, y) and (x', y').

Then the computing formula of translation transformation is:

(x^{'}, y^{'}, 1) = (x, y, 1) (\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ d_{x} & d_{y} & 1 \end{matrix});

Scaling computing formula is:

(x^{'}, y^{'}, 1) = (x, y, 1) (\begin{matrix} s_{x} & 0 & 0 \\ 0 & s_{y} & 0 \\ 0 & 0 & 1 \end{matrix});

The computing formula of rotation is:

(x^{'}, y^{'}, 1) = (x, y, 1) (\begin{matrix} c o s θ & s i n θ & 0 \\ - s i n θ & c o s θ & 0 \\ 0 & 0 & 1 \end{matrix});

Wherein, d_x、d_yIt is respectively the quantity of the pixel that x, y side offset up, s_x、s_yIt is respectively the scaling on x, y direction Ratio, θ is the anglec of rotation.

Above-mentioned translation, scaling and rotation are applied to image simultaneously, then can be calculated in the following way:

(x^{'}, y^{'}, 1) = (x, y, 1) (\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ d_{x} & d_{y} & 1 \end{matrix}) (\begin{matrix} s_{x} & 0 & 0 \\ 0 & s_{y} & 0 \\ 0 & 0 & 1 \end{matrix}) (\begin{matrix} c o s θ & s i n θ & 0 \\ - s i n θ & c o s θ & 0 \\ 0 & 0 & 1 \end{matrix}) .

In the embodiment of the present invention, to input signal transmission and reverse error propagation process before mainly comprising in training process, Three aspects of test below for forward direction transmission, back transfer and model are introduced.

1st, in forward direction transmittance process, input feature vector figure is carried out with random two-dimensional transform and carries out convolution operation again, final Predictive value to the classification results of input feature vector figure.

Assume that random two-dimensional transform isThe characteristic pattern x then inputting carries out random two-dimensional transformIt is expressed as:

y = f ((w * \hat{x}) + b);

In formula, d_x、d_yIt is respectively the quantity of the pixel that x, y side offset up, s_x、s_yIt is respectively the scaling on x, y direction Ratio, θ is the anglec of rotation.

Wherein,Its implication is θ, d_x、d_y、s_x、 s_yAll obeying average is μ, and variance is σ²Gauss distribution；D during forward direction computing each time_x、d_y、s_x、s_y, θ all passes through Above-mentioned Gaussian process randomly generates.

In the embodiment of the present invention, if comprising as shown in Figure 1 in convolutional neural networks, five being sequentially connected convolutional layer c1 ～c5, and three full articulamentum fc6～fc8；Then in each convolutional layer, random two-dimensional transform is carried out again to input feature vector figure Carry out convolution operation, the characteristic pattern y that wherein previous convolutional layer exports is as the input feature vector figure of next convolutional layer；By The transformed post-tensioning of input feature vector figure after five layers of convolutional layer is stretched into a characteristic vector and then inputs full articulamentum fc6, Quan Lian Meet a layer fc7, full articulamentum fc8.The output result of fc8 is the vector of 1 p dimension (such as 1000 dimension), the size of every one-dimensional vector Representing input images belong to the probability size of the category, take the maximum classification of probit as the predictive value of input picture.

2nd, during back transfer, the error between the predictive value obtaining and actual value is carried out back transfer, calculate Go out the corresponding gradient of parameter of each layer, then parameter is updated along the opposite direction of gradient.

Using x^l-1Represent that (m=h × w × c, h are special for the vector of a length of m that the output characteristic figure of l-1 layer launches Levy the height of figure, w is the width of characteristic pattern, c is the number of characteristic pattern).

If through random two-dimensional transformAfterwards, output is changed into the vector of n dimension, then be equivalent to x^l-1Premultiplication one is inserted Size after value is m × n coefficient matrix tx^l-1, particularly, if if bilinear interpolation, every a line of t both corresponds to 4 The interpolation coefficient of individual non-zero.

By convolution kernel w being launched into the form of toeplitz matrix, then obtain convolution operation and be expressed as:

z^l=toep (w) (tx^l-1)+b^l；

x^l=f (z^l)；

After obtaining the output characteristic figure of l layer, can be by the error delta between l+1 layer predictive value and actual value^l+1Pass Pass the δ of l layer^l, calculate the corresponding gradient of parameter of each layer, then parameter be updated along the opposite direction of gradient, It is expressed as:

δ^l=t^ttoep(w)^tf'(z^l)δ^l+1.

3rd, above-mentioned forward direction transmittance process and back transfer process are repeated until after pre-determined number, obtaining the mould of optimal parameter Type.

4th, in test process, the model of optimal parameter is tested, thus lifting convolutional neural networks robustness.

The scheme of the embodiment of the present invention, in the training process according to carrying out step on the basis of traditional convolutional neural networks Forward and backward transmittance process described in 1～2 is modified accordingly, and do not need in test process to existing structure and Corresponding operating makes any modification it is only necessary to convolution nuclear parameter in convolutional Neural networking is updated to the convolution of new stochastic transformation The parameter of neural network model can (i.e. convolution kernel w and bias variable b) in optimal parameter model.That is, due to Two steps above do not introduce new structure and parameter, only need to the parameter of conventional model during test uses Replace with the new model parameter that obtains of training, without carrying out any other operation.

In embodiment of the present invention such scheme, by the conversion being likely to occur in same characteristic pattern in training convolutional nerve net It is applied at random in the input of each convolutional layer of neutral net during network, so that neutral net has for input More insensitive characteristic is converted present in image.

The such scheme of the embodiment of the present invention, for image classification and retrieval when it is thus only necessary to introduce in the training process Random two-dimensional transform successively, need not introduce new characteristic extracting module, new parameter or input picture is done with extra place Reason, thus effectively lifting convolutional neural networks robustness, and reduces the complexity of image procossing.

When such scheme is used for image classification and retrieval, mainly there are two big steps: training and prediction.

Training process is aforesaid step 1～4: inputs an image into first in network, by front to transmittance process, obtains For the predictive value of input picture classification results, then corresponding error relatively calculated according to predictive value and actual value, then This error is carried out back transfer, calculates the corresponding gradient of parameter of each layer, then by parameter along gradient negative side To being updated, thus reaching the purpose reducing forecast error.So circularly carry out forward and backward transmittance process, arrive After reaching pre-determined number, the error of training data can tend towards stability, and the model now obtaining is the model with optimal parameter, then enters Complete training process after row test, can be used for classification and the retrieval of image.

Prediction process: given any one image, convolutional Neural net can be obtained to computing before carrying out according to model parameter The corresponding output characteristic figure of each convolutional layer in network, and the prediction output of last layer, the prediction output of last layer can be used In image classification, the characteristic pattern of output can be used for image retrieval.

On the other hand, the convolutional network sane enough in order to verify the conversion to input picture of embodiment of the present invention proposition Effectiveness.Test, experimental results will be compared on three data sets below.And and traditional convolutional neural networks and Other networks sane to conversion contrast.Traditional convolutional neural networks include the cnn being trained in artwork, Cnnwith data augmentation, two kinds of other cnns sane to input picture conversion include scale invariant Convolutional neural network (si-cnn) and spatial transformer networks (st-cnn).This A little methods are all used negative log likelihood (nll) as loss function.

Three data sets are mnist, ilsvrc-2012 and uk-bench respectively.Mnist contains 60000 and is used for instructing The hand-written character picture practiced and 10000 hand-written character training pictures being used for test.Every pictures are sizes is 28 × 28 Cromogram, picture is always divided into 10 classes.Using error in classification as measurement index on this data set.Ilsvrc-2012 data set A total of 1.3m opens training picture, 50,000 checking pictures, and 100,000 test pictures.Picture is altogether according to wherein object Classification is divided into 1000 classes, and each class picture is probably at 1000 about.Evaluation index is classification accuracy.Uk-bench is to use In the data set of retrieval, it comprises 2550 groups of pictures, every group of picture has 4, they are all to close a scene or object Similar pictures.The inquiry picture all as retrieval for this 10200 pictures all, the then similarity sequence according to feature, from this 4 most like pictures are returned as retrieval result in a little pictures.Accuracy according to returning picture can calculate retrieval knot The accuracy score of fruit.

Mnist and ilsvrc-2012 enters to image according to the image category of last layer of output of convolutional neural networks Row classification, then calculates the error rate of classification；In uk-bench, carried out according to the feature that convolutional neural networks intermediate layer is extracted The comparison of similarity between image, then returns the result most like with retrieval image according to sequencing of similarity, finally calculates and return Return the similarity score of result.As in Figure 3-5, wherein, r (rotation) refers to Random-Rotation to experimental result, and s (scale) refers to Random scaling, t (translation) refers to random translation, and rts refers to random rotation+scaling+translation.In Fig. 3, fcn refers to entirely connect Connect neutral net (fully connected neural network), its middle two-layer is full articulamentum.Fig. 4 a is test Collection image passes through the class test result after conversion without the class test result of any conversion, Fig. 4 b for test set image, For testing the robustness that convolutional neural networks convert for image.Uk-bench compares cnn and our utilizations of classics The volume that Random-Rotation (r), random scaling (s), random translation (t), Random-Rotation+random scaling+random translation (rts) train The result of long-pending neutral net.The method (ti-cnn) of lifting convolutional neural networks robustness provided in an embodiment of the present invention, with more Plus simply mode achieves more effectively result.

The above, the only present invention preferably specific embodiment, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope of present disclosure, the change or replacement that can readily occur in, All should be included within the scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Enclose and be defined.

Claims

1. a kind of method of effective lifting convolutional neural networks robustness is it is characterised in that include:

In forward direction transmittance process, random two-dimensional transform is carried out to input feature vector figure and carries out convolution operation again, finally give input The predictive value of the classification results of characteristic pattern；

During back transfer, the error between the predictive value obtaining and actual value is carried out back transfer, calculates each The corresponding gradient of parameter of layer, then parameter is updated along the opposite direction of gradient；

2. a kind of method of effective lifting convolutional neural networks robustness according to claim 1 is it is characterised in that described Input feature vector figure carried out with random two-dimensional transform carry out convolution operation again including:

y = f ((w * \hat{x}) + b);

3. according to claim 1 and 2 a kind of effective lifting convolutional neural networks robustness method it is characterised in that The predictive value of the described classification results finally giving input feature vector figure includes:

If comprising five convolutional layer c1～c5 being sequentially connected in convolutional neural networks, and three full articulamentum fc6～fc8； Then in each convolutional layer, input feature vector figure is carried out with random two-dimensional transform and carries out convolution operation again, wherein previous convolutional layer is defeated The characteristic pattern going out is as the input feature vector figure of next convolutional layer；The transformed after-drawing of characteristic pattern of last convolutional layer output Become a characteristic vector, and be input to full articulamentum fc6, finally exported the vector of a p dimension by full articulamentum fc8, often one-dimensional The size of vector represents the probability size that input feature vector figure belongs to respective classes；Take the maximum classification of probit as input feature vector The predictive value of figure.

4. according to claim 1 and 2 a kind of effective lifting convolutional neural networks robustness method it is characterised in that Described random two-dimensional transform is 3 × 3 transformation matrix, and its expression formula is as follows:

Wherein, d_x、d_yIt is respectively the quantity of the pixel that x, y side offset up, s_x、s_yIt is respectively the scaling on x, y direction, θ is the anglec of rotation；θ、d_x、d_y、s_x、s_yObedience average is μ, and variance is σ²Gauss distribution.

5. according to claim 1 a kind of effective lifting convolutional neural networks robustness method it is characterised in that

Using x^l-1Represent the vector of a length of m that the output characteristic figure of l-1 layer launches, if through random two-dimensional transform Afterwards, output is changed into the vector of n dimension, then be equivalent to x^l-1Coefficient matrix tx after one interpolation of premultiplication^l-1, by by convolution Core w is launched into the form of toeplitz matrix, then obtain convolution operation and be expressed as:

z^l=toep (w) (tx^l-1)+b^l；

x^l=f (z^l)

Wherein, b^lFor the bias variable of l layer, t is to convert corresponding coefficient matrix, and w is convolution kernel, z^lLine for the input of l layer Property combined result, f be nonlinear mapping function, x^lOutput characteristic figure for l layer；

After obtaining the output characteristic figure of l layer, by the error delta between l+1 layer predictive value and actual value^l+1Pass to l layer δ^l, calculate the corresponding gradient of parameter of each layer, then parameter be updated along the opposite direction of gradient, it is expressed as:

δ^l=t^ttoep(w)^tf'(z^l)δ^l+1.