CN105550747A

CN105550747A - Sample training method for novel convolutional neural network

Info

Publication number: CN105550747A
Application number: CN201510903228.6A
Authority: CN
Inventors: 游萌
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2015-12-09
Filing date: 2015-12-09
Publication date: 2016-05-04

Abstract

The invention relates to a neural network algorithm, and aims to solve the problem that sample training and collecting consumes a lot of time in the whole structure design and calculation process of the existing neural network. The invention provides a sample training method for a novel convolutional neural network. The method comprises the following steps: determining a certain number of sample sets as benchmark data sets of training, moderately distorting the training weight, and setting the initial learning rate and the final learning rate of training; and using a second-order back propagation learning algorithm to train the sample sets based on the initial learning rate, and ending training when the learning rate reaches the final learning rate. The sample training method of the invention is applicable to sample training of neural network models.

Description

A kind of sample training method of novel convolutional neural networks

Technical field

The present invention relates to neural network algorithm, particularly a kind of sample training method of novel convolutional neural networks.

Background technology

Convolutional neural networks is computer vision and pattern-recognition important field of research, and it refers to that computing machine is copied biological brain thinking to inspire and carried out the information handling system of the similar mankind to special object.It is widely used, and object detection and identification technology is the important component part in present information treatment technology fast and accurately.Because quantity of information increases in recent years sharp, we also exigence have suitable object detection and recognition technology can allow people from a large amount of information, find out information required for oneself.Image retrieval and Text region all belong to this classification, and the detection and indentification system of word is then the pacing items of carrying out information retrieval.Detection and indentification technology is computer vision and field of human-computer interaction important component part.

Convolutional neural networks is a kind of algorithm model being widely used in the field such as pattern-recognition and computer vision recently, test, and then the generalization of application programs is had higher requirement for increasing algorithm for the actual performance of True Data.In particular in the performance of generalization, neural network has a large amount of time loss in sample collection and training process in total project navigator process, in the application of reality, the effect of training has very large embodiment on sample set, instruct in the training pattern of neural network in the nothing of reality, according to the external feedback from output node, the adjustment being exactly desired output connects weights, makes the actual output of network output node consistent with outside desired output.Relative to the integral frame of convolutional neural networks, activation function, topological structure is compared the collection of sample and is arranged and may get the brush-off with aspects such as selecting the mode of training, almost all there is the training set comprising error sample in any neural network that can train, the factual error depositing difference sample collection be like this taken as correct numerical value import into network training process and not by network error correction and differentiation.For training set sample for.Real problem is how to perform the classification in real world and recognition mode by neural network and don't is only verify only a part of experimental data collection.

Summary of the invention

The object of the invention is a kind of sample training method in order to provide fast neuronal network, in total project navigator process, having the problem of a large amount of time loss in sample collection and training process to solve existing neural network.

For achieving the above object, the invention provides a kind of sample training method of novel convolutional neural networks, comprising the steps:

Determine the reference data set of the sample set of some as training, training weights are carried out to the distortion of appropriateness, the initial learn rate of training and final learning rate are set;

Based on initial learn rate, use second order back propagation learning algorithm to train sample set, when learning rate reaches final learning rate, terminate training.

Particularly, use in second order back-propagation algorithm process and can produce error, the partial derivative of error equals the difference of later layer network real output value and later layer network objectives output valve, utilizes described difference to upgrade weighted value.

Particularly, the concrete grammar upgrading weighted value is: new weighted value equals the product that the weighted value before upgrading deducts learning rate and described difference.

Particularly, described initial learn rate is 0.001, and final learning rate is 0.00005.

Alternatively, the method for training weights being carried out to distortion comprises zoom factor, reversion and elastic deformation.

Particularly, network neural unit node weighted value is verified, when weighted value much larger than or much smaller than learning rate decline scope time, abandon this weighted value.

The invention has the beneficial effects as follows: by method of the present invention, can when not taking a large amount of computational resource, promote the training effect of convolutional neural networks to the full extent, to promote and learning rate and iteration to be returned the Optimal Parameters such as number configure and keep sufficient amount training and testing sample and can be classified and identification work in follow-up experiment and doing for handwritten numeral in simulating at training algorithm.Based on the convolutional neural networks model optimized in the distortion of character set sample mode, sampling is incomplete or realize high precision identification under being disturbed the high noise environments such as sample mode.And when conformability significantly promotes, also can promote that pattern-recognition more widely and computer vision field are for the usable range of target detection and Object identifying, aphalangia based on this novel convolutional neural networks leads the performance of the basic engineering skill upgrading intelligent appliance product of the optimization method of network learning and training and sample collection, improve the intelligent and generalization of household electrical appliances in visual interactive, to obtain better Consumer's Experience in actual product use procedure.

Embodiment

Below technical scheme of the present invention is described in further detail.

The present invention solves existing neural network to have the problem of a large amount of time loss in sample collection and training process in total project navigator process, and provide a kind of sample training method of novel convolutional neural networks, the method comprises the steps:

Use in second order back-propagation algorithm process and can produce error, the partial derivative of error equals the difference of later layer network real output value and later layer network objectives output valve, utilizes described difference to upgrade weighted value.The concrete grammar upgrading weighted value is: new weighted value equals the product that the weighted value before upgrading deducts learning rate and described difference.The method of training weights being carried out to distortion comprises zoom factor, reversion and elastic deformation.

Initial learn rate is 0.001, and final learning rate is 0.00005.Network neural unit node weighted value is verified, when weighted value much larger than or much smaller than learning rate decline scope time, abandon this weighted value.

The technical solution used in the present invention is:

First we contain the reference data set of sample set as training of 10000 patterns with one.In a random series, process back-propagation algorithm continuously, wherein each Epoch by 10000 kinds of patterns we be simply called an iteration.The object of training makes great efforts to improve the Generalization Capability of neural network, the expression that training algorithm is plain in the example that this patent is set forth makes weight saltus step in domain of walker larger, avoid the weight properties of error sample to the full extent, particularly in the training set pattern of input, even if data acquisition is regular again, making time manpower does a large amount of preliminary work again, but sample also can not be completely identical with the attributive character in real world.For improving the change training weights that the even more important technology of generalization is exactly appropriateness, make the distortion of weights appropriateness: for training mode before network backpropagation by slight distortion.And each iteration of neural network runs the calculating of weight.This is the training set that distortion is exaggerated from another perspective, is calculated, create new training mode artificially from existing pattern by iterative feedback again and again.Suitable distortion weight computing weight makes the random backpropagation again of each pattern.The dissimilar application of three kinds of distortion weights has: zoom factor, reversion and elastic deformation.

Secondly, after suitable change training weights, the more optimum results for Support Training also needs to arrange other parameters, and the content of core is the setting of learning rate.The training process of network must specify an initial learn rate and final learning rate.In general, when not having neural network training, initial value should be larger, because large learning rate will change in feedback repeatedly calculates for learning rate.Learning rate is slow Gradient Descent in the process of training, due to the study mechanism of neural network, thus makes weight converges in last numerical value.But it is too many that it never really allows learning rate reduce, if lower than certain numerical network by deconditioning.The problem of elaboration initial learn rate that Here it is and final learning rate.To the elaboration of learning rate initial value in this patent, suggestion not use value is greater than about 0.001.It is too fast that larger value makes weight change, and in fact causes weight to disperse instead of restrain.Most neural metwork training is often using the function of the number of learning rate as a number of iterations.Our program is using current learning rate as the first factor, and computation process makes it in a learning rate change procedure constantly declined, and second is the number of times restriction setting iteration in the scope that can allow in the time.If more specifically.Our training system initial learn rate is 0.001, and this is larger comparatively speaking.Such as: learning rate have dropped 79 percentage points after every two iterative computation, and setting last learning rate is 0.00005, so can reach predetermined value according to such decline rate in the process of 25-27 level iteration and terminate training.

Again, what how to allow training process run is faster, uses second order back propagation learning algorithm when training.At this moment Two Order Method is discussed.Such the constringent of quickening neural network has the greatest impact, because so greatly reduce the number of times required for weight converges.So Second Order BP reduces the T.T. needed for training.

The partial derivative of back-propagation process error equals later layer network real output value and deducts later layer network objectives output valve, namely this result is for upgrading the numerical value of weight change, and wherein after the error signal of the hidden layer neuron error direction of signal that directly will be connected according to all hidden layer neurons, recurrence determines.For convergence speedup proposes computing formula:

\frac{\partial E_{n}^{p}}{\partial w_{n}^{i j}} = x_{n - 1}^{j} \cdot \frac{\partial E_{n}^{p}}{\partial y_{n}^{i}}

Formula 1

In formula, n represents n layer network node, relates to the process of tanh activation function here, is because be easy to obtain its derivative selecting tanh activation function.We obtain the changes values of weight at current network layer.Subsequently about front one deck error calculating we use following formula:

\frac{\partial E_{n - 1}^{p}}{\partial x_{n - 1}^{k}} = \underset{i}{Σ} w_{n}^{i k} \cdot \frac{\partial E_{n}^{p}}{\partial y_{n}^{i}}

Formula 2

From the value of formula 1, the value of equation, as the initial value of one deck error before calculating, meanwhile, tells how we calculate the weight of trying to achieve change at current layer, be calculated as object according to whole network delivery error and learning rate.Particularly we upgrade weighted value and can use formula below:

{(w_{n}^{i j})}_{n e w} = {(w_{n}^{i j})}_{o l d} - η \cdot (\frac{\partial E_{n}^{p}}{\partial w_{n}^{i J}})

Formula 3

Wherein η is learning rate, normally a very little number, and setting last learning rate is 0.00005, reduces gradually in the training process.Because formula 3 changes the value of weight.In the constant situation of weight, we can the setting parameter of change learning rate of repetition test, and briefly learning rate value can be multiplied by the learning rate η of a constant times.

Finally, in order to the speed of convergence speedup, we use the strategy abandoning fractional weight.We can solve formula 2 now.If the neuron node verification of network finds that the weight that certain thread is using is wrong value, usually much larger than or much smaller than learning rate decline parameter area, this situation is completely negligible and has reason to ignore current wrong weights propagation, it also avoid calculating and the time overhead of repetition.Less in the error of every one deck, thus more reasonably ignore this invalid numerical, as the one strategy of the training result of raising neural network.Very little error can be there is for some training networks for specific sample collection.Such as this patent for numeral and the sample changed of character both there is very little error in acquisition phase, if only only when the data acquisition of 60000 test sample books, several mistakes seem insignificant, and the process of relatively whole training seems meaningless to net result.The few error of error sample number is just very little, and weight would not change mutually and even can not change very greatly.Test according to reality, in a single-threaded neural computing process, our training algorithm can process 12-15 sample mode (the difference slightly difference of concrete computer hardware performance), because the excellent working time of algorithm performance is relatively short, only need 12-18 hour, the error back propagation of the calculated amount of nearly 1,000,000 samples so several mistakes is just extremely limited on the impact of weight.In addition to the strategy that fractional weight abandons, affect can regard as especially and ignore.Thus final when not affecting network Fast Convergent completely, again guarantee is provided to the training precision of algorithm.

In the configuration of training parameter, training sample set closes in the optimum performance of test, with the test of this patent experience constructing neural network as a result.This section openly gives a brief research experience.The standard evaluated is exactly about the square error of training set is tested; the object of first experiment determines whether training process has the needs of a distortion; general in many network models the process of iteration be there is no methods that weights distortion is arranged or other are artificially interfered, this patent the method is proposed and for the protection of.Actual test set sample size 10000, both there is the error rate of 1.40% in test by mistake identification about 140.Although the very low but space in order to improve and exist optimization of error rate, this means the needs of distortion weight, otherwise convolutional neural networks can not summarize its learning behavior effectively.Need after determining distortion weights to improve network training and generalization ability.Test center of gravity below to turn to and select to be inclined to the configuration optimization aspect of better distortion weights in training, but too many distortion seems to affect the nerves, network reaches the stable point of its weight.If in other words weights distortion is comparatively large, then in the end of each iteration, do not reach a suitable stationary value to the square error of training set is still very high.On the other hand, the result that too little weights distortion prevents neural network extensive to test set.How the parameter of distortion makes trade-offs and takes into account performance and stablize in brief.

Claims

1. a sample training method for novel convolutional neural networks, is characterized in that, comprise the steps:

2. the sample training method of novel convolutional neural networks as claimed in claim 1, it is characterized in that, use in second order back-propagation algorithm process and can produce error, the partial derivative of error equals the difference of later layer network real output value and later layer network objectives output valve, utilizes described difference to upgrade weighted value.

3. the sample training method of novel convolutional neural networks as claimed in claim 2, is characterized in that, the concrete grammar upgrading weighted value is: new weighted value equals the product that the weighted value before upgrading deducts learning rate and described difference.

4. the sample training method of the novel convolutional neural networks as described in claim 1 or 2 or 3, it is characterized in that, described initial learn rate is 0.001, final learning rate is 0.00005.

5. the sample training method of novel convolutional neural networks as claimed in claim 4, is characterized in that, the method for training weights being carried out to distortion comprises zoom factor, reversion and elastic deformation.

6. the sample training method of novel convolutional neural networks as claimed in claim 5, is characterized in that, network neural unit node weighted value is verified, when weighted value much larger than or much smaller than learning rate decline scope time, abandon this weighted value.