CN109146000A

CN109146000A - A kind of method and device for improving convolutional neural networks based on frost weight

Info

Publication number: CN109146000A
Application number: CN201811044605.5A
Authority: CN
Inventors: 韩宇铭; 朱立东; 冉普航
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-09-07
Filing date: 2018-09-07
Publication date: 2019-01-04
Anticipated expiration: 2038-09-07
Also published as: CN109146000B

Abstract

The invention discloses a kind of method and device for improving convolutional neural networks based on frost weight, this method improves traditional BP convolutional neural networks by the theoretical of frost weight.This method acquires enough training and test sample data first, it is pre-processed, and establish convolutional neural networks model, processing is optimized to convolutional neural networks according to frost weight theory, the hidden layer node in convolutional neural networks is analyzed by being introduced into entropy weight method, freeze to contribute little hidden layer node to network output in convolutional neural networks training process, the renewal process of hidden layer node in reasonably optimizing convolutional neural networks, it is effectively reduced the computational complexity of convolutional neural networks, shortens the training duration of convolutional neural networks.This method has the advantages that efficient, reliable, precision is higher.

Description

Method and device for improving convolutional neural network based on freezing weight

Technical Field

The invention belongs to the technical field of neural network algorithms, and particularly relates to a method and a device for improving a convolutional neural network based on a freezing weight.

Background

A Convolutional Neural Network (CNN) is a kind of Neural Network applied to deep learning in recent years, and is widely applied to the field of image recognition and the like in recent years due to high performance of image processing by the Convolutional Neural Network. The concept of the convolutional neural network is firstly proposed by LeCun, LeCun et al successfully designs a classical character recognition system LeNet-5 model by adopting a structure of alternately overlapping a simple cell layer (S-layer) and a complex cell layer (C-layer) and using a BP (Back Propagation Back Propagation algorithm) algorithm on the basis of a neurocognitive machine (Neocognitron) proposed by Fukushima. The convolutional neural network system has stronger robustness as the first convolutional neural network, and achieves the purpose of shortening the training time by amplifying the spatial characteristics among data and utilizing the spatial correlation among the data.

But the convolutional neural network also has many problems, the main problem is from the traditional Back Propagation (BP) algorithm used by the convolutional neural network. Because the error function in the traditional BP neural network is a first derivative relative to the weight of each hidden layer node in the network, the problem of infinite minimization cannot be realized in reality, and the problem of step length selection occurs. Secondly, the objective function to be optimized by the traditional BP neural network is very complex, so that a 'saw-tooth phenomenon' inevitably occurs, and the traditional BP neural network is inefficient to operate; because the optimized objective function in the traditional BP neural network is complex, certain flat areas inevitably appear under the condition that the output of the neuron is close to 0 or 1, and in the areas, the weight change of the node is small, the weight error change is small, and the network training process is almost stopped. These problems all result in long training time of the convolutional neural network, which is undoubtedly frosting on snow for the neural network system with complicated structure.

In a chinese patent "a parametric sound source modeling method based on an improved BP neural network" with publication number CN103077267B, an improved BP neural network is disclosed, which optimizes the structure and parameters in a convolutional neural network model based on a genetic algorithm, and optimizes the convolutional neural network algorithm by finding the number of hidden layers of the convolutional neural network, the initial weights between the hidden layer nodes, and the threshold. The method optimizes the convolutional neural network to a certain extent, but the training time of the convolutional neural network is increased by introducing a genetic algorithm, and the training time of the convolutional neural network is not really shortened.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a method for improving the traditional BP convolutional neural network by introducing a freezing weight theory and an entropy weight data processing method, and meanwhile, the method can effectively improve the network transportation efficiency, reduce the training time and ensure that the convolutional neural network has stronger robustness.

In order to achieve the above purpose, the invention provides the following technical scheme:

a method for improving a convolutional neural network based on a freezing weight comprises the following steps:

101, preprocessing an image to obtain a plurality of batches of training image data;

102, constructing a convolutional neural network, and performing initialization assignment on hidden layer node weight values in the convolutional neural network;

103, inputting a batch of training image data, calculating the convolutional neural network through a forward conduction algorithm to obtain an activation weight of each hidden layer node in the convolutional neural network, and calculating a weight difference of each hidden layer node according to the activation weight;

104, analyzing the weight difference value based on an entropy weight method to obtain an entropy weight of each hidden layer node, determining an evaluation value of the hidden layer node, screening the hidden layer node based on the evaluation value of the hidden layer node, freezing the hidden layer node with the evaluation value below an evaluation value threshold value to maintain the weight quantity of the hidden layer node as the current weight quantity, and transmitting the hidden layer node which is not frozen;

105, updating the transmitted weight quantity of the nodes of the hidden layer which are not frozen through a reverse conduction algorithm to obtain the updated weight quantity of the nodes of the hidden layer which are not frozen so as to obtain the weight quantity of each node of the hidden layer at present;

step 106, judging errors according to the current weight of each hidden layer node, and if the errors are not in a first preset range, repeating the steps 103-105 until the errors are in the first preset range, and stopping the batch of training;

and step 107, training the convolutional neural network by using a plurality of batches of training image data so as to enable the error index of the network to be within a second preset range.

Further, in the method for improving the convolutional neural network, the preprocessed image is a certain amount of image data obtained from the handwriting data recognition library through normalization processing, and all the image data are scaled to a uniform scale.

Further, in the method for improving the convolutional neural network, the convolutional neural network comprises an input layer, at least two convolutional layers and at least two sampling layers, and an output layer, wherein the output layer is combined with a full connection layer.

Further, in the method for improving a convolutional neural network, the weight difference value is an absolute value of a difference between an activation weight of the hidden layer node and a current weight of the hidden layer node.

Further, in the method for improving a convolutional neural network, the analyzing the weight difference value based on an entropy weight method in step 104 to obtain an entropy weight of each hidden layer node, and determining an evaluation value of the hidden layer node specifically includes:

establishing a data matrix according to the number of the nodes of the hidden layer and the weight difference value of the nodes; normalizing the data matrix to obtain a normalized matrix; calculating the entropy of the weight difference value of the hidden layer node according to the normalized matrix; calculating a difference coefficient according to the entropy of the obtained hidden layer node; calculating the entropy weight of the hidden layer node according to the obtained difference coefficient; and calculating the evaluation value of the hidden layer node according to the obtained entropy weight.

Further, the data matrix is:

wherein m is the number of hidden layer nodes, and n is the weight difference of the hidden layer nodes.

Further, in the method for improving a convolutional neural network, the weight amount of each current hidden layer node includes an initial weight amount of a frozen hidden layer node and a weight amount that is not updated by the frozen hidden layer node.

Further, the error determination formula is as follows:

where E (w, v) is the current error value of the convolutional neural network, where k is the number of modes, C is the number of hidden layer nodes, t_piIs the output target value, s, of the hidden layer node p_piIs the actual output value of the hidden layer node p at the convolutional neural network.

Further, the step 107 of the method for improving the convolutional neural network includes inputting a plurality of batches of training image data into the convolutional neural network respectively, and repeating the steps 103-106 to perform training of the network by the plurality of batches of training image data.

Preferably, an apparatus for improving a convolutional neural network based on freezing weights comprises at least one processor, and a memory communicatively connected to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.

Compared with the prior art, the invention has the beneficial effects that:

1. the structure of the convolutional neural network is optimized, the full connection layer and the output layer in the convolutional neural network are combined, and the operation efficiency of the convolutional neural network is improved.

2. And a freezing weight theory is introduced, so that the original BP convolutional neural network model is improved. In each batch of training process of the BP convolutional neural network, the hidden layer nodes with poor correlation degree with the output result are frozen, and the weights of the frozen hidden layer nodes are not updated in the subsequent reverse conduction, so that the complexity of the convolutional neural network training process is improved, the network training efficiency is improved, the operation process of updating the weights of the hidden layer nodes in each reverse conduction is simplified, and the network training time is shortened.

3. An entropy weight data processing method is introduced to screen out target nodes which need to be frozen in each batch of training, and an entropy weight between node weight difference values in each batch of training is used as an evaluation reference, so that a better solution is provided for the selection of nodes in the original relatively fuzzy freezing theory.

4. Experimental results show that the improved convolutional neural network can effectively reduce the complexity of the training process in the network, shorten the network training time, and avoid the problems of step length sinking and overlong training time in network training.

Description of the drawings:

FIG. 1 is a flow chart of a convolutional neural network improvement method according to an exemplary embodiment of the present invention;

FIG. 2 is a diagram of a LeNet-5 convolutional neural network architecture;

FIG. 3 is a diagram of a convolutional neural network structure optimized based on LeNet-5 in accordance with an exemplary embodiment of the present invention;

FIG. 4 is a flow chart of an entropy weighting method according to an exemplary embodiment of the present invention;

fig. 5 is a convolutional neural network improvement method apparatus according to an exemplary embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to test examples and specific embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.

Example 1

specifically, 70000 pieces of image data, namely 60000 pieces of training image data and 10000 pieces of test image data, are acquired from the MNIST handwriting data recognition library. All image data (including training image data and test image data) are normalized, all image data are processed into an input matrix of 28 × 28 pixel points, and the sample size of training image data of each batch is set to 50, so that 1200 training batches of training image data are obtained.

specifically, in the embodiment, a convolutional neural network is constructed based on a word recognition system LeNet-5, the network structure of the LeNet-5 is optimized, and a full connection layer and an output layer of the network are combined to reduce the operation complexity of the convolutional neural network. As shown in fig. 2, the basic LeNet-5 network architecture is composed of 1 input layer, 3 convolutional layers, 2 pooling layers, 1 fully-connected layer, and 1 output layer, and the convolutional neural network optimized in the present invention, as shown in fig. 3, includes 1 input layer, 2 convolutional layers, 2 sampling layers (also called pooling layers), and 1 output layer, in the following order: the input layer is sequentially connected with the first convolution layer, the first pooling layer, the second convolution layer, the second pooling layer and the output layer (the output layer is combined with the full-connection layer).

Further, the input layer of the convolutional neural network constructed in this example is designed based on the test image data image and the training image data, the number of hidden layer nodes in the input layer is set to 50, and each hidden layer node processes one data point in the image, so that the structure of the input layer is 28 × 28 × 50. There are 6 convolution kernels of 5 × 5 in the first convolution layer after the input layer, the image data coming from the input layer is processed into a 24 × 24 data matrix by the first convolution layer, and an offset is added after each convolution kernel. There are 6 convolution kernels of 1 × 12 in the second convolution layer, and the image is processed into 12 images of 8 × 8, with 12 offsets added as well.

The design of the first and second sampling layers will match the architecture of the previous convolutional layer, and in the first sampling layer, the sampling window is set to 6 12 × 12 sizes with an offset of 0. The second sampling layer sets the window to 12 sizes of 4 x 4, again offset by 0. Each sample layer is a data structure for the previous convolutional layer, reducing the result of the convolution by a factor of two in each dimension. The last output layer is also a fully connected layer.

In this example, we set 150 hidden layer nodes on the first convolutional layer and 1800 hidden layer nodes on the second convolutional layer of the convolutional neural network. And then, carrying out initialization assignment on the weight values of all hidden layer nodes in the convolutional neural network, wherein all the hidden layer nodes are all hidden layer nodes in an input layer, a convolutional layer, a pooling layer and an output layer (including a full connection layer) in the convolutional neural network. The assignment here is a random assignment of all hidden layer nodes within a certain range (typically 0-100). In addition, the settings of all hidden layer nodes in the input and output layers are unchanged after the initial assignment, while the hidden layer nodes in the pooling layer are not the target of the algorithm training, so in this example, only 1950 hidden layer nodes in the convolutional layer would be trained by the improved algorithm of the present invention.

103, inputting a batch of training image data, calculating a convolutional neural network through a forward conduction algorithm to obtain an activation weight of the hidden layer node, and calculating weight difference information of the hidden layer node according to the activation weight;

specifically, a batch of training image data (50 sheets) is input into the convolutional neural network, and the convolutional neural network is calculated according to a forward conduction training method in the BP convolutional neural network, so that an activation weight of each hidden layer node in the calculation can be obtained (that is, the hidden layer node is activated, so that the hidden layer node obtains a weight amount required by an optimal working state). Then, the obtained activation weight of each hidden layer node and the initial weight of each hidden layer node are subtracted, and the absolute value of the obtained difference is the weight difference value information (namely the residual error in the conventional definition) of the hidden layer node

The principle of the traditional BP convolutional neural network is as follows: giving a convolutional neural network, inputting a batch of training image data into the convolutional neural network, firstly performing 'forward conduction' calculation, calculating an activation value of each hidden layer node in the convolutional neural network from a second layer (the first layer is an input layer) to an output layer in the convolutional neural network according to a forward conduction calculation method, and then calculating a residual error delta of each hidden layer node i_i ^(l)The residual indicates how much each hidden layer node contributes to the final output of the network. At the moment, the residual error information of the hidden layer nodes is transmitted from the output layer in the reverse direction, and then the weight quantities of all the hidden layer nodes are updated through a reverse conduction updating algorithm. In training of different batches, the weight quantities of all hidden layer nodes are continuously updated, the output result of the network is closer to the result required by people, the stability and the accuracy of the network are correspondingly increased, and meanwhile, the related complexity and the training time of the algorithm are also obviously improved.

A large number of research results show that, in the training process of the conventional BP convolutional neural network, some hidden layer nodes in all hidden layer nodes of the convolutional neural network are not sensitive to a batch of training image data, that is, the residual values calculated by the hidden layer nodes in the forward conduction training of the network are small, and accordingly, the influence of the residual values on the network output is small. This means that the hidden layer node weight does not change much as the training time increases. Therefore, if the weight value of the hidden layer node is updated by directly utilizing the reverse conduction algorithm after the calculation of the forward conduction algorithm is completed, the significance of continuously updating the hidden layer node with a small residual value is not great. The fact that the part of hidden layer nodes are continuously updated is that useless work is done, so that the related complexity and the training time of the algorithm are obviously increased.

The invention is an improvement of the traditional BP convolutional neural network on the basis, after the network calculates the residual error of each hidden layer node through forward conduction training, and further, according to the theoretical thought of Shannon information entropy, the entropy of the residual error of each hidden layer node can be calculated to evaluate the residual error, so as to determine whether to carry out the subsequent reverse conduction weight updating process on the node, and when evaluating the index weight of an index system, the entropy is an ideal scale. An evaluation standard is determined based on an entropy weight method, and the part of meaningless hidden layer nodes which are continuously updated are screened out and frozen according to the evaluation standard (namely the part of nodes are not updated with the weight), so that the weights of the nodes which are not frozen are updated only by carrying out subsequent reverse conduction calculation on the nodes, the operation complexity of the network is greatly reduced, and a large amount of network training time is saved.

specifically, the step of analyzing the weight difference value based on an entropy weight method to obtain an entropy weight of each hidden layer node, and determining an evaluation value of the hidden layer node accordingly specifically includes:

firstly, an index evaluation system is established, wherein n evaluation indexes (n weight difference values of hidden layer nodes), m evaluated objects (the number of the hidden layer nodes) and original data matrixes of corresponding indexes of the evaluated objects are set as follows:

next, we need to normalize the original data matrix, which is denoted as matrix S ═ S (S)_ij)_m*n。

Normalize S as

All S values thus obtained will be within the [0,1] interval.

The entropy of the jth weight difference at this time is

Wherein,

the difference coefficient of the jth weight difference is:

α_j＝1-H_j(j＝1,2,···,n)

the entropy weight of the jth weight difference is:

under the setting of the entropy weight, a judgment standard can be obtained, when the obtained entropy of the hidden layer node is larger, the smaller the entropy weight is, and the contribution to the training process of the batch is correspondingly reduced. When the maximum value of the obtained entropy is 1 and the entropy weight is 0, it can be considered that the hidden layer node does not provide any effective contribution for the training.

As shown in fig. 4, the specific entropy weight analysis method can be summarized as follows,

step 401, establishing a data matrix according to the number of nodes of the hidden layer and the weight difference value thereof;

step 402, normalizing the data matrix to obtain a normalized matrix;

step 403, calculating entropy of the weight difference of the nodes of the hidden layer according to the normalized matrix;

step 404, calculating a difference coefficient according to the obtained entropy of the hidden layer node;

step 405, calculating the entropy weight of the hidden node according to the obtained difference coefficient;

and step 406, calculating the evaluation value of the hidden layer node according to the obtained entropy weight value by using the following formula.

Therefore, evaluation values of all hidden layer nodes are obtained, and the smaller the value of X is, the smaller the contribution of the node to weight updating is. And analyzing the evaluation value of each hidden layer node, screening twenty percent of hidden layer nodes with the minimum evaluation value of the hidden layer nodes, and freezing, wherein the evaluation value threshold is the evaluation value at the twenty percent of dividing point, and the evaluation value threshold is changed in each batch of training and is generally one number of 0-0.05. And freezing the hidden layer nodes with the entropy weights below the entropy weight threshold value to maintain the weight quantity of the hidden layer nodes as the initial weight quantity, namely, the weight quantity of the frozen hidden layer nodes is not updated in the subsequent algorithm, and the weight quantity of the hidden layer nodes is maintained at the initial weight quantity. While passing the non-frozen hidden layer nodes back from the output layer of the convolutional neural network to the hidden layer preceding it (which is a general term for the convolutional layer and the pooling layer preceding the output layer).

specifically, the weight of the hidden layer node that is not frozen is updated by using a reverse conduction formula, and the hidden layer node that is frozen at this time keeps the current weight, and the updated weight of the hidden layer node that is not frozen is added, that is, the weight of each current hidden layer node.

specifically, according to the weight amount of each current hidden layer node, the network error e (w) of the current training is calculated by the following formula:

where k is the number of patterns, C is the number of hidden layer nodes, t_piIs the output target value of the hidden layer node p. S_piIs the actual output value of the hidden layer node p at the convolutional neural network.

Where h is the number of hidden layer nodes in the network, x_iIs an n-dimensional input pattern, i 1,2_mIs a C-dimensional vector of the weights of the arcs connecting the mth hidden layer node and the output layer. The activation function of the output layer is sigmoid function sigma (y) 1/(1+ e-^y) The activation function of the hidden layer is a hyperbolic tangent function:

δ(y)＝(e^y-e^-y)/(e^y+e^-y)

at this time, if the obtained error value is within a first preset range (generally 0 to 0.05), the output target value t of the hidden layer node p is determined_piActual output value S of convolutional neural network with hidden layer node p_piThere is almost no deviation between the training targets, which indicates that the training targets of the batch are successfully completed, and the training of the batch can be ended (step 103-step 105 are the whole process of the training of the batch). Otherwise, if the calculated error value is not within the acceptable range, the batch of training targets is not achieved, and then the steps 103 to 105 are repeated, that is, the batch of training image data is input into the convolutional neural network again, forward calculation training is performed on hidden layer nodes in the convolutional neural network, part of the hidden layer nodes are frozen based on the entropy weight method, and the weight values of the hidden layer nodes which are not frozen are updated reversely until the error value calculated according to the weight values of all the hidden layer nodes is within the first preset range, so that the batch of training is ended.

And step 107, training the convolutional neural network by using a plurality of batches of training data so as to enable the error index of the network to be within a second preset range.

Specifically, after the training of one batch is finished, the training image data of the 2 nd, 3 rd and 4 th 4 … th times of 120 th times are input, and the steps 103 to 106 are repeated to finish the training of the training image data of the rest batches, so that the error index of the network is within the second preset range. Each image data in this example is an image of a handwritten number obtained from the MNIST handwriting data recognition library, and since the handwriting of different people is different, how to recognize the same number with a large difference in handwriting habit through the convolutional neural network is a problem that needs to be solved, that is, through training a network pair, the network needs to correctly recognize the same number. Therefore, after training the network by using the training image data (60000 sheets) to enable the network performance to tend to be stable, performing performance test on the network by using the test image data (10000 sheets) to obtain a test result (namely whether the network can correctly identify the same number with larger handwriting habit difference), and calculating the error rate of the network according to the test result. The second predetermined error range is a limit to the error rate of the overall operation result of the network, and generally requires that the network error rate is within fifteen percent.

Further, we compare the operation results (including training time and error rate) of the original BP convolutional neural network and the improved BP convolutional neural network to obtain the comparison results shown in table 1.

TABLE 1

As can be seen from the graph, the average training time of each time can be reduced by 25 seconds by the BP convolutional neural network improved by the invention, and the training time of the convolutional neural network is greatly shortened. Meanwhile, the improved BP convolutional neural network of the invention has the advantages of improved error rate (the error rate is improved by about 2 percent and is within an acceptable range), but the training time is reduced by 13.9 percent in total. Therefore, the improved convolutional neural network greatly improves the operation speed of the network under the condition of sacrificing part of acceptable accuracy, and the network training complexity is reduced by more than 5%. Secondly, the improved convolutional neural network can work normally, the network does not fall into a dead cycle or the error rate is too high, and the method effectively avoids the problems of step length, local minimum and the like easily occurring in the traditional BP convolutional neural network and has stronger robustness.

Example 2

Fig. 5 shows an apparatus for improving a convolutional neural network, namely an electronic device 310 (e.g., a computer server with program execution function) including at least one processor 311, a power supply 314, and a memory 312 and an input-output interface 313 communicatively connected to the at least one processor 311, according to an embodiment of the present invention; the memory 312 stores instructions executable by the at least one processor 311, the instructions being executable by the at least one processor 311 to enable the at least one processor 311 to perform a method disclosed in any one of the embodiments; the input/output interface 313 may include a display, a keyboard, a mouse, and a USB interface for inputting/outputting data; the power supply 314 is used to provide power to the electronic device 310.

Those skilled in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

When the integrated unit of the present invention is implemented in the form of a software functional unit and sold or used as a separate product, it may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

Claims

1. A method for improving a convolutional neural network based on a freezing weight is characterized by comprising the following steps:

2. The method of claim 1, wherein the pre-processing the image is normalizing a quantity of image data retrieved from a handwriting data recognition library, all image data sized to a uniform scale.

3. The method of claim 1, wherein the convolutional neural network comprises an input layer, at least two convolutional layers and at least two sampling layers, an output layer, wherein the output layer incorporates fully-connected layers.

4. The method of claim 1, wherein the weight difference value is an absolute value of a difference between the obtained activation weight of the hidden layer node and a current weight of the hidden layer node.

5. The method according to claim 1, wherein the step of analyzing the weight difference values based on an entropy weight method to obtain an entropy weight of each hidden layer node in step 104, and determining the evaluation value of the hidden layer node accordingly specifically comprises:

6. The method of claim 4, wherein the data matrix is:

7. The method of claim 1, wherein the current weight amount for each hidden layer node comprises an initial weight amount for a frozen hidden layer node and an updated weight amount for a hidden layer node that is not frozen.

8. The method of claim 1, wherein the error determination formula is:

where E (w, v) is the current error value of the convolutional neural network, where k is the number of modes, C is the number of hidden layer nodes, t_piIs the output target value of the hidden layer node p,s_piis the actual output value of the hidden layer node p at the convolutional neural network.

9. The method of claim 1, wherein step 107 comprises inputting a plurality of batches of training image data into the convolutional neural network, respectively, and repeating steps 103-106 to perform training of the network by the plurality of batches of training image data.

10. An apparatus for improving a convolutional neural network based on a freezing weight, comprising at least one processor, and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.