CN108491927A

CN108491927A - A kind of data processing method and device based on neural network

Info

Publication number: CN108491927A
Application number: CN201810219503.6A
Authority: CN
Inventors: 许震; 赵晓萌; 张如高
Original assignee: New Wisdom Cognition Marketing Data Services Ltd
Current assignee: New Wisdom Cognition Marketing Data Services Ltd
Priority date: 2018-03-16
Filing date: 2018-03-16
Publication date: 2018-09-04

Abstract

The present invention provides a kind of data processing method and device based on neural network, this method includes：Training sample data are input to neural network；Obtain the input quantity and weight of each layer convolution of the training sample data in the neural network；The weight of each layer of convolution of the training sample data is subjected to regularization, so that the weight of each layer of convolution is all distributed in specified region；Calculate separately the binaryzation data of the weight for each layer of convolution for adding regular terms；The corresponding neural network model of the training sample data is established according to the input quantity of the binaryzation data and each layer of convolution.The program to each layer convolution weight of the training sample data in neural network by adding regular terms, to force original floating-point weight in optimization process dynamically close to+1 or 1, to reduce the diversity factor before and after binaryzation between data model, data volume is reduced, data processing speed and computational accuracy are improved.

Description

Data processing method and device based on neural network

Technical Field

The invention relates to the technical field of data processing, in particular to a data processing method and device based on a neural network.

Background

With the development of the convolutional neural network, the convolutional neural network is widely applied in the field of intelligent monitoring and becomes an indispensable tool, such as face recognition, vehicle detection, object recognition and the like. But followWith the number of layers of modern convolutional neural networks increasing, the complexity of the networks becomes larger and larger, for example, for a convolutional neural network, the number of convolutional layers can exceed 10 layers, and moreover, the calculation amount of all convolutional layers occupies almost 80% of the calculation amount of the whole network. This results in the inability of similar convolutional neural networks to run on embedded devices such as surveillance cameras. In order to reduce the computational complexity of convolutional layers, the prior art directly binarizes the floating point Weights and floating point activations of a Neural network, for example, in the paper "binary Neural Networks with Weights and activations of convolutional Neural Networks," the author of which binarizes a given convolutional Neural network Ω. The specific mode is as follows: the floating point input vector of the i-th layer is x_lThe floating point weight of the i-th layer is w_lThen the output of the ith layer (i.e., the input of the (i + 1) th layer) isThen the floating point weight w for the ith layer_lThe binarization mode of (1) is shown as the following formula:

original floating point weight w after quantization by the above formula_lBecomes a binary weightThereby changing the operation which originally needs floating-point multiplication into floating-point addition operation.

But as shown in fig. 1, the weights w for each layer of the convolutional neural network_lThe distribution of (A) is generally about 0 and is close to Gaussian distribution, so if the weight is directly and forcibly subjected to binarization operation, the weight after binarization can be causedAnd an original weight w_lThe phase difference is large, resulting in the use of a random ladderWhen the degree reduction algorithm is used for optimizing the convolutional neural network, oscillation is generated, so that convergence is slowed down, and the precision is not high. Similarly, the output activation value of each layer of the convolutional neural network is also approximate to gaussian distribution, and if the binarization quantization is performed forcibly, the difference between the values before and after the quantization is large.

A good quantization method is needed to keep the accuracy from degrading after the quantization of the network weights is completed. This requires an accurate estimate of the weight distribution of each layer of the network during quantization and then binarization based on the estimated values. If the estimation is not accurate, the convolutional neural network model after quantization deviates from the optimal target in the optimization space.

Therefore, how to improve the processing speed and accuracy of the data processing algorithm based on the neural network becomes a technical problem to be solved urgently.

Disclosure of Invention

Therefore, the technical problems to be solved by the invention are that the data processing algorithm based on the neural network in the prior art has overlarge calculation amount, a complex calculation process and lower calculation result precision.

In view of this, a first aspect of the embodiments of the present invention provides a data processing method based on a neural network, including: inputting training sample data to a neural network; acquiring the input quantity and the weight of each layer of convolution of the training sample data in the neural network; regularizing the weight of each layer of convolution of the training sample data to enable the weight of each layer of convolution to be distributed in a specified area; respectively calculating the binary data of the weight of each layer of convolution added with the regular term; and establishing a neural network model corresponding to the training sample data according to the binary data and the input quantity of each layer of convolution.

Preferably, the regularizing the weight of each layer of convolution of the training sample data so that the weight of each layer of convolution is distributed in a specified area includes: adding a regularization term to the weights of each layer of convolution to make all weights in the neural network tend to be +1or-1 in weight.

Preferably, the adding a regularization term to the weight of each layer of convolution is such that all weights in the neural network tend to weigh +1or-1, including: calculating a forward propagation loss function of the training sample data in the neural network by adopting the following formula:

wherein L (W) is the forward propagation loss function, W is a weight set of training sample data in the neural network, N is the number of input samples in the neural network, L (W) is the forward propagation loss function, L (W) is the weight set of training sample data in the neural network, N is the number of input samples in the neural network, L (W) is the forward propagation loss function_iIs the loss of each of the input samples, H is the total number of convolutional layers in the neural network, H is the current number of convolutional layers, W_hIs a set of weights for the h-th convolutional layer in the neural network,is W_hα is the regular term coefficient used to balance the add power of the regular term.

Preferably, the binarizing data for separately calculating the weight of each layer convolution added with the regularization term includes: calculating the binary data by adopting the following formula:

wherein,is the binarized data of the h-th convolutional layer, W_hIs a set of weights for the h-th convolutional layer in the neural network.

Preferably, the establishing a neural network model corresponding to the training sample data according to the binarized data and the input quantity of each layer of convolution includes: solving an objective function of the neural network by adopting a random gradient descent algorithm; and establishing the neural network model according to the objective function.

A second aspect of an embodiment of the present invention provides a data processing apparatus based on a neural network, including: the input module is used for inputting training sample data to the neural network; the acquisition module is used for acquiring the input quantity and the weight of each layer of convolution of the training sample data in the neural network; the regularization module is used for regularizing the weight of each layer of convolution of the training sample data so as to enable the weight of each layer of convolution to be distributed in a specified area; the calculation module is used for respectively calculating the binary data of the weight of each layer of convolution added with the regular term; and the establishing module is used for establishing a neural network model corresponding to the training sample data according to the binary data and the input quantity of each layer of convolution.

wherein L (W) is the forward propagation loss function, W is a set of weights of training sample data in the neural network, N is the neural networkNumber of input samples in the network, L_iIs the loss of each of the input samples, H is the total number of convolutional layers in the neural network, H is the current number of convolutional layers, W_hIs a set of weights for the h-th convolutional layer in the neural network,is W_hα is the regular term coefficient used to balance the add power of the regular term.

Preferably, the calculation module comprises: a binarization unit for calculating the binarized data by adopting the following formula:

Preferably, the establishing module comprises: the solving unit is used for solving an objective function of the neural network by adopting a stochastic gradient descent algorithm; and the establishing unit is used for establishing the neural network model according to the objective function.

The technical scheme of the invention has the following advantages:

according to the data processing method and device based on the neural network, the regular terms are added to each layer of convolution weight of training sample data in the neural network, all the weights are distributed in a specified range, such as tend to be +1or-1, the original floating point weight is forced to be dynamically close to +1or-1 in the optimization process, then the regularized weights are subjected to binarization processing, and therefore the difference degree between data models before and after binarization is reduced, the data size of the whole data processing based on the neural network can be greatly reduced, the data processing speed in the process of training the neural network model is improved, the calculation result of a weight set added with the regular terms in the binarization process is more accurate, and the precision of the calculation result is improved. The neural network model can be used in various occasions such as image processing, data classification and the like, and can greatly improve the processing speed and precision.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a diagram illustrating the distribution of weights in a conventional convolution weight binarization method in the prior art;

FIG. 2 is a flowchart of a data processing method based on neural network according to embodiment 1 of the present invention;

fig. 3a is a schematic diagram of the distribution of the weights when the regular term coefficient α is 0.00010 in embodiment 1 of the present invention;

fig. 3b is a schematic diagram of the distribution of the weights when the regular term coefficient α is 0.00025 in embodiment 1 of the present invention;

fig. 3c is a schematic diagram of the distribution of the weights when the regular term coefficient α is 0.00050 in embodiment 1 of the present invention;

fig. 3d is a schematic diagram of the distribution of weights when the regular term coefficient α is 0.00090 according to embodiment 1 of the present invention;

FIG. 4 is a comparative graphical representation of the loss magnitude change in a comparative test of example 1 of the present invention;

FIG. 5 is a comparative graphical representation of the variation in accuracy in the comparative test of example 1 of the present invention;

fig. 6 is a block diagram of a data processing apparatus based on a neural network according to embodiment 2 of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example 1

The present embodiment provides a data processing method based on a neural network, which is used for establishing a neural network model under the conditions of image processing, data classification, and the like, as shown in fig. 2, and includes the following steps:

s21: training sample data is input to the neural network. The training sample data may be sample picture data to be processed or audio/video data, and the like, and in this embodiment, the picture data is taken as a training sample, and is input to the neural network as the training sample for training.

S22: acquiring the input quantity and weight of each layer of convolution of training sample data in a neural network; here, the number of convolution layers in the neural network may be determined according to actual needs, for example, to meet DSP (Digital signal processing) optimization requirements, a simple convolutional neural network may be designed, and a ReLU function is used as its activation function, for example, a neural network as shown in table one may be constructed:

in table one, the type represents the type of each layer of convolution of the neural network; the weight tensor and the input tensor represent the size of the parameters and the size of the input in a certain layer of convolution, respectively. In the calculation, in order to reduce the data processing amount, the input amount and the weight of each layer of convolution corresponding to the training sample data are processed, wherein the input amount can adopt the activation value of each layer of convolution.

S23: the weights of each layer of convolution of the training sample data are regularized so that the weights of each layer of convolution are distributed within a specified region. Regularization is a common method for solving the ill-posed problem, i.e. a solution of a group of ill-posed problems which are adjacent to the original ill-posed problem is used for approximating the solution of the original problem, and the regularization processing is carried out on the weight of each layer of convolution, so that the original weight set which is distributed unevenly can be distributed in a designated area, thereby facilitating the binarization processing of the subsequent steps and improving the accuracy of the binarization.

As a preferable scheme, the step S23 may specifically include: adding a regularization term to the weight of each layer of convolution so that all weights in the neural network tend to be +1 or-1; specifically, the following formula can be adopted to calculate the forward propagation loss function of the training sample data in the neural network:

wherein L (W) is a forward propagation loss function, and W is training sample data in the neural networkN is the number of input samples in the neural network, L_iIs the loss of each input sample, H is the total number of convolutional layers in the neural network, H is the current number of convolutional layers, W_hIs the weight set of the h-th convolutional layer in the neural network,is W_hThe rule of selecting the regular term coefficient α is to ensure that the weights tend to be distributed to +1or-1, and different values of α bring different regularization results, as shown in fig. 3a to fig. 3d, when α ═ 0.00010, α ═ 0.00025, α ═ 0.00050, and α ═ 0.00090 are respectively shown, the distribution diagram of the weights on the basis of the neural network structure as shown in table one is shown, and it can be seen that, in this embodiment, when α ═ 0.00090, the distribution of the weights can be ensured to be closer to +1or-1, and therefore, when α ═ 637 is selected to ensure that the final result is calculated as 0.00090, the final result is easier to be calculated as compared with the prior art (i.e., when the distribution diagram 5391 is compared with the prior art, i.e., when the distribution of the weights is processed as shown by comparing 0.00090 with the prior art).

S24: respectively calculating binary data of the weight of each layer of convolution added with the regular term; the Binarized data with Weights can be obtained by using the binarizing method in the paper binary Networks with Weights and similarities and differences Constrained to +1or-1, and of course, other applicable binarizing methods can be adopted, as long as the binarizing method can realize the technical scheme of the present invention.

As a preferable scheme, the binarized data may be calculated by using the following formula:

wherein,is the binarized data of the h-th convolutional layer, W_hIs a set of weights for the h-th convolutional layer in the neural network. The weight of each layer of convolution is respectively substituted into the formula for binarization, the binarization data has stronger adaptability to embedded equipment, and when solving the input quantity of the next layer of convolution, the output quantity and the weight of the previous layer of convolution are converted into the change of the element sign of the output quantity per se from the original dot product, so that the data quantity is compressed, and the data processing speed is improved.

S25: and establishing a neural network model corresponding to the training sample data according to the binary data and the input quantity of each layer of convolution. The binary data obtained through the steps not only compress the data volume, but also improve the data precision, so that the built neural network model has higher identification precision, the detection precision of the neural network model in a specific application scene can be improved, and a more accurate reference tool is provided for the fields of face identification, vehicle detection, object identification and the like.

As a preferable scheme, the step S25 may specifically include: solving an objective function of the neural network by adopting a random gradient descent algorithm; and establishing a neural network model according to the objective function. For example, the regularization term in the forward propagation function formula is additionally usedThe regularization term can be obtained according to a random gradient descent algorithmThe back propagation formula:

after the objective function is solved, the neural network model corresponding to the training sample data can be established according to the objective function.

In order to better illustrate the beneficial effects of the scheme of the present embodiment, the present embodiment takes the neural network structure shown in table one as an example, and the detailed description is performed in a manner of comparison test, specifically, as shown in fig. 4, curve OurRegularization is a change curve of the Loss size corresponding to the scheme of performing binarization after adding a regular term to the weight of each layer of convolution in the present embodiment along with the increase of the number of iterations, and L2 is a change curve of the Loss size corresponding to the conventional binarization scheme in the prior art along with the increase of the number of iterations, it can be seen that the technical scheme of the present embodiment has better stability in the process of performing binarization on the weight, and a Loss curve is smoother, so that an effect of fast convergence is achieved; as shown in fig. 5, when the binarization scheme in the present embodiment and the binarization scheme in the prior art are respectively adopted, the curves a1 and a2 are variation curves of accuracy along with the increase of the number of iterations in the calculation process, and it can be seen that the scheme in the present embodiment not only can improve the accuracy of the calculation result, but also can prevent the oscillation phenomenon from occurring at the initial stage of optimization; the final calculation result precision comparing the floating point data, L2 (conventional binary data) and the present embodiment is shown in table two:

as shown in table two, compared with floating point data, the final calculation result of the scheme of the present embodiment has a lower precision, but the data calculation amount can be greatly reduced, the data processing speed is improved, and the embedded device has stronger adaptability; compared with the scheme of L2, the embodiment improves the calculation accuracy on the basis of reducing the data calculation amount, so that the neural network model has better identification capability.

In the data processing method based on the neural network provided by this embodiment, the regularization term is added to each layer of convolution weight of training sample data in the neural network, so that all weights are distributed in a specified range, for example, tend to +1or-1, so as to force the original floating point weight to dynamically approach +1or-1 in the optimization process, and then the regularized weights are subjected to binarization processing, so as to reduce the difference degree between the data models before and after binarization, thereby not only greatly reducing the data amount of the whole data processing based on the neural network, and improving the data processing speed in the process of training the neural network model, but also the calculation result of the weight set to which the regularization term is added in the binarization process is more accurate, so as to improve the precision of the calculation result.

In addition, the present embodiment further provides a data classification method, which uses the data processing method based on the neural network to establish a neural network model, and includes the following steps:

firstly, obtaining classified training samples;

secondly, inputting the training samples into a neural network model, and training the neural network model by adopting the data processing method based on the neural network in the embodiment to obtain the neural network model for classification;

inputting the target classification data into the trained neural network model for classification;

and fourthly, processing the target classification data by the neural network model to obtain a classification result.

The neural network model established by the scheme has higher training speed and better identification precision.

Example 2

The present embodiment provides a data processing apparatus based on a neural network, as shown in fig. 6, including: the input module 61, the acquisition module 62, the regularization module 63, the calculation module 64 and the establishment module 65, each module functions as follows:

an input module 61, configured to input training sample data to a neural network; see in particular the detailed description of step S21 in example 1.

An obtaining module 62, configured to obtain an input amount and a weight of each layer of convolution of training sample data in the neural network; see in particular the detailed description of step S22 in example 1.

A regularization module 63, configured to regularize the weight of each layer of convolution of the training sample data, so that the weight of each layer of convolution is distributed in the designated area; see in particular the detailed description of step S23 in example 1.

A calculating module 64, configured to calculate binary data of the weight of each layer of convolution to which the regularization term is added; see in particular the detailed description of step S24 in example 1.

And the establishing module 65 is configured to establish a neural network model corresponding to the training sample data according to the binarized data and the input quantity of each layer of convolution. See in particular the detailed description of step S25 in example 1.

As a preferable scheme, regularizing the weight of each layer convolution of the training sample data so that the weight of each layer convolution is distributed in a specified area includes: a regularization term is added to the weights of each layer of convolution so that all weights in the neural network tend to be +1or-1 in weight. See in particular the detailed description of the preferred embodiment of step S23 in example 1.

As a preferred approach, a regularization term is added to the weights of each layer of convolution such that all weights in the neural network tend to weigh +1or-1 including: calculating a forward propagation loss function of the training sample data in the neural network by adopting the following formula:

wherein, L (W) is a forward propagation loss function, W is a weight set of training sample data in the neural network, N is the number of input samples in the neural network, L_iIs the loss of each input sample, H is the total number of convolutional layers in the neural network, H is the current number of convolutional layers, W_hIs the weight set of the h-th convolutional layer in the neural network,is W_hα is a regular term coefficient for balancing the adding strength of the regular term, see in particular the detailed description of the preferred embodiment of step S23 in embodiment 1.

As a preferred solution, the calculation module 64 includes: a binarization unit for calculating the binarized data by adopting the following formula:

wherein,is the binarized data of the h-th convolutional layer, W_hIs a set of weights for the h-th convolutional layer in the neural network. See in particular the relevant detailed description of the preferred version of step S24 in example 1.

As a preferable scheme, the establishing module 65 includes: the solving unit is used for solving an objective function of the neural network by adopting a random gradient descent algorithm; and the establishing unit is used for establishing the neural network model according to the target function. See in particular the relevant detailed description of the preferred version of step S25 in example 1.

In the data processing apparatus based on the neural network provided by this embodiment, the regularization term is added to each layer of convolution weight of training sample data in the neural network, so that all weights are distributed in a specified range, for example, tend to +1or-1, to force the original floating point weight to dynamically approach +1or-1 in the optimization process, and then the regularized weights are subjected to binarization processing, so as to reduce the difference between the data models before and after binarization, thereby not only greatly reducing the data amount of the whole data processing based on the neural network, and improving the data processing speed in the process of training the neural network model, but also improving the accuracy of the calculation result because the weight set to which the regularization term is added is more accurate in the process of binarization.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. A data processing method based on a neural network is characterized by comprising the following steps:

inputting training sample data to a neural network;

acquiring the input quantity and the weight of each layer of convolution of the training sample data in the neural network;

regularizing the weight of each layer of convolution of the training sample data to enable the weight of each layer of convolution to be distributed in a specified area;

respectively calculating the binary data of the weight of each layer of convolution added with the regular term;

and establishing a neural network model corresponding to the training sample data according to the binary data and the input quantity of each layer of convolution.

2. The method of claim 1, wherein the regularizing the weights of each layer of convolution of the training sample data such that the weights of each layer of convolution are distributed within a specified region comprises:

adding a regularization term to the weights of each layer of convolution to make all weights in the neural network tend to be +1or-1 in weight.

3. The neural network-based data processing method of claim 2, wherein the adding a regularization term to the weights of each layer of convolution such that all weights in the neural network tend to be +1or-1 comprises:

calculating a forward propagation loss function of the training sample data in the neural network by adopting the following formula:

4. The neural network-based data processing method according to any one of claims 1 to 3, wherein the separately calculating binarization data of the weights of the convolutions of each layer to which a regularization term is added includes:

calculating the binary data by adopting the following formula:

5. The method according to claim 1, wherein the building a neural network model corresponding to the training sample data according to the binarized data and the input amount of each layer of convolution comprises:

solving an objective function of the neural network by adopting a random gradient descent algorithm;

and establishing the neural network model according to the objective function.

6. A data processing apparatus based on a neural network, comprising:

the input module is used for inputting training sample data to the neural network;

the acquisition module is used for acquiring the input quantity and the weight of each layer of convolution of the training sample data in the neural network;

the regularization module is used for regularizing the weight of each layer of convolution of the training sample data so as to enable the weight of each layer of convolution to be distributed in a specified area;

the calculation module is used for respectively calculating the binary data of the weight of each layer of convolution added with the regular term;

a building module for building the convolution of each layer according to the input quantity of the binary data and each layer

And establishing a neural network model corresponding to the training sample data.

7. The neural network-based data processing apparatus of claim 6, wherein the regularizing the weights of each layer of convolution of the training sample data such that the weights of each layer of convolution are distributed within a specified region comprises:

8. The apparatus according to claim 7, wherein the adding a regularization term to the weights of each layer convolution such that all weights in the neural network tend to be +1or-1 comprises:

9. The neural network-based data processing apparatus of any one of claims 6 to 8, wherein the computation module comprises:

a binarization unit for calculating the binarized data by adopting the following formula:

10. The neural network-based data processing apparatus of claim 6, wherein the establishing means comprises:

the solving unit is used for solving an objective function of the neural network by adopting a stochastic gradient descent algorithm;

and the establishing unit is used for establishing the neural network model according to the objective function.