CN110837887A

CN110837887A - Compression and acceleration method of deep convolutional neural network, neural network model and application thereof

Info

Publication number: CN110837887A
Application number: CN201911103074.7A
Authority: CN
Inventors: 张菊莉; 贺占庄
Original assignee: Xian Microelectronics Technology Institute
Current assignee: Xian Microelectronics Technology Institute
Priority date: 2019-11-12
Filing date: 2019-11-12
Publication date: 2020-02-25

Abstract

The invention discloses a compression and acceleration method of a deep convolutional neural network, a neural network model and application thereof, and belongs to the field of deep convolutional neural networks. The method comprises the following steps: 1) converting the deep depth convolution neural network into a wide and shallow neural network structure; 2) inputting the normalized sample data into a wide and shallow neural network for training to obtain a floating point number weight; binarizing the floating point number weight and the activation function of the neural network structure to obtain a binarized neural network; inputting the binarized neural network into training data by using the binarized sample data as training data, and updating parameters until the error between the predicted value and the ground route reaches a preset error, thereby finishing training; wherein, in the training process of the neural network after the binarization, the convolution layer carries out addition and subtraction operations. The problem that the existing deep convolutional neural network cannot be applied to an embedded computing platform is solved.

Description

Compression and acceleration method of deep convolutional neural network, neural network model and application thereof

Technical Field

The invention belongs to the field of deep convolutional neural networks, and particularly relates to a compression and acceleration method of a deep convolutional neural network, a neural network model and application of the neural network model.

Background

In-orbit target identification requires a satellite to complete a series of actions such as feature extraction, classification and identification of a target in real time in an in-orbit mode, and meanwhile high accuracy and rapidity are kept. The traditional target identification method generally adopts a method of manually extracting global features and local features, then the extracted features are segmented and the global information of the target is modeled, and then the identification information of the target is given. This method has the following disadvantages: the manual feature extraction requires professional image processing knowledge, a method with good performance and robustness needs to be selected according to the characteristics of the image, and the process is complex and has certain subjectivity; the method for manually extracting the features is often a fusion of one or more methods, and the process needs to consume more time for feature extraction and fusion; the manual extraction method usually focuses on a certain aspect of an image, and the characteristics of the image cannot be comprehensively extracted, so that the final target identification has certain limitation; the object recognition has strong dependency on the features of the image.

In view of the trade-off between efficiency and performance and the urgent need of the development trend of various intelligent information processing systems, deep learning is rapidly becoming a research hotspot in the field of computer vision by virtue of strong modeling and data characterization capabilities, and has made breakthrough progress in the fields of image recognition and speech recognition. At present, high-performance earth observation satellites are developing towards intellectualization, and intelligent in-orbit satellite information processing is a key technology to be urgently broken through. The satellite in-orbit information system is a typical embedded system, and has very strict limitations on storage, memory, computing capacity, power consumption and the like, so that the operation of a deep neural network directly on a satellite information processing platform is hardly realized. Because a large amount of calculation and memory are consumed, under the current situation, the deep convolutional neural network can only run on a platform with a general image processor (GPU) and cannot be directly applied to an embedded calculation platform with limited memory, calculation and power consumption. This makes the satellite in-orbit processing system which must rely on the embedded computing platform only adopt the traditional algorithm, not adopt the deep learning algorithm with higher performance to improve the in-orbit processing capability of the system. This computational bottleneck greatly limits the speed of the satellite in-orbit information processing system.

Disclosure of Invention

The invention aims to solve the problem that the existing deep convolutional neural network cannot be directly applied to a general computing platform without GPU (graphics processing unit), and provides a compression and acceleration method of the deep convolutional neural network, a neural network model and application thereof.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

a compression and acceleration method of a deep convolutional neural network comprises the following steps:

1) converting the deep depth convolution neural network into a wide and shallow neural network structure;

2) inputting the normalized sample data into a wide and shallow neural network for training to obtain a floating point number weight;

binarizing the floating point number weight and the activation function of the neural network structure to obtain a binarized neural network;

inputting the binarized neural network into training data by using the binarized sample data as training data, and updating parameters until the error between the predicted value and the ground route reaches a preset error, thereby finishing training;

during the training process of the binary neural network, the convolution operation is converted into addition and subtraction operation.

Further, the transformation process in the step 1) is specifically as follows:

and shearing and cascading basic convolution units in the deep convolutional neural network structure to change the basic convolution units into the wide and shallow neural network.

Further, the step 2) specifically comprises:

201) standardizing the training sample to obtain a second training sample;

202) inputting the normalized training sample into a neural network which is widened and lightened to train so as to obtain a floating point number weight;

203) binary value of floating point number weight value obtained in training process is 1 and-1, and stored as binary weight value;

binarizing the activation function of the neural network into a binary activation function to obtain a binarized neural network;

204) inputting the training sample after binarization into a neural network after binarization for training, and outputting a predicted value of the training in the current round;

wherein, the convolution operation of the neural network after the two-system is converted into addition and subtraction operation in the training process;

205) calculating the gradient of back propagation by adopting binary weight values, and updating parameters;

206) calculating the error between the predicted value output by the training of the current round and the ground channel, and entering step 207 if the error reaches a preset error; otherwise, repeating steps 204) -206);

207) and finishing the training.

Further, the specific process of step 203) includes:

defining a binary value neural network stage;

the weight value and the activation function stage of the binary neural network.

Further, defining the binary value neural network specifically includes:

representing each convolution structure as < I, W >;

wherein I is a set of tensors, and each element I is I_lAn input tensor which is the L-th layer of the convolutional neural network, wherein L is 1, and L is the number of layers of the convolutional neural network;

w is a corresponding set of tensors, each element W_lkA kth weight filter, k 1, k, representing the l-th layer of the convolutional neural network^l，K^lThe number of weight filters of layer l of the CNN;

represents the convolution operation of I and W,wherein c represents the number of channels, w_inRepresents the width, h_inRepresents height;

wherein w is less than or equal to w_in，h≤h_in。

Further, the weight and activation function of the binary neural network are specifically as follows:

by using

Representing an operation with binary weights, wherein,

represents a convolution operation by an addition and an addition operation; b represents a binary filter, B ∈ { +1, -1}^c×w×h；

Represents a scale factor, W is approximately equal to α B;

the binary weights are obtained by the following optimization function:

as can be seen from the above formula development and analysis, the binarization filter B can be obtained by a maximization constraint optimization term as follows:

if W_iGreater than or equal to 0, then B_iNot greater than +1, otherwise B_iIs-1, therefore

B^*＝sign(W) (7)

By obtaining the partial derivative of J (B, α) with respect to αn is a constant, sign (W) is used instead of B^*Can obtain the product

Wherein, W^TRepresenting the transpose of the weight W, sign is a generic activation function,

expressing the L1 paradigm over w。

The neural network model is obtained by the compression and acceleration method of the deep convolutional neural network.

The application of the neural network model is applied to a satellite embedded computing platform for target identification.

Compared with the prior art, the invention has the following beneficial effects:

the compression and acceleration method of the deep convolutional neural network simplifies and accelerates the convolutional neural network model, and removes the dependence of the algorithm based on the deep convolutional neural network on the hardware structure and the corresponding algorithm (the deep algorithm is operated on a GPU and the acceleration algorithm aiming at the GPU is required) through simplified compression; by accelerating, the occupied memory of the neural network compression model obtained by training is reduced by about one 32 times compared with the original floating point weight theoretically, when the binarization weight is trained by adopting binarization input, the relative lifting speed is obviously improved in the GPU environment under the same condition and the CPU under the same condition, and the binarization weight is obviously superior to the calculation speed of the floating point weight on the CPU under the same condition. The target identification accuracy is reduced by about 10% -15% relative to a standard convolutional neural network, when input data is not binarized and only a binarization weight value is adopted for prediction inference, the identification accuracy is reduced by about 8% -10%, the accuracy is reduced to a certain extent, but the required storage space is obviously reduced, the calculation efficiency is obviously improved, and the method can be applied to mobile equipment with limited storage and limited calculation resources in an embedded mode.

The neural network model obtained by the compression and acceleration method of the deep convolutional neural network reduces the requirements of the neural network on computing resources and storage resources, and can be transplanted to a satellite embedded computing platform with limited computing resources, storage resources and energy consumption resources for target identification.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a block diagram of the present invention for transforming a conventional convolutional neural network structure into a wide and shallow neural network structure;

FIG. 3 is a block diagram of the present invention for transforming a convolutional neural network structure from a wide to a shallow convolutional neural network structure into a binary neural network structure;

FIG. 4 is a weight method of a binarized neural network according to the present invention;

FIG. 5 is a flow chart of neural network training according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention is described in further detail below with reference to the accompanying drawings:

the invention provides a compression and acceleration method of a deep convolutional neural network, which can directly apply the improved neural network to a satellite embedded computing platform. Referring to fig. 1, fig. 1 is a flowchart of an embodiment of the present invention, which includes the following steps:

s1, converting a deep-layer deep convolutional neural network into a wide-shallow neural network structure, specifically:

the deep neural network is deformed structurally: cutting and cascading basic convolution units in the deep convolution neural network structure to change the basic convolution units into a wide and shallow neural network; namely, the original series basic convolution units are selectively changed into a multi-stage cascade form, and the structural characteristics of the deep wide and shallow neural network can be continuously expanded to the wide and shallow neural network.

S2, binary system weight value training is carried out until a preset condition is reached, a trained neural network model is obtained, after input data are changed into binary data, a binary system training method is adopted to train weights in a wide and shallow network, and the final weights obtained through the training of the method are used for testing, specifically:

201) standardizing the training sample to obtain a second training sample;

wherein, the convolution layer of the neural network after the two-system is processed with addition and subtraction operations in the training process;

207) and finishing the training.

S3, applying the neural network model of the training number to an embedded computing platform for target recognition; the invention adopts the process of changing the deep convolution neural network structure into the wide and shallow state, then carrying out binarization on the input image and the network weight, and adopts a specific training method to apply the obtained training model to the reasoning process of the network model.

The convolutional neural network structure YOLO with better accuracy and real-time performance in the aspects of image classification and identification is taken as an acceleration example at present. The data set adopts a self-built ship satellite remote sensing image data set, and the specific implementation steps are as follows:

1) the total number of layers of the new network structure is 10, the number of parallel layers is 2, odd layers of the original 31-layer network structure are reserved, even layers are cut out to enable the odd layers to be directly connected, the even layers are cascaded with the corresponding basic layers, the network is changed from 31 layers to 15 layers, then the cutting and the cascading are carried out according to the method, the network structure can be changed into a 10-layer or more simplified network structure, and the structure of each layer can be correspondingly changed; the process is called widening and shallowing, specifically, the adopted basic network structure unit is shown in fig. 2, fig. 2 is a process of converting a common convolutional neural network structure into a wide and shallow neural network structure module, modularly dividing a deep convolutional neural network, taking all small modules in a block as a basic unit of each layer, then performing shearing cascade connection on the basic units, removing redundant neurons by using a discarding strategy to form a basic unit in a new network structure, and then stacking the basic units in a series connection mode to form the new network structure, wherein after the mode, the number of network layers is reduced, the network width is increased, and the process is called widening and shallowing.

2) Defining a binary-valued neural network

Each convolution structure is represented as<I,W,*>I is a set of tensors, each element I ═ I_lAn input tensor which is the L-th layer of the convolutional neural network, wherein L is 1, and L is the number of layers of the convolutional neural network; w is the set of tensors, where each element W_lkA kth weight filter, k 1, k, representing the l-th layer of the convolutional neural network^l；K^lThe number of weight filters of layer l of the CNN; represents the convolution operation of I and W,

wherein c represents the number of channels, w_inAnd h_inRespectively representing width and height;

wherein w is less than or equal to w_in，h≤h_in(ii) a Referring to fig. 3, fig. 3 is a block of the present invention, which is transformed from the convolution neural network structure with the width being increased and the width being decreased into the binary neural network structure, the left side is each basic convolution unit after the width being increased and the width being decreased, and the right side is the block of the basic convolution unit for binarization. Each layer requires binarization for the convolution operation.

3) Binary neural network weight and activation function

Referring to fig. 4, fig. 4 is a weight method of the binarized neural network of the present invention, a rectangular solid on the left side of fig. 4 represents a standard floating point weight W, and the right binarized weight can be obtained after the scaling scale factor α obtained by solving W according to the formula (1) and the binarization filter are calculated.

By using

Representing an operation with binary weights, wherein,

representing a convolution operation without multiplication, the convolution operation being performed by addition and subtraction; b represents a binary filter, B ∈ { +1, -1}^c×w×hBy using

Representing the scale factor, W ≈ α B, the binary weight is obtained by the following optimization function:

B^*＝sign(W) (10)

The deviation of J (B, α) from α is obtainedn is a constant, sign (W) is used instead of B^*Can obtain the product

Thus, weight binarization can be obtained by the optimization equation described above, B^*The method can be realized by a sign function, and the scale factor can be obtained by the average value of the absolute values of the weights;

obtaining a binary neural network model;

4) carrying out binarization on each image sample in the data set;

5) inputting the binarized sample data into the binarized neural network model neural network structure for binary weight training, referring to fig. 5, where fig. 5 is a neural network training flow chart of the present invention, and the training process is divided into two steps: forward propagation process, backward propagation and parameter update phases,

set a minipatch input and target: (I, Y) loss function is expressed as

The current network weight is W^tThe current learning rate is η^tThe total number of layers of the neural network is L, and each iteration process is as follows:

(1) starting from L-1 layer to L layer, each weight in L layer is binarized, e.g., the kth filter in L layer is calculated as follows:

wherein, the weight of each layer is a floating point weight obtained by inputting a normalized training sample into the simplified neural network;

(2) final prediction in forward propagation

Except that the convolution operation uses formulae

Calculating, and performing standard forward propagation on the rest by using the formula;

(3) formula for calculating partial derivatives in back propagation processI.e. using the binary-valued result weights to derive the partial derivatives instead of the floating-point number W^t；

(4) The parameters are updated with a gradient descent algorithm,

(5) updating the learning rate parameter, η^t+1＝UpdateLearningrate(η^t,t)；

(6)

Stopping training when a preset error is reached, and returning to the step (5) to repeat training if the preset error is not reached;

6) and (5) using the trained model for testing to predict the neural network.

The acceleration mode mainly comprises two steps, wherein the first step is to simplify the network structure, namely the process of widening and lightening, and the second step is to train the simplified network structure into a binary network. The network structure is simplified mainly for the purpose of reducing the number of network layers, facilitating network training, preventing the problem of gradient explosion, reducing the neural network parameters and reducing the occupied memory; the binarization process is mainly used for compressing the weight and accelerating the test process. The two parts are combined together, so that the parameters can be greatly reduced, and the test reasoning process is accelerated. The network weight compression process is a parameter reduction process, the benefit is that the training weight becomes smaller, the test reasoning is accelerated, and the loss is that the precision is reduced. The more the number of layers is reduced, the smaller the weight is, and the more the precision is reduced. Therefore, under the condition that the precision is reduced acceptably, the network is appropriately simplified and binarized, and the deep neural network is favorably transplanted to an embedded platform only with a CPU.

See table 1, where table 1 shows the conditions of the embodiment and the experimental results, taking image detection as an example, the data set adopts a satellite remote sensing image data set, each image is 11.9MB in size, and the total number of the images is 242, and the experimental process includes training and testing processes. Finally, through experimental results, the network weight change and classification accuracy change after acceleration by the method are observed, and the acceleration condition is observed.

The 31-layer Yolo-Darknet neural network is used as an original network structure, the last layer is a detection layer, and the detection layer is simplified into a 10-layer architecture through the process described by the invention.

The minimum batch size during training is 128, the initial learning rate is 0.01, the momentum is 0.8, and the subsequent learning rate is 0.001. The programming language is lua.

Table 1 conditions of examples and experimental results

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A compression and acceleration method of a deep convolutional neural network is characterized by comprising the following steps:

2. The method for compressing and accelerating a deep convolutional neural network as claimed in claim 1, wherein the transformation process in step 1) is specifically:

3. The method for compressing and accelerating a deep convolutional neural network as claimed in claim 1, wherein the step 2) specifically comprises:

201) standardizing the training sample to obtain a second training sample;

207) and finishing the training.

4. The method for compressing and accelerating a deep convolutional neural network as claimed in claim 3, wherein the specific process of step 203) comprises:

defining a binary value neural network stage;

5. The method of compressing and accelerating a deep convolutional neural network as claimed in claim 4, wherein the binary value neural network is defined as:

representing each convolution structure as < I, W >;

wherein I is a set of tensors, and each element I is I_lAs a convolutional neural network_lAn input tensor of a layer, L ═ 1., L is the number of layers of the convolutional neural network;

represents the convolution operation of I and W,

wherein c represents the number of channels, w_inRepresents the width, h_inRepresents height;

wherein w is less than or equal to w_in，h≤h_in。

6. The method of claim 4, wherein the weight and activation function of the binary neural network are specifically:

by using

Representing an operation with binary weights, wherein,

Represents a scale factor, W is approximately equal to α B;

the binary weights are obtained by the following optimization function:

B^*＝sign(W) (3)

By obtaining the partial derivative of J (B, α) with respect to α

n is a constant and is substituted by sign (W)B is substituted by^*Can obtain the product

Wherein, W^TRepresenting the transpose of the weight W, sign is a generic activation function,the L1 paradigm for w is shown.

7. A neural network model obtained by the compression and acceleration method of the deep convolutional neural network as set forth in any one of claims 1 to 6.

8. An application of the neural network model according to claim 7, wherein the application is applied to a satellite embedded computing platform for target recognition.