CN112288085A

CN112288085A - Convolutional neural network acceleration method and system

Info

Publication number: CN112288085A
Application number: CN202011147836.6A
Authority: CN
Inventors: 吴欣欣; 范志华; 轩伟; 李文明; 叶笑春; 范东睿
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2021-01-29
Anticipated expiration: 2040-10-23
Also published as: CN112288085B

Abstract

The invention provides a convolutional neural network acceleration method and a convolutional neural network acceleration system, which comprise the steps of taking an image to be subjected to characteristic analysis as input to activate and input into a convolutional neural network, decomposing a weight vector of a filter in the convolutional neural network, and obtaining a symbol vector corresponding to a weight in the filter; performing convolution operation on the symbol vector and the input activation vector to obtain a first convolution result, performing convolution operation on the compensation factor and the input activation vector to obtain a second convolution result, and adding the first convolution result and the second convolution result to obtain a prediction result; and when the convolution neural network executes convolution calculation, skipping 0 value related operation according to the prediction result to obtain a convolution result. The invention can predict the sparsity of output activation to guide the original neural network operation to skip the operation related to the 0 value, thereby reducing the calculation amount of the original network, saving the calculation resource, reducing the power consumption and improving the performance.

Description

Convolutional neural network acceleration method and system

Technical Field

The invention relates to a computer system structure, in particular to a convolutional neural network acceleration method and system for predicting sparse activation data based on weight symbols.

Background

The neural network has advanced performance in the aspects of image detection, voice recognition and natural language processing, the neural network model is complicated along with application complexity, a plurality of challenges are provided for traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like. Many algorithms and accelerators for accelerating a sparse network have appeared, such as a CPU-oriented sparse-blas library, a GPU-oriented custare library, and the like, which accelerate the execution of the sparse network to some extent, and for a dedicated accelerator, have advanced expressive power in terms of performance, power consumption, and the like.

The neural network model becomes large and deep along with the complexity of application, which provides many challenges for traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like.

In most Deep Neural Networks (DNN), a rectifying linear unit (RELU) is widely used for outputting of a network layer, negative value activation data are forcibly output to be 0, meanwhile, for the weight of the neural networks, based on the redundancy characteristic of the weight data, methods such as pruning and the like are used for setting some weights to be 0, and the methods result in that a large amount of 0 value output activation data and weights are generated in the neural networks, so that weight sparseness and activation sparseness exist in the sparse networks, and about 50% of sparseness exists in modern DNN models. The operation of the neural network is mainly multiplication and addition operation, and the multiplication of the 0 value data and any value is 0, so the operations can be regarded as invalid operations, the execution of the invalid operations occupies computing resources, the waste of the computing resources and the power consumption is caused, the execution time of the network is prolonged, and the performance of the network is reduced.

Disclosure of Invention

Aiming at the situation that a large amount of sparse data exists in the neural network, the invention discloses a prediction device of sparse activation data, which predicts the sparsity of the activation data in advance by using smaller prediction overhead so as to guide the operation of the original neural network. The execution of the neural network is thus divided into two phases, a prediction phase and an execution phase. In the prediction stage, the weight symbols and the input activation data are used for executing network operation, meanwhile, compensation factors are added to reduce the loss of reasoning accuracy, and the prediction result of the output activation data is generated. In the execution stage, only relevant neural network operation with positive output activation predicted value is executed by using the predicted output activation data, and relevant neural network operation with negative activation predicted value is removed. Finally, the calculation amount of sparse network operation is reduced, the power consumption is reduced, and the execution performance of the network is improved.

Specifically, in order to overcome the defects in the prior art, the present invention provides a convolutional neural network acceleration method and system, wherein the convolutional neural network acceleration method comprises:

step 1, inputting an image to be subjected to characteristic analysis as input activation into a convolutional neural network, decomposing a weight vector of a filter in the convolutional neural network, and obtaining a symbol vector corresponding to a weight in the filter;

step 2, performing convolution operation through the symbol vector and the input activation vector to obtain a first convolution result, performing convolution operation through the compensation factor and the input activation vector to obtain a second convolution result, and adding the first convolution result and the second convolution result to obtain a prediction result;

and 2, skipping 0 value related operation according to the prediction result when the convolutional neural network executes convolutional calculation to obtain a convolutional result.

The convolutional neural network acceleration method and system, wherein the step 2 comprises: and judging whether a numerical value less than or equal to 0 exists in the prediction result, if so, acquiring the vector position of the numerical value less than or equal to 0 in the prediction result, skipping the calculation process related to the vector position when performing convolution calculation to obtain an activation output result, and setting the numerical value at the vector position in the activation output result to zero to obtain the convolution result.

The convolutional neural network acceleration method and system, wherein the step 1 comprises: and taking the high-order weight value in the filter in the convolutional neural network as the symbol vector.

The convolutional neural network acceleration method and system are characterized in that the value range of the compensation factor is larger than 0 and smaller than 1.

The acceleration method and system of the convolutional neural network are characterized in that the convolution calculation is shown as the following formula:

O＝∑I*W

w is the filter weight, I is the input activation, and O is the convolution calculation result.

The invention also provides a convolutional neural network acceleration system and a convolutional neural network acceleration system, wherein the convolutional neural network acceleration system comprises:

the method comprises the following steps that a module 1 takes an image to be subjected to characteristic analysis as input activation to be input into a convolutional neural network, and decomposes a weight vector of a filter in the convolutional neural network to obtain a symbol vector corresponding to a weight in the filter;

the module 2 executes convolution operation through the symbol vector and the input activation vector to obtain a first convolution result, executes convolution operation through the compensation factor and the input activation vector to obtain a second convolution result, and adds the first convolution result and the second convolution result to obtain a prediction result;

and the module 2 skips 0 value related operation according to the prediction result when the convolutional neural network executes convolutional calculation to obtain a convolutional result.

The convolutional neural network acceleration system and system, wherein, this module 2 includes: and judging whether a numerical value less than or equal to 0 exists in the prediction result, if so, acquiring the vector position of the numerical value less than or equal to 0 in the prediction result, skipping the calculation process related to the vector position when performing convolution calculation to obtain an activation output result, and setting the numerical value at the vector position in the activation output result to zero to obtain the convolution result.

The convolutional neural network acceleration system and system, wherein, this module 1 includes: and taking the high-order weight value in the filter in the convolutional neural network as the symbol vector.

The convolutional neural network acceleration system and the convolutional neural network acceleration system are characterized in that the value range of the compensation factor is larger than 0 and smaller than 1.

The convolutional neural network acceleration system and system, wherein the convolution calculation is shown as the following formula:

O＝∑I*W

The technical progress of the application of the invention is to provide a prediction method and a prediction system, which can predict the sparsity of output activation to guide the original neural network operation to skip the operation related to the 0 value, thereby reducing the calculation amount of the original network, saving the calculation resources, reducing the power consumption and improving the performance.

Drawings

FIG. 1 is a diagram of a predictor and actuator framework based on weight notation;

FIG. 2 is a detailed block diagram of a predictor and actuator based on weight notation;

FIG. 3 is a flow chart of a prediction phase;

fig. 4 is a flow chart of an execution phase.

Detailed Description

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

The prediction method comprises the following steps:

the convolutional layer calculation formula of the neural network is shown as the following formula:

O＝∑I*W

＝∑I*(W_msh＜＜m+W_lsb)

＝∑I*(W_msb*2^m+W_lsb)

＝∑I*2^m*(W_msh+W_lsb*2^-m)

＝∑2^m*(I*W_msb+I*W_lsb*2^-m)

≈∑2^m*(I*W_msb+I*W₁*a)

convolution layer filterThe weight (W) is mapped to the input feature map (I) to extract input feature information. Since the filter weight can be decomposed into high bits (W)_msb) And low (W)_lsb) The convolution operation is therefore divided into two parts, input and high order filter computation and input and low order filter computation. W₁It is an all 1 matrix, and the size of the matrix is the same as the size of the input activation. 0 is the convolution calculation result. In the neural network, the filter weight is a known parameter, the input image is also called input activation, and the output characteristic diagram of the first layer is obtained after the convolution of the first layer, and is also called output activation. The output profile of the first layer is also the input to the next convolutional layer. The neural network is executed one layer at a time, and the calculation result of the previous layer is the input of the next layer.

The prediction based on the sign bit of the weight value only uses the most significant bit (sign bit) of the weight value to execute convolution operation so as to determine that the final output result is 0 or not 0, and simultaneously, a compensation factor a is used for compensating the precision loss of the final result. The determination of the compensation factor requires the neural network operation by setting different values, which range from 0 to 1, but the different values have different effects on the result precision, and the best compensation factor should have the least effect on the result precision. Assuming a is 0.5, by performing I W_msbAnd I W₁0.5, if the sum of the two is negative, the activation value is 0 after the output passes through the activation function (RELU), otherwise, the activation value is positive. Based on the prediction result, only the relevant convolution operation with the prediction result being a positive value is selected in the original convolution calculation, and the relevant convolution operation with the prediction result being a negative value is skipped.

Based on this convolution operation, two phases are completed, a prediction phase and an execution phase. In the prediction stage, the prediction device uses the sign bit of the weight value and the compensation factor to predict the output activation, and the execution stage selects to execute the operation of non-0 output activation according to the prediction result. The prediction device is shown in fig. 1:

the detailed prediction and execution device based on weight symbolic prediction is shown in fig. 2. Convolution operation is performed with the input activations using the sign of the weight in the filter, wherein if the weight is positive, the sign is 0, the weight is negative, the sign is-1, and since 0 is multiplied by any number to be 0, only the operation with the sign of the weight of-1 is performed through the weight index, and the associated input activation is indexed by the weight sign index. At the same time, the input activates the operation of the execution and compensation factor a. And adding the results of the two to obtain a predicted output symbol, wherein if the predicted output symbol is a negative value, the output activation value becomes 0 after the value passes through the activation function, and if the predicted output symbol is a positive value, the output activation value remains unchanged after the value passes through the activation function. Based on this, the Index constraint unit calculates the correlation Index of the non-0 output activation according to the sign of the predicted value, weight Index, input act Index. According to the index information, the execution stage executes the operation of non-0 output activation, and directly outputs the 0 value for 0 output activation.

The process of predicting the output activation based on the weight sign is explained in detail in connection with the execution of the convolution, in this example, the filter size is 2 × 2, the input activation (Ifmap) size is 4 × 4, and the compensation factor is assumed to be 0.5.

The method comprises the following steps: obtaining the symbol of each weight according to the weight of the filter, wherein the symbol is 0 when the weight is greater than 0, and the symbol is-1 when the weight is less than 0;

step two: the sign of the weight and the input activation (Ifmap) perform convolution operations, wherein only the weight of-1 performs a multiply-add operation with the corresponding Ifmap, as shown in fig. 3, and the result of the convolution is-0.7, -1, -0.8, -0.3. Meanwhile, the convolution operation is also executed by the compensation factor and the input activation (Ifmap), and the convolution result is 0.8, 0.8, 0.85 and 0.95;

step three: the results obtained from the respective steps are added to obtain the predicted output data, which are shown in FIG. 3 as 0.1, -0.2, 0.05, 0.65.

Step four: in the execution stage, the operation of predicting that the output activation value is negative is skipped according to the prediction result. In fig. 3, the result-0.2 is a negative value, so the convolution operation associated with this value can be skipped during the execution stage, and as shown in fig. 4, the actuator only needs to perform convolution operations of 3 output activation values, which are 0.045, 0.145 and 0.475 respectively. And the output activation value with the predicted output value of-0.2 directly outputs 0.

As shown in fig. 4, it is determined whether a value less than or equal to 0 exists in the prediction result, if so, a vector position of the value less than or equal to 0 in the prediction result is obtained, a calculation process related to the vector position is skipped when performing convolution calculation, that is, a calculation process of a white fill color in a skip map is performed, only convolution of a gray fill color is calculated to obtain an activation output result, and a value located at the vector position in the activation output result is set to zero to obtain a convolution result.

The following is a system example corresponding to the above method example, and the present implementation system can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in the present implementation system, and are not described herein again for the sake of reducing repetition. Accordingly, the related-art details mentioned in the present embodiment system can also be applied to the above-described embodiments.

O＝∑I*W

Claims

1. A convolutional neural network acceleration method and system are characterized by comprising the following steps:

2. The convolutional neural network acceleration method and system as claimed in claim 1, wherein the step 2 comprises: and judging whether a numerical value less than or equal to 0 exists in the prediction result, if so, acquiring the vector position of the numerical value less than or equal to 0 in the prediction result, skipping the calculation process related to the vector position when performing convolution calculation to obtain an activation output result, and setting the numerical value at the vector position in the activation output result to zero to obtain the convolution result.

3. The convolutional neural network acceleration method and system as claimed in claim 1, wherein the step 1 comprises: and taking the high-order weight value in the filter in the convolutional neural network as the symbol vector.

4. The convolutional neural network acceleration method and system of claim 1, wherein the range of values of the compensation factor is greater than 0 and less than 1.

5. The convolutional neural network acceleration method and system as claimed in claim 1, wherein the convolution calculation is shown by the following formula:

O＝∑I*W

6. A convolutional neural network acceleration system and system, characterized by comprising:

7. The convolutional neural network acceleration system and system as claimed in claim 1, wherein the module 2 comprises: and judging whether a numerical value less than or equal to 0 exists in the prediction result, if so, acquiring the vector position of the numerical value less than or equal to 0 in the prediction result, skipping the calculation process related to the vector position when performing convolution calculation to obtain an activation output result, and setting the numerical value at the vector position in the activation output result to zero to obtain the convolution result.

8. The convolutional neural network acceleration system and system as claimed in claim 1, wherein the module 1 comprises: and taking the high-order weight value in the filter in the convolutional neural network as the symbol vector.

9. The convolutional neural network acceleration system and system of claim 1, wherein the compensation factor has a value range greater than 0 and less than 1.

10. The convolutional neural network acceleration system and system of claim 1, wherein the convolution calculation is represented by the following formula:

O＝∑I*W