CN112215349B

CN112215349B - Sparse convolutional neural network acceleration method and device based on data flow architecture

Info

Publication number: CN112215349B
Application number: CN202010972552.4A
Authority: CN
Inventors: 吴欣欣; 范志华; 欧焱; 李文明; 叶笑春; 范东睿
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-09-16
Filing date: 2020-09-16
Publication date: 2024-01-12
Anticipated expiration: 2040-09-16
Also published as: CN112215349A

Abstract

The invention provides a sparse convolutional neural network acceleration method based on a data stream architecture, which comprises the following steps: obtaining positive and negative value marking information of output activation through calculation of input activation and weight matrix; marking the validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information; screening out the instruction marked as valid in the instructions according to the instruction marking information; skipping the marked invalid instruction, and executing only the marked valid instruction.

Description

Sparse convolutional neural network acceleration method and device based on data flow architecture

Technical Field

The invention relates to the technical field of computer system structures, in particular to a sparse convolutional neural network acceleration method and device based on a data flow architecture.

Background

The neural network has advanced performance in the aspects of image detection, voice recognition and natural language processing, and along with the complexity of application, the neural network model is also complex, so that many challenges are presented to the traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like. Many algorithms and accelerators for accelerating sparse networks, such as a sparse-blas library for a CPU, a custars library for a GPU, etc., have appeared, which accelerate the execution of sparse networks to some extent, and for dedicated accelerators, have advanced expressive forces in terms of performance, power consumption, etc. The data flow architecture has wide application in the aspects of big data processing, scientific calculation and the like, and the decoupled algorithm and structure enable the data flow architecture to have good universality and flexibility. The natural parallelism of the data flow architecture is well matched with the parallelism characteristic of the neural network algorithm. However, dedicated accelerators like a CPU, a GPU and an accelerating dense network cannot accelerate the sparse network, and dedicated accelerators for accelerating the sparse network lack flexibility and versatility of architecture due to strong coupling of algorithms and structures, and cannot perform innovation of algorithms.

Based on the data flow architecture, the neural network algorithm is mapped in the architecture formed by a computing array (PE array) in the form of a data flow graph, the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the flow graph represent the dependency relationship of the nodes. The PE array executes the mapped instruction to realize the operation of the neural network.

In most Deep Neural Networks (DNNs), the output of the network layer uses a rectifying linear unit (RELU) widely, negative activation data is forcedly output to be 0, and for the weight of the neural network, some weight values are set to be 0 by using pruning and other methods based on the redundancy characteristics of the weight value data, and the methods cause the neural network to generate a large number of 0-value output activation data and weight values, so that weight sparseness and activation sparseness exist in a sparse network, and the sparseness of about 50% exists in a modern DNN model. The operation of the neural network is mainly multiply-add operation, and because the multiplication of 0 value data and any value is 0, the operations can be regarded as invalid operations, and the execution of the invalid operations occupies calculation resources to cause the waste of calculation resources and power consumption, thereby prolonging the execution time of the network and reducing the performance of the network.

In order to remove these invalid calculations, in the data flow architecture, an effective method is to generate the marking information of the corresponding instructions in the data flow graph according to the characteristics of the data, and the PE array only executes the valid instructions according to the marking information before executing the instructions, and skips the invalid instructions, thereby realizing the saving of the computing resources, the reduction of the power consumption and the improvement of the performance.

However, the weight and activation data have different characteristics, the weight data is static, it does not change with the operation of the neural network, the activation data is dynamic, and the sparseness of the output activation cannot be known until one layer of the neural network is calculated.

The static data characteristic of the weight value can be utilized to generate the marking information of the instruction in the compiling stage, so that the PE array skips the instruction related to the weight value 0 according to the marking information of the instruction, but the mode is not applicable to the activation data of the dynamic characteristic. Firstly, the neural network is the execution process of a layer-by-layer, the input activation of the current network layer is derived from the output activation of the previous layer, and the activated information cannot be obtained in the compiling stage; second, it is also not known which weights and input activations are related to inactive output activations until the current layer is calculated. Therefore, in the compiling stage, the mode of using the instruction mark information to enable the PE array to skip the invalid instruction when executing can only use the sparsity of the weight and can not use the sparsity of the activation data, so that all operations related to the 0 value can not be removed sufficiently, the waste of calculation resources and the increase of power consumption are caused, and the performance of the neural network is also reduced.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a sparse convolutional neural network acceleration method based on a data stream architecture, which comprises the following steps:

step 1, obtaining positive and negative value marking information of output activation through calculation of input activation and weight matrix;

step 2, marking the validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information;

step 3, screening out the instruction marked as effective in the instructions according to the instruction marking information;

and step 4, skipping the instruction marked as invalid, and executing only the instruction marked as valid.

The invention also provides a sparse convolutional neural network accelerating device based on the data flow architecture, which comprises:

the instruction execution unit is used for calculating the operation of the input activation and weight matrix, obtaining positive and negative value marking information of output activation and executing the instruction marked as valid in the instructions;

the prediction marking unit is used for marking the validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information;

and the instruction selection unit is used for screening out the instruction marked as valid in the instructions according to the instruction marking information.

Aiming at the sparsity of the activation data which cannot be dynamically generated in a data flow architecture, the invention provides a device for predicting the sparsity of the output activation data, which is realized by using a short time before a network layer is calculated, and corresponding instruction marking information of network operation is generated by using the predicted activation data, so that a calculation array skips an instruction marked as invalid according to the instruction marking information, thereby removing operation related to invalid output activation and realizing acceleration of a sparse neural network.

Drawings

FIG. 1 is a schematic diagram of the execution of a convolution.

Fig. 2 is a schematic diagram of the structure of the output activated prediction device.

FIG. 3 is a schematic diagram of an instruction that needs to be executed to calculate m+1 output activations.

FIG. 4 is a schematic diagram of a prediction process of output activation according to an embodiment of the present invention.

Detailed Description

In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.

In the data flow architecture, the PE array executes instructions required for the neural network operation mapped thereon by the compiler, and the generated output data is passed through a RELU activation unit to generate output activation data, where the RELU unit changes the output data having a negative value to a 0 value, so that a large number of 0 values exist in the output activation data. The output activation data of each layer of the neural network is known for sparsity only after the layer has been executed. Therefore, there are ineffective computations in executing the sparse neural network in the data flow architecture, and these ineffective computations occupy computing resources, which hinders performance improvement.

Fig. 1 shows a convolution implementation, in which the input activation (Ifmap, size 4*4) and the filter (Fliter, size 3*3) perform convolution operations, and Output data (Output, size 2×2) is generated after 4 convolution operations, i.e. Conv1-Conv4, and the Output data is strongly negative by the ReLU unit to be 0. Since Conv3 and Conv4 output results are negative, after the ReLU activation function, their output activation values are all 0, so Conv3 and Conv4 perform invalid operations.

Based on this, in order to utilize the sparsity of the activation data, it is necessary to predict the sparsity of the output activation in advance, so that the ineffective operation is removed by utilizing the sparsity, the calculation amount of the forward reasoning of the neural network is reduced, and the improvement of performance and the reduction of power consumption are realized.

The invention designs a prediction device for outputting activation data aiming at a sparse neural network based on Singular Value Decomposition (SVD) of weight, which predicts whether the output activation data is 0 value or not before forward operation is executed, generates mark information of instructions related to the output activation data according to a prediction result to indicate whether the instructions are effective or not, and a calculation unit skips invalid instructions according to the mark information, so that the saving of calculation resources, the improvement of performance and the reduction of power consumption are finally realized.

The invention comprises the following key points:

a key point 1, a prediction method and a device for outputting activation data;

a key point 2, a marking process of an invalid instruction related to the sparse activation data;

and the key point 3, the PE array determines whether to execute the instruction according to the marking information of the instruction.

(1) Output activation data prediction method and device

Assume that the input of layer I is activated as I _l The weight matrix is W _l Then the activation of layer l+1 is I _{l-shaped incense 1} ＝Relu(I _l W _l ) For weight matrix W _l Using SVD decomposition to make W _l ＝UΣV ^T Wherein U represents W _l Is represented by a diagonal matrix, and V represents W _l Right singular matrices of (a).

W _l Is approximated by:

wherein->U _r Represents W _l Front r columns of left singular matrix, Σ _r Representing the first r columns of the diagonal matrix, V _r Represents W _l The first r columns of the right singular matrix.

Since r < h, r < w), after the low-rank approximation, the computational complexity O (r (h+w)) is much smaller than the forward operation complexity O (hw).

Will W _l Is applied to the network by low rank approximation of (2) with Wherein U' =u _r ，/>

If I _l The result of the W' operation (convolution operation mode) is smaller than 0, that is, the sign is negative, and the result is output as 0 after passing through the Relu activating unit. Based on such predictions, these operations may be skipped to reduce the computational effort of the network and speed up the operation of the network.

Fig. 2 is a schematic diagram of an output activation legal prediction device based on Singular Value Decomposition (SVD). As shown in fig. 2, the prediction means consists of an output activation index (out_act_index) and its flag information (out_act_flag) and an instruction index (inst_index) and its flag information (inst_flag). Wherein, the out_act_index indicates the position information of output activation, the out_act_flag stores the positive and negative value information of each output activation, if 0 indicates that the output activation value is negative, and 1 indicates that the output activation value is positive; inst_index indicates the location information of the instruction, inst_flag stores valid or invalid information for each instruction, e.g., 0 indicates that the instruction is invalid and 1 indicates that the instruction is valid.

The instruction execution unit (inst execution unit) completes I _l After the operation of U 'V', the predicted positive and negative value marking information of output activation is obtained, the positive and negative value marking information of output activation is stored in the out_act_flag of the prediction device, and relevant instructions are marked as valid or invalid according to the index out_act_index of output activation and the marking information out_act_flag. The instruction selecting unit (inst selection unit) screens out valid instructions (valid inst) according to the tag information inst_flag of the instruction, so that the instruction executing unit skips invalid instructions (inst_flag is 0) and only executes valid instructions (inst_flag is 1).

By predicting output activation, execution of some invalid instructions related to 0 values is removed, execution times of the instructions are reduced, execution time of a network is shortened, performance of the network is improved, and energy consumption is reduced.

Specifically, as shown in fig. 4, in a PE array, and further describing the execution of convolution in detail, it is assumed that m+1 output activation data (out 0-outm) are calculated, and the number of instructions required is shown in fig. 3.

Step one: performing Singular Value Decomposition (SVD) on a weight matrix W (a filter of a convolution layer or a weight matrix of a full connection layer, wherein the information of the weight is known in the parameters of a neural network) offline to obtain U and V of the matrix;

step two: performing low-rank approximation on the weight matrix, namely taking the first r ranks of the weight matrix, and approximating the weight matrix as U 'V';

step three: an instruction execution unit in the PE array executes instructions to calculate the operation of the input activation I and the approximate matrix U 'V' and generate m+1 output activation predicted values;

step four: storing predicted output activation flag information (out_act_flag) in a predictor, marking the output activation flag information according to each output activation index (out_act_index), wherein the out_act_flag value of out_act_index is 0, indicating that the activation output is a negative value, the out_act_index is 1,2, and the out_act_flag value of m is 1, indicating that the activation output is a positive value;

step five: and marking the marking information of the instruction related to the output activation according to the marking information of the output activation, wherein the relevant instruction with out_act_index of 0 is block0, and the corresponding instruction in the instruction block is marked invalid because the activation output of the relevant instruction is marked with 0, the relevant instruction with out_act_index of 1 and 2 and m is marked with block1, block2 and block m, and the corresponding instruction in the instruction block is marked valid because the activation output of the relevant instruction is marked with 1.

Step six: an instruction selecting unit (inst selection unit) screens out effective instructions block1, block2 and block m according to the marking information of the instructions;

step seven: after the valid instructions are screened out, the instruction execution unit (inst execution unit) executes only valid instructions, thereby skipping invalid instructions.

Of course, the present invention is capable of other various embodiments and its several details are capable of modification and variation in light of the present invention, as will be apparent to those skilled in the art, without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. The sparse convolutional neural network acceleration method based on the data stream architecture is characterized by comprising the following steps of:

step 4, skipping the instruction marked as invalid, and executing only the instruction marked as valid;

the step 1 specifically includes:

singular value decomposition is carried out on the weight matrix, and low-rank approximation is carried out, so that an approximation matrix is obtained; and obtaining positive and negative value marking information of the output activation through calculation of the input activation and the approximation matrix;

will W _l Is applied to the network by low rank approximation of (2) with Wherein U' =u _r ，/>If I _l The result after the operation of W' is smaller than 0, namely the sign is negative, and the result is output as 0 after passing through a Relu activating unit;

the input of the first layer is activated as I _l The weight matrix is W _l For weight matrix W _l Using SVD decomposition to make W _l ＝UΣV ^T WhereinU represents W _l Is represented by a diagonal matrix, and V represents W _l Right singular matrix of (a);

W _l is approximated by:wherein->U _r Represents W _l Front r columns of left singular matrix, Σ _r Representing the first r columns of the diagonal matrix, V _r Represents W _l The first r columns of the right singular matrix.

2. The method for accelerating a sparse convolutional neural network based on a data stream architecture according to claim 1, wherein the weight matrix comprises a filter of a convolutional layer or a weight matrix of a fully connected layer.

3. The method of claim 1, wherein positive and negative sign information of the output activation indicates positive and negative values of the output activation using 1 and 0, respectively.

4. A sparse convolutional neural network acceleration method based on a data flow architecture according to claim 1 or 3, wherein the instruction tag information indicates that the instruction is valid and invalid using 1 and 0, respectively.

5. A sparse convolutional neural network acceleration device based on a data stream architecture, comprising:

the instruction execution unit is used for calculating the operation of the input activation and weight matrix and obtaining positive and negative value marking information of output activation, and executing the instruction marked as valid in the instructions;

the instruction selecting unit is used for screening out the effective instruction marked in the instructions according to the instruction marking information;

the instruction execution unit specifically includes:

the method comprises the steps of performing singular value decomposition on the weight matrix and performing low-rank approximation to obtain an approximation matrix; and obtaining positive and negative value marking information of the output activation through calculation of the input activation and the approximation matrix;

6. The sparse convolutional neural network acceleration device of claim 5, wherein the weight matrix comprises a filter of a convolutional layer or a weight matrix of a fully connected layer.

7. The sparse convolutional neural network acceleration apparatus of claim 5, wherein positive and negative value signature information for the output activation indicates positive and negative values, respectively, for the output activation using 1 and 0.

8. The sparse convolutional neural network acceleration apparatus of claim 5 or 7, wherein the instruction tag information indicates that the instruction is valid and invalid using 1 and 0, respectively.