CN112215349A

CN112215349A - Sparse convolution neural network acceleration method and device based on data flow architecture

Info

Publication number: CN112215349A
Application number: CN202010972552.4A
Authority: CN
Inventors: 吴欣欣; 范志华; 欧焱; 李文明; 叶笑春; 范东睿
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-09-16
Filing date: 2020-09-16
Publication date: 2021-01-12
Anticipated expiration: 2040-09-16
Also published as: CN112215349B

Abstract

The invention provides a sparse convolution neural network acceleration method based on a data flow architecture, which comprises the following steps: obtaining positive and negative value marking information of output activation by calculating operation of input activation and a weight matrix; marking validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information; screening out the instructions marked as effective in the instructions according to the instruction marking information; skipping the instructions marked as invalid and executing only the instructions marked as valid.

Description

Sparse convolution neural network acceleration method and device based on data flow architecture

Technical Field

The invention relates to the technical field of computer system structures, in particular to a sparse convolution neural network acceleration method and device based on a data flow architecture.

Background

The neural network has advanced performance in the aspects of image detection, voice recognition and natural language processing, the neural network model is complicated along with application complexity, a plurality of challenges are provided for traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like. Many algorithms and accelerators for accelerating a sparse network have appeared, such as a CPU-oriented sparse-blas library, a GPU-oriented custare library, and the like, which accelerate the execution of the sparse network to some extent, and for a dedicated accelerator, have advanced expressive power in terms of performance, power consumption, and the like. The data flow architecture has wide application in the aspects of big data processing, scientific calculation and the like, and the decoupled algorithm and structure enable the data flow architecture to have good universality and flexibility. The natural parallelism of the dataflow architecture matches well the parallel nature of the neural network algorithm. However, the special accelerator like the CPU, the GPU and the acceleration-intensive network cannot accelerate the sparse network, and the special accelerator for accelerating the sparse network cannot innovate the algorithm because the strong coupling of the algorithm and the structure lacks flexibility and versatility of the architecture.

Based on a data flow architecture, a neural network algorithm is mapped in an architecture formed by a computing array (PE array) in the form of a data flow graph, the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and directed edges of the flow graph represent the dependency relationship of the nodes. The PE array executes the mapped instruction to realize the operation of the neural network.

In most Deep Neural Networks (DNN), a rectifying linear unit (RELU) is widely used for outputting of a network layer, negative value activation data are forcibly output to be 0, meanwhile, for the weight of the neural networks, based on the redundancy characteristic of the weight data, methods such as pruning and the like are used for setting some weights to be 0, and the methods result in that a large amount of 0 value output activation data and weights are generated in the neural networks, so that weight sparseness and activation sparseness exist in the sparse networks, and about 50% of sparseness exists in modern DNN models. The operation of the neural network is mainly multiplication and addition operation, and the multiplication of the 0 value data and any value is 0, so the operations can be regarded as invalid operations, the execution of the invalid operations occupies computing resources, the waste of the computing resources and the power consumption is caused, the execution time of the network is prolonged, and the performance of the network is reduced.

In order to remove these invalid calculations, in a data flow architecture, an effective method is to generate marking information of a corresponding instruction in a data flow graph according to characteristics of data, and before executing the instruction, a PE array only executes an effective instruction according to the marking information and skips an invalid instruction, thereby saving computing resources, reducing power consumption and improving performance.

However, the weight and activation data have different characteristics, the weight data is static and does not change with the operation of the neural network, while the activation data is dynamic, and the sparsity of output activation cannot be known until one layer of the neural network is calculated.

The static data characteristic of the weight can be used for generating the marking information of the instruction in the compiling stage, so that the PE array skips 0 weight-related instruction according to the marking information of the instruction, but the mode is not suitable for the active data of the dynamic characteristic. Firstly, a neural network is an execution process of a layer-by-layer, the input activation of a current network layer is derived from the output activation of a previous layer, and the activated information cannot be obtained in a compiling stage; second, it is not known which weights and input activations are related to the invalid output activations before the current layer is calculated. Therefore, in the compiling stage, the mode of skipping the invalid instruction during the execution of the PE array by using the instruction marking information can only use the sparsity of the weight value but cannot use the sparsity of the activation data, so that all operations related to the 0 value cannot be sufficiently removed, which causes the waste of computing resources and the increase of power consumption, and also reduces the performance of the neural network.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a sparse convolution neural network acceleration method based on a data flow architecture, which comprises the following steps:

step 1, calculating input activation and weight matrix to obtain positive and negative value marking information of output activation;

step 2, according to the positive and negative value marking information of the output activation, marking the validity and invalidity of the instruction related to the output activation to obtain instruction marking information;

step 3, screening out the instructions marked as effective in the instructions according to the instruction marking information;

and 4, skipping the instructions marked as invalid and only executing the instructions marked as valid.

The invention also provides a sparse convolution neural network accelerating device based on the data flow architecture, which comprises the following components:

the instruction execution unit is used for calculating the operation of input activation and the weight matrix, obtaining the positive and negative value marking information of output activation and executing the instruction marked as effective in the instruction;

the prediction marking unit is used for marking the validity and the invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information;

and the instruction selection unit is used for screening out the instructions marked as valid in the instructions according to the instruction marking information.

Aiming at the sparsity of activation data which cannot be dynamically generated in a data flow architecture, the invention provides a prediction device of sparse activation data, which can predict the sparsity of output activation before a calculation network layer by using a short time, and generate corresponding instruction marking information of network operation by using the predicted activation data, so that a calculation array skips instructions marked as invalid according to the instruction marking information, thereby removing the operation related to invalid output activation and realizing the acceleration of a sparse neural network.

Drawings

Fig. 1 is a schematic diagram of the convolution implementation.

Fig. 2 is a schematic diagram of a prediction device outputting activation.

FIG. 3 is a diagram illustrating instructions that need to be executed to compute m +1 output activations.

FIG. 4 is a diagram illustrating a prediction process of output activation according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

In the data flow architecture, the PE array executes instructions required by neural network operation mapped on the PE array by a compiler, generated output data passes through a RELU activation unit to generate output activation data, and the RELU unit changes the output data with negative values into 0 values, so that a large number of 0 values exist in the output activation data. The output activation data of each layer of the neural network only knows the sparsity of the layer after the layer is executed. Therefore, there are invalid calculations when executing the sparse neural network in the data flow architecture, and these invalid calculations occupy the computational resources, which hinders the performance improvement.

Fig. 1 shows the convolution process, in which the input activation (Ifmap, size 4 × 4) and the filter (Fliter, size 3 × 3) perform convolution operation, and after 4 times of convolution operations, i.e., Conv1-Conv4, Output data (Output, size 2 × 2) is generated, and the Output data is forced to 0 by the ReLU unit. Since the output results of Conv3 and Conv4 are negative values, and after the ReLU activation function, the output activation values are both 0, the operations performed by Conv3 and Conv4 are invalid.

Therefore, in order to utilize the sparsity of the activation data, the sparsity of the output activation needs to be predicted in advance, so that invalid operation is removed by utilizing the sparsity, the calculation amount of the neural network forward reasoning is reduced, and the performance and the power consumption are improved.

Aiming at the sparse neural network, the invention designs a prediction device for outputting activation data based on weighted Singular Value Decomposition (SVD), so as to predict whether the output activation data is 0 value before executing forward operation, generates marking information of an instruction related to the output activation data according to the prediction result to indicate whether the instruction is effective, and a calculation unit realizes skipping of an invalid instruction according to the marking information, thereby finally realizing saving of calculation resources, improving of performance and reducing of power consumption.

The invention comprises the following key points:

the key point 1, a prediction method and a device for outputting activation data;

key point 2, a marking process of invalid instructions related to sparse activation data;

at key point 3, the PE array determines whether to execute the instruction according to the tag information of the instruction.

(1) Prediction method and device for output activation data

Assume that the input activation of the l-th layer is I_lThe weight matrix is W_lActivation of the l +1 th layer is I_{l Xiang 1}＝Relu(I_lW_l) To the weight matrix W_lUsing SVD decomposition such that W_l＝UΣV^TWherein

U represents W_lLeft singular matrix of (d), sigma denotes diagonal matrix, V denotes W_lRight singular matrix of (a).

W_lThe low rank approximation of (d) is:

wherein

U_rRepresents W_lFirst r columns of left singular matrices, sigma_rRepresenting the first r columns, V, of the diagonal matrix_rRepresents W_lThe first r columns of the right singular matrix.

Because r < h, r < w, the computational complexity O (r (h + w)) is much less than the complexity O (hw) of the forward operation after low rank approximation.

W is to be_lThe low rank approximation of (2) is applied to the network, there is

Wherein U ═ U_r，

If I_lAnd (3) outputting a result which is smaller than 0 after the W' operation (convolution operation mode), namely the sign is negative, and outputting the result which is 0 after passing through the Relu activation unit. Based on such predictions, these operations may be skipped to reduce the computational load of the network and speed up the operation of the network.

Fig. 2 is a schematic diagram of an output activated human predictor based on Singular Value Decomposition (SVD). As shown in fig. 2, the prediction apparatus is composed of an output active index (out _ act _ index) and its flag information (out _ act _ flag), and an instruction index (inst _ index) and its flag information (inst _ flag). The out _ act _ flag stores positive and negative value information of each output activation, wherein 0 represents that the output activation value is negative, and 1 represents that the output activation value is positive; inst _ index indicates the location information of the instruction, and inst _ flag stores the valid or invalid information of each instruction, such as 0 indicating that the instruction is invalid and 1 indicating that the instruction is valid.

The instruction execution unit (inst execution unit) finishes executing I_lAnd after the U 'V' operation, obtaining predicted positive and negative value marking information of output activation, storing the positive and negative value marking information of the output activation into an out _ act _ flag of the prediction device, and marking related instructions as valid or invalid according to the index out _ act _ index of the output activation and the marking information out _ act _ flag. The instruction selection unit (inst selection unit) screens out a valid instruction (valid inst) according to flag information inst _ flag of the instruction, so that the instruction execution unit skips an invalid instruction (inst _ flag is 0) and executes only the valid instruction (inst _ flag is 1).

By predicting the output activation, invalid instruction execution related to 0 value is removed, the instruction execution times are reduced, the network execution time is shortened, the network performance is improved, and the energy consumption is also reduced.

Specifically, as shown in fig. 4, within a PE array, in combination with the convolution execution process, further detailed description is given, assuming that m +1 output activation data (out0-outm) are calculated, and the number of instructions required is as shown in fig. 3.

The method comprises the following steps: performing Singular Value Decomposition (SVD) on a weight matrix W (weight matrix of a filter of a convolutional layer or a fully-connected layer, wherein in parameters of a neural network, weight information is known) offline to obtain U and V of the matrix;

step two: carrying out low-rank approximation on the weight matrix, namely taking the first r ranks of the weight matrix and approximating the weight matrix to be U 'V';

step three: executing an instruction by an instruction execution unit in the PE array to calculate the operation of the input activation I and the approximate matrix U 'V' and generate predicted values of m +1 output activations;

step four: storing predicted output active flag information (out _ act _ flag) in a predictor, and marking the output active flag information according to each output active index (out _ act _ index), wherein the out _ act _ flag value of 0 of the out _ act _ index indicates that the active output is a negative value, the out _ act _ index is 1 and 2, and the out _ act _ flag value of m is 1 and indicates that the active output is a positive value;

step five: marking the marking information of the instruction related to the output activation according to the marking information of the output activation, wherein the related instruction with the out _ act _ index of 0 is block0, the instructions in the corresponding instruction block are all marked as invalid due to the activation output marking of 0, the related instructions with the out _ act _ index of 1, 2, m are block1, block2 and block m, and the instructions in the corresponding instruction block are all marked as valid due to the activation output marking of 1.

Step six: an instruction selection unit (inst selection unit) screens out effective instructions of block1, block2 and block m according to the marking information of the instructions;

step seven: after the valid instruction is screened, the instruction execution unit (inst execution unit) executes only the valid instruction, thereby skipping over the invalid instruction.

The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A sparse convolution neural network acceleration method based on a data flow architecture is characterized by comprising the following steps:

2. The data flow architecture-based sparse convolutional neural network acceleration method of claim 1, wherein the step 1 specifically comprises:

performing singular value decomposition on the weight matrix and performing low-rank approximation to obtain an approximation matrix; and calculating the operation of the input activation and the approximate matrix to obtain the positive and negative value marking information of the output activation.

3. The method according to claim 1 or 2, wherein the weight matrix comprises a filter of convolutional layer or a weight matrix of full link layer.

4. The data flow architecture based sparse convolutional neural network acceleration method of claim 1, wherein the positive and negative value flag information of the output activation respectively represents that the output activation is a positive value and a negative value using 1 and 0.

5. The data flow architecture based sparse convolutional neural network acceleration method of claim 1 or 4, wherein the instruction flag information uses 1 and 0 to indicate that the instruction is valid and invalid, respectively.

6. A sparse convolutional neural network acceleration device based on a data flow architecture, comprising:

the instruction execution unit is used for calculating the operation of the input activation and the weight matrix, obtaining the positive and negative value marking information of the output activation, and executing the instruction marked as effective in the instruction;

7. The data flow architecture-based sparse convolutional neural network acceleration device of claim 6, wherein the instruction execution unit specifically comprises:

the weight matrix is used for carrying out singular value decomposition and low-rank approximation to obtain an approximation matrix; and calculating the operation of the input activation and the approximate matrix to obtain the positive and negative value marking information of the output activation.

8. The sparse convolutional neural network acceleration device based on data flow architecture of claim 6 or 7, wherein the weight matrix comprises a filter of convolutional layer or a weight matrix of full link layer.

9. The data flow architecture based sparse convolutional neural network acceleration device of claim 6, wherein the positive and negative value flag information of the output activation uses 1 and 0 to indicate that the output activation is a positive value and a negative value, respectively.

10. The data-flow-architecture-based sparse convolutional neural network acceleration device of claim 6 or 9, wherein the instruction flag information uses 1 and 0 to indicate that the instruction is valid and invalid, respectively.