CN112215349B - Sparse convolutional neural network acceleration method and device based on data flow architecture - Google Patents

Sparse convolutional neural network acceleration method and device based on data flow architecture Download PDF

Info

Publication number
CN112215349B
CN112215349B CN202010972552.4A CN202010972552A CN112215349B CN 112215349 B CN112215349 B CN 112215349B CN 202010972552 A CN202010972552 A CN 202010972552A CN 112215349 B CN112215349 B CN 112215349B
Authority
CN
China
Prior art keywords
instruction
matrix
activation
neural network
output activation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010972552.4A
Other languages
Chinese (zh)
Other versions
CN112215349A (en
Inventor
吴欣欣
范志华
欧焱
李文明
叶笑春
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202010972552.4A priority Critical patent/CN112215349B/en
Publication of CN112215349A publication Critical patent/CN112215349A/en
Application granted granted Critical
Publication of CN112215349B publication Critical patent/CN112215349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a sparse convolutional neural network acceleration method based on a data stream architecture, which comprises the following steps: obtaining positive and negative value marking information of output activation through calculation of input activation and weight matrix; marking the validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information; screening out the instruction marked as valid in the instructions according to the instruction marking information; skipping the marked invalid instruction, and executing only the marked valid instruction.

Description

Sparse convolutional neural network acceleration method and device based on data flow architecture
Technical Field
The invention relates to the technical field of computer system structures, in particular to a sparse convolutional neural network acceleration method and device based on a data flow architecture.
Background
The neural network has advanced performance in the aspects of image detection, voice recognition and natural language processing, and along with the complexity of application, the neural network model is also complex, so that many challenges are presented to the traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like. Many algorithms and accelerators for accelerating sparse networks, such as a sparse-blas library for a CPU, a custars library for a GPU, etc., have appeared, which accelerate the execution of sparse networks to some extent, and for dedicated accelerators, have advanced expressive forces in terms of performance, power consumption, etc. The data flow architecture has wide application in the aspects of big data processing, scientific calculation and the like, and the decoupled algorithm and structure enable the data flow architecture to have good universality and flexibility. The natural parallelism of the data flow architecture is well matched with the parallelism characteristic of the neural network algorithm. However, dedicated accelerators like a CPU, a GPU and an accelerating dense network cannot accelerate the sparse network, and dedicated accelerators for accelerating the sparse network lack flexibility and versatility of architecture due to strong coupling of algorithms and structures, and cannot perform innovation of algorithms.
Based on the data flow architecture, the neural network algorithm is mapped in the architecture formed by a computing array (PE array) in the form of a data flow graph, the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the flow graph represent the dependency relationship of the nodes. The PE array executes the mapped instruction to realize the operation of the neural network.
In most Deep Neural Networks (DNNs), the output of the network layer uses a rectifying linear unit (RELU) widely, negative activation data is forcedly output to be 0, and for the weight of the neural network, some weight values are set to be 0 by using pruning and other methods based on the redundancy characteristics of the weight value data, and the methods cause the neural network to generate a large number of 0-value output activation data and weight values, so that weight sparseness and activation sparseness exist in a sparse network, and the sparseness of about 50% exists in a modern DNN model. The operation of the neural network is mainly multiply-add operation, and because the multiplication of 0 value data and any value is 0, the operations can be regarded as invalid operations, and the execution of the invalid operations occupies calculation resources to cause the waste of calculation resources and power consumption, thereby prolonging the execution time of the network and reducing the performance of the network.
In order to remove these invalid calculations, in the data flow architecture, an effective method is to generate the marking information of the corresponding instructions in the data flow graph according to the characteristics of the data, and the PE array only executes the valid instructions according to the marking information before executing the instructions, and skips the invalid instructions, thereby realizing the saving of the computing resources, the reduction of the power consumption and the improvement of the performance.
However, the weight and activation data have different characteristics, the weight data is static, it does not change with the operation of the neural network, the activation data is dynamic, and the sparseness of the output activation cannot be known until one layer of the neural network is calculated.
The static data characteristic of the weight value can be utilized to generate the marking information of the instruction in the compiling stage, so that the PE array skips the instruction related to the weight value 0 according to the marking information of the instruction, but the mode is not applicable to the activation data of the dynamic characteristic. Firstly, the neural network is the execution process of a layer-by-layer, the input activation of the current network layer is derived from the output activation of the previous layer, and the activated information cannot be obtained in the compiling stage; second, it is also not known which weights and input activations are related to inactive output activations until the current layer is calculated. Therefore, in the compiling stage, the mode of using the instruction mark information to enable the PE array to skip the invalid instruction when executing can only use the sparsity of the weight and can not use the sparsity of the activation data, so that all operations related to the 0 value can not be removed sufficiently, the waste of calculation resources and the increase of power consumption are caused, and the performance of the neural network is also reduced.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a sparse convolutional neural network acceleration method based on a data stream architecture, which comprises the following steps:
step 1, obtaining positive and negative value marking information of output activation through calculation of input activation and weight matrix;
step 2, marking the validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information;
step 3, screening out the instruction marked as effective in the instructions according to the instruction marking information;
and step 4, skipping the instruction marked as invalid, and executing only the instruction marked as valid.
The invention also provides a sparse convolutional neural network accelerating device based on the data flow architecture, which comprises:
the instruction execution unit is used for calculating the operation of the input activation and weight matrix, obtaining positive and negative value marking information of output activation and executing the instruction marked as valid in the instructions;
the prediction marking unit is used for marking the validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information;
and the instruction selection unit is used for screening out the instruction marked as valid in the instructions according to the instruction marking information.
Aiming at the sparsity of the activation data which cannot be dynamically generated in a data flow architecture, the invention provides a device for predicting the sparsity of the output activation data, which is realized by using a short time before a network layer is calculated, and corresponding instruction marking information of network operation is generated by using the predicted activation data, so that a calculation array skips an instruction marked as invalid according to the instruction marking information, thereby removing operation related to invalid output activation and realizing acceleration of a sparse neural network.
Drawings
FIG. 1 is a schematic diagram of the execution of a convolution.
Fig. 2 is a schematic diagram of the structure of the output activated prediction device.
FIG. 3 is a schematic diagram of an instruction that needs to be executed to calculate m+1 output activations.
FIG. 4 is a schematic diagram of a prediction process of output activation according to an embodiment of the present invention.
Detailed Description
In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.
In the data flow architecture, the PE array executes instructions required for the neural network operation mapped thereon by the compiler, and the generated output data is passed through a RELU activation unit to generate output activation data, where the RELU unit changes the output data having a negative value to a 0 value, so that a large number of 0 values exist in the output activation data. The output activation data of each layer of the neural network is known for sparsity only after the layer has been executed. Therefore, there are ineffective computations in executing the sparse neural network in the data flow architecture, and these ineffective computations occupy computing resources, which hinders performance improvement.
Fig. 1 shows a convolution implementation, in which the input activation (Ifmap, size 4*4) and the filter (Fliter, size 3*3) perform convolution operations, and Output data (Output, size 2×2) is generated after 4 convolution operations, i.e. Conv1-Conv4, and the Output data is strongly negative by the ReLU unit to be 0. Since Conv3 and Conv4 output results are negative, after the ReLU activation function, their output activation values are all 0, so Conv3 and Conv4 perform invalid operations.
Based on this, in order to utilize the sparsity of the activation data, it is necessary to predict the sparsity of the output activation in advance, so that the ineffective operation is removed by utilizing the sparsity, the calculation amount of the forward reasoning of the neural network is reduced, and the improvement of performance and the reduction of power consumption are realized.
The invention designs a prediction device for outputting activation data aiming at a sparse neural network based on Singular Value Decomposition (SVD) of weight, which predicts whether the output activation data is 0 value or not before forward operation is executed, generates mark information of instructions related to the output activation data according to a prediction result to indicate whether the instructions are effective or not, and a calculation unit skips invalid instructions according to the mark information, so that the saving of calculation resources, the improvement of performance and the reduction of power consumption are finally realized.
The invention comprises the following key points:
a key point 1, a prediction method and a device for outputting activation data;
a key point 2, a marking process of an invalid instruction related to the sparse activation data;
and the key point 3, the PE array determines whether to execute the instruction according to the marking information of the instruction.
(1) Output activation data prediction method and device
Assume that the input of layer I is activated as I l The weight matrix is W l Then the activation of layer l+1 is I l-shaped incense 1 =Relu(I l W l ) For weight matrix W l Using SVD decomposition to make W l =UΣV T Wherein U represents W l Is represented by a diagonal matrix, and V represents W l Right singular matrices of (a).
W l Is approximated by:
wherein->U r Represents W l Front r columns of left singular matrix, Σ r Representing the first r columns of the diagonal matrix, V r Represents W l The first r columns of the right singular matrix.
Since r < h, r < w), after the low-rank approximation, the computational complexity O (r (h+w)) is much smaller than the forward operation complexity O (hw).
Will W l Is applied to the network by low rank approximation of (2) with Wherein U' =u r ,/>
If I l The result of the W' operation (convolution operation mode) is smaller than 0, that is, the sign is negative, and the result is output as 0 after passing through the Relu activating unit. Based on such predictions, these operations may be skipped to reduce the computational effort of the network and speed up the operation of the network.
Fig. 2 is a schematic diagram of an output activation legal prediction device based on Singular Value Decomposition (SVD). As shown in fig. 2, the prediction means consists of an output activation index (out_act_index) and its flag information (out_act_flag) and an instruction index (inst_index) and its flag information (inst_flag). Wherein, the out_act_index indicates the position information of output activation, the out_act_flag stores the positive and negative value information of each output activation, if 0 indicates that the output activation value is negative, and 1 indicates that the output activation value is positive; inst_index indicates the location information of the instruction, inst_flag stores valid or invalid information for each instruction, e.g., 0 indicates that the instruction is invalid and 1 indicates that the instruction is valid.
The instruction execution unit (inst execution unit) completes I l After the operation of U 'V', the predicted positive and negative value marking information of output activation is obtained, the positive and negative value marking information of output activation is stored in the out_act_flag of the prediction device, and relevant instructions are marked as valid or invalid according to the index out_act_index of output activation and the marking information out_act_flag. The instruction selecting unit (inst selection unit) screens out valid instructions (valid inst) according to the tag information inst_flag of the instruction, so that the instruction executing unit skips invalid instructions (inst_flag is 0) and only executes valid instructions (inst_flag is 1).
By predicting output activation, execution of some invalid instructions related to 0 values is removed, execution times of the instructions are reduced, execution time of a network is shortened, performance of the network is improved, and energy consumption is reduced.
Specifically, as shown in fig. 4, in a PE array, and further describing the execution of convolution in detail, it is assumed that m+1 output activation data (out 0-outm) are calculated, and the number of instructions required is shown in fig. 3.
Step one: performing Singular Value Decomposition (SVD) on a weight matrix W (a filter of a convolution layer or a weight matrix of a full connection layer, wherein the information of the weight is known in the parameters of a neural network) offline to obtain U and V of the matrix;
step two: performing low-rank approximation on the weight matrix, namely taking the first r ranks of the weight matrix, and approximating the weight matrix as U 'V';
step three: an instruction execution unit in the PE array executes instructions to calculate the operation of the input activation I and the approximate matrix U 'V' and generate m+1 output activation predicted values;
step four: storing predicted output activation flag information (out_act_flag) in a predictor, marking the output activation flag information according to each output activation index (out_act_index), wherein the out_act_flag value of out_act_index is 0, indicating that the activation output is a negative value, the out_act_index is 1,2, and the out_act_flag value of m is 1, indicating that the activation output is a positive value;
step five: and marking the marking information of the instruction related to the output activation according to the marking information of the output activation, wherein the relevant instruction with out_act_index of 0 is block0, and the corresponding instruction in the instruction block is marked invalid because the activation output of the relevant instruction is marked with 0, the relevant instruction with out_act_index of 1 and 2 and m is marked with block1, block2 and block m, and the corresponding instruction in the instruction block is marked valid because the activation output of the relevant instruction is marked with 1.
Step six: an instruction selecting unit (inst selection unit) screens out effective instructions block1, block2 and block m according to the marking information of the instructions;
step seven: after the valid instructions are screened out, the instruction execution unit (inst execution unit) executes only valid instructions, thereby skipping invalid instructions.
Of course, the present invention is capable of other various embodiments and its several details are capable of modification and variation in light of the present invention, as will be apparent to those skilled in the art, without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (8)

1. The sparse convolutional neural network acceleration method based on the data stream architecture is characterized by comprising the following steps of:
step 1, obtaining positive and negative value marking information of output activation through calculation of input activation and weight matrix;
step 2, marking the validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information;
step 3, screening out the instruction marked as effective in the instructions according to the instruction marking information;
step 4, skipping the instruction marked as invalid, and executing only the instruction marked as valid;
the step 1 specifically includes:
singular value decomposition is carried out on the weight matrix, and low-rank approximation is carried out, so that an approximation matrix is obtained; and obtaining positive and negative value marking information of the output activation through calculation of the input activation and the approximation matrix;
will W l Is applied to the network by low rank approximation of (2) with Wherein U' =u r ,/>If I l The result after the operation of W' is smaller than 0, namely the sign is negative, and the result is output as 0 after passing through a Relu activating unit;
the input of the first layer is activated as I l The weight matrix is W l For weight matrix W l Using SVD decomposition to make W l =UΣV T WhereinU represents W l Is represented by a diagonal matrix, and V represents W l Right singular matrix of (a);
W l is approximated by:wherein->U r Represents W l Front r columns of left singular matrix, Σ r Representing the first r columns of the diagonal matrix, V r Represents W l The first r columns of the right singular matrix.
2. The method for accelerating a sparse convolutional neural network based on a data stream architecture according to claim 1, wherein the weight matrix comprises a filter of a convolutional layer or a weight matrix of a fully connected layer.
3. The method of claim 1, wherein positive and negative sign information of the output activation indicates positive and negative values of the output activation using 1 and 0, respectively.
4. A sparse convolutional neural network acceleration method based on a data flow architecture according to claim 1 or 3, wherein the instruction tag information indicates that the instruction is valid and invalid using 1 and 0, respectively.
5. A sparse convolutional neural network acceleration device based on a data stream architecture, comprising:
the instruction execution unit is used for calculating the operation of the input activation and weight matrix and obtaining positive and negative value marking information of output activation, and executing the instruction marked as valid in the instructions;
the prediction marking unit is used for marking the validity and invalidity of the instruction related to the output activation according to the positive and negative value marking information of the output activation to obtain instruction marking information;
the instruction selecting unit is used for screening out the effective instruction marked in the instructions according to the instruction marking information;
the instruction execution unit specifically includes:
the method comprises the steps of performing singular value decomposition on the weight matrix and performing low-rank approximation to obtain an approximation matrix; and obtaining positive and negative value marking information of the output activation through calculation of the input activation and the approximation matrix;
will W l Is applied to the network by low rank approximation of (2) with Wherein U' =u r ,/>If I l The result after the operation of W' is smaller than 0, namely the sign is negative, and the result is output as 0 after passing through a Relu activating unit;
the input of the first layer is activated as I l The weight matrix is W l For weight matrix W l Using SVD decomposition to make W l =UΣV T WhereinU represents W l Is represented by a diagonal matrix, and V represents W l Right singular matrix of (a);
W l is approximated by:wherein->U r Represents W l Front r columns of left singular matrix, Σ r Representing the first r columns of the diagonal matrix, V r Represents W l The first r columns of the right singular matrix.
6. The sparse convolutional neural network acceleration device of claim 5, wherein the weight matrix comprises a filter of a convolutional layer or a weight matrix of a fully connected layer.
7. The sparse convolutional neural network acceleration apparatus of claim 5, wherein positive and negative value signature information for the output activation indicates positive and negative values, respectively, for the output activation using 1 and 0.
8. The sparse convolutional neural network acceleration apparatus of claim 5 or 7, wherein the instruction tag information indicates that the instruction is valid and invalid using 1 and 0, respectively.
CN202010972552.4A 2020-09-16 2020-09-16 Sparse convolutional neural network acceleration method and device based on data flow architecture Active CN112215349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010972552.4A CN112215349B (en) 2020-09-16 2020-09-16 Sparse convolutional neural network acceleration method and device based on data flow architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010972552.4A CN112215349B (en) 2020-09-16 2020-09-16 Sparse convolutional neural network acceleration method and device based on data flow architecture

Publications (2)

Publication Number Publication Date
CN112215349A CN112215349A (en) 2021-01-12
CN112215349B true CN112215349B (en) 2024-01-12

Family

ID=74049599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010972552.4A Active CN112215349B (en) 2020-09-16 2020-09-16 Sparse convolutional neural network acceleration method and device based on data flow architecture

Country Status (1)

Country Link
CN (1) CN112215349B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505383A (en) * 2021-07-02 2021-10-15 中国科学院计算技术研究所 ECDSA algorithm execution system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106297778A (en) * 2015-05-21 2017-01-04 中国科学院声学研究所 The neutral net acoustic model method of cutting out based on singular value decomposition of data-driven
CN107239829A (en) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 A kind of method of optimized artificial neural network
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106297778A (en) * 2015-05-21 2017-01-04 中国科学院声学研究所 The neutral net acoustic model method of cutting out based on singular value decomposition of data-driven
CN107239829A (en) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 A kind of method of optimized artificial neural network
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Learning bothWeights and Connections for Efficient Neural Networks;Song Han et al.;《arXiv.org》;20151030;全文 *

Also Published As

Publication number Publication date
CN112215349A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
US10402725B2 (en) Apparatus and method for compression coding for artificial neural network
US10691996B2 (en) Hardware accelerator for compressed LSTM
Lu et al. SpWA: An efficient sparse winograd convolutional neural networks accelerator on FPGAs
US11568258B2 (en) Operation method
US11055063B2 (en) Systems and methods for deep learning processor
US20180260709A1 (en) Calculating device and method for a sparsely connected artificial neural network
KR20200002607A (en) Deep neural network architecture using piecewise linear approximation
CN111783974A (en) Model construction and image processing method and device, hardware platform and storage medium
US9424032B2 (en) List vector processing apparatus, list vector processing method, storage medium, compiler, and information processing apparatus
CN112580793A (en) Neural network accelerator based on time domain memory computing and acceleration method
CN112215349B (en) Sparse convolutional neural network acceleration method and device based on data flow architecture
Wu et al. Exploring deep reuse in winograd CNN inference
US11551087B2 (en) Information processor, information processing method, and storage medium
Reddy et al. Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications
Gao et al. FPGA-based accelerator for independently recurrent neural network
Gou et al. Re-training and parameter sharing with the Hash trick for compressing convolutional neural networks
CN113722668A (en) Processing unit, correlation device, and tensor operation method
KR20220077709A (en) Neural network operation method, apparatus and keyword spotting methd using the same neural network operation
Parashar et al. Processor pipelining method for efficient deep neural network inference on embedded devices
CN112015472B (en) Sparse convolutional neural network acceleration method and system based on data flow architecture
Pietras et al. FPGA implementation of logarithmic versions of Baum-Welch and Viterbi algorithms for reduced precision hidden Markov models
Shipton et al. Implementing WaveNet Using Intel® Stratix® 10 NX FPGA for Real-Time Speech Synthesis
JP2806262B2 (en) Process allocation method for multiprocessor system
US20230196124A1 (en) Runtime predictors for neural network computation reduction
Zhang et al. Dynamic Runtime Feature Map Pruning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant