CN109086883A - Method and device for realizing sparse calculation based on deep learning accelerator - Google Patents
Method and device for realizing sparse calculation based on deep learning accelerator Download PDFInfo
- Publication number
- CN109086883A CN109086883A CN201810803430.5A CN201810803430A CN109086883A CN 109086883 A CN109086883 A CN 109086883A CN 201810803430 A CN201810803430 A CN 201810803430A CN 109086883 A CN109086883 A CN 109086883A
- Authority
- CN
- China
- Prior art keywords
- data
- feature vector
- input
- value
- input feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention discloses a method and a device for realizing sparse calculation based on a deep learning accelerator, wherein the method comprises the following steps: s1, starting scoring processing operation when input data from each channel contains a specified number of 0; when the scoring processing operation is started, compressing the input data to filter 0-value neurons to obtain compressed input data; s2, sequentially acquiring each input characteristic value in the compressed input data according to a channel sequence for judgment, storing a target input characteristic value to a pre-configured data cache region if the data required by the calculation of the current output characteristic value is judged, and switching to other channels to acquire the next input characteristic value for judgment if the data is not judged; and S3, sending the information of all input characteristic values in the data cache region to a multiplier array in an accelerator to execute sparse calculation. The device comprises a control module and a scoring processing module, and has the advantages of simple implementation method, low required cost, high calculation efficiency, low energy consumption and the like.
Description
Technical field
The present invention relates to deep learning accelerator art field, more particularly to one kind are dilute based on the realization of deep learning accelerator
Dredge the method calculated.
Background technique
In neural network and deep learning, to obtain better Forecasting recognition effect, can by the hardware of higher performance,
The means such as huger tape label training data or wider deeper network, but there are following two major defects: first is that more
Deep broader network model can generate flood tide parameter, thus the phenomenon that being easy to appear over-fitting, and the problem is non-in label data
It is very prominent when often limited, and to avoid over-fitting, the requirement and skill to a small amount of label data are very high;Another
Be that calculation amount can be significantly greatly increased in network size increasing, consume more computing resources, and in practical application, computing resource it is pre-
It is all very limited for calculating, and efficient distributed computing resource becomes more and more important to ever-increasing network size.It solves
The basic method of above-mentioned two defect is to convert partially connected for full connection even general convolution, on the one hand reality biology mind
Connection through system is sparse, another aspect, and for the neural network of Large Scale Sparse, the statistics that can analyze activation value is special
Property and to highly relevant output clustered come successively construct an optimal network, i.e., too fat to move sparse network can be by not
Lose the simplification of performance.
In neural network when preference pattern, selects least feature or only use Partial Feature as classification foundation, it can
To obtain good result.Also there is the characteristic of sparsity, in neural pathways for vision, many neurons for the research of human brain
It can only make a response to specific stimulation, such as color, texture, direction, scale, if the substrate being made of these neurons, to
On signal be sparse.In field of signal processing, L1-norm is calculated by convex optimization, was obtained complete subbasal dilute
Dredging indicates, has got more and more applications.
Isomery accelerator has outstanding power dissipation ratio of performance, and carrying out accelerans network algorithm using deep learning accelerator is
Current research hotspot, key technology are how Processing with Neural Network system efficiently to be realized on accelerator.Utilize nerve
The fault-tolerance of network, can be approximately by the value of characteristic pattern zero data as zero processing, excavate data among sparsity, mention
Computationally efficient.Mainstream deep learning accelerator lacks to provide sparse network and effectively support at present, needs to be cut off with zero padding
Weight, then calculated, therefore can not be benefited from sparse network with common mode.And in the hardware design, at present substantially
Mask technology is all employed, i.e., is directly not processed when the input of data channel or memory are complete zero, Ke Yizhi
Minimum energy consumption is connect, but the unnecessary clock cycle can not be masked in this way.
In deep learning algorithm, the operation of convolution algorithm or full articulamentum accounts for most calculating, convolution or complete
Activation operation is generally carried out after articulamentum operation, the most commonly used is ReLu activated namely data input after, be greater than
0 number output or the number itself, the output that counts less than 0 is just 0, thus can be generated largely after the full articulamentum activation of convolution sum
0, about 40~60% or so, and 0 multiplied by a several result or 0, if it is possible to remove 0 operation, can greatly drop
Low-power consumption improves calculated performance.But current traditional some accelerators are often by the way of Gating, i.e. judgement input number
If being 0, just arithmetic element is closed, exports 0 automatically, although such mode can be to avoid 0 multiplication, it remains unchanged
A beat period is wasted, still results in a large amount of energy consumption waste, while influencing computational efficiency.
Summary of the invention
The technical problem to be solved in the present invention is that, for technical problem of the existing technology, the present invention provides one
Kind of implementation method is simple, it is required it is at low cost, computational efficiency is high and what low energy consumption realizes sparse calculation based on deep learning accelerator
Method.
In order to solve the above technical problems, technical solution proposed by the present invention are as follows:
A method of sparse calculation is realized based on deep learning accelerator, this method comprises:
S1. starting score processing operation when in the input data from each channel comprising specified number 0;Start at score
When reason operation, input data is compressed to filter out 0 value neuron therein, obtain compressed input data, be transferred to and hold
Row step S2;
S2. each input feature vector value in compressed input data is successively obtained by channel sequence to be judged, if it is determined that
It is the data needed for current output characteristic value calculates to target input feature vector value, target input feature vector value is stored to being pre-configured with
Data buffer area, otherwise switch to other channels and obtain next input feature vector value in compressed input data and sentenced
It is disconnected, until completing the judgement of all input feature vector values in input data;
S3. it is dilute to execute the information of input feature vector values all in data buffer area to be sent to multiplier array in accelerator
It dredges and calculates.
Further improvement as the method for the present invention: in step S1 by input data specifically using stride long compression method into
Row compression.
Further improvement as the method for the present invention: specifically when deep learning first layer inputs, control is not opened in step S1
Dynamic score processing operation, the control starting score processing operation after carrying out ReLu activation operation.
Further improvement as the method for the present invention: specifically target input feature vector value and target are inputted in step S2 special
The corresponding true address of value indicative is stored to data buffer area.
Further improvement as the method for the present invention: specifically by input feature vector values all in data buffer area in step S3
Value and corresponding true address are sent to multiplier array, and multiplier array is corresponding according to true address acquisition input feature vector value
Weighted value after input feature vector value is executed multiplying with the weighted value of corresponding acquisition, exports multiplication result.
Further improvement as the method for the present invention: it is pre-configured with for storing the first number for calculating required input characteristic value
The second data buffer area according to buffer area and for storing non-computational required input characteristic value;If it is determined that arriving mesh in step S2
Mark input feature vector value is the data needed for current output characteristic value calculates, and is controlled target input feature vector value, target input feature vector
The true address of the sparse address in input data and target input feature vector value is stored in the first data buffer storage to value upon compression
Otherwise area is stored to the second data buffer area.
The present invention further provides the dresses for implementing the above-mentioned method for realizing sparse calculation based on deep learning accelerator
It sets, comprising:
Control module, for starting score processing operation when in the input data from each channel comprising specified number 0;
Score processing module, for executing score processing operation, including sequentially connected data input cell, compression list
Member, judging unit, data buffer area and transmission unit, data input cell access all input datas, and output is to compression
Unit, compression unit compress input data to filter out 0 value neuron therein, obtain compressed input data;Sentence
Disconnected unit successively obtains each input feature vector value in compressed input data by channel sequence and is judged, if it is determined that arriving target
Input feature vector value is the data needed for current output characteristic value calculates, and target input feature vector value is stored to data buffer area;It passes
Defeated unit by the information of input feature vector values all in data buffer area be sent in accelerator multiplier array needed for executing in terms of
It calculates.
Further improvement as apparatus of the present invention: the data buffer area includes calculating required input feature for storing
First data buffer area of value and the second data buffer area for storing non-computational required input characteristic value;In control module
If it is determined that target input feature vector value be the data needed for current output characteristic value calculates, control by target input feature vector value,
The true address of the sparse address in input data and target input feature vector value is stored in target input feature vector value upon compression
Otherwise first data buffer area is stored to the second data buffer area.
Further improvement as apparatus of the present invention: score processing module further includes that the address connecting with control module increases certainly
Unit, for the sparse address by obtaining each input feature vector value from increase.
Further improvement as apparatus of the present invention: ping-pong type data processing method is used in score processing module, i.e., will
The input feature vector value for completing processing is stored to data storage area, when being transferred to multiplier array through transmission unit, while by data
Input unit accesses input feature vector value, is stored after being judged by judging unit to data storage area.
Compared with the prior art, the advantages of the present invention are as follows:
1, the present invention is based on the methods that deep learning accelerator realizes sparse calculation, are calculated by the way that input feature vector value to be sent into
Compression processing is first carried out before cell processing, filters out 0 value neuron therein, then successively each input feature vector value is judged, such as
Fruit is the data needed for current output characteristic value calculates, and the information of input feature vector value is stored into specified data buffer area, is sieved
After selecting the input feature vector value needed for all current output characteristic values calculate, it is dilute to be sent to multiplier array progress in accelerator
It dredges and calculates, the deep learning accelerator of sparse operation is realized based on score processing operation, can make full use of in deep learning and roll up
The sparse characteristic of operation is accumulated to accelerate deep learning accelerator to calculate.
2, the present invention is based on the method that deep learning accelerator realizes sparse calculation, multiplication in deep learning accelerator hardware
Device array only calculates non-zero data, eliminates unnecessary 0 multiplication and invalid 0 and multiplies beat, can be avoided in deep learning calculating
Invalid 0 while multiply operation, additionally it is possible to reduce by invalid 0 waste for multiplying beat number, computing resource, so that can not only reduce nothing
Effect 0 multiplies operation to reduce power consumption, and the beat that can multiply to avoid invalid 0 improves operation and accelerator efficiency, implementation method
It is simple and versatile, only need to increase score processing operation can by common calculation method to non-sparse neural network at
Reason.
Detailed description of the invention
Fig. 1 is the implementation process schematic diagram for the method that the present embodiment realizes sparse calculation based on deep learning accelerator.
Fig. 2 is the structural schematic diagram that the present embodiment realizes the device based on deep learning accelerator sparse calculation.
Fig. 3 is the realization principle schematic diagram based on deep learning accelerator sparse calculation in the specific embodiment of the invention.
Specific embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and
It limits the scope of the invention.
As shown in Figure 1, the method that the present embodiment realizes sparse calculation based on deep learning accelerator, step include:
S1. starting score processing operation when in the input data from each channel comprising specified number 0;Start at score
When reason operation, input data is compressed to filter out 0 value neuron therein, obtain compressed input data, be transferred to and hold
Row step S2;
S2. each input feature vector value in compressed input data is successively obtained by channel sequence to be judged, if it is determined that
It is the data needed for current output characteristic value calculates to target input feature vector value, target input feature vector value is stored to being pre-configured with
Data buffer area, otherwise switch to other channels and obtain next input feature vector value in compressed input data and sentenced
It is disconnected, until completing the judgement of all input feature vector values in input data;
S3. it is dilute to execute the information of input feature vector values all in data buffer area to be sent to multiplier array in accelerator
It dredges and calculates.
The present embodiment filters out therein 0 by first carrying out compression processing before input feature vector value is sent into computing unit processing
It is worth neuron, then successively each input feature vector value is judged, the data needed for calculating if it is current output characteristic value will be defeated
The information for entering characteristic value is stored into specified data buffer area, filters out the input needed for all current output characteristic values calculate
It after characteristic value, is sent to multiplier array in accelerator and carries out sparse calculation, sparse operation is realized based on score processing operation
Deep learning accelerator can make full use of the sparse characteristic of convolution algorithm in deep learning to accelerate deep learning accelerator meter
It calculates.
The above method realizes sparse calculation through this embodiment, and multiplier array only calculates in deep learning accelerator hardware
Non-zero data eliminate unnecessary 0 multiplication and invalid 0 and multiply beat, can be avoided deep learning calculate in invalid 0 multiply operation
While, additionally it is possible to reduce by invalid 0 waste for multiplying beat number, computing resource, so that can not only reduce invalid 0 multiplies operation to drop
Low-power consumption can also avoid invalid 0 beat multiplied to improve operation and accelerator efficiency.
It, will be by meter when starting score processing operation when the present embodiment specifically works as accelerator using sparse operation mode
Sparse data after point processing operation passes to arithmetic element and carries out operation, due to being non-zero value after processing operation of scoring,
It is non-zero value that thus entire arithmetic element is received, and the operation of the entire each beat of arithmetic element is significance arithmetic;For non-
Sparse data does not start score processing operation then, to mask the score processing operation, is directly transferred to input feature vector value and multiplies
Operation is carried out in musical instruments used in a Buddhist or Taoist mass array.
Input data is specifically compressed using the long compression method that strides in the present embodiment step S1, which can
Subsequent score processing operation is directly executed without being decompressed.
In the present embodiment step S1 specifically deep learning first layer input when control do not start score processing operation, into
Control starting score processing operation after row ReLu activation operation.When the input of deep learning first layer, in data not
A large amount of data 0, operation can not use scoring board circuit, in neural net layer below, after carrying out ReLu activation
There can be a large amount of sparse data, output result contains a large amount of 0.The present embodiment specifically deep learning first layer input when,
Do not start score processing operation, input feature vector value input multiplier array is directly subjected to operation, does not influence former operation, operation is complete
At it is rear activated as needed, pond or normalization etc. operation;After carrying out ReLu activation operation, starting score processing behaviour
Make, stride using upper one layer output result as input growing and compress to filter out wherein contain a large amount of 0, then by each input feature vector
After value is successively judged, the input feature vector value needed for output characteristic value is calculated is stored into data buffer area, is completed above-mentioned
The information input of input feature vector value in data buffer area is finally subjected to sparse operation, multiplication to multiplier array after preliminary treatment
Device array exported after the completion of multiplying as a result, activated, pond or normalization etc. operation, then carry out data compression with into
Next layer of operation of row.
Specifically by target input feature vector value and the corresponding true address of target input feature vector value in the present embodiment step S2
It stores to data buffer area, true address is for obtaining the corresponding weighted value of input feature vector value;Specifically data are delayed in step S3
It deposits the value of all input feature vector values and corresponding true address in area and is sent to multiplier array,.Multiplication array receives transmission
Each input feature vector value and corresponding true address after, multiplier array according to true address obtain input feature vector value it is corresponding
Weighted value after input feature vector value is executed multiplying with the weighted value of corresponding acquisition, exports multiplication result.
The present embodiment, which is pre-configured with, calculates the first data buffer area of required input characteristic value and for depositing for storing
Store up the second data buffer area of non-computational required input characteristic value;In step S2 if it is determined that target input feature vector value be current
Output characteristic value calculate needed for data, control is by target input feature vector value, target input feature vector value input data upon compression
In sparse address and the true address of target input feature vector value be stored in the first data buffer area, otherwise store to second
Data buffer area, wherein sparse address is the address after compressing where sparse data (input feature vector value), true address is then
The actual address of input feature vector value, sparse address can specifically be generated by address from device is increased, every time sparse number after one compression of storage
According to when, corresponding sparse address is obtained from device is increased by address.
As shown in Fig. 2, the present embodiment realizes the device of the above-mentioned method that sparse calculation is realized based on deep learning accelerator,
Include:
Control module, for starting score processing operation when in the input data from each channel comprising specified number 0;
Score processing module, for executing score processing operation, including sequentially connected data input cell, compression list
Member, judging unit, data buffer area and transmission unit, data input cell access all input datas, and output is to compression
Unit, compression unit compress input data to filter out 0 value neuron therein, obtain compressed input data;Sentence
Disconnected unit successively obtains each input feature vector value in compressed input data by channel sequence and is judged, if it is determined that arriving target
Input feature vector value is the data needed for current output characteristic value calculates, and target input feature vector value is stored to data buffer area;It passes
Defeated unit by the information of input feature vector values all in data buffer area be sent in accelerator multiplier array needed for executing in terms of
It calculates.
In concrete application embodiment, the setting score plate module in entire accelerator, by control module controlling depth
First layer input data remains unchanged after plate module of scoring and is transferred to multiplier array module in habit, when accelerator is using dilute
When dredging operation mode, such as by the data of ReLu operation, starting score plate module, by score plate module first using the long pressure that strides
Contracting method compresses sparse data, then judges compressed data, and the sparse data filtered out needed for calculating is (defeated
Enter characteristic value) pass to arithmetic element and carry out operation, it can be ensured that carry out operation in entire multiplier array be will 0 reject after
Data, multiplier array complete operation after output characteristic value is exported and is further processed.
In the present embodiment, data buffer area include for store calculate required input characteristic value the first data buffer area with
And the second data buffer area for storing non-computational required input characteristic value;It is in control module if it is determined that special to target input
Value indicative is the data needed for current output characteristic value calculates, and target input feature vector value, target input feature vector value are being compressed in control
The true address of the sparse address in input data and target input feature vector value is stored in the first data buffer area afterwards, otherwise
It stores to the second data buffer area.
In the present embodiment, score processing module further includes that the address connecting with control module increases unit certainly, for by certainly
Increase the sparse address for obtaining each input feature vector value.
In the present embodiment, ping-pong type data processing method is used in processing module of scoring, the input for the processing that is near completion is special
Value indicative is stored to data storage area, when being transferred to multiplier array through transmission unit, while being accessed and being inputted by data input cell
Characteristic value is stored after being judged by judging unit to data storage area.The scoring board circuit of sparse operation is realized in accelerator
Using the structure of similar table tennis, number is inputted toward multiplier array after the storage of a part completion sparse data of scoring board circuit
According to when, the another part of scoring board circuit transmits input feature vector value from outside, judge the data whether be 0 and toward scoring board
It is injected in circuit.
As shown in figure 3, processing module of scoring in concrete application embodiment is realized especially by a scoring board, scoring board
It is connect by a transmission circuit with accelerator multiplication array, sparse address is obtained from device is increased by address, control module passes through
One control circuit realizes that control circuit is connect with scoring board, address from increasing device and transmission circuit respectively.Scoring board is divided into
Left side and right area, on the left of the storage of scoring board for from the sparse compressed address that external memory transmission is come in, sparse value and
The corresponding true address of sparse value, control circuit judge whether the true address is input value that neural net layer needs to complete,
It stores if so, the sparse value and true value address are transferred to right side scoring board, is currently needed if not neural net layer
The input value of operation is completed, then is stored sparse address and sparse value and true address on the left of scoring board storage, and
Input address is changed to another set input channel and restarts aforesaid operations, until completing sentencing for all input feature vector values
It is disconnected.
There are 4 input channels with one below, convolution kernel size is four inputs in the convolutional neural networks algorithm of 5*5
In the input feature vector table in channel for the treatment process of the first row, the above method of the present invention is further described.
As shown in table 1 it is original untreated sparse input feature vector table:
Table 1: original sparse input feature vector table.
Data in table 1 stride long compression to filter out 0 value therein, are obtained after the long compression that strides such as 2 institute of table
The input feature vector table shown, understands for convenience and the long compression algorithm that strides is fairly simple, and the right mark of numerical value is not shown true in table 2
The real long value that strides, but use true address value.
Table 2: stride long compression input feature vector table.
X0 | X2 | X3 | X7 | X8 |
Y0 | Y1 | Y5 | X6 | Y8 |
Z1 | Z2 | Z6 | Z7 | |
T3 | T4 | T5 | T6 |
Scoring board storage left side is labeled as ScoreBoard, right side is labeled as InputFeatureBuffer, takes first
First input channel first value out, judges this input feature vector value that should be calculated for first output characteristic value, with
The value and true address are put into InputFeatureBuffer afterwards, wherein true address is for taking-up pair in multiplier array
The weighted value answered;
According to the method described above until the 3rd beat has taken out X7, decision circuitry discovery X7 is not to calculate in scoring board
What current output characteristic value needed, therefore the value is put into ScoreBoard;
It then continues in the input and judgement of second input channel input feature vector value, using similar approach is saved to the 6th
The input and judgement completed to the corresponding input feature vector value of current output characteristic value are clapped, until the 12nd period completion input is special
Corresponding entire 4 input channel of value indicative corresponds to the access of input feature vector value, then by the numerical value in InputFeatureBuffer
It is sent to multiplier array with corresponding address, array obtains respective weights value according to address value to complete to transport using conventional method
It calculates, scoring board Stored Procedure is detailed below:
By above-mentioned process, X0, X2, X3, Y0, Y1, Z1, Z2, T3 and T4 needed for output characteristic value is calculated are stored
In InputFeatureBuffer, wherein without 0 value, and be the input feature vector value needed for calculating, it can effectively improve calculating
Efficiency reduces calculating energy consumption.
Above-mentioned only presently preferred embodiments of the present invention, is not intended to limit the present invention in any form.Although of the invention
It has been disclosed in a preferred embodiment above, however, it is not intended to limit the invention.Therefore, all without departing from technical solution of the present invention
Content, technical spirit any simple modifications, equivalents, and modifications made to the above embodiment, should all fall according to the present invention
In the range of technical solution of the present invention protection.
Claims (10)
1. a kind of method for realizing sparse calculation based on deep learning accelerator, which is characterized in that this method comprises:
S1. starting score processing operation when in the input data from each channel comprising specified number 0;Starting score processing behaviour
When making, input data is compressed to filter out 0 value neuron therein, obtain compressed input data, be transferred to and execute step
Rapid S2;
S2. it successively obtains each input feature vector value in compressed input data by channel sequence to be judged, if it is determined that arriving mesh
Mark input feature vector value is the data needed for current output characteristic value calculates, and target input feature vector value is stored to preconfigured number
According to buffer area, otherwise switches to next input feature vector value in the compressed input data of other channels acquisition and judged, directly
To the judgement for completing all input feature vector values in input data;
S3. by the information of input feature vector values all in data buffer area be sent in accelerator multiplier array by execute it is sparse in terms of
It calculates.
2. the method according to claim 1 for realizing sparse calculation based on deep learning accelerator, which is characterized in that in step S1
Input data is specifically compressed using the long compression method that strides.
3. the method according to claim 1 for realizing sparse calculation based on deep learning accelerator, which is characterized in that in step S1
Specifically when deep learning first layer inputs, control does not start score processing operation, controls starting after carrying out ReLu activation operation
Score processing operation.
4. according to claim 1 or 2 or 3 method for realizing sparse calculation based on deep learning accelerator, which is characterized in that step
Specifically target input feature vector value and the corresponding true address of target input feature vector value are stored to data buffer area in rapid S2.
5. the method according to claim 4 for realizing sparse calculation based on deep learning accelerator, which is characterized in that in step S3
The value of input feature vector values all in data buffer area and corresponding true address are specifically sent to multiplier array, multiplier battle array
Column obtain the corresponding weighted value of input feature vector value according to true address, and input feature vector value is multiplied with the execution of the weighted value of corresponding acquisition
After method operation, multiplication result is exported.
6. according to claim 1 or 2 or 3 method for realizing sparse calculation based on deep learning accelerator, which is characterized in that pre-
It is first configured to storage and calculates the first data buffer area of required input characteristic value and for storing non-computational required input spy
Second data buffer area of value indicative;In step S2 if it is determined that target input feature vector value be needed for current output characteristic value calculates
Data, control by target input feature vector value, target input feature vector value upon compression input data sparse address and mesh
The true address of mark input feature vector value is stored in the first data buffer area, otherwise stores to the second data buffer area.
7. being used to implement the method based on deep learning accelerator realization sparse calculation of any one of claim 1~6
Device characterized by comprising
Control module, for starting score processing operation when in the input data from each channel comprising specified number 0;
It scores processing module, for executing score processing operation, including sequentially connected data input cell, compression unit, sentences
Disconnected unit, data buffer area and transmission unit, data input cell access all input datas, export to compression unit,
Compression unit compresses input data to filter out 0 value neuron therein, obtains compressed input data;Judging unit
It successively obtains each input feature vector value in compressed input data by channel sequence to be judged, if it is determined that special to target input
Value indicative is the data needed for current output characteristic value calculates, and target input feature vector value is stored to data buffer area;Transmission unit
The information of input feature vector values all in data buffer area is sent in accelerator multiplier array to execute required calculating.
8. device according to claim 7, it is characterised in that: the data buffer area includes calculating required input spy for storing
First data buffer area of value indicative and the second data buffer area for storing non-computational required input characteristic value;Control module
In if it is determined that being the data needed for current output characteristic value calculates to target input feature vector value, control is by target input feature vector
The true address of the sparse address in input data and target input feature vector value is deposited upon compression for value, target input feature vector value
Otherwise storage is stored in the first data buffer area to the second data buffer area.
9. device according to claim 8, it is characterised in that: the score processing module further includes the ground connecting with control module
Location is from unit is increased, for the sparse address by obtaining each input feature vector value from increase.
10. according to the device of claim 8 or 9, it is characterised in that: use ping-pong type data processing side in score processing module
The input feature vector value of formula, the processing that is near completion is stored to data storage area, when being transferred to multiplier array through transmission unit, simultaneously
Input feature vector value is accessed by data input cell, is stored after being judged by judging unit to data storage area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810803430.5A CN109086883A (en) | 2018-07-20 | 2018-07-20 | Method and device for realizing sparse calculation based on deep learning accelerator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810803430.5A CN109086883A (en) | 2018-07-20 | 2018-07-20 | Method and device for realizing sparse calculation based on deep learning accelerator |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109086883A true CN109086883A (en) | 2018-12-25 |
Family
ID=64838357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810803430.5A Pending CN109086883A (en) | 2018-07-20 | 2018-07-20 | Method and device for realizing sparse calculation based on deep learning accelerator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109086883A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858622A (en) * | 2019-01-31 | 2019-06-07 | 福州瑞芯微电子股份有限公司 | The data of deep learning neural network carry circuit and method |
CN111931921A (en) * | 2020-10-13 | 2020-11-13 | 南京风兴科技有限公司 | Ping-pong storage method and device for sparse neural network |
CN112749782A (en) * | 2019-10-31 | 2021-05-04 | 上海商汤智能科技有限公司 | Data processing method and related product |
US11823060B2 (en) | 2020-04-29 | 2023-11-21 | HCL America, Inc. | Method and system for performing deterministic data processing through artificial intelligence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101820543A (en) * | 2010-03-30 | 2010-09-01 | 北京蓝色星河软件技术发展有限公司 | Ping-pong structure fast data access method combined with direct memory access (DMA) |
CN107066239A (en) * | 2017-03-01 | 2017-08-18 | 智擎信息系统(上海)有限公司 | A kind of hardware configuration for realizing convolutional neural networks forward calculation |
CN107657581A (en) * | 2017-09-28 | 2018-02-02 | 中国人民解放军国防科技大学 | Convolutional neural network CNN hardware accelerator and acceleration method |
US20180174036A1 (en) * | 2016-12-15 | 2018-06-21 | DeePhi Technology Co., Ltd. | Hardware Accelerator for Compressed LSTM |
CN108280514A (en) * | 2018-01-05 | 2018-07-13 | 中国科学技术大学 | Sparse neural network acceleration system based on FPGA and design method |
-
2018
- 2018-07-20 CN CN201810803430.5A patent/CN109086883A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101820543A (en) * | 2010-03-30 | 2010-09-01 | 北京蓝色星河软件技术发展有限公司 | Ping-pong structure fast data access method combined with direct memory access (DMA) |
US20180174036A1 (en) * | 2016-12-15 | 2018-06-21 | DeePhi Technology Co., Ltd. | Hardware Accelerator for Compressed LSTM |
CN107066239A (en) * | 2017-03-01 | 2017-08-18 | 智擎信息系统(上海)有限公司 | A kind of hardware configuration for realizing convolutional neural networks forward calculation |
CN107657581A (en) * | 2017-09-28 | 2018-02-02 | 中国人民解放军国防科技大学 | Convolutional neural network CNN hardware accelerator and acceleration method |
CN108280514A (en) * | 2018-01-05 | 2018-07-13 | 中国科学技术大学 | Sparse neural network acceleration system based on FPGA and design method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858622A (en) * | 2019-01-31 | 2019-06-07 | 福州瑞芯微电子股份有限公司 | The data of deep learning neural network carry circuit and method |
CN112749782A (en) * | 2019-10-31 | 2021-05-04 | 上海商汤智能科技有限公司 | Data processing method and related product |
US11823060B2 (en) | 2020-04-29 | 2023-11-21 | HCL America, Inc. | Method and system for performing deterministic data processing through artificial intelligence |
CN111931921A (en) * | 2020-10-13 | 2020-11-13 | 南京风兴科技有限公司 | Ping-pong storage method and device for sparse neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086883A (en) | Method and device for realizing sparse calculation based on deep learning accelerator | |
CN106844294B (en) | Convolution algorithm chip and communication equipment | |
CN108108809B (en) | Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof | |
CN109784489A (en) | Convolutional neural networks IP kernel based on FPGA | |
CN107832804A (en) | A kind of information processing method and Related product | |
CN108932548A (en) | A kind of degree of rarefication neural network acceleration system based on FPGA | |
CN107239824A (en) | Apparatus and method for realizing sparse convolution neutral net accelerator | |
CN106779060A (en) | A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization | |
CN106951395A (en) | Towards the parallel convolution operations method and device of compression convolutional neural networks | |
CN109472356A (en) | A kind of accelerator and method of restructural neural network algorithm | |
CN108763159A (en) | To arithmetic accelerator before a kind of LSTM based on FPGA | |
CN107622305A (en) | Processor and processing method for neutral net | |
CN110163354A (en) | A kind of computing device and method | |
CN112529165B (en) | Deep neural network pruning method, device, terminal and storage medium | |
Que et al. | Optimizing reconfigurable recurrent neural networks | |
CN108304925A (en) | A kind of pond computing device and method | |
CN108647776A (en) | A kind of convolutional neural networks convolution expansion process circuit and method | |
WO2022112739A1 (en) | Activation compression method for deep learning acceleration | |
CN109214508A (en) | The system and method for signal processing | |
CN110163350A (en) | A kind of computing device and method | |
CN109145107A (en) | Subject distillation method, apparatus, medium and equipment based on convolutional neural networks | |
CN110119805A (en) | Convolutional neural networks algorithm based on echo state network classification | |
Que et al. | Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs | |
Liu et al. | Kiwifruit leaf disease identification using improved deep convolutional neural networks | |
CN109948787B (en) | Arithmetic device, chip and method for neural network convolution layer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181225 |
|
RJ01 | Rejection of invention patent application after publication |