CN109993297A - A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing - Google Patents
A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing Download PDFInfo
- Publication number
- CN109993297A CN109993297A CN201910259591.7A CN201910259591A CN109993297A CN 109993297 A CN109993297 A CN 109993297A CN 201910259591 A CN201910259591 A CN 201910259591A CN 109993297 A CN109993297 A CN 109993297A
- Authority
- CN
- China
- Prior art keywords
- convolution
- data
- load balancing
- array
- computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000009826 distribution Methods 0.000 claims abstract description 30
- 238000003860 storage Methods 0.000 claims abstract description 17
- 230000004913 activation Effects 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 40
- 238000013527 convolutional neural network Methods 0.000 claims description 15
- 230000005284 excitation Effects 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000013138 pruning Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 6
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 6
- 238000013459 approach Methods 0.000 claims description 5
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 230000006837 decompression Effects 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000001133 acceleration Effects 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims description 3
- 230000003068 static effect Effects 0.000 abstract description 5
- 239000010410 layer Substances 0.000 description 19
- 238000010586 diagram Methods 0.000 description 12
- 238000013461 design Methods 0.000 description 4
- 210000004027 cell Anatomy 0.000 description 3
- 230000005055 memory storage Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 2
- 208000010378 Pulmonary Embolism Diseases 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000011229 interlayer Substances 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000000465 moulding Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses the sparse convolution neural network accelerators and its accelerated method of a kind of load balancing.Accelerator includes master controller, data distribution module, the computing array of convolution algorithm, output result cache module, linear activation primitive unit, pond unit, online coding unit and outer chip dynamic memory.The solution of the present invention can realize the computing array high efficiency operation of convolution algorithm under the conditions of seldom storage resource, guarantee the high reusability of input stimulus and weighted data, the load balancing and high usage of computing array;Computing array supports the Parallel Scheduling that two levels between different characteristic figure between the convolution algorithm and ranks of different size different scales are realized by way of static configuration simultaneously, has good applicability and scalability.
Description
Technical field
The present invention relates to the sparse convolution neural network accelerators and its accelerated method of a kind of load balancing, belong to depth
Practise the technical field of algorithm.
Background technique
In recent years, deep learning algorithm computer vision, natural language processing and in terms of obtain
It is widely applied the effect with brilliance, and convolutional neural networks CNN is one of most important one algorithm.Convolutional neural networks mould
The higher accuracy rate of type often means that the deeper network number of plies, and more network parameters and operand, wherein 90%
Calculating all concentrates on convolutional layer, therefore for the high-efficiency operation convolutional neural networks preferably in embedded system, optimization
The Energy Efficiency Ratio of convolution operation is imperative.
There are two the main features of convolutional neural networks CNN convolutional layer operation: first is that the data volume of operation is big, convolution algorithm
Required feature image and weighted data is in large scale, carries out rarefaction to it and compress storage and can save well data to deposit
Storage unit maximumlly utilizes data transfer bandwidth;Second is that operational data stream and control stream are complicated, convolution algorithm is needed according to volume
Product dimensional information handles multiple channels of multiple convolution kernels simultaneously, guarantees that the flowing water of operation carries out.
The convolutional neural networks of rarefaction will increase invalid meter in calculating process due to the irregular distribution of nonzero element
It calculates, causes calculation resources vacancy rate high.
Summary of the invention
In view of the above-mentioned problems of the prior art, the present invention is intended to provide a kind of sparse convolution of high efficiency load balancing
Neural network accelerator, with realize weight and excited data reusability are high, volume of transmitted data is small, it is expansible can degree of parallelism it is high and
Required hardware store resource and the few purpose of DSP resource.It is a further object of the present invention to provide a kind of adding using the accelerator
Fast method.
Accelerator of the present invention the technical solution adopted is that:
A kind of sparse convolution neural network accelerator of load balancing, comprising: master controller, for controlling convolution algorithm
Signal stream and data flow are controlled, data are handled and are saved;Data distribution module, according to the segment partition scheme pair of convolution algorithm
Computing array carries out weighted data distribution;The computing array of convolution algorithm, the multiply-add operation for completing sparse convolution operates, defeated
The result of part sum out;Result cache module is exported, the result for the part sum to computing array carries out cumulative caching, and whole
Unified format is managed into, the characteristic pattern result of processing and pond to be activated is exported;Linear activation primitive unit, for cumulative completion
Part and result biasing set and activation primitive operation;Pond unit, for the pond through activation primitive treated result
Change operation;Online coding unit, for carrying out the excitation value for still needing to carry out subsequent convolutional layer operation in line coding;Dynamic outside piece
Memory, for storing the characteristic pattern of raw image data, the intermediate result of computing array operation and final output.
A kind of accelerated method of the sparse convolution neural network accelerator of load balancing of the present invention, comprising the following steps:
1) beta pruning is carried out to convolutional neural networks Model Weight data, according to the scale parameter logistic of weighted data according to progress
Then grouping takes identical prune approach to carry out each group weighted data sparse on the basis of guaranteeing model entirety accuracy rate
Change processing;
2) the sparse convolution operation mapping scheme for formulating load balancing, the convolutional neural networks after rarefaction are mapped to and are added
On the computing array of the convolution algorithm of fast device;
3) accelerator guarantees the stream of convolution algorithm according to the configuration information reconstruction calculations array and storage array of mapping scheme
Water carries out;
4) main controller controls data distribution module completes the distribution of weighted data and excited data, and computing array is transported
It calculates, exports conventional part and result;
5) it is added up to the conventional part and result and is linearly corrected, i.e., completion biasing is set operates with activation primitive;
6) the pondization operation of respective cells core size and step-length is carried out according to current convolutional layer pond demand;
7) judge whether current convolution layer operation is the last layer, if it is not, then carrying out in line coding, after coding
Excitation result is sent to next layer of convolution, if it is, being output to outer chip dynamic memory, completes the acceleration of convolutional neural networks.
Compared with prior art, the invention has the advantages that
The sparse convolution neural network accelerator and its accelerated method of a kind of load balancing provided by the invention, maximumlly
Using the sparse characteristic of convolution algorithm data, it can realize that the computing array of convolution algorithm is high under the conditions of seldom storage resource
Efficiency operation guarantees the high reusability of input stimulus and weighted data, the load balancing and high usage of operation array;It counts simultaneously
Array is calculated to support to realize by way of static configuration between the convolution algorithm and ranks of different size different scales and different spies
The Parallel Scheduling of two levels, has good applicability and scalability between sign figure.Design of the invention can meet well
The demand of the low-power consumption high energy efficiency ratio of convolutional neural networks is run under embedded system at present.
Detailed description of the invention
Fig. 1 is the sparse convolution network accelerating method schematic diagram of load balancing.
Fig. 2 is weight prune approach schematic diagram.
Fig. 3 is hardware accelerator overall structure diagram.
Fig. 4 is convolution algorithm mapping mode schematic diagram.
Fig. 5 is convolution algorithm schematic diagram in PE group.
Fig. 6 is the realization schematic diagram that PE is array-supported balanced and storage is shared.
Specific embodiment
The present invention program is described in detail with reference to the accompanying drawing.
It is as shown in Figure 1 the sparse convolution network operations method flow schematic diagram of load balancing, it first can be to convolutional Neural
Network model weighted data carries out beta pruning, according to the scale parameter logistic of weighted data according to being grouped, is then guaranteeing model
Identical prune approach is taken to carry out LS-SVM sparseness each group weighted data on the basis of whole accuracy rate;Then according to convolution
The sparse convolution operation mapping scheme of operation input feature vector figure and convolution kernel dimensioned load balancing, by the convolution after rarefaction
Neural network is mapped on PE (Process Element arithmetic element) array of the convolution algorithm of hardware accelerator;Then hard
Part accelerator reconstructs PE array and storage array according to the configuration information of mapping scheme, guarantees that the flowing water of convolution algorithm carries out;Add
The master controller of fast device can control the distribution for completing weighted data and excited data, and PE array carries out operation, export conventional part
And result;Linear amending unit adds up to part and result and is linearly corrected, i.e., completion biasing is set operates with activation primitive;
Pond unit carries out the pondization operation of respective cells core size and step-length, including selection maximum according to current convolutional layer pond demand
It is worth pondization or average value pond;Finally judge whether current convolution layer operation is the last layer, if it is not, then carrying out
Excitation result after coding is sent to next layer of convolution, if it is, being output to piece external storage, completes entire convolution by line coding
Accelerate.
The sparse convolution operation mapping scheme of load balancing includes convolution algorithm mapping mode, PE array grouping scheme, defeated
Enter the distribution multiplex mode and PE array operation Parallel Scheduling mechanism of feature image and weighted data.
Convolution algorithm mapping mode: input feature vector picture is transformed into a matrix according to row (column) dimension, by weighted data
A vector, which is launched into, according to output channel dimension passes through design so that convolution algorithm is converted to matrix-vector multiplication
Sparse Matrix-Vector multiplication unit can skip the zero in input feature vector picture and weighted data well, guarantee whole fortune
The high efficiency of calculation.
PE array is grouped scheme: completing to divide by master controller static configuration according to the dimensional parameters information of every layer of convolution algorithm
Group operation, when PE number is greater than three dimensional convolution kernel total number, one group can calculate all output characteristic pattern channels, on this basis,
Remaining PE is grouped by same number, is responsible for calculating not going together for output characteristic pattern;When PE number is less than three dimensional convolution kernel total number,
One group of calculating exports the maximum approximate number of characteristic pattern port number, and the principle being grouped in this way is to guarantee each PE arithmetic speed matching,
PE array vacancy rate is low.
The distribution multiplex mode of input feature vector picture and weighted data: entire PE array is by one piece of shared on-chip memory
The identical excited data of synchronization distribution is as matrix needed for operation, by data distribution module according to the control information of piecemeal operation
Weighted data needed for distributing each PE essentially consists in different PE's as vector needed for operation, the multiplexing of input feature vector picture
It uses simultaneously, the multiplexing of weight and the same PE replace weighted data after matrix between the multiplexing of weighted data essentially consists in different groups
Utilization again without distribution.
PE array operation Parallel Scheduling mechanism: PE array needs to export according to convolutional layer in operation the size of feature image
Information determines that different grouping is to complete the output of same output feature image difference row (column), or complete different output characteristic patterns
The operation of piece.This ensure that PE array can carry out Parallel Scheduling in two levels, first is that in the layer of single features picture
Parallel, second is that different characteristic picture simultaneously and concurrently.
A kind of sparse convolution neural network speeding scheme of load balancing of the present embodiment includes two portions of software and hardware
Point, as shown in Fig. 2, being software section Pruning strategy schematic diagram in figure.Pruning strategy is described as follows: for initial intensive nerve
Network connection can be grouped it according to the connection number and neuron number of network, and each grouping prune approach is identical with position,
That is for the neuron of each convolution kernel group as connection type, the weighted data only connected is different.With input feature vector
For figure is W*W*C, (W is characterized figure width and height dimensions, and C is input channel number), convolution kernel is having a size of R*R*C*N, and (R is volume
The width of product core and high size, C are convolution kernel port number, and N is convolution kernel number namely output channel number), beta pruning when, can be first
The convolution kernel of R*R*C is classified as a convolution kernel group, amount to it is N number of, for each convolution kernel, the position phase of neutral element in them
Together;If model needs are not achieved in accuracy rate after beta pruning, convolution kernel group size can be adjusted, takes R*R*C*N1 (approximate number that N1 is N)
Carry out beta pruning.
It is illustrated in figure 3 the sparse convolution neural network accelerator structure schematic diagram of hardware components.Overall structure is mainly wrapped
Contain: master controller, the convolution algorithm since host computer CPU receives instruction, for generating the control signal of control convolution algorithm
Stream and data flow;Data distribution module carries out weighted data distribution to PE according to the segment partition scheme of convolution algorithm;Convolution algorithm
PE (Process Element arithmetic element) array is grouped according to the configuration information of master controller and completes sparse convolution
Multiply-add operation operation, exports convolution results or part and result;Result cache module is exported, part and result to PE carry out tired
Add caching, is organized into after unified format and is sent to subsequent cell and is operated;Linear activation primitive unit, completes convolution algorithm result
Biasing set and activation primitive operation;Pond unit completes the maximum value pondization operation of result;Online coding unit, to centre
As a result online CSR (storage of compression loose line) coding is carried out, to guarantee that the result of output meets the data of subsequent convolutional layer operation
Call format;Outer chip dynamic memory DDR4, for storing raw image data, interlayer intermediate result and convolutional layer final output
As a result.
Data distribution module includes the fetch configurable on-chip memory storage unit of address calculation, on piece and data
The FIFO group of format caching conversion.The configuration information that data distribution module can be sent according to the master controller received, by fetching
Address calculation completes the cache flush mode to outer chip dynamic memory DDR4, and the data taken out are cached to via AXI4 interface
The on-chip memory storage unit of on piece weight, a step of going forward side by side format, and distribution is cached in corresponding FIFO, wait
Operation sends data.
The PE array of convolution algorithm includes multiple matrix-vector multiplication computing units, can be wanted according to static configuration information
It asks, completes in the layer of feature image or interlayer parallel-convolution operates, export part and the result of convolution algorithm.Multiple PE are mono- simultaneously
The storage of member is common on-chip memory, and in view of the design of Pruning strategy and hardware structure, multiple PE can be using seldom
Under conditions of storage resource, reaches and jump zero acceleration calculating and the matching of difference PE arithmetic speed during calculating sparse convolution.
Matrix-vector multiplication computing unit includes flowing water controller module, weight non-zero detection module, pointer control module, swashs
Encourage decompression module, MLA operation unit module and public on-chip memory storage.Weight non-zero detection module can be data distribution
The weighted data that module is sent carries out non-zero detection, only transmits nonzero value location information corresponding with its to PE unit;Pointer control
Molding block and excitation decompression module can take out non-zero weight value according to corresponding non-zero weight value from common on-chip memory
Excitation value needed for corresponding operation, while each PE unit is sent in case operation;MLA operation unit is mainly responsible for matrix
Vector multiply in multiplication and additional calculation.
It is illustrated in figure 4 convolution algorithm mapping mode schematic diagram, by taking input feature vector figure is W*W*C as an example, (W is characterized figure
Width and height dimensions, C are input channel number), convolution kernel is having a size of R*R*C*N, and (R is the wide and high size of convolution kernel, and C is convolution
Core port number, N are convolution kernel number namely output channel number), F is output characteristic pattern size;It is determined each by N size first
The number Num_PE of PE unit in PE group can allow Num_PE to be equal to N, each group of a batch operation can if PE total number is greater than N
With immediately arrive at output all channels of characteristic pattern as a result, otherwise just allow Num_PE for the approximate number M of N, integer batch operation output is special
Sign figure passage portion as a result, guaranteeing that certain PE will not be idle;The group number Group_PE of PE is true by PE total number and Num_PE
It is fixed, if one group of operation that can have completed all output channels, different groups are responsible for exporting not going together for characteristic pattern, i.e.,
As shown in 2 operation of the PE group division of labor in figure.
Convolution algorithm complete for one layer, a PE group is by Num_PE PE unit (i.e. matrix-vector multiplication unit) structure
At each matrix-vector multiplication unit is responsible for exporting several rows in a channel of characteristic pattern, if wherein first time operation can export
The first row of dry row, specific line number are determined that it is storage that matrix is corresponding in matrix-vector multiplication by the matrix size of matrix-vector multiplication
In the shared excited data being locally stored in on-chip memory, corresponding vector is the weight number sent by data distribution module
According to;For other PE groups, the subsequent rows that operation content can be output characteristic pattern that is, as shown in Figure 3 can also
To be the convolution algorithm of other input feature vector figures, it can it is two different parallel to meet row-column parallel calculation and different characteristic figure in layer
Parallel modes of operation.
It is illustrated in figure 5 convolution algorithm schematic diagram in PE group, input feature vector figure and difference are indicated with different numerical value
The value of different location on convolution kernel, the matrix-vector multiplication scale that example is taken is the matrix of 2*12 and the vector of 12*1, so PE
The vector that each operation output result is 2*1, it is three channel 12* of convolution kernel 1 that PE1 vector in first time operation is corresponding
1, it is three channels of (1,2,4,5) and (2,3,5,6) corresponding position in activating image that matrix is corresponding, is carrying out multiply-add operation
Output result is to export the front two row of first channel first row of characteristic pattern afterwards, and subsequent matrices can first update, that is, take (4,5,7,
8) and the excitation value of (5,6,8,9) position, output result are to export the front two row of first channel secondary series of characteristic pattern;It is exporting
After all column datas of corresponding row, the corresponding weighted data of vector will do it update, i.e., rear extended meeting output third channel is defeated
Result out.And it is exactly after weighted data updates, to become calculating defeated in second channel for calculating output characteristic pattern that PE2 is corresponding
The 4th channel out.
It is illustrated in figure 6 the realization schematic diagram that PE is array-supported balanced and storage is shared, the shared on piece of PE array is deposited
Reservoir stores according to the nonzero value of the input stimulus of CSR (storage of compression loose line) format storage and their index and refers to
Needle, the position of the weight vectors nonzero value sent according to data distribution module take out corresponding excitation and carry out multiply-add operation, due to
The interior all weight vectors of PE group are identical according to the position of its nonzero element of software Pruning strategy, so required for each PE
Correspondence excitation value be also identical, it is only necessary to seldom memory saves a excitation value, and decodes while being sent to PE i.e.
The matrix requirements of PE array can be met.And for all PE, carry out the non-of matrix and vector in matrix-vector multiplication
Null position is identical, therefore PE array computation speed matches, and reaches the purpose of design of the low storage load balancing of operation array.
At the same time, different PE groups can also share the weighted data of distribution, realize the high reusability of excitation and weight.
It to sum up narrates, the accelerated method for sparse convolution neural network proposed using the embodiment of the present invention, Ke Yiyou
Storage hardware resource is saved on effect ground, improves the reusability of input feature vector figure and weight, and the load that can be realized PE array is equal
Weighing apparatus, carrying out static configuration to PE array can satisfy different concurrent operation requirements, guarantee the high usage of PE array, whole to improve
The data throughput of system system, reaches very high Energy Efficiency Ratio, the embedded system suitable for low-power consumption.
Claims (8)
1. a kind of sparse convolution neural network accelerator of load balancing characterized by comprising
Master controller handles data and is saved for controlling the control signal stream and data flow of convolution algorithm;
Data distribution module carries out weighted data distribution to computing array according to the segment partition scheme of convolution algorithm;
The computing array of convolution algorithm, the multiply-add operation for completing sparse convolution operate, the result of output par, c sum;
Result cache module is exported, the result for the part sum to computing array carries out cumulative caching, and is organized into unified lattice
Formula exports the characteristic pattern result of processing and pond to be activated;
Linear activation primitive unit, the biasing for part and result to cumulative completion is set and activation primitive operation;
Pond unit, for being operated to the pondization through activation primitive treated result;
Online coding unit, for carrying out the excitation value for still needing to carry out subsequent convolutional layer operation in line coding;
Outer chip dynamic memory, for storing the spy of raw image data, the intermediate result of computing array operation and final output
Sign figure.
2. a kind of sparse convolution neural network accelerator of load balancing according to claim 1, which is characterized in that described
The computing array of convolution algorithm includes matrix-vector multiplication computing unit, and the matrix-vector multiplication computing unit includes flowing water controller
Module, weight non-zero detection module, pointer control module, excitation decompression module, MLA operation unit module and public on piece are deposited
Reservoir;The weighted data that the weight non-zero detection module is used to send data distribution module carries out non-zero detection, and only passes
Defeated nonzero value location information corresponding with its is to computing unit;The pointer control module and excitation decompression module are used for according to right
The non-zero weight value answered is sent simultaneously from excitation value needed for non-zero weight is worth corresponding operation is taken out in public on-chip memory
To each computing unit;The MLA operation unit for operation matrix vector multiply in multiplication and addition.
3. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing, which comprises the following steps:
1) beta pruning is carried out to convolutional neural networks Model Weight data, according to the scale parameter logistic of weighted data according to being divided
Then group takes identical prune approach to carry out rarefaction each group weighted data on the basis of guaranteeing model entirety accuracy rate
Processing;
2) convolutional neural networks after rarefaction are mapped to accelerator by the sparse convolution operation mapping scheme for formulating load balancing
Convolution algorithm computing array on;
3) accelerator is according to the configuration information reconstruction calculations array and storage array of mapping scheme, guarantee the flowing water of convolution algorithm into
Row;
4) main controller controls data distribution module completes the distribution of weighted data and excited data, and computing array carries out operation,
Export conventional part and result;
5) it is added up to the conventional part and result and is linearly corrected, i.e., completion biasing is set operates with activation primitive;
6) the pondization operation of respective cells core size and step-length is carried out according to current convolutional layer pond demand;
7) judge whether current convolution layer operation is the last layer, if it is not, then carrying out in line coding, by the excitation after coding
As a result it is sent to next layer of convolution, if it is, being output to outer chip dynamic memory, completes the acceleration of convolutional neural networks.
4. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 3, special
Sign is, in the step 2), sparse convolution operation mapping scheme includes convolution algorithm mapping mode, computing array grouping side
The distribution multiplex mode and computing array operation Parallel Scheduling mechanism of case, input feature vector picture and weighted data.
5. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, special
Sign is, the convolution algorithm mapping mode specifically: input feature vector picture is transformed into a square according to row dimension or column dimension
Battle array, is launched into a vector according to output channel dimension for weighted data, so that convolution algorithm is converted to matrix-vector multiplication fortune
It calculates.
6. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, special
Sign is that the computing array is grouped scheme specifically: the dimensional parameters information according to every layer of convolution algorithm is quiet by master controller
State configures into grouping operation, and when computing unit number is greater than three dimensional convolution kernel total number, a group pattern is all for calculating
Characteristic pattern channel is exported, on this basis, remaining computing unit is grouped by same number, is responsible for calculating the difference of output characteristic pattern
Row;When computing unit number is less than three dimensional convolution kernel total number, a group pattern is for calculating output characteristic pattern port number most
Big approximate number.
7. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, special
Sign is, the distribution multiplex mode of the input feature vector picture and weighted data specifically: entire computing array is shared by one piece
The identical excited data of on-chip memory synchronization distribution as matrix needed for operation, transported by data distribution module according to piecemeal
Weighted data needed for the control information of calculation distributes each computing unit is as vector needed for operation.
8. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, special
Sign is, the computing array operation Parallel Scheduling mechanism specifically: computing array needs to be exported according to convolutional layer in operation
The dimension information of feature image determines that different grouping is to complete same output feature image not go together or the output of different lines, still
Complete the operation of different output feature images.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910259591.7A CN109993297A (en) | 2019-04-02 | 2019-04-02 | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910259591.7A CN109993297A (en) | 2019-04-02 | 2019-04-02 | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109993297A true CN109993297A (en) | 2019-07-09 |
Family
ID=67132262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910259591.7A Pending CN109993297A (en) | 2019-04-02 | 2019-04-02 | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109993297A (en) |
Cited By (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516801A (en) * | 2019-08-05 | 2019-11-29 | 西安交通大学 | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput |
CN110543900A (en) * | 2019-08-21 | 2019-12-06 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN110738310A (en) * | 2019-10-08 | 2020-01-31 | 清华大学 | sparse neural network accelerators and implementation method thereof |
CN110807513A (en) * | 2019-10-23 | 2020-02-18 | 中国人民解放军国防科技大学 | Convolutional neural network accelerator based on Winograd sparse algorithm |
CN110852422A (en) * | 2019-11-12 | 2020-02-28 | 吉林大学 | Convolutional neural network optimization method and device based on pulse array |
CN110991631A (en) * | 2019-11-28 | 2020-04-10 | 福州大学 | Neural network acceleration system based on FPGA |
CN111047008A (en) * | 2019-11-12 | 2020-04-21 | 天津大学 | Convolutional neural network accelerator and acceleration method |
CN111047010A (en) * | 2019-11-25 | 2020-04-21 | 天津大学 | Method and device for reducing first-layer convolution calculation delay of CNN accelerator |
CN111062472A (en) * | 2019-12-11 | 2020-04-24 | 浙江大学 | Sparse neural network accelerator based on structured pruning and acceleration method thereof |
CN111079919A (en) * | 2019-11-21 | 2020-04-28 | 清华大学 | Memory computing architecture supporting weight sparsity and data output method thereof |
CN111178508A (en) * | 2019-12-27 | 2020-05-19 | 珠海亿智电子科技有限公司 | Operation device and method for executing full connection layer in convolutional neural network |
CN111199277A (en) * | 2020-01-10 | 2020-05-26 | 中山大学 | Convolutional neural network accelerator |
CN111240743A (en) * | 2020-01-03 | 2020-06-05 | 上海兆芯集成电路有限公司 | Artificial intelligence integrated circuit |
CN111368988A (en) * | 2020-02-28 | 2020-07-03 | 北京航空航天大学 | Deep learning training hardware accelerator utilizing sparsity |
CN111401532A (en) * | 2020-04-28 | 2020-07-10 | 南京宁麒智能计算芯片研究院有限公司 | Convolutional neural network reasoning accelerator and acceleration method |
CN111401554A (en) * | 2020-03-12 | 2020-07-10 | 交叉信息核心技术研究院(西安)有限公司 | Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization |
CN111415004A (en) * | 2020-03-17 | 2020-07-14 | 北京百度网讯科技有限公司 | Method and apparatus for outputting information |
CN111445012A (en) * | 2020-04-28 | 2020-07-24 | 南京大学 | FPGA-based packet convolution hardware accelerator and method thereof |
CN111445013A (en) * | 2020-04-28 | 2020-07-24 | 南京大学 | Non-zero detector for convolutional neural network and method thereof |
CN111667051A (en) * | 2020-05-27 | 2020-09-15 | 上海赛昉科技有限公司 | Neural network accelerator suitable for edge equipment and neural network acceleration calculation method |
CN111667052A (en) * | 2020-05-27 | 2020-09-15 | 上海赛昉科技有限公司 | Standard and nonstandard volume consistency transformation method for special neural network accelerator |
CN111738433A (en) * | 2020-05-22 | 2020-10-02 | 华南理工大学 | Reconfigurable convolution hardware accelerator |
CN111782356A (en) * | 2020-06-03 | 2020-10-16 | 上海交通大学 | Data flow method and system of weight sparse neural network chip |
CN111882028A (en) * | 2020-06-08 | 2020-11-03 | 北京大学深圳研究生院 | Convolution operation device for convolution neural network |
CN111914999A (en) * | 2020-07-30 | 2020-11-10 | 云知声智能科技股份有限公司 | Method and equipment for reducing calculation bandwidth of neural network accelerator |
CN111967587A (en) * | 2020-07-27 | 2020-11-20 | 复旦大学 | Arithmetic unit array structure for neural network processing |
CN112052941A (en) * | 2020-09-10 | 2020-12-08 | 南京大学 | Efficient storage and calculation system applied to CNN network convolution layer and operation method thereof |
CN112418417A (en) * | 2020-09-24 | 2021-02-26 | 北京计算机技术及应用研究所 | Convolution neural network acceleration device and method based on SIMD technology |
CN112506436A (en) * | 2020-12-11 | 2021-03-16 | 西北工业大学 | High-efficiency data dynamic storage allocation method for convolutional neural network accelerator |
CN112766453A (en) * | 2019-10-21 | 2021-05-07 | 华为技术有限公司 | Data processing device and data processing method |
CN112836803A (en) * | 2021-02-04 | 2021-05-25 | 珠海亿智电子科技有限公司 | Data placement method for improving convolution operation efficiency |
CN113077047A (en) * | 2021-04-08 | 2021-07-06 | 华南理工大学 | Convolutional neural network accelerator based on feature map sparsity |
CN113128688A (en) * | 2021-04-14 | 2021-07-16 | 北京航空航天大学 | General AI parallel reasoning acceleration structure and reasoning equipment |
CN113159302A (en) * | 2020-12-15 | 2021-07-23 | 浙江大学 | Routing structure for reconfigurable neural network processor |
CN113191493A (en) * | 2021-04-27 | 2021-07-30 | 北京工业大学 | Convolutional neural network accelerator based on FPGA parallelism self-adaptation |
CN113313251A (en) * | 2021-05-13 | 2021-08-27 | 中国科学院计算技术研究所 | Deep separable convolution fusion method and system based on data stream architecture |
CN113435570A (en) * | 2021-05-07 | 2021-09-24 | 西安电子科技大学 | Programmable convolutional neural network processor, method, device, medium, and terminal |
CN113486200A (en) * | 2021-07-12 | 2021-10-08 | 北京大学深圳研究生院 | Data processing method, processor and system of sparse neural network |
CN113496274A (en) * | 2020-03-20 | 2021-10-12 | 郑桂忠 | Quantification method and system based on operation circuit architecture in memory |
CN113591025A (en) * | 2021-08-03 | 2021-11-02 | 深圳思谋信息科技有限公司 | Feature map processing method and device, convolutional neural network accelerator and medium |
CN113705794A (en) * | 2021-09-08 | 2021-11-26 | 上海交通大学 | Neural network accelerator design method based on dynamic activation bit sparsity |
CN113946538A (en) * | 2021-09-23 | 2022-01-18 | 南京大学 | Convolutional layer fusion storage device and method based on line cache mechanism |
CN114065927A (en) * | 2021-11-22 | 2022-02-18 | 中国工程物理研究院电子工程研究所 | Excitation data blocking processing method of hardware accelerator and hardware accelerator |
WO2022134465A1 (en) * | 2020-12-24 | 2022-06-30 | 北京清微智能科技有限公司 | Sparse data processing method for accelerating operation of re-configurable processor, and device |
CN114780910A (en) * | 2022-06-16 | 2022-07-22 | 千芯半导体科技(北京)有限公司 | Hardware system and calculation method for sparse convolution calculation |
CN115145839A (en) * | 2021-03-31 | 2022-10-04 | 广东高云半导体科技股份有限公司 | Deep convolution accelerator and method for accelerating deep convolution by using same |
CN115529475A (en) * | 2021-12-29 | 2022-12-27 | 北京智美互联科技有限公司 | Method and system for detecting video flow content and controlling wind |
CN115879530A (en) * | 2023-03-02 | 2023-03-31 | 湖北大学 | Method for optimizing array structure of RRAM (resistive random access memory) memory computing system |
CN116029332A (en) * | 2023-02-22 | 2023-04-28 | 南京大学 | On-chip fine tuning method and device based on LSTM network |
CN116261736A (en) * | 2020-06-12 | 2023-06-13 | 墨芯国际有限公司 | Method and system for double sparse convolution processing and parallelization |
CN117290279A (en) * | 2023-11-24 | 2023-12-26 | 深存科技(无锡)有限公司 | Shared tight coupling based general computing accelerator |
CN113435570B (en) * | 2021-05-07 | 2024-05-31 | 西安电子科技大学 | Programmable convolutional neural network processor, method, device, medium and terminal |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107229967A (en) * | 2016-08-22 | 2017-10-03 | 北京深鉴智能科技有限公司 | A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
CN108932548A (en) * | 2018-05-22 | 2018-12-04 | 中国科学技术大学苏州研究院 | A kind of degree of rarefication neural network acceleration system based on FPGA |
CN109472350A (en) * | 2018-10-30 | 2019-03-15 | 南京大学 | A kind of neural network acceleration system based on block circulation sparse matrix |
-
2019
- 2019-04-02 CN CN201910259591.7A patent/CN109993297A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107229967A (en) * | 2016-08-22 | 2017-10-03 | 北京深鉴智能科技有限公司 | A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
CN108932548A (en) * | 2018-05-22 | 2018-12-04 | 中国科学技术大学苏州研究院 | A kind of degree of rarefication neural network acceleration system based on FPGA |
CN109472350A (en) * | 2018-10-30 | 2019-03-15 | 南京大学 | A kind of neural network acceleration system based on block circulation sparse matrix |
Cited By (80)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516801B (en) * | 2019-08-05 | 2022-04-22 | 西安交通大学 | High-throughput-rate dynamic reconfigurable convolutional neural network accelerator |
CN110516801A (en) * | 2019-08-05 | 2019-11-29 | 西安交通大学 | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput |
CN110543900A (en) * | 2019-08-21 | 2019-12-06 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN110738310A (en) * | 2019-10-08 | 2020-01-31 | 清华大学 | sparse neural network accelerators and implementation method thereof |
CN110738310B (en) * | 2019-10-08 | 2022-02-01 | 清华大学 | Sparse neural network accelerator and implementation method thereof |
CN112766453A (en) * | 2019-10-21 | 2021-05-07 | 华为技术有限公司 | Data processing device and data processing method |
CN110807513A (en) * | 2019-10-23 | 2020-02-18 | 中国人民解放军国防科技大学 | Convolutional neural network accelerator based on Winograd sparse algorithm |
CN110852422A (en) * | 2019-11-12 | 2020-02-28 | 吉林大学 | Convolutional neural network optimization method and device based on pulse array |
CN111047008A (en) * | 2019-11-12 | 2020-04-21 | 天津大学 | Convolutional neural network accelerator and acceleration method |
CN111047008B (en) * | 2019-11-12 | 2023-08-01 | 天津大学 | Convolutional neural network accelerator and acceleration method |
CN111079919A (en) * | 2019-11-21 | 2020-04-28 | 清华大学 | Memory computing architecture supporting weight sparsity and data output method thereof |
CN111079919B (en) * | 2019-11-21 | 2022-05-20 | 清华大学 | Memory computing architecture supporting weight sparseness and data output method thereof |
CN111047010A (en) * | 2019-11-25 | 2020-04-21 | 天津大学 | Method and device for reducing first-layer convolution calculation delay of CNN accelerator |
CN110991631A (en) * | 2019-11-28 | 2020-04-10 | 福州大学 | Neural network acceleration system based on FPGA |
CN111062472A (en) * | 2019-12-11 | 2020-04-24 | 浙江大学 | Sparse neural network accelerator based on structured pruning and acceleration method thereof |
CN111178508A (en) * | 2019-12-27 | 2020-05-19 | 珠海亿智电子科技有限公司 | Operation device and method for executing full connection layer in convolutional neural network |
CN111178508B (en) * | 2019-12-27 | 2024-04-05 | 珠海亿智电子科技有限公司 | Computing device and method for executing full connection layer in convolutional neural network |
CN111240743A (en) * | 2020-01-03 | 2020-06-05 | 上海兆芯集成电路有限公司 | Artificial intelligence integrated circuit |
CN111240743B (en) * | 2020-01-03 | 2022-06-03 | 格兰菲智能科技有限公司 | Artificial intelligence integrated circuit |
CN111199277B (en) * | 2020-01-10 | 2023-05-23 | 中山大学 | Convolutional neural network accelerator |
CN111199277A (en) * | 2020-01-10 | 2020-05-26 | 中山大学 | Convolutional neural network accelerator |
CN111368988B (en) * | 2020-02-28 | 2022-12-20 | 北京航空航天大学 | Deep learning training hardware accelerator utilizing sparsity |
CN111368988A (en) * | 2020-02-28 | 2020-07-03 | 北京航空航天大学 | Deep learning training hardware accelerator utilizing sparsity |
CN111401554B (en) * | 2020-03-12 | 2023-03-24 | 交叉信息核心技术研究院(西安)有限公司 | Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization |
CN111401554A (en) * | 2020-03-12 | 2020-07-10 | 交叉信息核心技术研究院(西安)有限公司 | Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization |
CN111415004A (en) * | 2020-03-17 | 2020-07-14 | 北京百度网讯科技有限公司 | Method and apparatus for outputting information |
CN111415004B (en) * | 2020-03-17 | 2023-11-03 | 阿波罗智联(北京)科技有限公司 | Method and device for outputting information |
CN113496274A (en) * | 2020-03-20 | 2021-10-12 | 郑桂忠 | Quantification method and system based on operation circuit architecture in memory |
CN111445012A (en) * | 2020-04-28 | 2020-07-24 | 南京大学 | FPGA-based packet convolution hardware accelerator and method thereof |
CN111401532A (en) * | 2020-04-28 | 2020-07-10 | 南京宁麒智能计算芯片研究院有限公司 | Convolutional neural network reasoning accelerator and acceleration method |
CN111445013A (en) * | 2020-04-28 | 2020-07-24 | 南京大学 | Non-zero detector for convolutional neural network and method thereof |
CN111738433A (en) * | 2020-05-22 | 2020-10-02 | 华南理工大学 | Reconfigurable convolution hardware accelerator |
CN111738433B (en) * | 2020-05-22 | 2023-09-26 | 华南理工大学 | Reconfigurable convolution hardware accelerator |
CN111667051A (en) * | 2020-05-27 | 2020-09-15 | 上海赛昉科技有限公司 | Neural network accelerator suitable for edge equipment and neural network acceleration calculation method |
CN111667052B (en) * | 2020-05-27 | 2023-04-25 | 上海赛昉科技有限公司 | Standard and nonstandard convolution consistency transformation method of special neural network accelerator |
CN111667052A (en) * | 2020-05-27 | 2020-09-15 | 上海赛昉科技有限公司 | Standard and nonstandard volume consistency transformation method for special neural network accelerator |
CN111782356B (en) * | 2020-06-03 | 2022-04-08 | 上海交通大学 | Data flow method and system of weight sparse neural network chip |
CN111782356A (en) * | 2020-06-03 | 2020-10-16 | 上海交通大学 | Data flow method and system of weight sparse neural network chip |
CN111882028A (en) * | 2020-06-08 | 2020-11-03 | 北京大学深圳研究生院 | Convolution operation device for convolution neural network |
CN116261736A (en) * | 2020-06-12 | 2023-06-13 | 墨芯国际有限公司 | Method and system for double sparse convolution processing and parallelization |
CN111967587A (en) * | 2020-07-27 | 2020-11-20 | 复旦大学 | Arithmetic unit array structure for neural network processing |
CN111967587B (en) * | 2020-07-27 | 2024-03-29 | 复旦大学 | Method for constructing operation unit array structure facing neural network processing |
CN111914999B (en) * | 2020-07-30 | 2024-04-19 | 云知声智能科技股份有限公司 | Method and equipment for reducing calculation bandwidth of neural network accelerator |
CN111914999A (en) * | 2020-07-30 | 2020-11-10 | 云知声智能科技股份有限公司 | Method and equipment for reducing calculation bandwidth of neural network accelerator |
CN112052941B (en) * | 2020-09-10 | 2024-02-20 | 南京大学 | Efficient memory calculation system applied to CNN (computer numerical network) convolution layer and operation method thereof |
CN112052941A (en) * | 2020-09-10 | 2020-12-08 | 南京大学 | Efficient storage and calculation system applied to CNN network convolution layer and operation method thereof |
CN112418417B (en) * | 2020-09-24 | 2024-02-27 | 北京计算机技术及应用研究所 | Convolutional neural network acceleration device and method based on SIMD technology |
CN112418417A (en) * | 2020-09-24 | 2021-02-26 | 北京计算机技术及应用研究所 | Convolution neural network acceleration device and method based on SIMD technology |
CN112506436B (en) * | 2020-12-11 | 2023-01-31 | 西北工业大学 | High-efficiency data dynamic storage allocation method for convolutional neural network accelerator |
CN112506436A (en) * | 2020-12-11 | 2021-03-16 | 西北工业大学 | High-efficiency data dynamic storage allocation method for convolutional neural network accelerator |
CN113159302A (en) * | 2020-12-15 | 2021-07-23 | 浙江大学 | Routing structure for reconfigurable neural network processor |
WO2022134465A1 (en) * | 2020-12-24 | 2022-06-30 | 北京清微智能科技有限公司 | Sparse data processing method for accelerating operation of re-configurable processor, and device |
CN112836803A (en) * | 2021-02-04 | 2021-05-25 | 珠海亿智电子科技有限公司 | Data placement method for improving convolution operation efficiency |
CN115145839A (en) * | 2021-03-31 | 2022-10-04 | 广东高云半导体科技股份有限公司 | Deep convolution accelerator and method for accelerating deep convolution by using same |
CN115145839B (en) * | 2021-03-31 | 2024-05-14 | 广东高云半导体科技股份有限公司 | Depth convolution accelerator and method for accelerating depth convolution |
CN113077047B (en) * | 2021-04-08 | 2023-08-22 | 华南理工大学 | Convolutional neural network accelerator based on feature map sparsity |
CN113077047A (en) * | 2021-04-08 | 2021-07-06 | 华南理工大学 | Convolutional neural network accelerator based on feature map sparsity |
CN113128688A (en) * | 2021-04-14 | 2021-07-16 | 北京航空航天大学 | General AI parallel reasoning acceleration structure and reasoning equipment |
CN113128688B (en) * | 2021-04-14 | 2022-10-21 | 北京航空航天大学 | General AI parallel reasoning acceleration structure and reasoning equipment |
CN113191493B (en) * | 2021-04-27 | 2024-05-28 | 北京工业大学 | Convolutional neural network accelerator based on FPGA parallelism self-adaption |
CN113191493A (en) * | 2021-04-27 | 2021-07-30 | 北京工业大学 | Convolutional neural network accelerator based on FPGA parallelism self-adaptation |
CN113435570A (en) * | 2021-05-07 | 2021-09-24 | 西安电子科技大学 | Programmable convolutional neural network processor, method, device, medium, and terminal |
CN113435570B (en) * | 2021-05-07 | 2024-05-31 | 西安电子科技大学 | Programmable convolutional neural network processor, method, device, medium and terminal |
CN113313251B (en) * | 2021-05-13 | 2023-05-23 | 中国科学院计算技术研究所 | Depth separable convolution fusion method and system based on data flow architecture |
CN113313251A (en) * | 2021-05-13 | 2021-08-27 | 中国科学院计算技术研究所 | Deep separable convolution fusion method and system based on data stream architecture |
CN113486200A (en) * | 2021-07-12 | 2021-10-08 | 北京大学深圳研究生院 | Data processing method, processor and system of sparse neural network |
CN113591025A (en) * | 2021-08-03 | 2021-11-02 | 深圳思谋信息科技有限公司 | Feature map processing method and device, convolutional neural network accelerator and medium |
CN113705794B (en) * | 2021-09-08 | 2023-09-01 | 上海交通大学 | Neural network accelerator design method based on dynamic activation bit sparseness |
CN113705794A (en) * | 2021-09-08 | 2021-11-26 | 上海交通大学 | Neural network accelerator design method based on dynamic activation bit sparsity |
CN113946538A (en) * | 2021-09-23 | 2022-01-18 | 南京大学 | Convolutional layer fusion storage device and method based on line cache mechanism |
CN113946538B (en) * | 2021-09-23 | 2024-04-12 | 南京大学 | Convolutional layer fusion storage device and method based on line caching mechanism |
CN114065927B (en) * | 2021-11-22 | 2023-05-05 | 中国工程物理研究院电子工程研究所 | Excitation data block processing method of hardware accelerator and hardware accelerator |
CN114065927A (en) * | 2021-11-22 | 2022-02-18 | 中国工程物理研究院电子工程研究所 | Excitation data blocking processing method of hardware accelerator and hardware accelerator |
CN115529475A (en) * | 2021-12-29 | 2022-12-27 | 北京智美互联科技有限公司 | Method and system for detecting video flow content and controlling wind |
CN114780910A (en) * | 2022-06-16 | 2022-07-22 | 千芯半导体科技(北京)有限公司 | Hardware system and calculation method for sparse convolution calculation |
CN116029332B (en) * | 2023-02-22 | 2023-08-22 | 南京大学 | On-chip fine tuning method and device based on LSTM network |
CN116029332A (en) * | 2023-02-22 | 2023-04-28 | 南京大学 | On-chip fine tuning method and device based on LSTM network |
CN115879530A (en) * | 2023-03-02 | 2023-03-31 | 湖北大学 | Method for optimizing array structure of RRAM (resistive random access memory) memory computing system |
CN117290279B (en) * | 2023-11-24 | 2024-01-26 | 深存科技(无锡)有限公司 | Shared tight coupling based general computing accelerator |
CN117290279A (en) * | 2023-11-24 | 2023-12-26 | 深存科技(无锡)有限公司 | Shared tight coupling based general computing accelerator |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109993297A (en) | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing | |
CN207895435U (en) | Neural computing module | |
CN207458128U (en) | A kind of convolutional neural networks accelerator based on FPGA in vision application | |
CN110516801A (en) | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput | |
Geng et al. | A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing | |
EP2122542B1 (en) | Architecture, system and method for artificial neural network implementation | |
CN107301456B (en) | Deep neural network multi-core acceleration implementation method based on vector processor | |
CN105930902B (en) | A kind of processing method of neural network, system | |
JP6890615B2 (en) | Accelerator for deep neural networks | |
Klöckner et al. | Nodal discontinuous Galerkin methods on graphics processors | |
CN107239823A (en) | A kind of apparatus and method for realizing sparse neural network | |
CN107609641A (en) | Sparse neural network framework and its implementation | |
CN108932548A (en) | A kind of degree of rarefication neural network acceleration system based on FPGA | |
CN106228238A (en) | The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform | |
CN110390384A (en) | A kind of configurable general convolutional neural networks accelerator | |
CN106875013A (en) | The system and method for optimizing Recognition with Recurrent Neural Network for multinuclear | |
JP2019522850A (en) | Accelerator for deep neural networks | |
CN109978161A (en) | A kind of general convolution-pond synchronization process convolution kernel system | |
JP2021521516A (en) | Accelerators and systems for accelerating operations | |
CN107886167A (en) | Neural network computing device and method | |
CN108229645A (en) | Convolution accelerates and computation processing method, device, electronic equipment and storage medium | |
CN109472356A (en) | A kind of accelerator and method of restructural neural network algorithm | |
CN104572011A (en) | FPGA (Field Programmable Gate Array)-based general matrix fixed-point multiplier and calculation method thereof | |
CN109447241A (en) | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field | |
KR20130090147A (en) | Neural network computing apparatus and system, and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190709 |