CN109993297A - A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing - Google Patents

A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing Download PDF

Info

Publication number
CN109993297A
CN109993297A CN201910259591.7A CN201910259591A CN109993297A CN 109993297 A CN109993297 A CN 109993297A CN 201910259591 A CN201910259591 A CN 201910259591A CN 109993297 A CN109993297 A CN 109993297A
Authority
CN
China
Prior art keywords
convolution
data
load balancing
array
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910259591.7A
Other languages
Chinese (zh)
Inventor
王瑶
朱志炜
秦子迪
苏岩
王宇宣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Jixiang Sensing And Imaging Technology Research Institute Co Ltd
Original Assignee
Nanjing Jixiang Sensing And Imaging Technology Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Jixiang Sensing And Imaging Technology Research Institute Co Ltd filed Critical Nanjing Jixiang Sensing And Imaging Technology Research Institute Co Ltd
Priority to CN201910259591.7A priority Critical patent/CN109993297A/en
Publication of CN109993297A publication Critical patent/CN109993297A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses the sparse convolution neural network accelerators and its accelerated method of a kind of load balancing.Accelerator includes master controller, data distribution module, the computing array of convolution algorithm, output result cache module, linear activation primitive unit, pond unit, online coding unit and outer chip dynamic memory.The solution of the present invention can realize the computing array high efficiency operation of convolution algorithm under the conditions of seldom storage resource, guarantee the high reusability of input stimulus and weighted data, the load balancing and high usage of computing array;Computing array supports the Parallel Scheduling that two levels between different characteristic figure between the convolution algorithm and ranks of different size different scales are realized by way of static configuration simultaneously, has good applicability and scalability.

Description

A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
Technical field
The present invention relates to the sparse convolution neural network accelerators and its accelerated method of a kind of load balancing, belong to depth Practise the technical field of algorithm.
Background technique
In recent years, deep learning algorithm computer vision, natural language processing and in terms of obtain It is widely applied the effect with brilliance, and convolutional neural networks CNN is one of most important one algorithm.Convolutional neural networks mould The higher accuracy rate of type often means that the deeper network number of plies, and more network parameters and operand, wherein 90% Calculating all concentrates on convolutional layer, therefore for the high-efficiency operation convolutional neural networks preferably in embedded system, optimization The Energy Efficiency Ratio of convolution operation is imperative.
There are two the main features of convolutional neural networks CNN convolutional layer operation: first is that the data volume of operation is big, convolution algorithm Required feature image and weighted data is in large scale, carries out rarefaction to it and compress storage and can save well data to deposit Storage unit maximumlly utilizes data transfer bandwidth;Second is that operational data stream and control stream are complicated, convolution algorithm is needed according to volume Product dimensional information handles multiple channels of multiple convolution kernels simultaneously, guarantees that the flowing water of operation carries out.
The convolutional neural networks of rarefaction will increase invalid meter in calculating process due to the irregular distribution of nonzero element It calculates, causes calculation resources vacancy rate high.
Summary of the invention
In view of the above-mentioned problems of the prior art, the present invention is intended to provide a kind of sparse convolution of high efficiency load balancing Neural network accelerator, with realize weight and excited data reusability are high, volume of transmitted data is small, it is expansible can degree of parallelism it is high and Required hardware store resource and the few purpose of DSP resource.It is a further object of the present invention to provide a kind of adding using the accelerator Fast method.
Accelerator of the present invention the technical solution adopted is that:
A kind of sparse convolution neural network accelerator of load balancing, comprising: master controller, for controlling convolution algorithm Signal stream and data flow are controlled, data are handled and are saved;Data distribution module, according to the segment partition scheme pair of convolution algorithm Computing array carries out weighted data distribution;The computing array of convolution algorithm, the multiply-add operation for completing sparse convolution operates, defeated The result of part sum out;Result cache module is exported, the result for the part sum to computing array carries out cumulative caching, and whole Unified format is managed into, the characteristic pattern result of processing and pond to be activated is exported;Linear activation primitive unit, for cumulative completion Part and result biasing set and activation primitive operation;Pond unit, for the pond through activation primitive treated result Change operation;Online coding unit, for carrying out the excitation value for still needing to carry out subsequent convolutional layer operation in line coding;Dynamic outside piece Memory, for storing the characteristic pattern of raw image data, the intermediate result of computing array operation and final output.
A kind of accelerated method of the sparse convolution neural network accelerator of load balancing of the present invention, comprising the following steps:
1) beta pruning is carried out to convolutional neural networks Model Weight data, according to the scale parameter logistic of weighted data according to progress Then grouping takes identical prune approach to carry out each group weighted data sparse on the basis of guaranteeing model entirety accuracy rate Change processing;
2) the sparse convolution operation mapping scheme for formulating load balancing, the convolutional neural networks after rarefaction are mapped to and are added On the computing array of the convolution algorithm of fast device;
3) accelerator guarantees the stream of convolution algorithm according to the configuration information reconstruction calculations array and storage array of mapping scheme Water carries out;
4) main controller controls data distribution module completes the distribution of weighted data and excited data, and computing array is transported It calculates, exports conventional part and result;
5) it is added up to the conventional part and result and is linearly corrected, i.e., completion biasing is set operates with activation primitive;
6) the pondization operation of respective cells core size and step-length is carried out according to current convolutional layer pond demand;
7) judge whether current convolution layer operation is the last layer, if it is not, then carrying out in line coding, after coding Excitation result is sent to next layer of convolution, if it is, being output to outer chip dynamic memory, completes the acceleration of convolutional neural networks.
Compared with prior art, the invention has the advantages that
The sparse convolution neural network accelerator and its accelerated method of a kind of load balancing provided by the invention, maximumlly Using the sparse characteristic of convolution algorithm data, it can realize that the computing array of convolution algorithm is high under the conditions of seldom storage resource Efficiency operation guarantees the high reusability of input stimulus and weighted data, the load balancing and high usage of operation array;It counts simultaneously Array is calculated to support to realize by way of static configuration between the convolution algorithm and ranks of different size different scales and different spies The Parallel Scheduling of two levels, has good applicability and scalability between sign figure.Design of the invention can meet well The demand of the low-power consumption high energy efficiency ratio of convolutional neural networks is run under embedded system at present.
Detailed description of the invention
Fig. 1 is the sparse convolution network accelerating method schematic diagram of load balancing.
Fig. 2 is weight prune approach schematic diagram.
Fig. 3 is hardware accelerator overall structure diagram.
Fig. 4 is convolution algorithm mapping mode schematic diagram.
Fig. 5 is convolution algorithm schematic diagram in PE group.
Fig. 6 is the realization schematic diagram that PE is array-supported balanced and storage is shared.
Specific embodiment
The present invention program is described in detail with reference to the accompanying drawing.
It is as shown in Figure 1 the sparse convolution network operations method flow schematic diagram of load balancing, it first can be to convolutional Neural Network model weighted data carries out beta pruning, according to the scale parameter logistic of weighted data according to being grouped, is then guaranteeing model Identical prune approach is taken to carry out LS-SVM sparseness each group weighted data on the basis of whole accuracy rate;Then according to convolution The sparse convolution operation mapping scheme of operation input feature vector figure and convolution kernel dimensioned load balancing, by the convolution after rarefaction Neural network is mapped on PE (Process Element arithmetic element) array of the convolution algorithm of hardware accelerator;Then hard Part accelerator reconstructs PE array and storage array according to the configuration information of mapping scheme, guarantees that the flowing water of convolution algorithm carries out;Add The master controller of fast device can control the distribution for completing weighted data and excited data, and PE array carries out operation, export conventional part And result;Linear amending unit adds up to part and result and is linearly corrected, i.e., completion biasing is set operates with activation primitive; Pond unit carries out the pondization operation of respective cells core size and step-length, including selection maximum according to current convolutional layer pond demand It is worth pondization or average value pond;Finally judge whether current convolution layer operation is the last layer, if it is not, then carrying out Excitation result after coding is sent to next layer of convolution, if it is, being output to piece external storage, completes entire convolution by line coding Accelerate.
The sparse convolution operation mapping scheme of load balancing includes convolution algorithm mapping mode, PE array grouping scheme, defeated Enter the distribution multiplex mode and PE array operation Parallel Scheduling mechanism of feature image and weighted data.
Convolution algorithm mapping mode: input feature vector picture is transformed into a matrix according to row (column) dimension, by weighted data A vector, which is launched into, according to output channel dimension passes through design so that convolution algorithm is converted to matrix-vector multiplication Sparse Matrix-Vector multiplication unit can skip the zero in input feature vector picture and weighted data well, guarantee whole fortune The high efficiency of calculation.
PE array is grouped scheme: completing to divide by master controller static configuration according to the dimensional parameters information of every layer of convolution algorithm Group operation, when PE number is greater than three dimensional convolution kernel total number, one group can calculate all output characteristic pattern channels, on this basis, Remaining PE is grouped by same number, is responsible for calculating not going together for output characteristic pattern;When PE number is less than three dimensional convolution kernel total number, One group of calculating exports the maximum approximate number of characteristic pattern port number, and the principle being grouped in this way is to guarantee each PE arithmetic speed matching, PE array vacancy rate is low.
The distribution multiplex mode of input feature vector picture and weighted data: entire PE array is by one piece of shared on-chip memory The identical excited data of synchronization distribution is as matrix needed for operation, by data distribution module according to the control information of piecemeal operation Weighted data needed for distributing each PE essentially consists in different PE's as vector needed for operation, the multiplexing of input feature vector picture It uses simultaneously, the multiplexing of weight and the same PE replace weighted data after matrix between the multiplexing of weighted data essentially consists in different groups Utilization again without distribution.
PE array operation Parallel Scheduling mechanism: PE array needs to export according to convolutional layer in operation the size of feature image Information determines that different grouping is to complete the output of same output feature image difference row (column), or complete different output characteristic patterns The operation of piece.This ensure that PE array can carry out Parallel Scheduling in two levels, first is that in the layer of single features picture Parallel, second is that different characteristic picture simultaneously and concurrently.
A kind of sparse convolution neural network speeding scheme of load balancing of the present embodiment includes two portions of software and hardware Point, as shown in Fig. 2, being software section Pruning strategy schematic diagram in figure.Pruning strategy is described as follows: for initial intensive nerve Network connection can be grouped it according to the connection number and neuron number of network, and each grouping prune approach is identical with position, That is for the neuron of each convolution kernel group as connection type, the weighted data only connected is different.With input feature vector For figure is W*W*C, (W is characterized figure width and height dimensions, and C is input channel number), convolution kernel is having a size of R*R*C*N, and (R is volume The width of product core and high size, C are convolution kernel port number, and N is convolution kernel number namely output channel number), beta pruning when, can be first The convolution kernel of R*R*C is classified as a convolution kernel group, amount to it is N number of, for each convolution kernel, the position phase of neutral element in them Together;If model needs are not achieved in accuracy rate after beta pruning, convolution kernel group size can be adjusted, takes R*R*C*N1 (approximate number that N1 is N) Carry out beta pruning.
It is illustrated in figure 3 the sparse convolution neural network accelerator structure schematic diagram of hardware components.Overall structure is mainly wrapped Contain: master controller, the convolution algorithm since host computer CPU receives instruction, for generating the control signal of control convolution algorithm Stream and data flow;Data distribution module carries out weighted data distribution to PE according to the segment partition scheme of convolution algorithm;Convolution algorithm PE (Process Element arithmetic element) array is grouped according to the configuration information of master controller and completes sparse convolution Multiply-add operation operation, exports convolution results or part and result;Result cache module is exported, part and result to PE carry out tired Add caching, is organized into after unified format and is sent to subsequent cell and is operated;Linear activation primitive unit, completes convolution algorithm result Biasing set and activation primitive operation;Pond unit completes the maximum value pondization operation of result;Online coding unit, to centre As a result online CSR (storage of compression loose line) coding is carried out, to guarantee that the result of output meets the data of subsequent convolutional layer operation Call format;Outer chip dynamic memory DDR4, for storing raw image data, interlayer intermediate result and convolutional layer final output As a result.
Data distribution module includes the fetch configurable on-chip memory storage unit of address calculation, on piece and data The FIFO group of format caching conversion.The configuration information that data distribution module can be sent according to the master controller received, by fetching Address calculation completes the cache flush mode to outer chip dynamic memory DDR4, and the data taken out are cached to via AXI4 interface The on-chip memory storage unit of on piece weight, a step of going forward side by side format, and distribution is cached in corresponding FIFO, wait Operation sends data.
The PE array of convolution algorithm includes multiple matrix-vector multiplication computing units, can be wanted according to static configuration information It asks, completes in the layer of feature image or interlayer parallel-convolution operates, export part and the result of convolution algorithm.Multiple PE are mono- simultaneously The storage of member is common on-chip memory, and in view of the design of Pruning strategy and hardware structure, multiple PE can be using seldom Under conditions of storage resource, reaches and jump zero acceleration calculating and the matching of difference PE arithmetic speed during calculating sparse convolution.
Matrix-vector multiplication computing unit includes flowing water controller module, weight non-zero detection module, pointer control module, swashs Encourage decompression module, MLA operation unit module and public on-chip memory storage.Weight non-zero detection module can be data distribution The weighted data that module is sent carries out non-zero detection, only transmits nonzero value location information corresponding with its to PE unit;Pointer control Molding block and excitation decompression module can take out non-zero weight value according to corresponding non-zero weight value from common on-chip memory Excitation value needed for corresponding operation, while each PE unit is sent in case operation;MLA operation unit is mainly responsible for matrix Vector multiply in multiplication and additional calculation.
It is illustrated in figure 4 convolution algorithm mapping mode schematic diagram, by taking input feature vector figure is W*W*C as an example, (W is characterized figure Width and height dimensions, C are input channel number), convolution kernel is having a size of R*R*C*N, and (R is the wide and high size of convolution kernel, and C is convolution Core port number, N are convolution kernel number namely output channel number), F is output characteristic pattern size;It is determined each by N size first The number Num_PE of PE unit in PE group can allow Num_PE to be equal to N, each group of a batch operation can if PE total number is greater than N With immediately arrive at output all channels of characteristic pattern as a result, otherwise just allow Num_PE for the approximate number M of N, integer batch operation output is special Sign figure passage portion as a result, guaranteeing that certain PE will not be idle;The group number Group_PE of PE is true by PE total number and Num_PE It is fixed, if one group of operation that can have completed all output channels, different groups are responsible for exporting not going together for characteristic pattern, i.e., As shown in 2 operation of the PE group division of labor in figure.
Convolution algorithm complete for one layer, a PE group is by Num_PE PE unit (i.e. matrix-vector multiplication unit) structure At each matrix-vector multiplication unit is responsible for exporting several rows in a channel of characteristic pattern, if wherein first time operation can export The first row of dry row, specific line number are determined that it is storage that matrix is corresponding in matrix-vector multiplication by the matrix size of matrix-vector multiplication In the shared excited data being locally stored in on-chip memory, corresponding vector is the weight number sent by data distribution module According to;For other PE groups, the subsequent rows that operation content can be output characteristic pattern that is, as shown in Figure 3 can also To be the convolution algorithm of other input feature vector figures, it can it is two different parallel to meet row-column parallel calculation and different characteristic figure in layer Parallel modes of operation.
It is illustrated in figure 5 convolution algorithm schematic diagram in PE group, input feature vector figure and difference are indicated with different numerical value The value of different location on convolution kernel, the matrix-vector multiplication scale that example is taken is the matrix of 2*12 and the vector of 12*1, so PE The vector that each operation output result is 2*1, it is three channel 12* of convolution kernel 1 that PE1 vector in first time operation is corresponding 1, it is three channels of (1,2,4,5) and (2,3,5,6) corresponding position in activating image that matrix is corresponding, is carrying out multiply-add operation Output result is to export the front two row of first channel first row of characteristic pattern afterwards, and subsequent matrices can first update, that is, take (4,5,7, 8) and the excitation value of (5,6,8,9) position, output result are to export the front two row of first channel secondary series of characteristic pattern;It is exporting After all column datas of corresponding row, the corresponding weighted data of vector will do it update, i.e., rear extended meeting output third channel is defeated Result out.And it is exactly after weighted data updates, to become calculating defeated in second channel for calculating output characteristic pattern that PE2 is corresponding The 4th channel out.
It is illustrated in figure 6 the realization schematic diagram that PE is array-supported balanced and storage is shared, the shared on piece of PE array is deposited Reservoir stores according to the nonzero value of the input stimulus of CSR (storage of compression loose line) format storage and their index and refers to Needle, the position of the weight vectors nonzero value sent according to data distribution module take out corresponding excitation and carry out multiply-add operation, due to The interior all weight vectors of PE group are identical according to the position of its nonzero element of software Pruning strategy, so required for each PE Correspondence excitation value be also identical, it is only necessary to seldom memory saves a excitation value, and decodes while being sent to PE i.e. The matrix requirements of PE array can be met.And for all PE, carry out the non-of matrix and vector in matrix-vector multiplication Null position is identical, therefore PE array computation speed matches, and reaches the purpose of design of the low storage load balancing of operation array. At the same time, different PE groups can also share the weighted data of distribution, realize the high reusability of excitation and weight.
It to sum up narrates, the accelerated method for sparse convolution neural network proposed using the embodiment of the present invention, Ke Yiyou Storage hardware resource is saved on effect ground, improves the reusability of input feature vector figure and weight, and the load that can be realized PE array is equal Weighing apparatus, carrying out static configuration to PE array can satisfy different concurrent operation requirements, guarantee the high usage of PE array, whole to improve The data throughput of system system, reaches very high Energy Efficiency Ratio, the embedded system suitable for low-power consumption.

Claims (8)

1. a kind of sparse convolution neural network accelerator of load balancing characterized by comprising
Master controller handles data and is saved for controlling the control signal stream and data flow of convolution algorithm;
Data distribution module carries out weighted data distribution to computing array according to the segment partition scheme of convolution algorithm;
The computing array of convolution algorithm, the multiply-add operation for completing sparse convolution operate, the result of output par, c sum;
Result cache module is exported, the result for the part sum to computing array carries out cumulative caching, and is organized into unified lattice Formula exports the characteristic pattern result of processing and pond to be activated;
Linear activation primitive unit, the biasing for part and result to cumulative completion is set and activation primitive operation;
Pond unit, for being operated to the pondization through activation primitive treated result;
Online coding unit, for carrying out the excitation value for still needing to carry out subsequent convolutional layer operation in line coding;
Outer chip dynamic memory, for storing the spy of raw image data, the intermediate result of computing array operation and final output Sign figure.
2. a kind of sparse convolution neural network accelerator of load balancing according to claim 1, which is characterized in that described The computing array of convolution algorithm includes matrix-vector multiplication computing unit, and the matrix-vector multiplication computing unit includes flowing water controller Module, weight non-zero detection module, pointer control module, excitation decompression module, MLA operation unit module and public on piece are deposited Reservoir;The weighted data that the weight non-zero detection module is used to send data distribution module carries out non-zero detection, and only passes Defeated nonzero value location information corresponding with its is to computing unit;The pointer control module and excitation decompression module are used for according to right The non-zero weight value answered is sent simultaneously from excitation value needed for non-zero weight is worth corresponding operation is taken out in public on-chip memory To each computing unit;The MLA operation unit for operation matrix vector multiply in multiplication and addition.
3. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing, which comprises the following steps:
1) beta pruning is carried out to convolutional neural networks Model Weight data, according to the scale parameter logistic of weighted data according to being divided Then group takes identical prune approach to carry out rarefaction each group weighted data on the basis of guaranteeing model entirety accuracy rate Processing;
2) convolutional neural networks after rarefaction are mapped to accelerator by the sparse convolution operation mapping scheme for formulating load balancing Convolution algorithm computing array on;
3) accelerator is according to the configuration information reconstruction calculations array and storage array of mapping scheme, guarantee the flowing water of convolution algorithm into Row;
4) main controller controls data distribution module completes the distribution of weighted data and excited data, and computing array carries out operation, Export conventional part and result;
5) it is added up to the conventional part and result and is linearly corrected, i.e., completion biasing is set operates with activation primitive;
6) the pondization operation of respective cells core size and step-length is carried out according to current convolutional layer pond demand;
7) judge whether current convolution layer operation is the last layer, if it is not, then carrying out in line coding, by the excitation after coding As a result it is sent to next layer of convolution, if it is, being output to outer chip dynamic memory, completes the acceleration of convolutional neural networks.
4. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 3, special Sign is, in the step 2), sparse convolution operation mapping scheme includes convolution algorithm mapping mode, computing array grouping side The distribution multiplex mode and computing array operation Parallel Scheduling mechanism of case, input feature vector picture and weighted data.
5. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, special Sign is, the convolution algorithm mapping mode specifically: input feature vector picture is transformed into a square according to row dimension or column dimension Battle array, is launched into a vector according to output channel dimension for weighted data, so that convolution algorithm is converted to matrix-vector multiplication fortune It calculates.
6. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, special Sign is that the computing array is grouped scheme specifically: the dimensional parameters information according to every layer of convolution algorithm is quiet by master controller State configures into grouping operation, and when computing unit number is greater than three dimensional convolution kernel total number, a group pattern is all for calculating Characteristic pattern channel is exported, on this basis, remaining computing unit is grouped by same number, is responsible for calculating the difference of output characteristic pattern Row;When computing unit number is less than three dimensional convolution kernel total number, a group pattern is for calculating output characteristic pattern port number most Big approximate number.
7. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, special Sign is, the distribution multiplex mode of the input feature vector picture and weighted data specifically: entire computing array is shared by one piece The identical excited data of on-chip memory synchronization distribution as matrix needed for operation, transported by data distribution module according to piecemeal Weighted data needed for the control information of calculation distributes each computing unit is as vector needed for operation.
8. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, special Sign is, the computing array operation Parallel Scheduling mechanism specifically: computing array needs to be exported according to convolutional layer in operation The dimension information of feature image determines that different grouping is to complete same output feature image not go together or the output of different lines, still Complete the operation of different output feature images.
CN201910259591.7A 2019-04-02 2019-04-02 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing Pending CN109993297A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910259591.7A CN109993297A (en) 2019-04-02 2019-04-02 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910259591.7A CN109993297A (en) 2019-04-02 2019-04-02 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing

Publications (1)

Publication Number Publication Date
CN109993297A true CN109993297A (en) 2019-07-09

Family

ID=67132262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910259591.7A Pending CN109993297A (en) 2019-04-02 2019-04-02 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing

Country Status (1)

Country Link
CN (1) CN109993297A (en)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516801A (en) * 2019-08-05 2019-11-29 西安交通大学 A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput
CN110543900A (en) * 2019-08-21 2019-12-06 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110738310A (en) * 2019-10-08 2020-01-31 清华大学 sparse neural network accelerators and implementation method thereof
CN110807513A (en) * 2019-10-23 2020-02-18 中国人民解放军国防科技大学 Convolutional neural network accelerator based on Winograd sparse algorithm
CN110852422A (en) * 2019-11-12 2020-02-28 吉林大学 Convolutional neural network optimization method and device based on pulse array
CN110991631A (en) * 2019-11-28 2020-04-10 福州大学 Neural network acceleration system based on FPGA
CN111047008A (en) * 2019-11-12 2020-04-21 天津大学 Convolutional neural network accelerator and acceleration method
CN111047010A (en) * 2019-11-25 2020-04-21 天津大学 Method and device for reducing first-layer convolution calculation delay of CNN accelerator
CN111062472A (en) * 2019-12-11 2020-04-24 浙江大学 Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111079919A (en) * 2019-11-21 2020-04-28 清华大学 Memory computing architecture supporting weight sparsity and data output method thereof
CN111178508A (en) * 2019-12-27 2020-05-19 珠海亿智电子科技有限公司 Operation device and method for executing full connection layer in convolutional neural network
CN111199277A (en) * 2020-01-10 2020-05-26 中山大学 Convolutional neural network accelerator
CN111240743A (en) * 2020-01-03 2020-06-05 上海兆芯集成电路有限公司 Artificial intelligence integrated circuit
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
CN111401532A (en) * 2020-04-28 2020-07-10 南京宁麒智能计算芯片研究院有限公司 Convolutional neural network reasoning accelerator and acceleration method
CN111401554A (en) * 2020-03-12 2020-07-10 交叉信息核心技术研究院(西安)有限公司 Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
CN111415004A (en) * 2020-03-17 2020-07-14 北京百度网讯科技有限公司 Method and apparatus for outputting information
CN111445012A (en) * 2020-04-28 2020-07-24 南京大学 FPGA-based packet convolution hardware accelerator and method thereof
CN111445013A (en) * 2020-04-28 2020-07-24 南京大学 Non-zero detector for convolutional neural network and method thereof
CN111667051A (en) * 2020-05-27 2020-09-15 上海赛昉科技有限公司 Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
CN111667052A (en) * 2020-05-27 2020-09-15 上海赛昉科技有限公司 Standard and nonstandard volume consistency transformation method for special neural network accelerator
CN111738433A (en) * 2020-05-22 2020-10-02 华南理工大学 Reconfigurable convolution hardware accelerator
CN111782356A (en) * 2020-06-03 2020-10-16 上海交通大学 Data flow method and system of weight sparse neural network chip
CN111882028A (en) * 2020-06-08 2020-11-03 北京大学深圳研究生院 Convolution operation device for convolution neural network
CN111914999A (en) * 2020-07-30 2020-11-10 云知声智能科技股份有限公司 Method and equipment for reducing calculation bandwidth of neural network accelerator
CN111967587A (en) * 2020-07-27 2020-11-20 复旦大学 Arithmetic unit array structure for neural network processing
CN112052941A (en) * 2020-09-10 2020-12-08 南京大学 Efficient storage and calculation system applied to CNN network convolution layer and operation method thereof
CN112418417A (en) * 2020-09-24 2021-02-26 北京计算机技术及应用研究所 Convolution neural network acceleration device and method based on SIMD technology
CN112506436A (en) * 2020-12-11 2021-03-16 西北工业大学 High-efficiency data dynamic storage allocation method for convolutional neural network accelerator
CN112766453A (en) * 2019-10-21 2021-05-07 华为技术有限公司 Data processing device and data processing method
CN112836803A (en) * 2021-02-04 2021-05-25 珠海亿智电子科技有限公司 Data placement method for improving convolution operation efficiency
CN113077047A (en) * 2021-04-08 2021-07-06 华南理工大学 Convolutional neural network accelerator based on feature map sparsity
CN113128688A (en) * 2021-04-14 2021-07-16 北京航空航天大学 General AI parallel reasoning acceleration structure and reasoning equipment
CN113159302A (en) * 2020-12-15 2021-07-23 浙江大学 Routing structure for reconfigurable neural network processor
CN113191493A (en) * 2021-04-27 2021-07-30 北京工业大学 Convolutional neural network accelerator based on FPGA parallelism self-adaptation
CN113313251A (en) * 2021-05-13 2021-08-27 中国科学院计算技术研究所 Deep separable convolution fusion method and system based on data stream architecture
CN113435570A (en) * 2021-05-07 2021-09-24 西安电子科技大学 Programmable convolutional neural network processor, method, device, medium, and terminal
CN113486200A (en) * 2021-07-12 2021-10-08 北京大学深圳研究生院 Data processing method, processor and system of sparse neural network
CN113496274A (en) * 2020-03-20 2021-10-12 郑桂忠 Quantification method and system based on operation circuit architecture in memory
CN113591025A (en) * 2021-08-03 2021-11-02 深圳思谋信息科技有限公司 Feature map processing method and device, convolutional neural network accelerator and medium
CN113705794A (en) * 2021-09-08 2021-11-26 上海交通大学 Neural network accelerator design method based on dynamic activation bit sparsity
CN113946538A (en) * 2021-09-23 2022-01-18 南京大学 Convolutional layer fusion storage device and method based on line cache mechanism
CN114065927A (en) * 2021-11-22 2022-02-18 中国工程物理研究院电子工程研究所 Excitation data blocking processing method of hardware accelerator and hardware accelerator
WO2022134465A1 (en) * 2020-12-24 2022-06-30 北京清微智能科技有限公司 Sparse data processing method for accelerating operation of re-configurable processor, and device
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
CN115145839A (en) * 2021-03-31 2022-10-04 广东高云半导体科技股份有限公司 Deep convolution accelerator and method for accelerating deep convolution by using same
CN115529475A (en) * 2021-12-29 2022-12-27 北京智美互联科技有限公司 Method and system for detecting video flow content and controlling wind
CN115879530A (en) * 2023-03-02 2023-03-31 湖北大学 Method for optimizing array structure of RRAM (resistive random access memory) memory computing system
CN116029332A (en) * 2023-02-22 2023-04-28 南京大学 On-chip fine tuning method and device based on LSTM network
CN116261736A (en) * 2020-06-12 2023-06-13 墨芯国际有限公司 Method and system for double sparse convolution processing and parallelization
CN117290279A (en) * 2023-11-24 2023-12-26 深存科技(无锡)有限公司 Shared tight coupling based general computing accelerator
CN113435570B (en) * 2021-05-07 2024-05-31 西安电子科技大学 Programmable convolutional neural network processor, method, device, medium and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN108932548A (en) * 2018-05-22 2018-12-04 中国科学技术大学苏州研究院 A kind of degree of rarefication neural network acceleration system based on FPGA
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN108932548A (en) * 2018-05-22 2018-12-04 中国科学技术大学苏州研究院 A kind of degree of rarefication neural network acceleration system based on FPGA
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix

Cited By (80)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516801B (en) * 2019-08-05 2022-04-22 西安交通大学 High-throughput-rate dynamic reconfigurable convolutional neural network accelerator
CN110516801A (en) * 2019-08-05 2019-11-29 西安交通大学 A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput
CN110543900A (en) * 2019-08-21 2019-12-06 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110738310A (en) * 2019-10-08 2020-01-31 清华大学 sparse neural network accelerators and implementation method thereof
CN110738310B (en) * 2019-10-08 2022-02-01 清华大学 Sparse neural network accelerator and implementation method thereof
CN112766453A (en) * 2019-10-21 2021-05-07 华为技术有限公司 Data processing device and data processing method
CN110807513A (en) * 2019-10-23 2020-02-18 中国人民解放军国防科技大学 Convolutional neural network accelerator based on Winograd sparse algorithm
CN110852422A (en) * 2019-11-12 2020-02-28 吉林大学 Convolutional neural network optimization method and device based on pulse array
CN111047008A (en) * 2019-11-12 2020-04-21 天津大学 Convolutional neural network accelerator and acceleration method
CN111047008B (en) * 2019-11-12 2023-08-01 天津大学 Convolutional neural network accelerator and acceleration method
CN111079919A (en) * 2019-11-21 2020-04-28 清华大学 Memory computing architecture supporting weight sparsity and data output method thereof
CN111079919B (en) * 2019-11-21 2022-05-20 清华大学 Memory computing architecture supporting weight sparseness and data output method thereof
CN111047010A (en) * 2019-11-25 2020-04-21 天津大学 Method and device for reducing first-layer convolution calculation delay of CNN accelerator
CN110991631A (en) * 2019-11-28 2020-04-10 福州大学 Neural network acceleration system based on FPGA
CN111062472A (en) * 2019-12-11 2020-04-24 浙江大学 Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111178508A (en) * 2019-12-27 2020-05-19 珠海亿智电子科技有限公司 Operation device and method for executing full connection layer in convolutional neural network
CN111178508B (en) * 2019-12-27 2024-04-05 珠海亿智电子科技有限公司 Computing device and method for executing full connection layer in convolutional neural network
CN111240743A (en) * 2020-01-03 2020-06-05 上海兆芯集成电路有限公司 Artificial intelligence integrated circuit
CN111240743B (en) * 2020-01-03 2022-06-03 格兰菲智能科技有限公司 Artificial intelligence integrated circuit
CN111199277B (en) * 2020-01-10 2023-05-23 中山大学 Convolutional neural network accelerator
CN111199277A (en) * 2020-01-10 2020-05-26 中山大学 Convolutional neural network accelerator
CN111368988B (en) * 2020-02-28 2022-12-20 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
CN111401554B (en) * 2020-03-12 2023-03-24 交叉信息核心技术研究院(西安)有限公司 Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
CN111401554A (en) * 2020-03-12 2020-07-10 交叉信息核心技术研究院(西安)有限公司 Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
CN111415004A (en) * 2020-03-17 2020-07-14 北京百度网讯科技有限公司 Method and apparatus for outputting information
CN111415004B (en) * 2020-03-17 2023-11-03 阿波罗智联(北京)科技有限公司 Method and device for outputting information
CN113496274A (en) * 2020-03-20 2021-10-12 郑桂忠 Quantification method and system based on operation circuit architecture in memory
CN111445012A (en) * 2020-04-28 2020-07-24 南京大学 FPGA-based packet convolution hardware accelerator and method thereof
CN111401532A (en) * 2020-04-28 2020-07-10 南京宁麒智能计算芯片研究院有限公司 Convolutional neural network reasoning accelerator and acceleration method
CN111445013A (en) * 2020-04-28 2020-07-24 南京大学 Non-zero detector for convolutional neural network and method thereof
CN111738433A (en) * 2020-05-22 2020-10-02 华南理工大学 Reconfigurable convolution hardware accelerator
CN111738433B (en) * 2020-05-22 2023-09-26 华南理工大学 Reconfigurable convolution hardware accelerator
CN111667051A (en) * 2020-05-27 2020-09-15 上海赛昉科技有限公司 Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
CN111667052B (en) * 2020-05-27 2023-04-25 上海赛昉科技有限公司 Standard and nonstandard convolution consistency transformation method of special neural network accelerator
CN111667052A (en) * 2020-05-27 2020-09-15 上海赛昉科技有限公司 Standard and nonstandard volume consistency transformation method for special neural network accelerator
CN111782356B (en) * 2020-06-03 2022-04-08 上海交通大学 Data flow method and system of weight sparse neural network chip
CN111782356A (en) * 2020-06-03 2020-10-16 上海交通大学 Data flow method and system of weight sparse neural network chip
CN111882028A (en) * 2020-06-08 2020-11-03 北京大学深圳研究生院 Convolution operation device for convolution neural network
CN116261736A (en) * 2020-06-12 2023-06-13 墨芯国际有限公司 Method and system for double sparse convolution processing and parallelization
CN111967587A (en) * 2020-07-27 2020-11-20 复旦大学 Arithmetic unit array structure for neural network processing
CN111967587B (en) * 2020-07-27 2024-03-29 复旦大学 Method for constructing operation unit array structure facing neural network processing
CN111914999B (en) * 2020-07-30 2024-04-19 云知声智能科技股份有限公司 Method and equipment for reducing calculation bandwidth of neural network accelerator
CN111914999A (en) * 2020-07-30 2020-11-10 云知声智能科技股份有限公司 Method and equipment for reducing calculation bandwidth of neural network accelerator
CN112052941B (en) * 2020-09-10 2024-02-20 南京大学 Efficient memory calculation system applied to CNN (computer numerical network) convolution layer and operation method thereof
CN112052941A (en) * 2020-09-10 2020-12-08 南京大学 Efficient storage and calculation system applied to CNN network convolution layer and operation method thereof
CN112418417B (en) * 2020-09-24 2024-02-27 北京计算机技术及应用研究所 Convolutional neural network acceleration device and method based on SIMD technology
CN112418417A (en) * 2020-09-24 2021-02-26 北京计算机技术及应用研究所 Convolution neural network acceleration device and method based on SIMD technology
CN112506436B (en) * 2020-12-11 2023-01-31 西北工业大学 High-efficiency data dynamic storage allocation method for convolutional neural network accelerator
CN112506436A (en) * 2020-12-11 2021-03-16 西北工业大学 High-efficiency data dynamic storage allocation method for convolutional neural network accelerator
CN113159302A (en) * 2020-12-15 2021-07-23 浙江大学 Routing structure for reconfigurable neural network processor
WO2022134465A1 (en) * 2020-12-24 2022-06-30 北京清微智能科技有限公司 Sparse data processing method for accelerating operation of re-configurable processor, and device
CN112836803A (en) * 2021-02-04 2021-05-25 珠海亿智电子科技有限公司 Data placement method for improving convolution operation efficiency
CN115145839A (en) * 2021-03-31 2022-10-04 广东高云半导体科技股份有限公司 Deep convolution accelerator and method for accelerating deep convolution by using same
CN115145839B (en) * 2021-03-31 2024-05-14 广东高云半导体科技股份有限公司 Depth convolution accelerator and method for accelerating depth convolution
CN113077047B (en) * 2021-04-08 2023-08-22 华南理工大学 Convolutional neural network accelerator based on feature map sparsity
CN113077047A (en) * 2021-04-08 2021-07-06 华南理工大学 Convolutional neural network accelerator based on feature map sparsity
CN113128688A (en) * 2021-04-14 2021-07-16 北京航空航天大学 General AI parallel reasoning acceleration structure and reasoning equipment
CN113128688B (en) * 2021-04-14 2022-10-21 北京航空航天大学 General AI parallel reasoning acceleration structure and reasoning equipment
CN113191493B (en) * 2021-04-27 2024-05-28 北京工业大学 Convolutional neural network accelerator based on FPGA parallelism self-adaption
CN113191493A (en) * 2021-04-27 2021-07-30 北京工业大学 Convolutional neural network accelerator based on FPGA parallelism self-adaptation
CN113435570A (en) * 2021-05-07 2021-09-24 西安电子科技大学 Programmable convolutional neural network processor, method, device, medium, and terminal
CN113435570B (en) * 2021-05-07 2024-05-31 西安电子科技大学 Programmable convolutional neural network processor, method, device, medium and terminal
CN113313251B (en) * 2021-05-13 2023-05-23 中国科学院计算技术研究所 Depth separable convolution fusion method and system based on data flow architecture
CN113313251A (en) * 2021-05-13 2021-08-27 中国科学院计算技术研究所 Deep separable convolution fusion method and system based on data stream architecture
CN113486200A (en) * 2021-07-12 2021-10-08 北京大学深圳研究生院 Data processing method, processor and system of sparse neural network
CN113591025A (en) * 2021-08-03 2021-11-02 深圳思谋信息科技有限公司 Feature map processing method and device, convolutional neural network accelerator and medium
CN113705794B (en) * 2021-09-08 2023-09-01 上海交通大学 Neural network accelerator design method based on dynamic activation bit sparseness
CN113705794A (en) * 2021-09-08 2021-11-26 上海交通大学 Neural network accelerator design method based on dynamic activation bit sparsity
CN113946538A (en) * 2021-09-23 2022-01-18 南京大学 Convolutional layer fusion storage device and method based on line cache mechanism
CN113946538B (en) * 2021-09-23 2024-04-12 南京大学 Convolutional layer fusion storage device and method based on line caching mechanism
CN114065927B (en) * 2021-11-22 2023-05-05 中国工程物理研究院电子工程研究所 Excitation data block processing method of hardware accelerator and hardware accelerator
CN114065927A (en) * 2021-11-22 2022-02-18 中国工程物理研究院电子工程研究所 Excitation data blocking processing method of hardware accelerator and hardware accelerator
CN115529475A (en) * 2021-12-29 2022-12-27 北京智美互联科技有限公司 Method and system for detecting video flow content and controlling wind
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
CN116029332B (en) * 2023-02-22 2023-08-22 南京大学 On-chip fine tuning method and device based on LSTM network
CN116029332A (en) * 2023-02-22 2023-04-28 南京大学 On-chip fine tuning method and device based on LSTM network
CN115879530A (en) * 2023-03-02 2023-03-31 湖北大学 Method for optimizing array structure of RRAM (resistive random access memory) memory computing system
CN117290279B (en) * 2023-11-24 2024-01-26 深存科技(无锡)有限公司 Shared tight coupling based general computing accelerator
CN117290279A (en) * 2023-11-24 2023-12-26 深存科技(无锡)有限公司 Shared tight coupling based general computing accelerator

Similar Documents

Publication Publication Date Title
CN109993297A (en) A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN207895435U (en) Neural computing module
CN207458128U (en) A kind of convolutional neural networks accelerator based on FPGA in vision application
CN110516801A (en) A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput
Geng et al. A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing
EP2122542B1 (en) Architecture, system and method for artificial neural network implementation
CN107301456B (en) Deep neural network multi-core acceleration implementation method based on vector processor
CN105930902B (en) A kind of processing method of neural network, system
JP6890615B2 (en) Accelerator for deep neural networks
Klöckner et al. Nodal discontinuous Galerkin methods on graphics processors
CN107239823A (en) A kind of apparatus and method for realizing sparse neural network
CN107609641A (en) Sparse neural network framework and its implementation
CN108932548A (en) A kind of degree of rarefication neural network acceleration system based on FPGA
CN106228238A (en) The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN110390384A (en) A kind of configurable general convolutional neural networks accelerator
CN106875013A (en) The system and method for optimizing Recognition with Recurrent Neural Network for multinuclear
JP2019522850A (en) Accelerator for deep neural networks
CN109978161A (en) A kind of general convolution-pond synchronization process convolution kernel system
JP2021521516A (en) Accelerators and systems for accelerating operations
CN107886167A (en) Neural network computing device and method
CN108229645A (en) Convolution accelerates and computation processing method, device, electronic equipment and storage medium
CN109472356A (en) A kind of accelerator and method of restructural neural network algorithm
CN104572011A (en) FPGA (Field Programmable Gate Array)-based general matrix fixed-point multiplier and calculation method thereof
CN109447241A (en) A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field
KR20130090147A (en) Neural network computing apparatus and system, and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190709