CN108932548A - A kind of degree of rarefication neural network acceleration system based on FPGA - Google Patents

A kind of degree of rarefication neural network acceleration system based on FPGA Download PDF

Info

Publication number
CN108932548A
CN108932548A CN201810494819.6A CN201810494819A CN108932548A CN 108932548 A CN108932548 A CN 108932548A CN 201810494819 A CN201810494819 A CN 201810494819A CN 108932548 A CN108932548 A CN 108932548A
Authority
CN
China
Prior art keywords
neural network
data
beta pruning
fpga
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810494819.6A
Other languages
Chinese (zh)
Inventor
李曦
周学海
王超
鲁云涛
宫磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Institute for Advanced Study USTC
Original Assignee
Suzhou Institute for Advanced Study USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Institute for Advanced Study USTC filed Critical Suzhou Institute for Advanced Study USTC
Priority to CN201810494819.6A priority Critical patent/CN108932548A/en
Publication of CN108932548A publication Critical patent/CN108932548A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Abstract

The degree of rarefication neural network acceleration system based on FPGA that the invention discloses a kind of, program is executed including hardware accelerator and software, the hardware accelerator includes system control unit and difference beta pruning processing unit connected to it, weight compression unit, data storage cell, data transmission unit, calculation processing unit, it includes being stored respectively in beta pruning processing unit, beta pruning processing routine and weight processing routine in weight compression unit that the software, which executes program, software executes program and handles sparse network parameter matrix beta pruning, and carries out compression storage;Hardware accelerator is based on FPGA hardware development platform, carries out to calculating process subsequent after compression parameters hardware-accelerated.The present invention optimizes processing to the neural network model for using novel technology of prunning branches to generate, the pruning method used according to different type nervous layer targetedly handles the weighting parameter of this layer, compression storage is carried out to network parameter, reduces the memory space that network model uses.

Description

A kind of degree of rarefication neural network acceleration system based on FPGA
Technical field
The present invention relates to computer hardwares to accelerate field, in particular to a kind of degree of rarefication neural network based on FPGA accelerates System and its design method.
Background technique
A kind of method to work well of deep neural network, is widely used in multiple fields.In image classification application often Convolutional neural networks are a kind of versions of deep neural network.Convolutional neural networks model size in development is continuous Increase, thus possess the computation-intensive and big feature of parameter amount of storage, it can not application deployment in many resource-constrained equipment. Traditional neural networks pruning technology uses identical Pruning strategy in the different convolutional layer of calculation features and full articulamentum, deletes The neuron and Synaptic junction of redundancy are ultimately formed not in neural network model with reducing calculating operation quantity and memory space With the neural network model of degree of rarefication, different hardware resources is adapted to.Neural network model is stored using the data structure of matrix The parameter matrix of network parameter, primitive network model is dense matrix, and the network model parameter after beta pruning is expressed as sparse matrix, Contain a large amount of zero valued elements i.e. in parameter matrix, these zero valued elements can be omitted in calculating and storage, and not influence Final prediction accuracy.
Meanwhile most of neural network accelerators only consider to handle the neural network model of dense parameter, it is sparse handling When the neural network model of parameter, calculating is stored in still according to traditional mode for the zero valued elements in parameter matrix, and Corresponding performance boost is not obtained from rarefaction.And the sparse neural network accelerator proposed in the recent period handles such sparse net Network model carries out compression storage to sparse parameter matrix, then calculates the parameter of compression, the sparse network accelerator Performance promotion it is also disproportionate with the degree of network parameter reduction.
For example, a convolutional neural networks model used in Handwritten Digit Recognition application, in original neural network model In, the input neuronal quantity of first full connection layer original is 800, and output neuron quantity is 500, then the cynapse of this layer connects Connecing quantity is 100000, uses 500 QUOTE 800 matrix data structures store the weighting parameter on each Synaptic junction. The model is handled using technology of prunning branches, deletes the Synaptic junction that weight is less than some threshold value.In obtained sparse neural network mould In the full articulamentum of type, Synaptic junction quantity becomes 8000, and the weighting parameter numerical value for the connection corresponding position that beta pruning deletes is set It is set to zero, that is, 500 original QUOTE The element of corresponding position is zero in 800 weight matrixs.Due to neural network What layer carried out is calculated as multiplication and non-linear rectification function QUOTE Operation, thus multiplier be zero multiplying after result be zero, zero again subsequent ReLU function in filter, therefore, the zero valued elements in weight matrix do not have an impact subsequent calculated result, therefore can be with It does not store and participates in subsequent calculating.Theoretically, it in the full articulamentum, is answered by the memory space of beta pruning treated Sparse parameter It is 1/10th of original dense parameter, calculated performance should be 10 times of original dense performance parameters.But use most of places When managing the accelerator of dense neural network, the corresponding zero valued elements of sparse weight matrix can still be stored and carry out multiplying Operation, therefore, the performance that accelerator handles sparse neural network are similar to the performance for handling dense neural network, network parameter Memory space does not also change.In addition, can consume exceptional space for the dedicated accelerator of sparse network of traditional technology of prunning branches and deposit Parameter reference is stored up, the calculating of additional time overhead processing compression parameters is consumed, ideal acceleration effect can not be reached.To sum up Described, whether dense neural network accelerator or sparse neural network accelerator, handle this of traditional technology of prunning branches generation When class sparse neural network, it cannot reach ideal performance acceleration effect.
Summary of the invention
Object of the present invention is to:In view of the shortcomings and deficiencies of the prior art, the invention proposes one kind to be based on novel beta pruning The neural network accelerator system of technology(Hereinafter referred to as acceleration system), which generates dilute for novel technology of prunning branches Neural network is dredged, using the working method of software-hardware synergism, performance acceleration is carried out to such sparse neural network.
The technical scheme is that:
A kind of degree of rarefication neural network acceleration system based on FPGA, including hardware accelerator and software execute program, described hard Part accelerator includes system control unit and beta pruning processing unit connected to it, weight compression unit, data storage are single respectively Member, data transmission unit, calculation processing unit, it includes being stored respectively in beta pruning processing unit, weight that the software, which executes program, Beta pruning processing routine and weight processing routine in compression unit, software execute program to sparse network parameter matrix beta pruning at Reason, and carry out compression storage;Hardware accelerator is based on FPGA hardware development platform, carries out to calculating process subsequent after compression parameters It is hardware-accelerated.
Preferably, the beta pruning processing routine that the software executes program use respectively convolutional layer beta pruning subprogram with connect entirely Layer beta pruning subprogram carries out cut operator to dense network model, neuron of the numerical value lower than specific threshold or prominent in case-deleted models Touching connection;The weight processing routine that the software executes program is used to carry out the Sparse parameter matrix after beta pruning at compression storage Reason.
Preferably, the execution process of system control unit control whole system, by neural network parameter carry out beta pruning, Compression, storage and calculating;Ginseng after data storage cell storage beta pruning treated Sparse parameter matrix and compression processing Number compressed data structure;The parameter of compression is transmitted to hardware cache using direct memory access mode by the data transmission unit On be ready for subsequent calculating;Calculation processing unit carries out multiplying, accumulating operation and the activation in Application of Neural Network Functional operation.
Preferably, the convolutional layer beta pruning subprogram of the beta pruning processing routine rolls up neural network as unit of neuron Lamination carries out beta pruning processing, in convolutional layer as unit of characteristic pattern.
Preferably, the full articulamentum beta pruning subprogram of the software beta pruning processing routine is as unit of one group of weighting parameter Beta pruning processing is carried out to the full articulamentum of neural network;According to the computing capability of calculation processing unit by the weighting parameter of full articulamentum It is grouped, calculates the root mean square of all elements in each weight group, the weight group that root mean square is less than a certain threshold value is deleted, The corresponding position element of weight matrix will be set to zero.
Preferably, the neural network model of software beta pruning routine processes, convolution layer parameter are thrown away as condensed form, are more The data structure of a convolution nuclear matrix;Full connection layer parameter is sparse form, is the data structure of Sparse parameter matrix;Described Weight processing routine carries out compression storage to the Sparse parameter matrix of full articulamentum, uses the sparse matrix compression storage lattice of CSR Formula.
Preferably, the computing function structure in the hardware accelerator architecture inside calculation processing unit, including fixed point Multiplier, cumulative tree, activation primitive and nonzero value pass filter, wherein fixed-point multiplication device is by the input vector and power of full articulamentum It is worth nonzero value in array and carries out matrix multiplication operation;The result that cumulative tree exports fixed-point multiplication device passes through the parallel meter of cumulative tree It calculates;The summed result of cumulative tree output is carried out activation operation by activation primitive;Nonzero value filters component for the defeated of activation primitive value It is filtered, is not stored when exporting result and being zero out, output result is that nonzero value is stored into output data caching.
Preferably, in the hardware accelerator calculation processing unit quantity, i.e., system in parallel degree be p, it is hard according to FPGA Resource constraint situation in part development platform, determines the value of p.
Preferably, the calculation processing unit needs to divide the data for participating in calculating when calculating, after calculating Partial results are integrated, final output result is obtained.
Preferably, the method that the data divide, in convolutional layer, by the small size convolution nuclear matrix on each channel A set of matrices is formed, data division is carried out to the parameter of convolutional layer as unit of convolution kernel, while data being avoided to draw as far as possible Bring data are divided to replicate quantity;In full articulamentum, weighting parameter is divided according to original row vector, to each meter The input vector in processing unit is calculated, the corresponding input vector element of corresponding non-zero weight is only replicated, remaining vector does not replicate.
Preferably, the calculation process of hardware accelerator processing sparse neural network prediction algorithm is, in general procedure Under the control of device, the input data in data storage cell is loaded into network parametric data by meter by Data Transmission Control Unit Calculate the buffer structure of processing unit, wherein the calculating data in convolutional layer are input feature vector figure and convolution nuclear matrix, full articulamentum Calculating data be input vector and multiple weight arrays;Multiplication fortune is carried out by the functional structure in calculation processing unit again It calculates, accumulating operation, activation primitive processing and nonzero value filtration phase processing obtain output result and be stored in buffer structure;Finally, The data in caching are returned in data storage cell by Data Transmission Control Unit.
It is an advantage of the invention that:
The present invention optimizes processing to the neural network model for using novel technology of prunning branches to generate, according to different type nervous layer The pruning method used targetedly handles the weighting parameter of this layer, carries out compression storage to network parameter, reduces network model The memory space used.Meanwhile using the performance advantage using hardware accelerator, compressed weight and input data are utilized Special hardware calculation processing unit carries out acceleration processing, when reducing the execution of sparse neural network prediction algorithm after beta pruning Between.
Detailed description of the invention
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is that the present invention is based on the integrated stand compositions of the degree of rarefication neural network acceleration system of FPGA;
Fig. 2 is the schematic diagram that convolutional layer beta pruning subprogram of the present invention carries out beta pruning processing to neural network convolutional layer;
Fig. 3 is the schematic diagram that the full articulamentum beta pruning subprogram of the present invention carries out beta pruning processing to the full articulamentum of neural network;
Fig. 4 is invention software weight processing routine to beta pruning treated signal that neural network Sparse parameter compressed Figure;
Fig. 5 is the data buffer storage structural schematic diagram in hardware accelerator architecture of the present invention inside calculation processing unit;
Fig. 6 is the computing function structural schematic diagram in hardware accelerator architecture of the present invention inside calculation processing unit;
Fig. 7 is the data partition method schematic diagram of convolutional layer of the present invention;
Fig. 8 is the data partition method schematic diagram of the full articulamentum of the present invention;
Fig. 9 is the calculation flow chart that hardware accelerator of the present invention handles sparse neural network prediction algorithm;
Figure 10 is that the calculating control of acceleration system of the present invention and data transmit schematic diagram.
Specific embodiment
As shown in Figure 1, a kind of degree of rarefication neural network acceleration system based on FPGA that the present invention designs, is held using software Framework of the line program in conjunction with hardware accelerator.Program is executed including hardware accelerator and software, software executes program and is based on C/C ++ advanced language programming, including being stored respectively in beta pruning processing unit, the beta pruning processing routine in weight compression unit and weight Processing routine, the method for realizing novel beta pruning, and compression storage is carried out to the sparse network parameter matrix after beta pruning, it only stores non- Zero param elements reduce the memory space of network parameter.The hardware accelerator is based on FPGA hardware development platform, including is Control unit of uniting and respectively beta pruning processing unit connected to it, weight compression unit, data storage cell, data transmission are single Member, calculation processing unit, it is hardware-accelerated to calculating process subsequent after compression parameters progress, improve the computational of neural network model Energy.
It includes beta pruning processing routine and weight processing routine that the software, which executes program, to the network of dense neural network Parameter carries out beta pruning and compression processing.Wherein, beta pruning processing routine is with respectively using convolutional layer beta pruning subprogram and full articulamentum Beta pruning subprogram carries out cut operator to dense network model, neuron of the numerical value lower than specific threshold or cynapse in case-deleted models Connection;Weight processing routine is used to carry out the Sparse parameter matrix after beta pruning compression storage processing.
The hardware accelerator, which calculate to the weighting parameter of compression, to be accelerated.Wherein system control unit controls Neural network parameter is carried out beta pruning, compression, storage and calculating by the execution process of whole system;Data storage cell storage is cut Compression of parameters data structure after branch treated Sparse parameter matrix and compression processing;Data transmission unit is using directly interior It deposits access mode the parameter of compression is transmitted in hardware cache and be ready for subsequent calculating;Calculation processing unit is nerve net The formant that network calculates carries out main multiplying, accumulating operation and activation primitive operation in Application of Neural Network.
The convolutional layer beta pruning subprogram of the software beta pruning processing routine is as unit of neuron to neural network convolution Layer carries out beta pruning processing, by as unit of characteristic pattern in convolutional layer, as shown in Fig. 2, it is shown, if the corresponding output feature of convolution kernel Figure is Y, before each output layer money core add a mask layer, the characteristic pattern of mask layer all include two parameter alpha and Beta exports as QUOTE .The value of alpha and the value of beta are related, when When the value of beta is less than threshold value, alpha=0;When the value of beta is greater than threshold value, alpha=1.The convolution kernel matrix element of alpha=1 Prime number value is set as zero, until deleting mask layer after its subsequent characteristic pattern is all deleted.
The full articulamentum beta pruning subprogram of the software beta pruning processing routine is as unit of one group of weighting parameter to nerve The full articulamentum of network carries out beta pruning processing.The weighting parameter of full articulamentum is divided according to the computing capability of calculation processing unit Group calculates the root mean square of all elements in each weight group as shown in figure 3, the computing capability of processing unit is p, will be square The weight group that root is less than a certain threshold value is deleted, and the corresponding position element of weight matrix will be set to zero.
Treated that neural network Sparse parameter carries out compression storage to beta pruning for the software weight processing routine.It is novel The neural network model of pruning method processing, convolution layer parameter are thrown away as condensed form, are the data structure of multiple convolution nuclear matrix; Full connection layer parameter is sparse form, is the data structure of Sparse parameter matrix.The Sparse parameter matrix of full articulamentum is carried out Compression storage compresses storage format using the sparse matrix of CSR, as shown in Figure 4.Weighting parameter is divided per the group when beta pruning, and every group Weight number of elements is p, and the nonzero element number in weight matrix is nz, uses modified routine CSR Matrix compressional storage Format, using storing original weight matrix nonzero element in three one-dimension arrays.Wherein, the original power of nonzero element storage of array All non-zero weights of value matrix;Wherein, in each weight group of column index storage of array first weight in original matrix position The column index number set;Wherein, position indicator pointer of the first nonzero element of the every row of row vector storage of array in nonzero value array.
The hardware system control unit is using the ARM general processor or host provided in FPGA hardware development platform On Intel general processor, for control acceleration system calculating execution and data transmit.
Data storage cell in the hardware accelerator architecture is bis- using the DDR provided in FPGA hardware development platform Times rate synchronous DRAM, as Installed System Memory, treated for storing software program beta pruning supplemental characteristic with Compression parameters data.
The hardware data transmission unit uses the direct memory access controller of DMA of AXI or PCI-E, by AXI bus or PCI-E bus connects control unit and computing unit, while controlling the direct transmission of data.
Data buffer storage structure in the hardware accelerator architecture inside calculation processing unit, as shown in figure 5, including volume Lamination data buffer storage and full articulamentum data buffer storage.Wherein, convolutional layer caches input feature vector figure, convolution nuclear matrix and output feature Figure;Full articulamentum caches input vector, output vector and compressed three weight arrays.
Computing function structure in the hardware accelerator architecture inside calculation processing unit, as shown in fig. 6, including fixed Dot product musical instruments used in a Buddhist or Taoist mass, cumulative tree, activation primitive and nonzero value pass filter.Wherein, fixed-point multiplication device by the input convolution kernel of convolutional layer with Corresponding input feature vector figure interior element carries out multiplication Inner product operation, by nonzero value in the input vector of full articulamentum and weight array Matrix multiplication operation is carried out, realizes multiplier using DSP48E in FPGA hardware development platform, and carry out streamlined operation.Its In, the result that cumulative tree exports fixed-point multiplication device is answered the time that n number carries out accumulation operations by cumulative tree parallel computation Miscellaneous degree is reduced to O (log2n) by serial O (n).Wherein, activation primitive component is common non-linear in neural computing Rectify function QUOTE , by asking for cumulative tree output Activation operation is carried out with result, compares the size with zero, is then exported if positive value, if negative value, then exports zero.Wherein, non- Zero filtration component filters the output of activation primitive value, does not store when exporting result and being zero, and output result is non-zero Value is stored into output data caching.
The quantity of calculation processing unit in the hardware accelerator, i.e. system in parallel degree are p, according to FPGA hardware development Resource constraint situation on platform, can determine the value of p.Participating in the data calculated is 16 fixed-point data types, calculates class Type is fixed-point multiplication and accumulating operation, and the hardware DSP48E quantity that each fixed-point multiplication occupies is 4, what each fixed point addition occupied The quantity of DSP48E is 1, needs pc multiplier and (pc -1) a adder in the convolutional calculation of convolutional layer, occupancy The quantity of DSP48E is (4+1) pc;The number of multipliers pf needed in the calculating of full articulamentum, the quantity of adder are (pf - 1) the DSP48E quantity, needed is (4+1) pf.If pc=pf=p, the DSP48E total amount of each processing unit occupancy For 10p.If the DSP48E quantity in FPGA development platform is that NDSP48E needs to meet 10p<NDSP48E, i.e. p< NDSP48E/10.Meanwhile the add operation inside processing unit is regularly arranged according to the add tree of binary, therefore the value of p Preferably two index value.
The computing capability of the calculation processing unit is limited, when handling the sparse neural network after beta pruning, network ginseng Number cannot all solidify in the calculating caching of calculation processing unit, therefore need to carry out the data for participating in calculating when calculating It divides, partial results is integrated after calculating, obtain final output result.
The method that the data divide, in convolutional layer, even if quantity is more after convolution kernel beta pruning, however each convolution The size of core is smaller, and size is typically not greater than 10, therefore, the small size convolution kernel matrix on each channel is formed a matrix Set carries out data division to the parameter of convolutional layer as unit of convolution kernel.Simultaneously as existing in convolutional calculation a large amount of shared Calculating data, need to avoid data to divide bring data duplication quantity as far as possible.As shown in fig. 7, the data of convolutional layer divide Strategy is to divide input feature vector figure according to channel, corresponding convolution nuclear matrix is similarly divided according to channel.Again Input feature vector figure and convolution kernel in the same channel carry out sliding convolutional calculation, obtain the part of the respective element of output feature As a result.Finally, the partial results respective value of the output characteristic pattern each element obtained on different channels is added, obtains output feature The final result of figure.
The data partition method, in full articulamentum, input vector and number of elements contained by output vector are big, power The size of value parameter is also larger.The parameter of full articulamentum is expressed as the form of three one-dimensional matrixes, and according to the sequence of weight group The weight element of reservation is stored in nonzero value array, the element that the every a line of original weight matrix saves be the output feature to All weights of secondary element.Therefore, weighting parameter is divided according to original row vector, in each calculation processing unit Input vector, only replicate the corresponding input vector element of corresponding non-zero weight, remaining vector does not replicate.As shown in figure 8, will The weight of non-zero is divided according to the same row in original weight matrix, therefore row vector array can be omitted, only Save line index number.The corresponding input vector element of non-zero weight also only stored in each calculation processing unit simultaneously with Column index number reduces the memory space of input vector.Input vector only stores the corresponding element of non-zero weight and corresponding column rope Quotation marks.Every group of the division will calculate an element of corresponding position in output vector.
The calculation process of the hardware accelerator processing sparse neural network prediction algorithm is as shown in figure 9, in general procedure Under the control of device, the input data in data storage cell is loaded into network parametric data by meter by Data Transmission Control Unit Calculate the buffer structure of processing unit, wherein the calculating data in convolutional layer are input feature vector figure and convolution nuclear matrix, full articulamentum Calculating data be input vector and multiple weight arrays.Multiplication fortune is carried out by the functional structure in calculation processing unit again It calculates, accumulating operation, activation primitive processing and nonzero value filtration phase processing obtain output result and be stored in buffer structure.Finally, The data in caching are returned in data storage cell by Data Transmission Control Unit.
As shown in Figure 10, by taking AXI bus as an example, system control is single for the calculating control of the acceleration system and data transmission Member calculates the transmission of data by the setting of AXI-Lite control bus, initialization and control, and AXI_MM2S and AXI_S2MM are interior The AXI4 bus of mapping is deposited, AXIS_MM2S and AXIS_S2MM are that unaddressed continuous data transmission AXI4-Streaming is total Line, the data for participating in calculating can be transmitted between DDR memory and customized calculation processing unit.
The above embodiments merely illustrate the technical concept and features of the present invention, and its object is to allow person skilled in the art It cans understand the content of the present invention and implement it accordingly, it is not intended to limit the scope of the present invention.It is all to lead according to the present invention The modification for wanting the Spirit Essence of technical solution to be done, should be covered by the protection scope of the present invention.

Claims (11)

1. a kind of degree of rarefication neural network acceleration system based on FPGA, it is characterised in that:It is held including hardware accelerator and software Line program, the hardware accelerator includes system control unit and beta pruning processing unit connected to it, weight compression are single respectively Member, data storage cell, data transmission unit, calculation processing unit, it includes being stored respectively in beta pruning that the software, which executes program, Beta pruning processing routine and weight processing routine in processing unit, weight compression unit, software, which executes program, joins sparse network Matrix number beta pruning processing, and carry out compression storage;Hardware accelerator is based on FPGA hardware development platform, to subsequent after compression parameters Calculating process carries out hardware-accelerated.
2. the degree of rarefication neural network acceleration system according to claim 1 based on FPGA, it is characterised in that:The software The beta pruning processing routine for executing program uses convolutional layer beta pruning subprogram and full articulamentum beta pruning subprogram to dense network respectively Model carries out cut operator, and numerical value is lower than the neuron or Synaptic junction of specific threshold in case-deleted models;The software executes journey The weight processing routine of sequence is used to carry out the Sparse parameter matrix after beta pruning compression storage processing.
3. the degree of rarefication neural network acceleration system according to claim 2 based on FPGA, it is characterised in that:The system Control unit controls the execution process of whole system, and neural network parameter is carried out beta pruning, compression, storage and calculating;The number According to the compression of parameters data structure after storage unit storage beta pruning treated Sparse parameter matrix and compression processing;The data The parameter of compression is transmitted in hardware cache using direct memory access mode and is ready for subsequent calculating by transmission unit;Meter Calculate multiplying, accumulating operation and the activation primitive operation in processing unit progress Application of Neural Network.
4. the degree of rarefication neural network acceleration system according to claim 2 based on FPGA, it is characterised in that:Described cuts The convolutional layer beta pruning subprogram of branch processing routine carries out beta pruning processing to neural network convolutional layer as unit of neuron, in convolution In layer as unit of characteristic pattern.
5. the degree of rarefication neural network acceleration system according to claim 4 based on FPGA, it is characterised in that:Described is soft The full articulamentum beta pruning subprogram of part beta pruning processing routine carries out the full articulamentum of neural network as unit of one group of weighting parameter Beta pruning processing;The weighting parameter of full articulamentum is grouped according to the computing capability of calculation processing unit, calculates each power The root mean square of all elements in value group deletes the weight group that root mean square is less than a certain threshold value, the corresponding position member of weight matrix Element will be set to zero.
6. the degree of rarefication neural network acceleration system according to claim 5 based on FPGA, it is characterised in that:Software beta pruning The neural network model of routine processes, convolution layer parameter are thrown away as condensed form, are the data structure of multiple convolution nuclear matrix; Full connection layer parameter is sparse form, is the data structure of Sparse parameter matrix;The weight processing routine is to full articulamentum Sparse parameter matrix carry out compression storage, use CSR sparse matrix compress storage format.
7. the degree of rarefication neural network acceleration system according to claim 5 based on FPGA, it is characterised in that:Described is hard Computing function structure in part accelerator structure inside calculation processing unit, including fixed-point multiplication device, cumulative tree, activation primitive with Nonzero value pass filter, wherein nonzero value in the input vector of full articulamentum and weight array is carried out Matrix Multiplication by fixed-point multiplication device Method operation;The result that cumulative tree exports fixed-point multiplication device passes through cumulative tree parallel computation;Activation primitive exports cumulative tree Summed result carries out activation operation;Nonzero value filtration component filters the output of activation primitive value, when output result is zero When do not store, output result be nonzero value store to output data caching in.
8. the degree of rarefication neural network acceleration system according to claim 7 based on FPGA, it is characterised in that:The hardware The quantity of calculation processing unit in accelerator, i.e. system in parallel degree are p, according to the resource constraint on FPGA hardware development platform Situation determines the value of p.
9. the degree of rarefication neural network acceleration system according to claim 8 based on FPGA, it is characterised in that:The meter Processing unit is calculated, needs to divide the data for participating in calculating when calculating, partial results is integrated after calculating, are obtained Final output result.
10. the degree of rarefication neural network acceleration system according to claim 9 based on FPGA, it is characterised in that:Described Small size convolution kernel matrix on each channel is formed a set of matrices, with volume in convolutional layer by the method that data divide Product core is that unit carries out data division to the parameter of convolutional layer, while data being avoided to divide bring data duplication quantity as far as possible; In full articulamentum, weighting parameter is divided according to original row vector, to the input in each calculation processing unit to Amount, only replicates the corresponding input vector element of corresponding non-zero weight, remaining vector does not replicate.
11. the degree of rarefication neural network acceleration system according to claim 10 based on FPGA, it is characterised in that:It is described hard The calculation process that part accelerator handles sparse neural network prediction algorithm is under the control of general processor, to be passed by data Input data in data storage cell is loaded into the buffer structure of calculation processing unit by defeated controller with network parametric data, Wherein, the calculating data in convolutional layer are input feature vector figure and convolution nuclear matrix, and the calculating data of full articulamentum are input vector With multiple weight arrays;Multiplying is carried out by the functional structure in calculation processing unit again, accumulating operation, at activation primitive Reason and nonzero value filtration phase processing, obtain output result and are stored in buffer structure;Finally, will be delayed by Data Transmission Control Unit The data deposited return in data storage cell.
CN201810494819.6A 2018-05-22 2018-05-22 A kind of degree of rarefication neural network acceleration system based on FPGA Pending CN108932548A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810494819.6A CN108932548A (en) 2018-05-22 2018-05-22 A kind of degree of rarefication neural network acceleration system based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810494819.6A CN108932548A (en) 2018-05-22 2018-05-22 A kind of degree of rarefication neural network acceleration system based on FPGA

Publications (1)

Publication Number Publication Date
CN108932548A true CN108932548A (en) 2018-12-04

Family

ID=64449592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810494819.6A Pending CN108932548A (en) 2018-05-22 2018-05-22 A kind of degree of rarefication neural network acceleration system based on FPGA

Country Status (1)

Country Link
CN (1) CN108932548A (en)

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685205A (en) * 2018-12-26 2019-04-26 上海大学 A kind of depth network model accelerated method based on sparse matrix
CN109711532A (en) * 2018-12-06 2019-05-03 东南大学 A kind of accelerated method inferred for hardware realization rarefaction convolutional neural networks
CN109978142A (en) * 2019-03-29 2019-07-05 腾讯科技(深圳)有限公司 The compression method and device of neural network model
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110211121A (en) * 2019-06-10 2019-09-06 北京百度网讯科技有限公司 Method and apparatus for pushing model
CN110209627A (en) * 2019-06-03 2019-09-06 山东浪潮人工智能研究院有限公司 A kind of hardware-accelerated method of SSD towards intelligent terminal
CN110458289A (en) * 2019-06-10 2019-11-15 北京达佳互联信息技术有限公司 The construction method of multimedia class model, multimedia class method and device
CN110796238A (en) * 2019-10-29 2020-02-14 上海安路信息科技有限公司 Convolutional neural network weight compression method and system
CN110889259A (en) * 2019-11-06 2020-03-17 北京中科胜芯科技有限公司 Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix
CN110909801A (en) * 2019-11-26 2020-03-24 山东师范大学 Data classification method, system, medium and device based on convolutional neural network
CN110991631A (en) * 2019-11-28 2020-04-10 福州大学 Neural network acceleration system based on FPGA
CN111026700A (en) * 2019-11-21 2020-04-17 清华大学 Memory computing architecture for realizing acceleration and acceleration method thereof
CN111078189A (en) * 2019-11-23 2020-04-28 复旦大学 Sparse matrix multiplication accelerator for recurrent neural network natural language processing
CN111291875A (en) * 2018-12-06 2020-06-16 意法半导体(鲁塞)公司 Method and apparatus for determining memory size
CN111291871A (en) * 2018-12-10 2020-06-16 中科寒武纪科技股份有限公司 Computing device and related product
CN111340206A (en) * 2020-02-20 2020-06-26 云南大学 Alexnet forward network accelerator based on FPGA
CN111353591A (en) * 2018-12-20 2020-06-30 中科寒武纪科技股份有限公司 Computing device and related product
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
CN111507473A (en) * 2020-04-20 2020-08-07 上海交通大学 Pruning method and system based on Crossbar architecture
CN111723922A (en) * 2019-03-20 2020-09-29 爱思开海力士有限公司 Neural network acceleration device and control method thereof
CN111758104A (en) * 2019-01-29 2020-10-09 深爱智能科技有限公司 Neural network parameter optimization method suitable for hardware implementation, neural network calculation method and device
CN111860800A (en) * 2019-04-26 2020-10-30 爱思开海力士有限公司 Neural network acceleration device and operation method thereof
US20200349421A1 (en) * 2019-05-02 2020-11-05 Silicon Storage Technology, Inc. Configurable input blocks and output blocks and physical layout for analog neural memory in deep learning artificial neural network
CN111985632A (en) * 2019-05-24 2020-11-24 三星电子株式会社 Decompression apparatus and control method thereof
CN112101178A (en) * 2020-09-10 2020-12-18 电子科技大学 Intelligent SOC terminal assisting blind people in perceiving external environment
CN112381206A (en) * 2020-10-20 2021-02-19 广东电网有限责任公司中山供电局 Deep neural network compression method, system, storage medium and computer equipment
CN112508184A (en) * 2020-12-16 2021-03-16 重庆邮电大学 Design method of fast image recognition accelerator based on convolutional neural network
CN112749782A (en) * 2019-10-31 2021-05-04 上海商汤智能科技有限公司 Data processing method and related product
CN112906887A (en) * 2021-02-20 2021-06-04 上海大学 Sparse GRU neural network acceleration realization method and device
CN112966807A (en) * 2019-12-13 2021-06-15 上海大学 Convolutional neural network implementation method based on storage resource limited FPGA
CN113112002A (en) * 2021-04-06 2021-07-13 济南大学 Design method of lightweight convolution accelerator based on FPGA
CN113128658A (en) * 2019-12-31 2021-07-16 Tcl集团股份有限公司 Neural network processing method, accelerator and storage medium
CN113159272A (en) * 2020-01-07 2021-07-23 阿里巴巴集团控股有限公司 Method and system for processing neural network
CN113159297A (en) * 2021-04-29 2021-07-23 上海阵量智能科技有限公司 Neural network compression method and device, computer equipment and storage medium
CN113657595A (en) * 2021-08-20 2021-11-16 中国科学院计算技术研究所 Neural network real-time pruning method and system and neural network accelerator
CN114581676A (en) * 2022-03-01 2022-06-03 北京百度网讯科技有限公司 Characteristic image processing method and device and storage medium
CN116167425A (en) * 2023-04-26 2023-05-26 浪潮电子信息产业股份有限公司 Neural network acceleration method, device, equipment and medium
CN116187408A (en) * 2023-04-23 2023-05-30 成都甄识科技有限公司 Sparse acceleration unit, calculation method and sparse neural network hardware acceleration system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032866A1 (en) * 2016-07-28 2018-02-01 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN107944555A (en) * 2017-12-07 2018-04-20 广州华多网络科技有限公司 Method, storage device and the terminal that neutral net is compressed and accelerated

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032866A1 (en) * 2016-07-28 2018-02-01 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN107944555A (en) * 2017-12-07 2018-04-20 广州华多网络科技有限公司 Method, storage device and the terminal that neutral net is compressed and accelerated

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUNTAO LU等: "Work-in-Progress:A High-performance FPGA Accelerator for Sparse Neural Networks", 《CASES’17 COMPANION》 *

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711532A (en) * 2018-12-06 2019-05-03 东南大学 A kind of accelerated method inferred for hardware realization rarefaction convolutional neural networks
CN111291875B (en) * 2018-12-06 2023-10-24 意法半导体(鲁塞)公司 Method and apparatus for determining memory size
CN109711532B (en) * 2018-12-06 2023-05-12 东南大学 Acceleration method for realizing sparse convolutional neural network inference aiming at hardware
CN111291875A (en) * 2018-12-06 2020-06-16 意法半导体(鲁塞)公司 Method and apparatus for determining memory size
CN111291871A (en) * 2018-12-10 2020-06-16 中科寒武纪科技股份有限公司 Computing device and related product
CN111353591A (en) * 2018-12-20 2020-06-30 中科寒武纪科技股份有限公司 Computing device and related product
CN109685205A (en) * 2018-12-26 2019-04-26 上海大学 A kind of depth network model accelerated method based on sparse matrix
CN111758104B (en) * 2019-01-29 2024-04-16 深爱智能科技有限公司 Neural network parameter optimization method and neural network calculation method and device suitable for hardware implementation
CN111758104A (en) * 2019-01-29 2020-10-09 深爱智能科技有限公司 Neural network parameter optimization method suitable for hardware implementation, neural network calculation method and device
CN111723922A (en) * 2019-03-20 2020-09-29 爱思开海力士有限公司 Neural network acceleration device and control method thereof
CN109978142A (en) * 2019-03-29 2019-07-05 腾讯科技(深圳)有限公司 The compression method and device of neural network model
CN109978142B (en) * 2019-03-29 2022-11-29 腾讯科技(深圳)有限公司 Neural network model compression method and device
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN111860800A (en) * 2019-04-26 2020-10-30 爱思开海力士有限公司 Neural network acceleration device and operation method thereof
US11507642B2 (en) * 2019-05-02 2022-11-22 Silicon Storage Technology, Inc. Configurable input blocks and output blocks and physical layout for analog neural memory in deep learning artificial neural network
US20200349421A1 (en) * 2019-05-02 2020-11-05 Silicon Storage Technology, Inc. Configurable input blocks and output blocks and physical layout for analog neural memory in deep learning artificial neural network
CN111985632A (en) * 2019-05-24 2020-11-24 三星电子株式会社 Decompression apparatus and control method thereof
CN110209627A (en) * 2019-06-03 2019-09-06 山东浪潮人工智能研究院有限公司 A kind of hardware-accelerated method of SSD towards intelligent terminal
CN110458289B (en) * 2019-06-10 2022-06-10 北京达佳互联信息技术有限公司 Multimedia classification model construction method, multimedia classification method and device
CN110458289A (en) * 2019-06-10 2019-11-15 北京达佳互联信息技术有限公司 The construction method of multimedia class model, multimedia class method and device
CN110211121B (en) * 2019-06-10 2021-07-16 北京百度网讯科技有限公司 Method and device for pushing model
CN110211121A (en) * 2019-06-10 2019-09-06 北京百度网讯科技有限公司 Method and apparatus for pushing model
CN110796238A (en) * 2019-10-29 2020-02-14 上海安路信息科技有限公司 Convolutional neural network weight compression method and system
CN112749782A (en) * 2019-10-31 2021-05-04 上海商汤智能科技有限公司 Data processing method and related product
CN110889259A (en) * 2019-11-06 2020-03-17 北京中科胜芯科技有限公司 Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix
CN111026700B (en) * 2019-11-21 2022-02-01 清华大学 Memory computing architecture for realizing acceleration and acceleration method thereof
CN111026700A (en) * 2019-11-21 2020-04-17 清华大学 Memory computing architecture for realizing acceleration and acceleration method thereof
CN111078189A (en) * 2019-11-23 2020-04-28 复旦大学 Sparse matrix multiplication accelerator for recurrent neural network natural language processing
CN111078189B (en) * 2019-11-23 2023-05-02 复旦大学 Sparse matrix multiplication accelerator for cyclic neural network natural language processing
CN110909801A (en) * 2019-11-26 2020-03-24 山东师范大学 Data classification method, system, medium and device based on convolutional neural network
CN110909801B (en) * 2019-11-26 2020-10-09 山东师范大学 Data classification method, system, medium and device based on convolutional neural network
CN110991631A (en) * 2019-11-28 2020-04-10 福州大学 Neural network acceleration system based on FPGA
CN112966807A (en) * 2019-12-13 2021-06-15 上海大学 Convolutional neural network implementation method based on storage resource limited FPGA
CN112966807B (en) * 2019-12-13 2022-09-16 上海大学 Convolutional neural network implementation method based on storage resource limited FPGA
CN113128658A (en) * 2019-12-31 2021-07-16 Tcl集团股份有限公司 Neural network processing method, accelerator and storage medium
CN113159272A (en) * 2020-01-07 2021-07-23 阿里巴巴集团控股有限公司 Method and system for processing neural network
CN111340206A (en) * 2020-02-20 2020-06-26 云南大学 Alexnet forward network accelerator based on FPGA
CN111368988B (en) * 2020-02-28 2022-12-20 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
CN111507473B (en) * 2020-04-20 2023-05-12 上海交通大学 Pruning method and system based on Crossbar architecture
CN111507473A (en) * 2020-04-20 2020-08-07 上海交通大学 Pruning method and system based on Crossbar architecture
CN112101178A (en) * 2020-09-10 2020-12-18 电子科技大学 Intelligent SOC terminal assisting blind people in perceiving external environment
CN112381206A (en) * 2020-10-20 2021-02-19 广东电网有限责任公司中山供电局 Deep neural network compression method, system, storage medium and computer equipment
CN112508184B (en) * 2020-12-16 2022-04-29 重庆邮电大学 Design method of fast image recognition accelerator based on convolutional neural network
CN112508184A (en) * 2020-12-16 2021-03-16 重庆邮电大学 Design method of fast image recognition accelerator based on convolutional neural network
CN112906887A (en) * 2021-02-20 2021-06-04 上海大学 Sparse GRU neural network acceleration realization method and device
CN113112002A (en) * 2021-04-06 2021-07-13 济南大学 Design method of lightweight convolution accelerator based on FPGA
CN113159297A (en) * 2021-04-29 2021-07-23 上海阵量智能科技有限公司 Neural network compression method and device, computer equipment and storage medium
CN113159297B (en) * 2021-04-29 2024-01-09 上海阵量智能科技有限公司 Neural network compression method, device, computer equipment and storage medium
CN113657595B (en) * 2021-08-20 2024-03-12 中国科学院计算技术研究所 Neural network accelerator based on neural network real-time pruning
CN113657595A (en) * 2021-08-20 2021-11-16 中国科学院计算技术研究所 Neural network real-time pruning method and system and neural network accelerator
CN114581676A (en) * 2022-03-01 2022-06-03 北京百度网讯科技有限公司 Characteristic image processing method and device and storage medium
CN114581676B (en) * 2022-03-01 2023-09-26 北京百度网讯科技有限公司 Processing method, device and storage medium for feature image
CN116187408A (en) * 2023-04-23 2023-05-30 成都甄识科技有限公司 Sparse acceleration unit, calculation method and sparse neural network hardware acceleration system
CN116187408B (en) * 2023-04-23 2023-07-21 成都甄识科技有限公司 Sparse acceleration unit, calculation method and sparse neural network hardware acceleration system
CN116167425A (en) * 2023-04-26 2023-05-26 浪潮电子信息产业股份有限公司 Neural network acceleration method, device, equipment and medium
CN116167425B (en) * 2023-04-26 2023-08-04 浪潮电子信息产业股份有限公司 Neural network acceleration method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN108932548A (en) A kind of degree of rarefication neural network acceleration system based on FPGA
CN108280514B (en) FPGA-based sparse neural network acceleration system and design method
CN106529670B (en) It is a kind of based on weight compression neural network processor, design method, chip
Su et al. Redundancy-reduced mobilenet acceleration on reconfigurable logic for imagenet classification
CN110378468A (en) A kind of neural network accelerator quantified based on structuring beta pruning and low bit
DE102019114243A1 (en) Architecture for deep neural networks using piecewise linear approximation
CN107832082A (en) A kind of apparatus and method for performing artificial neural network forward operation
CN106529668A (en) Operation device and method of accelerating chip which accelerates depth neural network algorithm
CN108268947A (en) For improving the device and method of the processing speed of neural network and its application
Barrois et al. The hidden cost of functional approximation against careful data sizing—A case study
CN109711539A (en) Operation method, device and Related product
CN106650925A (en) Deep learning framework Caffe system and algorithm based on MIC cluster
CN110163359A (en) A kind of computing device and method
CN109934336A (en) Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform
Que et al. Mapping large LSTMs to FPGAs with weight reuse
Peyrl et al. Parallel implementations of the fast gradient method for high-speed MPC
CN110163350A (en) A kind of computing device and method
CN108304925A (en) A kind of pond computing device and method
CN110069444A (en) A kind of computing unit, array, module, hardware system and implementation method
CN109214502A (en) Neural network weight discretization method and system
CN109117455A (en) Computing device and method
CN112529165A (en) Deep neural network pruning method, device, terminal and storage medium
CN111091183B (en) Neural network acceleration system and method
CN112734020A (en) Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network
CN209708122U (en) A kind of computing unit, array, module, hardware system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181204

RJ01 Rejection of invention patent application after publication