CN108932548A

CN108932548A - A kind of degree of rarefication neural network acceleration system based on FPGA

Info

Publication number: CN108932548A
Application number: CN201810494819.6A
Authority: CN
Inventors: 李曦; 周学海; 王超; 鲁云涛; 宫磊
Original assignee: Suzhou Institute for Advanced Study USTC
Current assignee: Suzhou Institute for Advanced Study USTC
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2018-12-04

Abstract

The degree of rarefication neural network acceleration system based on FPGA that the invention discloses a kind of, program is executed including hardware accelerator and software, the hardware accelerator includes system control unit and difference beta pruning processing unit connected to it, weight compression unit, data storage cell, data transmission unit, calculation processing unit, it includes being stored respectively in beta pruning processing unit, beta pruning processing routine and weight processing routine in weight compression unit that the software, which executes program, software executes program and handles sparse network parameter matrix beta pruning, and carries out compression storage；Hardware accelerator is based on FPGA hardware development platform, carries out to calculating process subsequent after compression parameters hardware-accelerated.The present invention optimizes processing to the neural network model for using novel technology of prunning branches to generate, the pruning method used according to different type nervous layer targetedly handles the weighting parameter of this layer, compression storage is carried out to network parameter, reduces the memory space that network model uses.

Description

A kind of degree of rarefication neural network acceleration system based on FPGA

Technical field

The present invention relates to computer hardwares to accelerate field, in particular to a kind of degree of rarefication neural network based on FPGA accelerates System and its design method.

Background technique

A kind of method to work well of deep neural network, is widely used in multiple fields.In image classification application often Convolutional neural networks are a kind of versions of deep neural network.Convolutional neural networks model size in development is continuous Increase, thus possess the computation-intensive and big feature of parameter amount of storage, it can not application deployment in many resource-constrained equipment. Traditional neural networks pruning technology uses identical Pruning strategy in the different convolutional layer of calculation features and full articulamentum, deletes The neuron and Synaptic junction of redundancy are ultimately formed not in neural network model with reducing calculating operation quantity and memory space With the neural network model of degree of rarefication, different hardware resources is adapted to.Neural network model is stored using the data structure of matrix The parameter matrix of network parameter, primitive network model is dense matrix, and the network model parameter after beta pruning is expressed as sparse matrix, Contain a large amount of zero valued elements i.e. in parameter matrix, these zero valued elements can be omitted in calculating and storage, and not influence Final prediction accuracy.

Meanwhile most of neural network accelerators only consider to handle the neural network model of dense parameter, it is sparse handling When the neural network model of parameter, calculating is stored in still according to traditional mode for the zero valued elements in parameter matrix, and Corresponding performance boost is not obtained from rarefaction.And the sparse neural network accelerator proposed in the recent period handles such sparse net Network model carries out compression storage to sparse parameter matrix, then calculates the parameter of compression, the sparse network accelerator Performance promotion it is also disproportionate with the degree of network parameter reduction.

For example, a convolutional neural networks model used in Handwritten Digit Recognition application, in original neural network model In, the input neuronal quantity of first full connection layer original is 800, and output neuron quantity is 500, then the cynapse of this layer connects Connecing quantity is 100000, uses 500 QUOTE 800 matrix data structures store the weighting parameter on each Synaptic junction. The model is handled using technology of prunning branches, deletes the Synaptic junction that weight is less than some threshold value.In obtained sparse neural network mould In the full articulamentum of type, Synaptic junction quantity becomes 8000, and the weighting parameter numerical value for the connection corresponding position that beta pruning deletes is set It is set to zero, that is, 500 original QUOTE The element of corresponding position is zero in 800 weight matrixs.Due to neural network What layer carried out is calculated as multiplication and non-linear rectification function QUOTE Operation, thus multiplier be zero multiplying after result be zero, zero again subsequent ReLU function in filter, therefore, the zero valued elements in weight matrix do not have an impact subsequent calculated result, therefore can be with It does not store and participates in subsequent calculating.Theoretically, it in the full articulamentum, is answered by the memory space of beta pruning treated Sparse parameter It is 1/10th of original dense parameter, calculated performance should be 10 times of original dense performance parameters.But use most of places When managing the accelerator of dense neural network, the corresponding zero valued elements of sparse weight matrix can still be stored and carry out multiplying Operation, therefore, the performance that accelerator handles sparse neural network are similar to the performance for handling dense neural network, network parameter Memory space does not also change.In addition, can consume exceptional space for the dedicated accelerator of sparse network of traditional technology of prunning branches and deposit Parameter reference is stored up, the calculating of additional time overhead processing compression parameters is consumed, ideal acceleration effect can not be reached.To sum up Described, whether dense neural network accelerator or sparse neural network accelerator, handle this of traditional technology of prunning branches generation When class sparse neural network, it cannot reach ideal performance acceleration effect.

Summary of the invention

Object of the present invention is to：In view of the shortcomings and deficiencies of the prior art, the invention proposes one kind to be based on novel beta pruning The neural network accelerator system of technology（Hereinafter referred to as acceleration system）, which generates dilute for novel technology of prunning branches Neural network is dredged, using the working method of software-hardware synergism, performance acceleration is carried out to such sparse neural network.

The technical scheme is that：

A kind of degree of rarefication neural network acceleration system based on FPGA, including hardware accelerator and software execute program, described hard Part accelerator includes system control unit and beta pruning processing unit connected to it, weight compression unit, data storage are single respectively Member, data transmission unit, calculation processing unit, it includes being stored respectively in beta pruning processing unit, weight that the software, which executes program, Beta pruning processing routine and weight processing routine in compression unit, software execute program to sparse network parameter matrix beta pruning at Reason, and carry out compression storage；Hardware accelerator is based on FPGA hardware development platform, carries out to calculating process subsequent after compression parameters It is hardware-accelerated.

Preferably, the beta pruning processing routine that the software executes program use respectively convolutional layer beta pruning subprogram with connect entirely Layer beta pruning subprogram carries out cut operator to dense network model, neuron of the numerical value lower than specific threshold or prominent in case-deleted models Touching connection；The weight processing routine that the software executes program is used to carry out the Sparse parameter matrix after beta pruning at compression storage Reason.

Preferably, the execution process of system control unit control whole system, by neural network parameter carry out beta pruning, Compression, storage and calculating；Ginseng after data storage cell storage beta pruning treated Sparse parameter matrix and compression processing Number compressed data structure；The parameter of compression is transmitted to hardware cache using direct memory access mode by the data transmission unit On be ready for subsequent calculating；Calculation processing unit carries out multiplying, accumulating operation and the activation in Application of Neural Network Functional operation.

Preferably, the convolutional layer beta pruning subprogram of the beta pruning processing routine rolls up neural network as unit of neuron Lamination carries out beta pruning processing, in convolutional layer as unit of characteristic pattern.

Preferably, the full articulamentum beta pruning subprogram of the software beta pruning processing routine is as unit of one group of weighting parameter Beta pruning processing is carried out to the full articulamentum of neural network；According to the computing capability of calculation processing unit by the weighting parameter of full articulamentum It is grouped, calculates the root mean square of all elements in each weight group, the weight group that root mean square is less than a certain threshold value is deleted, The corresponding position element of weight matrix will be set to zero.

Preferably, the neural network model of software beta pruning routine processes, convolution layer parameter are thrown away as condensed form, are more The data structure of a convolution nuclear matrix；Full connection layer parameter is sparse form, is the data structure of Sparse parameter matrix；Described Weight processing routine carries out compression storage to the Sparse parameter matrix of full articulamentum, uses the sparse matrix compression storage lattice of CSR Formula.

Preferably, the computing function structure in the hardware accelerator architecture inside calculation processing unit, including fixed point Multiplier, cumulative tree, activation primitive and nonzero value pass filter, wherein fixed-point multiplication device is by the input vector and power of full articulamentum It is worth nonzero value in array and carries out matrix multiplication operation；The result that cumulative tree exports fixed-point multiplication device passes through the parallel meter of cumulative tree It calculates；The summed result of cumulative tree output is carried out activation operation by activation primitive；Nonzero value filters component for the defeated of activation primitive value It is filtered, is not stored when exporting result and being zero out, output result is that nonzero value is stored into output data caching.

Preferably, in the hardware accelerator calculation processing unit quantity, i.e., system in parallel degree be p, it is hard according to FPGA Resource constraint situation in part development platform, determines the value of p.

Preferably, the calculation processing unit needs to divide the data for participating in calculating when calculating, after calculating Partial results are integrated, final output result is obtained.

Preferably, the method that the data divide, in convolutional layer, by the small size convolution nuclear matrix on each channel A set of matrices is formed, data division is carried out to the parameter of convolutional layer as unit of convolution kernel, while data being avoided to draw as far as possible Bring data are divided to replicate quantity；In full articulamentum, weighting parameter is divided according to original row vector, to each meter The input vector in processing unit is calculated, the corresponding input vector element of corresponding non-zero weight is only replicated, remaining vector does not replicate.

Preferably, the calculation process of hardware accelerator processing sparse neural network prediction algorithm is, in general procedure Under the control of device, the input data in data storage cell is loaded into network parametric data by meter by Data Transmission Control Unit Calculate the buffer structure of processing unit, wherein the calculating data in convolutional layer are input feature vector figure and convolution nuclear matrix, full articulamentum Calculating data be input vector and multiple weight arrays；Multiplication fortune is carried out by the functional structure in calculation processing unit again It calculates, accumulating operation, activation primitive processing and nonzero value filtration phase processing obtain output result and be stored in buffer structure；Finally, The data in caching are returned in data storage cell by Data Transmission Control Unit.

It is an advantage of the invention that：

The present invention optimizes processing to the neural network model for using novel technology of prunning branches to generate, according to different type nervous layer The pruning method used targetedly handles the weighting parameter of this layer, carries out compression storage to network parameter, reduces network model The memory space used.Meanwhile using the performance advantage using hardware accelerator, compressed weight and input data are utilized Special hardware calculation processing unit carries out acceleration processing, when reducing the execution of sparse neural network prediction algorithm after beta pruning Between.

Detailed description of the invention

The invention will be further described with reference to the accompanying drawings and embodiments：

Fig. 1 is that the present invention is based on the integrated stand compositions of the degree of rarefication neural network acceleration system of FPGA；

Fig. 2 is the schematic diagram that convolutional layer beta pruning subprogram of the present invention carries out beta pruning processing to neural network convolutional layer；

Fig. 3 is the schematic diagram that the full articulamentum beta pruning subprogram of the present invention carries out beta pruning processing to the full articulamentum of neural network；

Fig. 4 is invention software weight processing routine to beta pruning treated signal that neural network Sparse parameter compressed Figure；

Fig. 5 is the data buffer storage structural schematic diagram in hardware accelerator architecture of the present invention inside calculation processing unit；

Fig. 6 is the computing function structural schematic diagram in hardware accelerator architecture of the present invention inside calculation processing unit；

Fig. 7 is the data partition method schematic diagram of convolutional layer of the present invention；

Fig. 8 is the data partition method schematic diagram of the full articulamentum of the present invention；

Fig. 9 is the calculation flow chart that hardware accelerator of the present invention handles sparse neural network prediction algorithm；

Figure 10 is that the calculating control of acceleration system of the present invention and data transmit schematic diagram.

Specific embodiment

As shown in Figure 1, a kind of degree of rarefication neural network acceleration system based on FPGA that the present invention designs, is held using software Framework of the line program in conjunction with hardware accelerator.Program is executed including hardware accelerator and software, software executes program and is based on C/C ++ advanced language programming, including being stored respectively in beta pruning processing unit, the beta pruning processing routine in weight compression unit and weight Processing routine, the method for realizing novel beta pruning, and compression storage is carried out to the sparse network parameter matrix after beta pruning, it only stores non- Zero param elements reduce the memory space of network parameter.The hardware accelerator is based on FPGA hardware development platform, including is Control unit of uniting and respectively beta pruning processing unit connected to it, weight compression unit, data storage cell, data transmission are single Member, calculation processing unit, it is hardware-accelerated to calculating process subsequent after compression parameters progress, improve the computational of neural network model Energy.

It includes beta pruning processing routine and weight processing routine that the software, which executes program, to the network of dense neural network Parameter carries out beta pruning and compression processing.Wherein, beta pruning processing routine is with respectively using convolutional layer beta pruning subprogram and full articulamentum Beta pruning subprogram carries out cut operator to dense network model, neuron of the numerical value lower than specific threshold or cynapse in case-deleted models Connection；Weight processing routine is used to carry out the Sparse parameter matrix after beta pruning compression storage processing.

The hardware accelerator, which calculate to the weighting parameter of compression, to be accelerated.Wherein system control unit controls Neural network parameter is carried out beta pruning, compression, storage and calculating by the execution process of whole system；Data storage cell storage is cut Compression of parameters data structure after branch treated Sparse parameter matrix and compression processing；Data transmission unit is using directly interior It deposits access mode the parameter of compression is transmitted in hardware cache and be ready for subsequent calculating；Calculation processing unit is nerve net The formant that network calculates carries out main multiplying, accumulating operation and activation primitive operation in Application of Neural Network.

The convolutional layer beta pruning subprogram of the software beta pruning processing routine is as unit of neuron to neural network convolution Layer carries out beta pruning processing, by as unit of characteristic pattern in convolutional layer, as shown in Fig. 2, it is shown, if the corresponding output feature of convolution kernel Figure is Y, before each output layer money core add a mask layer, the characteristic pattern of mask layer all include two parameter alpha and Beta exports as QUOTE .The value of alpha and the value of beta are related, when When the value of beta is less than threshold value, alpha=0；When the value of beta is greater than threshold value, alpha=1.The convolution kernel matrix element of alpha=1 Prime number value is set as zero, until deleting mask layer after its subsequent characteristic pattern is all deleted.

The full articulamentum beta pruning subprogram of the software beta pruning processing routine is as unit of one group of weighting parameter to nerve The full articulamentum of network carries out beta pruning processing.The weighting parameter of full articulamentum is divided according to the computing capability of calculation processing unit Group calculates the root mean square of all elements in each weight group as shown in figure 3, the computing capability of processing unit is p, will be square The weight group that root is less than a certain threshold value is deleted, and the corresponding position element of weight matrix will be set to zero.

Treated that neural network Sparse parameter carries out compression storage to beta pruning for the software weight processing routine.It is novel The neural network model of pruning method processing, convolution layer parameter are thrown away as condensed form, are the data structure of multiple convolution nuclear matrix； Full connection layer parameter is sparse form, is the data structure of Sparse parameter matrix.The Sparse parameter matrix of full articulamentum is carried out Compression storage compresses storage format using the sparse matrix of CSR, as shown in Figure 4.Weighting parameter is divided per the group when beta pruning, and every group Weight number of elements is p, and the nonzero element number in weight matrix is nz, uses modified routine CSR Matrix compressional storage Format, using storing original weight matrix nonzero element in three one-dimension arrays.Wherein, the original power of nonzero element storage of array All non-zero weights of value matrix；Wherein, in each weight group of column index storage of array first weight in original matrix position The column index number set；Wherein, position indicator pointer of the first nonzero element of the every row of row vector storage of array in nonzero value array.

The hardware system control unit is using the ARM general processor or host provided in FPGA hardware development platform On Intel general processor, for control acceleration system calculating execution and data transmit.

Data storage cell in the hardware accelerator architecture is bis- using the DDR provided in FPGA hardware development platform Times rate synchronous DRAM, as Installed System Memory, treated for storing software program beta pruning supplemental characteristic with Compression parameters data.

The hardware data transmission unit uses the direct memory access controller of DMA of AXI or PCI-E, by AXI bus or PCI-E bus connects control unit and computing unit, while controlling the direct transmission of data.

Data buffer storage structure in the hardware accelerator architecture inside calculation processing unit, as shown in figure 5, including volume Lamination data buffer storage and full articulamentum data buffer storage.Wherein, convolutional layer caches input feature vector figure, convolution nuclear matrix and output feature Figure；Full articulamentum caches input vector, output vector and compressed three weight arrays.

Computing function structure in the hardware accelerator architecture inside calculation processing unit, as shown in fig. 6, including fixed Dot product musical instruments used in a Buddhist or Taoist mass, cumulative tree, activation primitive and nonzero value pass filter.Wherein, fixed-point multiplication device by the input convolution kernel of convolutional layer with Corresponding input feature vector figure interior element carries out multiplication Inner product operation, by nonzero value in the input vector of full articulamentum and weight array Matrix multiplication operation is carried out, realizes multiplier using DSP48E in FPGA hardware development platform, and carry out streamlined operation.Its In, the result that cumulative tree exports fixed-point multiplication device is answered the time that n number carries out accumulation operations by cumulative tree parallel computation Miscellaneous degree is reduced to O (log2n) by serial O (n).Wherein, activation primitive component is common non-linear in neural computing Rectify function QUOTE , by asking for cumulative tree output Activation operation is carried out with result, compares the size with zero, is then exported if positive value, if negative value, then exports zero.Wherein, non- Zero filtration component filters the output of activation primitive value, does not store when exporting result and being zero, and output result is non-zero Value is stored into output data caching.

The quantity of calculation processing unit in the hardware accelerator, i.e. system in parallel degree are p, according to FPGA hardware development Resource constraint situation on platform, can determine the value of p.Participating in the data calculated is 16 fixed-point data types, calculates class Type is fixed-point multiplication and accumulating operation, and the hardware DSP48E quantity that each fixed-point multiplication occupies is 4, what each fixed point addition occupied The quantity of DSP48E is 1, needs pc multiplier and (pc -1) a adder in the convolutional calculation of convolutional layer, occupancy The quantity of DSP48E is (4+1) pc；The number of multipliers pf needed in the calculating of full articulamentum, the quantity of adder are (pf - 1) the DSP48E quantity, needed is (4+1) pf.If pc=pf=p, the DSP48E total amount of each processing unit occupancy For 10p.If the DSP48E quantity in FPGA development platform is that NDSP48E needs to meet 10p<NDSP48E, i.e. p< NDSP48E/10.Meanwhile the add operation inside processing unit is regularly arranged according to the add tree of binary, therefore the value of p Preferably two index value.

The computing capability of the calculation processing unit is limited, when handling the sparse neural network after beta pruning, network ginseng Number cannot all solidify in the calculating caching of calculation processing unit, therefore need to carry out the data for participating in calculating when calculating It divides, partial results is integrated after calculating, obtain final output result.

The method that the data divide, in convolutional layer, even if quantity is more after convolution kernel beta pruning, however each convolution The size of core is smaller, and size is typically not greater than 10, therefore, the small size convolution kernel matrix on each channel is formed a matrix Set carries out data division to the parameter of convolutional layer as unit of convolution kernel.Simultaneously as existing in convolutional calculation a large amount of shared Calculating data, need to avoid data to divide bring data duplication quantity as far as possible.As shown in fig. 7, the data of convolutional layer divide Strategy is to divide input feature vector figure according to channel, corresponding convolution nuclear matrix is similarly divided according to channel.Again Input feature vector figure and convolution kernel in the same channel carry out sliding convolutional calculation, obtain the part of the respective element of output feature As a result.Finally, the partial results respective value of the output characteristic pattern each element obtained on different channels is added, obtains output feature The final result of figure.

The data partition method, in full articulamentum, input vector and number of elements contained by output vector are big, power The size of value parameter is also larger.The parameter of full articulamentum is expressed as the form of three one-dimensional matrixes, and according to the sequence of weight group The weight element of reservation is stored in nonzero value array, the element that the every a line of original weight matrix saves be the output feature to All weights of secondary element.Therefore, weighting parameter is divided according to original row vector, in each calculation processing unit Input vector, only replicate the corresponding input vector element of corresponding non-zero weight, remaining vector does not replicate.As shown in figure 8, will The weight of non-zero is divided according to the same row in original weight matrix, therefore row vector array can be omitted, only Save line index number.The corresponding input vector element of non-zero weight also only stored in each calculation processing unit simultaneously with Column index number reduces the memory space of input vector.Input vector only stores the corresponding element of non-zero weight and corresponding column rope Quotation marks.Every group of the division will calculate an element of corresponding position in output vector.

The calculation process of the hardware accelerator processing sparse neural network prediction algorithm is as shown in figure 9, in general procedure Under the control of device, the input data in data storage cell is loaded into network parametric data by meter by Data Transmission Control Unit Calculate the buffer structure of processing unit, wherein the calculating data in convolutional layer are input feature vector figure and convolution nuclear matrix, full articulamentum Calculating data be input vector and multiple weight arrays.Multiplication fortune is carried out by the functional structure in calculation processing unit again It calculates, accumulating operation, activation primitive processing and nonzero value filtration phase processing obtain output result and be stored in buffer structure.Finally, The data in caching are returned in data storage cell by Data Transmission Control Unit.

As shown in Figure 10, by taking AXI bus as an example, system control is single for the calculating control of the acceleration system and data transmission Member calculates the transmission of data by the setting of AXI-Lite control bus, initialization and control, and AXI_MM2S and AXI_S2MM are interior The AXI4 bus of mapping is deposited, AXIS_MM2S and AXIS_S2MM are that unaddressed continuous data transmission AXI4-Streaming is total Line, the data for participating in calculating can be transmitted between DDR memory and customized calculation processing unit.

The above embodiments merely illustrate the technical concept and features of the present invention, and its object is to allow person skilled in the art It cans understand the content of the present invention and implement it accordingly, it is not intended to limit the scope of the present invention.It is all to lead according to the present invention The modification for wanting the Spirit Essence of technical solution to be done, should be covered by the protection scope of the present invention.

Claims

1. a kind of degree of rarefication neural network acceleration system based on FPGA, it is characterised in that：It is held including hardware accelerator and software Line program, the hardware accelerator includes system control unit and beta pruning processing unit connected to it, weight compression are single respectively Member, data storage cell, data transmission unit, calculation processing unit, it includes being stored respectively in beta pruning that the software, which executes program, Beta pruning processing routine and weight processing routine in processing unit, weight compression unit, software, which executes program, joins sparse network Matrix number beta pruning processing, and carry out compression storage；Hardware accelerator is based on FPGA hardware development platform, to subsequent after compression parameters Calculating process carries out hardware-accelerated.

2. the degree of rarefication neural network acceleration system according to claim 1 based on FPGA, it is characterised in that：The software The beta pruning processing routine for executing program uses convolutional layer beta pruning subprogram and full articulamentum beta pruning subprogram to dense network respectively Model carries out cut operator, and numerical value is lower than the neuron or Synaptic junction of specific threshold in case-deleted models；The software executes journey The weight processing routine of sequence is used to carry out the Sparse parameter matrix after beta pruning compression storage processing.

3. the degree of rarefication neural network acceleration system according to claim 2 based on FPGA, it is characterised in that：The system Control unit controls the execution process of whole system, and neural network parameter is carried out beta pruning, compression, storage and calculating；The number According to the compression of parameters data structure after storage unit storage beta pruning treated Sparse parameter matrix and compression processing；The data The parameter of compression is transmitted in hardware cache using direct memory access mode and is ready for subsequent calculating by transmission unit；Meter Calculate multiplying, accumulating operation and the activation primitive operation in processing unit progress Application of Neural Network.

4. the degree of rarefication neural network acceleration system according to claim 2 based on FPGA, it is characterised in that：Described cuts The convolutional layer beta pruning subprogram of branch processing routine carries out beta pruning processing to neural network convolutional layer as unit of neuron, in convolution In layer as unit of characteristic pattern.

5. the degree of rarefication neural network acceleration system according to claim 4 based on FPGA, it is characterised in that：Described is soft The full articulamentum beta pruning subprogram of part beta pruning processing routine carries out the full articulamentum of neural network as unit of one group of weighting parameter Beta pruning processing；The weighting parameter of full articulamentum is grouped according to the computing capability of calculation processing unit, calculates each power The root mean square of all elements in value group deletes the weight group that root mean square is less than a certain threshold value, the corresponding position member of weight matrix Element will be set to zero.

6. the degree of rarefication neural network acceleration system according to claim 5 based on FPGA, it is characterised in that：Software beta pruning The neural network model of routine processes, convolution layer parameter are thrown away as condensed form, are the data structure of multiple convolution nuclear matrix； Full connection layer parameter is sparse form, is the data structure of Sparse parameter matrix；The weight processing routine is to full articulamentum Sparse parameter matrix carry out compression storage, use CSR sparse matrix compress storage format.

7. the degree of rarefication neural network acceleration system according to claim 5 based on FPGA, it is characterised in that：Described is hard Computing function structure in part accelerator structure inside calculation processing unit, including fixed-point multiplication device, cumulative tree, activation primitive with Nonzero value pass filter, wherein nonzero value in the input vector of full articulamentum and weight array is carried out Matrix Multiplication by fixed-point multiplication device Method operation；The result that cumulative tree exports fixed-point multiplication device passes through cumulative tree parallel computation；Activation primitive exports cumulative tree Summed result carries out activation operation；Nonzero value filtration component filters the output of activation primitive value, when output result is zero When do not store, output result be nonzero value store to output data caching in.

8. the degree of rarefication neural network acceleration system according to claim 7 based on FPGA, it is characterised in that：The hardware The quantity of calculation processing unit in accelerator, i.e. system in parallel degree are p, according to the resource constraint on FPGA hardware development platform Situation determines the value of p.

9. the degree of rarefication neural network acceleration system according to claim 8 based on FPGA, it is characterised in that：The meter Processing unit is calculated, needs to divide the data for participating in calculating when calculating, partial results is integrated after calculating, are obtained Final output result.

10. the degree of rarefication neural network acceleration system according to claim 9 based on FPGA, it is characterised in that：Described Small size convolution kernel matrix on each channel is formed a set of matrices, with volume in convolutional layer by the method that data divide Product core is that unit carries out data division to the parameter of convolutional layer, while data being avoided to divide bring data duplication quantity as far as possible； In full articulamentum, weighting parameter is divided according to original row vector, to the input in each calculation processing unit to Amount, only replicates the corresponding input vector element of corresponding non-zero weight, remaining vector does not replicate.

11. the degree of rarefication neural network acceleration system according to claim 10 based on FPGA, it is characterised in that：It is described hard The calculation process that part accelerator handles sparse neural network prediction algorithm is under the control of general processor, to be passed by data Input data in data storage cell is loaded into the buffer structure of calculation processing unit by defeated controller with network parametric data, Wherein, the calculating data in convolutional layer are input feature vector figure and convolution nuclear matrix, and the calculating data of full articulamentum are input vector With multiple weight arrays；Multiplying is carried out by the functional structure in calculation processing unit again, accumulating operation, at activation primitive Reason and nonzero value filtration phase processing, obtain output result and are stored in buffer structure；Finally, will be delayed by Data Transmission Control Unit The data deposited return in data storage cell.