CN111126569A - Convolutional neural network device supporting pruning sparse compression and calculation method - Google Patents

Convolutional neural network device supporting pruning sparse compression and calculation method Download PDF

Info

Publication number
CN111126569A
CN111126569A CN201911312338.XA CN201911312338A CN111126569A CN 111126569 A CN111126569 A CN 111126569A CN 201911312338 A CN201911312338 A CN 201911312338A CN 111126569 A CN111126569 A CN 111126569A
Authority
CN
China
Prior art keywords
weight
source data
neural network
processing unit
zero
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911312338.XA
Other languages
Chinese (zh)
Other versions
CN111126569B (en
Inventor
丁永林
曹学成
廖湘萍
邱蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 52 Research Institute
Original Assignee
CETHIK Group Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETHIK Group Ltd filed Critical CETHIK Group Ltd
Priority to CN201911312338.XA priority Critical patent/CN111126569B/en
Publication of CN111126569A publication Critical patent/CN111126569A/en
Application granted granted Critical
Publication of CN111126569B publication Critical patent/CN111126569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a convolutional neural network device and a calculation method supporting pruning sparse compression, wherein the convolutional neural network device comprises a weight buffer, a zero value detection unit, a source data buffer, a source data first-in first-out queue, a weight first-in first-out queue, a convolutional block processing unit and a target data buffer; meanwhile, unconstrained pruning and sparsification can be carried out according to the position of the zero weight in the weight, so that unconstrained support for pruning and sparsification is realized, the effects of pruning and sparsification are obviously improved, and the convolution calculation efficiency is improved.

Description

Convolutional neural network device supporting pruning sparse compression and calculation method
Technical Field
The application belongs to the field of artificial intelligence chip design and FPGA design, and particularly relates to a convolutional neural network device supporting pruning sparse compression and a calculation method.
Background
Artificial intelligence is a branch of computer science, and has gained more and more attention. It replaces "people" in the complex affairs that could be accomplished by people by simulating some thinking process and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) of "people".
The image recognition technology is a typical application field of artificial intelligence, and aims to enable a computer to process a large amount of physical information instead of human beings, and finally classify and even decide through acquisition, preprocessing and feature extraction of image information.
Convolutional Neural Networks (CNN) are a class of feed-forward neural networks including convolutional calculation and having a deep structure, are one of the representative algorithms for deep learning, and are widely applied to various image recognition algorithms.
The convolutional neural network is generated by the inspiration of the structure of a visual system, a convolutional kernel is simulated into neuron cells, the multilayer network is simulated into information transmission among the neuron cells, the result is finally output to an output layer by passing input layer data through a plurality of hidden layers, the image recognition is realized, and the recognition precision is always in direct proportion to the number of the hidden layers. The core operation of the whole network is the operation of the convolution layer, and the convolution operation is increased in geometric level along with the increase of the number of the network layers. Therefore, the convolution operation amount is reduced, and the convolution operation amount plays a vital role in improving the performance of the convolution neural network and saving the power consumption of the convolution neural network.
The conventional method for reducing the convolution operand is mainly completed by weight data pruning and sparsification. By pruning and thinning the weight data, the redundancy of convolution kernels can be reduced, and the computational complexity is reduced. In the learning of ImageNet data, the operation speed of a convolutional neural network thinned by 90% is 2 to 10 times that of a traditional convolutional neural network with the same structure, and the output classification precision is only lost by 2%
However, due to the limitation of the prior art, in the chip design and FPGA design process, pruning and sparsification of weight data of any structure cannot be completely supported, and only pruning and sparsification of some specific modes can be supported. Because of too much constraint on the pruning and thinning of the algorithm level, the actual pruning and thinning effects are greatly reduced.
Therefore, in the field of chip design and FPGA design processes, the requirement of no constraint support for pruning and sparseness exists.
Disclosure of Invention
The convolution neural network device and the calculation method for supporting pruning sparsification compression achieve unconstrained support on pruning and sparsification, reduce convolution calculation pressure remarkably and improve convolution calculation efficiency.
In order to achieve the purpose, the technical scheme adopted by the application is as follows:
a convolutional neural network device supporting pruning sparse compression comprises a weight buffer, a zero value detection unit, a source data buffer, a source data first-in first-out queue, a weight first-in first-out queue, a convolutional block processing unit and a target data buffer, wherein:
the weight buffer is used for storing the weights of the convolutional neural network and outputting the weights to the zero value detection unit, and each weight corresponds to different position information;
the zero value detection unit is used for judging whether the received weight value is zero or not, outputting a non-zero weight value to the weight value first-in first-out queue and outputting position information corresponding to the non-zero weight value to the source data buffer;
the source data buffer is used for storing source data of the convolutional neural network and outputting the corresponding source data to the source data first-in first-out queue according to the position information corresponding to the received nonzero weight;
the source data first-in first-out queue is used for storing the source data output by the source data buffer and outputting S source data to the volume block processing unit according to a first-in first-out principle;
the weight FIFO queue is used for storing the nonzero weight output by the zero value detection unit and outputting the nonzero weight to the rolling block processing unit according to the FIFO principle;
the convolution block processing unit is used for calculating convolution operation of S target data in parallel according to the received nonzero weight and S source data and outputting the S target data obtained through calculation to the target data buffer;
and the target data buffer is used for buffering S target data output by the convolution block processing unit.
Preferably, the convolution block processing unit includes S multiply-accumulators, each multiply-accumulator for calculating a convolution operation of a single target data.
Preferably, S of said multiply-accumulators are calculated in parallel.
Preferably, the pixels of the input layer of the convolutional neural network are M rows × N columns, and the number of the multiply-accumulator satisfies the following relation:
S>N;
or, S ═ N;
or, S < N.
The application also provides a calculation method, based on any one of the above technical solutions, the calculation method is a convolutional neural network device supporting pruning sparse compression, and the calculation method relates to a weight buffer, a zero detection unit, a source data buffer, a source data first-in first-out queue, a weight first-in first-out queue, a convolutional block processing unit, and a target data buffer, and the calculation method includes:
outputting weights to the zero value detection unit through the weight buffer, wherein each weight corresponds to different position information;
the zero value detection unit judges whether the received weight value is zero or not, outputs a non-zero weight value to the weight value first-in first-out queue, and outputs position information corresponding to the non-zero weight value to the source data buffer;
the source data buffer outputs corresponding source data to the source data first-in first-out queue according to the received position information corresponding to the nonzero weight value;
the source data first-in first-out queue outputs S source data to the volume block processing unit according to a first-in first-out principle;
the weight FIFO queue outputs a non-zero weight to the rolling block processing unit according to a FIFO principle;
the convolution block processing unit is used for calculating convolution operation of S target data in parallel according to the received nonzero weight and S source data and outputting the S target data obtained through calculation to the target data buffer;
the target data buffer buffers S target data output by the volume block processing unit.
Preferably, the convolution block processing unit includes S multiply-accumulators, each multiply-accumulator for calculating a convolution operation of a single target data.
Preferably, S of said multiply-accumulators are calculated in parallel.
Preferably, the pixels of the input layer of the convolutional neural network are M rows × N columns, and the number of the multiply-accumulator satisfies the following relation:
S>N;
or, S ═ N;
or, S < N.
According to the convolutional neural network device and the calculating method supporting pruning sparse compression, before convolution operation, whether the weight is zero or not is judged, only the nonzero weight and source data corresponding to the nonzero weight are output according to the judgment result, pruning and sparse are carried out by taking input data as an entry point, the redundancy of a convolution kernel is reduced, and the calculation complexity is reduced; meanwhile, unconstrained pruning and sparsification can be carried out according to the position of the zero weight in the weight, so that unconstrained support for pruning and sparsification is realized, the effects of pruning and sparsification are obviously improved, and the convolution calculation efficiency is improved.
Drawings
FIG. 1 is a schematic structural diagram of a convolutional neural network device supporting pruning sparsification compression according to the present application;
FIG. 2 is a schematic diagram of the internal structure of the rolling block processing unit according to the present application;
FIG. 3 is a schematic diagram of the operation of a single multiply-accumulator according to the present application;
FIG. 4 is a schematic diagram of the present application, where the multiply-accumulate unit completes convolution operation to generate a whole row of target data in the output layer under 1024 conditions.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
As shown in fig. 1, in one embodiment, a convolutional neural network device supporting pruning sparsification compression is provided, which can achieve unconstrained support for pruning and sparsification.
Specifically, the Convolutional Neural Network (CNN) device supporting pruning sparse compression of the present embodiment includes a weight buffer (WGT BUF), a zero detection unit, a source data buffer (SRC BUF), a source data first-in first-out queue (SRCFIFO), a weight first-in first-out queue (WGT FIFO), a convolutional BLOCK processing unit (CONV BLOCK), and a target data buffer (DES BUF), where:
the weight buffer is used for storing the weights of the convolutional neural network and outputting the weights to the zero value detection unit, and each weight corresponds to different position information;
the zero value detection unit is used for judging whether the received weight value is zero or not, outputting a non-zero weight value to the weight value first-in first-out queue and outputting position information corresponding to the non-zero weight value to the source data buffer;
the source data buffer is used for storing source data of the convolutional neural network and outputting the corresponding source data to the source data first-in first-out queue according to the position information corresponding to the received nonzero weight;
the source data first-in first-out queue is used for storing the source data output by the source data buffer and outputting S source data to the volume block processing unit according to a first-in first-out principle;
the weight FIFO queue is used for storing the nonzero weight output by the zero value detection unit and outputting the nonzero weight to the rolling block processing unit according to the FIFO principle;
the convolution block processing unit is used for calculating convolution operation of S target data in parallel according to the received nonzero weight and S source data and outputting the S target data obtained through calculation to the target data buffer;
and the target data buffer is used for buffering S target data output by the convolution block processing unit.
The above-mentioned non-zero weight is understood to be a weight whose value is non-zero, and a zero weight is understood to be a weight whose value is zero.
As shown in FIG. 2, in one embodiment, the convolution block processing unit includes S multiply-accumulators, each for calculating a convolution operation of a single target data. And S multiply-accumulator parallel computation, that is, the convolution block processing unit can simultaneously carry out convolution operation on S target data once, so as to improve the computation efficiency.
It should be noted that the number of multiply-accumulate units included in the convolution block processing unit is adjusted according to the computing capability and computing requirement of the computer device.
In one embodiment, if the pixels of the input layer of the convolutional neural network are M rows by N columns, the number of the multiply-accumulator satisfies the following relation:
s is more than N; or, S ═ N; or, S < N.
According to the above relation, the number of parallel computations by the convolution block processing unit at a time can be one row larger than, equal to or smaller than the input layer pixels, i.e. there is no necessary constraint between the two.
To facilitate an understanding of the apparatus of the present application, further details are provided by the following examples.
Example 1
If the input layer of the convolutional neural network has 768 rows by 1024 columns of pixels, 4 input channels, 3 by 3 convolution kernels, 1 step length (stride), 1 output channel, and the output layer of the network: the pixels are 768 rows by 1024 columns.
As shown in fig. 3, first, a multiply-accumulator is taken as an example for explanation:
and when the target data of the mth row and the nth column need to be calculated, the source data of the mth row and the nth column are taken. The position information of the convolution kernel of 3 × 3 corresponding to each input channel is defined as 0, 1, 0, 2, 1, 0, 1, 2, 0, 2, 1, 2, from the top left to the bottom right, the complete target data of the mth row and the nth column of each input channel is also a structure of 3 × 3, the position information of the structure is defined as m-1, n-1, m-1, n +1, m, n-1, m, n, m +1, n-1, m +1, n +1, and the weight of the corresponding position is multiplied by the data of the corresponding position in the target data, that is, according to the formula of convolution operation, the mth row of the output layer, and the target data of the nth column is:
ymninput channel 1
(xm-1,n-1*W0,0+xm-1,n*W0,1+xm-1,n+1*W0,2+
xm,n-1*W1,0+xm,n*W1,1+xm,n+1*W1,2+
xm+1,n-1*W2,0+xm+1,n*W2,1+xm+1,n+1*W2,2+)
+
// input channel 2
(xm-1,n-1*W0,0+xm-1,n*W0,1+xm-1,n+1*W0,2+
xm,n-1*W1,0+xm,n*W1,1+xm,n+1*W1,2+
xm+1,n-1*W2,0+xm+1,n*W2,1+xm+1,n+1*W2,2+)
+
// input channel 3
(xm-1,n-1*W0,0+xm-1,n*W0,1+xm-1,n+1*W0,2+
xm,n-1*W1,0+xm,n*W1,1+xm,n+1*W1,2+
xm+1,n-1*W2,0+xm+1,n*W2,1+xm+1,n+1*W2,2+)
+
// input channel 4
(xm-1,n-1*W0,0+xm-1,n*W0,1+xm-1,n+1*W0,2+
xm,n-1*W1,0+xm,n*W1,1+xm,n+1*W1,2+
xm+1,n-1*W2,0+xm+1,n*W2,1+xm+1,n+1*W2,2+)
According to the above calculation process, after the period of 3 × 4 is 36 cycles, the interface completes the convolution operation of the target data in the mth row and the nth column of the output layer.
All weights are assumed to be nonzero in the convolution operation, and in the application, the weight FIFO queue only stores nonzero weights, namely w0,0~w2,2In addition, the zero weight value can be kicked out, and the data corresponding to the zero weight value position in the source data can also be kicked out, namely, only the non-zero weight value and the data corresponding to the position in the source data can participate in the convolution operation, so that the operation efficiency is obviously improved.
On the basis of one multiply-accumulator, S multiply-accumulators are taken as an example for explanation:
as shown in fig. 4, in a preferred embodiment, S ═ N is set, that is, the convolution block processing unit includes 1024 multiply-accumulator units, and the convolution block processing unit calculates the convolution operation of 1024 target data at a time, and generates an entire row of target data of the output layer.
At the moment, the source data FIFO queue outputs S pieces of source data at a time, each source data is correspondingly input into one multiply accumulator and is subjected to convolution operation with a convolution kernel in the multiply accumulator, the convolution kernels in the multiply accumulators are the same, simultaneous calculation of the S pieces of target data is achieved, and a whole line of target data of an output layer is obtained after calculation is completed.
For the convolutional neural network device of the present application, another possible solution is to set S < N, i.e., the number of S-selected output layers less than one entire row of target data, i.e., N < 1024.
For example, S is selected to be 200, i.e., the convolution block processing unit includes 200 multiply-accumulators, and one convolution operation can simultaneously calculate convolution operations of 200 target data, resulting in partial target data in one row of the output layer. And after the current calculation is finished, continuously taking next 200 source data from the un-calculated data to carry out convolution operation until the calculation of all the data is finished.
If the source data in the last calculation is less than 200, for example, only 32 are left, the calculation is performed by only the first 32 multiply-accumulate units, or the calculation is performed by randomly allocating the 32 multiply-accumulate units to the former 32 multiply-accumulate units.
For the convolutional neural network device of the present application, another possible solution is to set S > N, i.e., S selects the number of output layers more than one entire row of target data, i.e., N > 1024.
For example, N is selected to be 2000, that is, the convolution block processing unit is composed of 2000 multiply-accumulators, and one convolution operation can simultaneously calculate the convolution operations of 2000 target data to generate one row of target data of the output layer. After the calculation is finished, continuously taking 2000 next source data from the un-calculated data to carry out convolution operation until the calculation of all the data is finished.
If the source data in the last calculation is less than 2000, for example, only 432 are left, the calculation is performed by only the first 432 multiply-accumulate units, or the calculation is performed by randomly allocating the 432 multiply-accumulate units to the former 432 multiply-accumulate units.
In an actual convolutional neural network, through compression of a basic pruning sparse algorithm, nonzero weight data is only 1/5 of all weights, so that after zero-value weights are removed, a convolution block processing unit completes convolution operation of S target data, and 7.2(36 × 1/5) cycles are required on average.
And data are pruned according to the zero weight, no constraint support for pruning and sparsification is realized, the pruning and sparsification effects are obviously improved, and the convolution calculation efficiency is improved.
A convolutional neural network apparatus supporting pruning sparsification compression may be a computer device, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In another embodiment, a calculation method is provided, where the calculation method is implemented based on the foregoing convolutional neural network device supporting pruning sparse compression, that is, the calculation method involves a weight buffer, a zero detection unit, a source data buffer, a source data first-in first-out queue, a weight first-in first-out queue, a convolutional block processing unit, and a target data buffer, and the calculation method includes:
outputting weights to the zero value detection unit through the weight buffer, wherein each weight corresponds to different position information;
the zero value detection unit judges whether the received weight value is zero or not, outputs a non-zero weight value to the weight value first-in first-out queue, and outputs position information corresponding to the non-zero weight value to the source data buffer;
the source data buffer outputs corresponding source data to the source data first-in first-out queue according to the received position information corresponding to the nonzero weight value;
the source data first-in first-out queue outputs S source data to the volume block processing unit according to a first-in first-out principle;
the weight FIFO queue outputs a non-zero weight to the rolling block processing unit according to a FIFO principle;
the convolution block processing unit is used for calculating convolution operation of S target data in parallel according to the received nonzero weight and S source data and outputting the S target data obtained through calculation to the target data buffer;
the target data buffer buffers S target data output by the volume block processing unit.
Specifically, the convolution block processing unit includes S multiply-accumulators, each of which is used to calculate a convolution operation of a single target data, and S of the multiply-accumulators are calculated in parallel.
If the pixels of the input layer of the convolutional neural network are M rows by N columns, the number of the multiply-accumulator satisfies the following relation:
s > N; or, S ═ N; or, S < N. That is, the multiply-accumulator in the convolution block processing unit can be adjusted according to actual conditions.
The calculation method provided by the embodiment judges whether the weight is zero or not before convolution operation is performed, only outputs the non-zero weight and source data corresponding to the non-zero weight according to the judgment result, and performs pruning and sparsification by using input data as an entry point, so that the redundancy of a convolution kernel is reduced, and the calculation complexity is reduced; meanwhile, unconstrained pruning and sparsification can be carried out according to the position of the zero weight in the weight, the pruning and sparsification effects are obviously improved, and the convolution calculation efficiency is improved.
For further limitation of the calculation method, reference may be made to the above-mentioned limitation on the convolutional neural network device supporting pruning sparseness compression, and details thereof are not repeated here.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. The utility model provides a support convolution neural network device of pruning sparse compression, which characterized in that, the convolution neural network device of pruning sparse compression includes weight buffer, zero value detecting element, source data buffer, source data first-in first-out queue, weight first-in first-out queue, convolution block processing unit and target data buffer, wherein:
the weight buffer is used for storing the weights of the convolutional neural network and outputting the weights to the zero value detection unit, and each weight corresponds to different position information;
the zero value detection unit is used for judging whether the received weight value is zero or not, outputting a non-zero weight value to the weight value first-in first-out queue and outputting position information corresponding to the non-zero weight value to the source data buffer;
the source data buffer is used for storing source data of the convolutional neural network and outputting the corresponding source data to the source data first-in first-out queue according to the position information corresponding to the received nonzero weight;
the source data first-in first-out queue is used for storing the source data output by the source data buffer and outputting S source data to the volume block processing unit according to a first-in first-out principle;
the weight FIFO queue is used for storing the nonzero weight output by the zero value detection unit and outputting the nonzero weight to the rolling block processing unit according to the FIFO principle;
the convolution block processing unit is used for calculating convolution operation of S target data in parallel according to the received nonzero weight and S source data and outputting the S target data obtained through calculation to the target data buffer;
and the target data buffer is used for buffering S target data output by the convolution block processing unit.
2. The convolutional neural network device supporting pruning sparsification compression as claimed in claim 1, wherein the convolutional block processing unit comprises S multiply-accumulators, each multiply-accumulator for calculating a convolution operation of a single target data.
3. The convolutional neural network device supporting pruning-sparsification compression as claimed in claim 2, wherein S of the multiply-accumulators are calculated in parallel.
4. The convolutional neural network device for supporting pruning sparsification compression as claimed in claim 2, wherein the pixels of the input layer of the convolutional neural network are M rows by N columns, and the number of the multiply-accumulator satisfies the following relation:
S>N;
or, S ═ N;
or, S < N.
5. A computing method of a convolutional neural network device supporting pruning sparsification compression based on claim 1, the computing method involving a weight buffer, a zero value detection unit, a source data buffer, a source data first-in first-out queue, a weight first-in first-out queue, a convolutional block processing unit, and a target data buffer, the computing method comprising:
outputting weights to the zero value detection unit through the weight buffer, wherein each weight corresponds to different position information;
the zero value detection unit judges whether the received weight value is zero or not, outputs a non-zero weight value to the weight value first-in first-out queue, and outputs position information corresponding to the non-zero weight value to the source data buffer;
the source data buffer outputs corresponding source data to the source data first-in first-out queue according to the received position information corresponding to the nonzero weight value;
the source data first-in first-out queue outputs S source data to the volume block processing unit according to a first-in first-out principle;
the weight FIFO queue outputs a non-zero weight to the rolling block processing unit according to a FIFO principle;
the convolution block processing unit is used for calculating convolution operation of S target data in parallel according to the received nonzero weight and S source data and outputting the S target data obtained through calculation to the target data buffer;
the target data buffer buffers S target data output by the volume block processing unit.
6. The computing method of claim 5, wherein the convolution block processing unit includes S multiply-accumulators, each multiply-accumulator for calculating a convolution operation of a single target data.
7. The method of computing of claim 6, wherein S of said multiply-accumulator computations are in parallel.
8. The computing method of claim 6, wherein the pixels of the input layer of the convolutional neural network are M rows by N columns, and the number of the multiply-accumulator satisfies the following relation:
S>N;
or, S ═ N;
or, S < N.
CN201911312338.XA 2019-12-18 2019-12-18 Convolutional neural network device supporting pruning sparse compression and calculation method Active CN111126569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911312338.XA CN111126569B (en) 2019-12-18 2019-12-18 Convolutional neural network device supporting pruning sparse compression and calculation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911312338.XA CN111126569B (en) 2019-12-18 2019-12-18 Convolutional neural network device supporting pruning sparse compression and calculation method

Publications (2)

Publication Number Publication Date
CN111126569A true CN111126569A (en) 2020-05-08
CN111126569B CN111126569B (en) 2022-11-11

Family

ID=70498305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911312338.XA Active CN111126569B (en) 2019-12-18 2019-12-18 Convolutional neural network device supporting pruning sparse compression and calculation method

Country Status (1)

Country Link
CN (1) CN111126569B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738435A (en) * 2020-06-22 2020-10-02 上海交通大学 Online sparse training method and system based on mobile equipment
CN113592072A (en) * 2021-07-26 2021-11-02 中国人民解放军国防科技大学 Sparse convolution neural network accelerator oriented to memory access optimization
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
US20180218518A1 (en) * 2017-02-01 2018-08-02 Nvidia Corporation Data compaction and memory bandwidth reduction for sparse neural networks
CN108510066A (en) * 2018-04-08 2018-09-07 清华大学 A kind of processor applied to convolutional neural networks
CN110070178A (en) * 2019-04-25 2019-07-30 北京交通大学 A kind of convolutional neural networks computing device and method
CN110222835A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of convolutional neural networks hardware system and operation method based on zero value detection
US20190347554A1 (en) * 2018-05-14 2019-11-14 Samsung Electronics Co., Ltd. Method and apparatus for universal pruning and compression of deep convolutional neural networks under joint sparsity constraints

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
US20180218518A1 (en) * 2017-02-01 2018-08-02 Nvidia Corporation Data compaction and memory bandwidth reduction for sparse neural networks
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108510066A (en) * 2018-04-08 2018-09-07 清华大学 A kind of processor applied to convolutional neural networks
US20190347554A1 (en) * 2018-05-14 2019-11-14 Samsung Electronics Co., Ltd. Method and apparatus for universal pruning and compression of deep convolutional neural networks under joint sparsity constraints
CN110070178A (en) * 2019-04-25 2019-07-30 北京交通大学 A kind of convolutional neural networks computing device and method
CN110222835A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of convolutional neural networks hardware system and operation method based on zero value detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANGSHUMAN PARASHAR ET AL.: "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks", 《HTTPS://ARXIV.ORG/ABS/1708.04485》 *
L. LU ET AL.: "An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs", 《2019 IEEE 27TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738435A (en) * 2020-06-22 2020-10-02 上海交通大学 Online sparse training method and system based on mobile equipment
CN111738435B (en) * 2020-06-22 2024-03-29 上海交通大学 Online sparse training method and system based on mobile equipment
CN113592072A (en) * 2021-07-26 2021-11-02 中国人民解放军国防科技大学 Sparse convolution neural network accelerator oriented to memory access optimization
CN113592072B (en) * 2021-07-26 2024-05-14 中国人民解放军国防科技大学 Sparse convolutional neural network accelerator for memory optimization
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation

Also Published As

Publication number Publication date
CN111126569B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN111684473B (en) Improving performance of neural network arrays
CN111126569B (en) Convolutional neural network device supporting pruning sparse compression and calculation method
US11307865B2 (en) Data processing apparatus and method
CN110050267B (en) System and method for data management
CN109478144B (en) Data processing device and method
Ju et al. An FPGA implementation of deep spiking neural networks for low-power and fast classification
CN109543832B (en) Computing device and board card
KR102637735B1 (en) Neural network processing unit including approximate multiplier and system on chip including the same
CN110163357B (en) Computing device and method
CN113051216B (en) MobileNet-SSD target detection device and method based on FPGA acceleration
CN110163350B (en) Computing device and method
CN110766127B (en) Neural network computing special circuit and related computing platform and implementation method thereof
CN112115801B (en) Dynamic gesture recognition method and device, storage medium and terminal equipment
CN111353591A (en) Computing device and related product
US20210089888A1 (en) Hybrid Filter Banks for Artificial Neural Networks
Vinh et al. Facial expression recognition system on SoC FPGA
CN114925320B (en) Data processing method and related device
Maurya et al. Complex human activities recognition based on high performance 1D CNN model
Gonzalez et al. An inference hardware accelerator for EEG-based emotion detection
Feng et al. An Efficient Model-Compressed EEGNet Accelerator for Generalized Brain-Computer Interfaces With Near Sensor Intelligence
CN110716751B (en) High-parallelism computing platform, system and computing implementation method
CN112836793B (en) Floating point separable convolution calculation accelerating device, system and image processing method
US20200104207A1 (en) Data processing apparatus and method
Tapiador-Morales et al. Event-based row-by-row multi-convolution engine for dynamic-vision feature extraction on fpga
CN111382848A (en) Computing device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20200519

Address after: Ma Cheng Road Hangzhou City, Zhejiang province 310012 No. 36

Applicant after: NO.52 RESEARCH INSTITUTE OF CHINA ELECTRONICS TECHNOLOGY GROUP Corp.

Address before: Yuhang District, Hangzhou City, Zhejiang Province, 311121 West No. 1500 Building 1 room 311

Applicant before: CETHIK GROUP Co.,Ltd.

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant