CN113128675A - Multiplication-free convolution scheduler based on impulse neural network and hardware implementation method thereof - Google Patents
Multiplication-free convolution scheduler based on impulse neural network and hardware implementation method thereof Download PDFInfo
- Publication number
- CN113128675A CN113128675A CN202110431741.5A CN202110431741A CN113128675A CN 113128675 A CN113128675 A CN 113128675A CN 202110431741 A CN202110431741 A CN 202110431741A CN 113128675 A CN113128675 A CN 113128675A
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolution
- calculation
- neuron
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 title claims abstract description 23
- 210000002569 neuron Anatomy 0.000 claims abstract description 56
- 238000004364 calculation method Methods 0.000 claims abstract description 53
- 238000003860 storage Methods 0.000 claims abstract description 29
- 238000001914 filtration Methods 0.000 claims abstract description 11
- 238000011065 in-situ storage Methods 0.000 claims abstract description 7
- 210000002364 input neuron Anatomy 0.000 claims abstract description 4
- 239000012528 membrane Substances 0.000 claims description 12
- 230000000946 synaptic effect Effects 0.000 claims description 6
- 238000012421 spiking Methods 0.000 claims description 5
- 210000000225 synapse Anatomy 0.000 claims description 4
- 210000004027 cell Anatomy 0.000 claims description 2
- 230000003139 buffering effect Effects 0.000 claims 1
- 238000003709 image segmentation Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013142 basic testing Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012942 design verification Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a multiplication-free convolution scheduler based on a pulse neural network and a hardware implementation method thereof, which utilize the characteristic that an SNN is driven based on an event and realize convolution calculation in the SNN through hardware, and provide an effective convolution scheduling method for the SNN in image segmentation; the method caches the input neuron state through FIFO and sends the neuron state to a '1' filter to realize the filtration of an effective state, thereby avoiding the invalid state from participating in calculation, improving the calculation efficiency and needing no multiplication; according to the characteristics of the data stream, a parallel storage structure is specially considered, and parallel storage is achieved by using less storage resources so as to adapt to the high parallel computing power of the computing unit; in the calculation process, the result of each time step is stored back in situ, so that the utilization rate of storage resources is improved; finally, 3 x 3 convolution calculation based on any specification input of the impulse neural network can be realized, and 64-path parallel calculation is supported; the method improves the performance of convolution calculation in the neural network, reduces the calculation complexity and the power consumption, and has higher flexibility.
Description
Technical Field
The invention relates to the field of convolutional neural network algorithms, in particular to a multiplication-free convolutional scheduler based on a pulse neural network and a hardware implementation method thereof.
Background
In recent years, the complexity of a neural network is gradually increased, and a traditional Artificial Neural Network (ANN) generally has a huge number of parameters and needs to participate in matrix multiplication calculation, so that the conventional artificial neural network consumes huge memory and power consumption when being implemented on a hardware platform; compared with the traditional ANN, the Spiking Neural Network (SNN) inspired by the biological brain directly reactivates neurons in hardware, has thousands of synapses, the spiking connection between the neurons is binary, which means that multiplication is not needed, the operation can be completed only through addition, and meanwhile, the spiking neural network is driven by events, and the sparsity of the activated neurons can be utilized to pursue higher efficiency and lower power consumption.
For a pulse neural network (SNN), each neuron corresponds to a membrane potential, the membrane potential is updated based on time step, in each step, the input of the neuron is spike activation signals of neurons in the previous layer, new neuron membrane potentials can be obtained by calculating the weight sum corresponding to effective neurons in a kernel convolution kernel, and when the membrane potential reaches a threshold value, the neurons are activated and output spike activation signals to the subsequent neurons.
By utilizing the advantages of SNN, realizing an efficient SNN biological model through hardware gradually becomes a research hotspot, however, the SNN effect is ideally executed by using the current CPU or GPU architecture, compared with the architecture provided by the CPU, the SNN needs an architecture with higher parallelism, and the GPU can realize high-parallelism calculation but is not suitable for an event-driven calculation mode; meanwhile, while SNN can achieve higher efficiency by reducing latency and computational effort, it requires receiving and processing of input data across multiple time steps, repeated data accesses can result in low throughput; in addition, the SNN activated neurons have sparsity, and the data storage structure and the control flow of the SNN activated neurons also need special consideration.
Disclosure of Invention
The purpose of the invention is as follows: aiming at overcoming the defects of the prior art, the utility model provides a multiplication-free convolution scheduler based on a pulse neural network and a hardware realization method thereof. The input neuron state is cached through FIFO and sent to a '1' filter to realize the filtration of the effective state, so that the invalid state is prevented from participating in the calculation, the calculation efficiency is improved, and the multiplication is not needed; according to the characteristics of the data stream, a parallel storage structure is specially considered, and parallel storage is achieved by using less storage resources so as to adapt to the high parallel computing power of the computing unit; in the calculation process, the result of each time step is stored back in situ, so that the utilization rate of storage resources is improved; finally, 3 x 3 convolution calculation based on any specification input of the impulse neural network can be realized, and 64-path parallel calculation is supported.
The technical scheme is as follows: a multiplication-free convolution scheduler based on a pulse neural network comprises a processor, an external DDR memory and a hardware accelerator, wherein the hardware accelerator comprises a convolution controller, a storage unit and a calculation unit;
the convolution controller is responsible for decoding an instruction of the processor, controlling the overall execution of convolution calculation, reading and writing data from the storage unit, managing the input and the output of the calculation unit and updating the state of the neuron according to the read data;
the memory cell comprises three separate parts, storing neuronal state (neurostate), membrane potential (Vmem) and synaptic Weight (Weight), respectively;
the computing unit bears most of computing tasks and is responsible for computing spike signals emitted by the previous layer of effective neurons, judging whether the neurons in the current layer are activated or not according to the spike signals, and finally updating the state of the neurons.
In a further embodiment, the efficient hardware implementation method of the multiplicative-free convolution scheduler based on the impulse neural network is further designed in that the convolution controller manages the input and the output of the computing unit, only considers the influence of the activated neurons on the neurons of the next layer, and controls the non-activated neurons not to participate in the computation, thereby effectively saving the computation time.
In a further embodiment, the efficient hardware implementation method of the multiplicative-free convolution scheduler based on the impulse neural network is further designed in that a convolution controller controls the overall execution of convolution calculations: firstly, reading neuron states corresponding to a kernel into FIFO in sequence; then, sending the number in the FIFO into a '1' filter, filtering out an index corresponding to the neuron with the state value of 1 in the kernel, and decoding the index into a weight address; then, corresponding Weight and Vmem values are taken from the storage unit according to the Weight address; and finally, sending the Vmem and the State result to a computing unit for computing, and storing the Vmem and the State result back in situ.
In a further embodiment, the efficient hardware implementation method of the multiplicative-free convolution scheduler based on the impulse neural network is further designed in such a way that a smaller RAM is used, a storage structure of NeuronState is specially considered, finally, one layer of NeuronState in 1 kernel can be taken out in 8 ways in parallel, all NeuronState in the kernel can be taken out in 1 cycle through three levels of cache and stored in FIFO, and pipeline access is achieved.
In a further embodiment, the efficient hardware implementation method of the multiplicationless convolutional scheduler based on the impulse neural network is further designed in that each position in the FIFO buffers N neuron states, where N is the weight number corresponding to 1 kernel, and "1" represents that a neuron is activated, and M is the number of the activated neurons in the kernel, and by using a "1" filter, indexes corresponding to M "1" state neurons can be filtered out in M cycles, and the processing speed is N/M times of that of the conventional CNN, and for sparse efficient data streams, that is, N/M is very large, a good optimization effect can be obtained.
In a further embodiment, the efficient hardware implementation method of the convolution scheduler without multiplication based on the impulse neural network is further designed in that a convolution module adopts 64 paths of parallel calculation, supports 3 × 3 convolution calculation of any specification input, 8 paths of controllers control and calculate 8 rows of convolution results, and each controller is responsible for calculating 8 channel results, wherein 1 channel represents 1 channel of the convolution results; each channel shares the same input, neuron state, corresponding to a different weight.
In a further embodiment, the efficient hardware implementation of the multiplicative-free convolutional scheduler based on the spikeless neural network is further designed in that Vmem and Weight require 8-way parallel storage to adapt to the high parallel computing power of the computing unit; vmem data are sequentially stored in the RAMs with the numbers of 0-7 according to rows, 8 rows of source data can be provided simultaneously during calculation, Vmem results of 1-8 channels are sequentially stored in each RAM from high to low, if the number of the channels is larger than 8, the Vmem results continue to be stored in 9-16 channels after the Vmem results of 1-8 channels are stored, and the like.
In a further embodiment, the efficient hardware implementation method of the multiplicative-free convolution scheduler based on the impulse neural network is further designed in such a way that the Weight specification is 3 × 3 × InC, where InC is the number of channels of the input layer, the results of each layer share the same Weight, and there are OutC weights of the 3 × 3 × InC specification, where OutC is the number of channels of the results; and each RAM sequentially stores the Weight values of 1 st to 8 th from high to low, and if OutC is larger than 8, the RAM continues storing.
To sum up, the efficient hardware implementation method of the multiplicative-free convolution scheduler based on the impulse neural network comprises the following steps:
step 2, sending the number in the FIFO into a '1' filter, decoding an address of an effective weight, and taking down a number after the filtering is finished;
step 3, corresponding Weight and Vmem values are taken from the memory unit according to the decoding address;
and 4, sending the Weight and the Vmem into the calculation unit, and storing the calculation result Vmem and the NeuroState in the storage unit in situ.
Has the advantages that:
firstly, the convolution controller can realize the running water access of the NeuronState, and all states corresponding to one kernel can be stored in the FIFO in 1 beat by means of parallel storage and cache of the NeuronState, so that the time consumed by data transportation is greatly reduced, and the data throughput rate is improved;
secondly, filtering 1 by using a filter, and controlling the inactive neurons not to participate in calculation by using the sparsity of the activated neurons, thereby effectively saving the calculation time and achieving low power consumption while having high efficiency;
thirdly, in SNN, the spike connection between neurons is binary, and no hardware multiplier is needed, reducing computational resources and hardware implementation complexity.
In summary, the present invention provides a hardware implementation of convolution calculation in SNN by using the advantages of SNN, which can effectively improve the performance of convolution calculation in a neural network, reduce the calculation complexity, obtain higher efficiency and lower power consumption, and have higher flexibility.
Drawings
Fig. 1 is a hardware configuration diagram of a convolution scheduler in the present invention.
Fig. 2 is a control flow diagram of the convolutional scheduler of the present invention.
FIG. 3 is the read neuron State module of the convolutional scheduler of the present invention.
FIG. 4 is a block diagram illustrating filtering and access in accordance with the present invention.
FIG. 5 is a block diagram of the calculation and result saving module of the present invention.
Fig. 6 is a memory structure diagram of the NeuronState in the present invention.
FIG. 7 is a memory structure diagram of Weight in the present invention.
FIG. 8 is a memory structure diagram of Vmem in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The first embodiment is as follows:
as shown in fig. 1, this embodiment proposes a hardware structure of a multiplicative-free convolution scheduler based on a pulse neural network, which is composed of a processor, an external DDR memory, a storage unit, a calculation unit and a convolution controller; the storage unit is used for storing neuron state NeuronState, membrane potential Vmem and synaptic Weight; the computing unit snpu comprises 8 × 8=64 parallel computing units and is responsible for judging whether neurons in the current layer are activated or not and updating the state of the neurons; the convolution controller is responsible for decoding instructions of the processor, controlling the overall execution of convolution calculations, reading and writing data from the storage unit and managing the input and output of the calculation unit.
The convolution controller controls the execution of convolution calculation, and the whole control flow is shown in FIG. 2; firstly, sequentially reading the STATE of a neuron corresponding to one kernel into FIFO; then sending the STATE in the FIFO into a '1' filter, filtering out the index corresponding to the neuron with the STATE value of 1 in the kernel, and decoding the index into a weight address; then, corresponding Weight and Vmem values are taken from the storage unit according to the Weight address; and finally, sending the Vmem and the State result to a computing unit for computing and storing the Vmem and the State result in situ.
For the 3 × 3 convolution calculation with step =1, the input layer specification is (N +2) × InC, and there are OutC 3 × 3 × InC weight kernels, the output layer is N × OutC, where N represents the length or width of the input image, and the convolution calculation is performed after zero padding is needed, InC is the number of channels of the input layer, and OutC is the number of channels of the output layer, and is also the number of weights.
Example two:
on the basis of the first embodiment, an example of the present invention is described in detail below, where N =80, InC =3, and OutC =8, and in practical applications, the present invention can support a convolution calculation of 3 × 3 with an input of any specification; the hardware acceleration system is designed based on Verilog HDL language, and basic test verification is completed by using VCS and FPGA, and the specific steps are as follows:
fetch _ STATE _ to _ FIFO, read STATE into FIFO, as shown in fig. 3, one kernel corresponds to 3 × 3 × 3=27 STATEs, STATEs are stored in FIFO in a particular order, FIFO _ DATA [2:0] is 3 STATEs of line 1, column 1 in the kernel, FIFO _ DATA [5:3] is 3 STATEs of line 1, column 2 in the kernel, and so on, FIFO _ DATA [26:24] is 3 STATEs of line 3, column 3 in the kernel.
According to the method, 64 paths of parallel calculation are adopted, for the step 1, 8 paths of controllers calculate 8 lines of convolution results, corresponding source data are different STATEs, 8 STATE FIFOs are needed, and 8 kernel source STATEs corresponding to the 8 lines of convolution results are stored respectively; simultaneously, each path of controller simultaneously calculates the results of 8 channels; each channel shares the same source STATE, corresponding to a different weight.
Further, regarding the results of the 1 st row and the 1 st column, the STATE of the 1 st to 3 rd rows and the 1 st to 3 rd columns needs to be taken out, the position of 1 in the STATE is judged, and the weight is correspondingly taken out; this requires that 27 STATEs be stored in the FIFO according to the unified standard described above, for example 27 numbers 000_010_000_101_000_100_000_000_010 are stored in the STATE FIFO, as shown in fig. 3.
For the design, the convolutional layer is connected with the pooling layer, the maximum pooling operation is carried out according to the specification of kernel =2 × 2, the convolution result of the 1 st line is required to be calculated firstly, the 2 nd line is calculated after the 1 st line is calculated, namely, the 8 odd lines are calculated firstly, and then the even lines are calculated; the calculation process of the odd lines is marked as state1, the even lines are marked as state2, the 1 st, 3 rd, 5 th and … 15 th lines are calculated at the state1, then 8 paths of parallel calculation are carried out on the even 2 th, 4 th and … 16 th lines at the state2, and after the convolution of the first 16 lines is completed, the convolution result of the next 16 lines is calculated.
Furthermore, in order to improve the reading efficiency, 27 STATE values need to be taken out in 1 clock cycle, that is, 8 27bit numbers are stored in 8 FIFOs in each beat, and in order to realize that 8 paths of taking out one layer of neuron STATE in 1 kernel in parallel, certain requirements are imposed on the storage mode of the STATE.
As shown in fig. 6, which is a storage structure diagram of NeuronState, when calculating the convolution result of row 16, rows 16,17 and 18 of STATE need to be read, which requires that rows 16,17 and 18 are stored in the same address in NeuronState ram; meanwhile, when the convolution result of the 17 th row is calculated, the 17 th, 18 th and 19 th rows of the STATE need to be read, the 17 th, 18 th and 19 th rows are required to be stored in the same address in the neuroonstate RAM, the width of the RAM is limited, the boundary row needs to be repeatedly stored, and as shown in fig. 6, the 17 th and 18 th rows of the STATE are repeatedly stored, so that the pipelined STATE fetching can be realized.
Step 2
The module adopts 8-path parallelism, takes out the STATE signals from 8 FIFOs and sends the STATE signals to a filter, outputs the index corresponding to the neuron with the STATE value of 1 in the kernel, and takes out the number of the neurons from the FIFOs after 1-time filtering is finished.
In fig. 3, 27 numbers 000_010_000_101_000_100_000_000_010 _ are stored in the STATE FIFO, wherein 5 values are 1, and after 5 beats, the filtering operation is completed, and the output index corresponding to the filter is 1,11,15,17, 22.
Step 3
And the fetch _ src _ to _ snpu module decodes the filter output index into a Weight address, fetches the corresponding Weight and Vmem values from the storage unit according to the address, and sends the Weight and Vmem values into the snpu calculation unit for calculation.
The index analyzed by the filter can be decoded into a Weight address, then the activated Weight can be taken out, corresponding 8 Vmem values are also taken out, 8-path controllers calculate 8-line convolution results, simultaneously each path of controller simultaneously calculates results of 8 channels, the calculating unit completes 8 multiplied by 8= 64-path parallel convolution calculation, and the calculation between layers is pipelined.
FIG. 4 contains more details of step 3, where one kernel corresponds to 27 weights, 8 kernels correspond to 8 channels of the result, and each channel shares the input Neuron State, so that valid positions of weightings in the kernels 0-7 are the same, and the same addresses can be simultaneously fetched, and each address stores Weight values of the kernels 0-7 in sequence from high to low, and the weightings in each kernel are stored in 0-26 addresses in sequence according to the sequence of the first row and the second row, and the storage structure diagram of weightings is specifically shown in FIG. 7.
Meanwhile, the Vmem value of the previous time of the output layer needs to be read, updating is carried out on the basis, the memory structure diagram of Vmem is shown in FIG. 8, Vmem data are stored in the RAMs with the numbers of 0-7 in sequence according to rows, 8 rows of source data can be provided during calculation, meanwhile, Vmem results of 1-8 channels are stored in each RAM from high to low in sequence, if the number of the channels is larger than 8, after the 1-8 channel results are stored, 9-16 channels continue to be stored, and the rest is analogized.
Step 4
fetch _ res _ from _ snpu, which saves the result Vmem and neuron state returned by the compute unit back to the memory unit in-situ.
Design verification is carried out according to the scheme, the total neuron is N, wherein the activated neuron M can obtain N/M times of optimization effect compared with the traditional convolutional neural network, and the sparse optimization effect is better. In summary, the efficient hardware implementation method of the multiplicationless convolution scheduler based on the impulse neural network supports 3 × 3 convolution calculation based on any specification input of the impulse neural network, supports 64-path parallel calculation, does not need a hardware multiplier, can effectively improve the performance of convolution calculation in the neural network, reduces the calculation complexity, obtains higher efficiency and lower power consumption, and has higher flexibility.
Although the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the details of the embodiments, and various equivalent modifications can be made within the technical spirit of the present invention, and the scope of the present invention is also within the scope of the present invention.
Claims (7)
1. A multiplicative-free convolutional scheduler based on a spiking neural network, comprising:
a processor;
a memory cell comprising at least one set of first storage regions for storing neuron states, at least one set of second storage regions for storing membrane potentials, and at least one set of third storage regions for storing synaptic weights;
the convolution controller is used for decoding the instruction of the processor and controlling the integral execution of convolution calculation; the volume controller reads and writes data from the storage unit, manages the input and output of the computing unit and updates the state of the neuron according to the data;
and the computing unit is electrically connected with the convolution controller and used for computing the spike signal emitted by the previous layer of effective neurons, judging whether the current layer of neurons is activated or not according to the spike signal, and finally updating the state of the neurons.
2. The multiplicative-free convolutional scheduler based on a spiking neural network as claimed in claim 1, wherein the membrane potential and synaptic weight are stored in at least 8 ways in parallel; the membrane potential data are sequentially stored in the RAMs with the numbers of 0-7 according to rows, meanwhile, each RAM sequentially stores membrane potential results of 1-8 channels from high to low, if the number of the channels is larger than 8, after the results of the 1-8 channels are stored, 9-16 channels continue to be stored, and the like.
3. The convolutional scheduler without multiplication based on impulse neural network of claim 1, wherein the synaptic weight specification is 3 x 3 xnc, where InC is the number of channels of the input layer, the results of each layer share the same weight, there are OutC weights of the 3 x 3 xnc specification, where OutC is the number of channels of the results; and (4) sequentially storing 1 st to 8 th synapse weight values from high to low in each RAM, and if OutC is larger than 8, continuously storing the synapse weight values downwards.
4. The multiplicationless convolutional scheduler based on the impulse neural network as claimed in claim 1, wherein the convolutional controller adopts at least 64-way parallel computation, 8-way controllers control and compute 8 rows of convolutional results, and each controller is responsible for computing the results of 8 channels, and 1 channel represents 1 channel of the convolutional results; each channel shares the same input neuron state, corresponding to a different weight.
5. A hardware implementation method of a multiplication-free convolution scheduler based on a pulse neural network is characterized by comprising the following steps:
step 1, storing the neuron state corresponding to one kernel into a first-in first-out queue according to the sequence of a next layer of results;
step 2, sending the numbers in the first-in first-out queue into a '1' filter, filtering out the corresponding position index of the neuron with the state value of 1 in the kernel, decoding the address of the effective weight, and taking the next number after the filtering is finished;
step 3, reading a corresponding synapse weight value and a corresponding membrane potential value from the storage unit according to the decoding address;
and 4, sending the synaptic weight value and the membrane potential value into a calculation unit, and storing the calculation result back to the storage unit in situ.
6. The hardware implementation method of the multiplicative-free convolutional scheduler based on the impulse neural network as claimed in claim 5, wherein a RAM smaller than a predetermined value is used to fetch a layer of neuron states in at least 1 kernel in parallel, and all neuron states in the kernel are fetched in 1 predetermined cycle by at least three-level buffering, and the fetched neuron states are stored in a first-in first-out queue to implement pipelined fetching.
7. The hardware implementation method of the multiplicative-free convolutional scheduler based on the impulse neural network as claimed in claim 5, wherein each position in the fifo queue stores N neuron states;
wherein N is the weight number corresponding to 1 kernel, 1 represents the neuron is activated, M is the number of the activated neuron in the kernel, and M indexes corresponding to 1 state neurons are filtered out in M periods by using a 1 filter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110431741.5A CN113128675B (en) | 2021-04-21 | 2021-04-21 | Multiplication-free convolution scheduler based on impulse neural network and hardware implementation method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110431741.5A CN113128675B (en) | 2021-04-21 | 2021-04-21 | Multiplication-free convolution scheduler based on impulse neural network and hardware implementation method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113128675A true CN113128675A (en) | 2021-07-16 |
CN113128675B CN113128675B (en) | 2023-12-26 |
Family
ID=76778696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110431741.5A Active CN113128675B (en) | 2021-04-21 | 2021-04-21 | Multiplication-free convolution scheduler based on impulse neural network and hardware implementation method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113128675B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688983A (en) * | 2021-08-09 | 2021-11-23 | 上海新氦类脑智能科技有限公司 | Convolution operation implementation method, circuit and terminal for reducing weight storage in impulse neural network |
CN114781633A (en) * | 2022-06-17 | 2022-07-22 | 电子科技大学 | Processor fusing artificial neural network and pulse neural network |
CN114819114A (en) * | 2022-07-04 | 2022-07-29 | 南京大学 | Pulse neural network hardware accelerator and optimization method thereof in convolution operation |
CN116205274A (en) * | 2023-04-27 | 2023-06-02 | 苏州浪潮智能科技有限公司 | Control method, device, equipment and storage medium of impulse neural network |
CN117054396A (en) * | 2023-10-11 | 2023-11-14 | 天津大学 | Raman spectrum detection method and device based on double-path multiplicative neural network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015016640A1 (en) * | 2013-08-02 | 2015-02-05 | Ahn Byungik | Neural network computing device, system and method |
CN107092959A (en) * | 2017-04-07 | 2017-08-25 | 武汉大学 | Hardware friendly impulsive neural networks model based on STDP unsupervised-learning algorithms |
US20180189648A1 (en) * | 2016-12-30 | 2018-07-05 | Intel Corporation | Event driven and time hopping neural network |
CN108846408A (en) * | 2018-04-25 | 2018-11-20 | 中国人民解放军军事科学院军事医学研究院 | Image classification method and device based on impulsive neural networks |
US20190005376A1 (en) * | 2017-06-30 | 2019-01-03 | Intel Corporation | In-memory spiking neural networks for memory array architectures |
KR20210004349A (en) * | 2019-07-04 | 2021-01-13 | 한국과학기술연구원 | Neuromodule device and signaling method performed on the same |
-
2021
- 2021-04-21 CN CN202110431741.5A patent/CN113128675B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015016640A1 (en) * | 2013-08-02 | 2015-02-05 | Ahn Byungik | Neural network computing device, system and method |
US20180189648A1 (en) * | 2016-12-30 | 2018-07-05 | Intel Corporation | Event driven and time hopping neural network |
CN107092959A (en) * | 2017-04-07 | 2017-08-25 | 武汉大学 | Hardware friendly impulsive neural networks model based on STDP unsupervised-learning algorithms |
US20190005376A1 (en) * | 2017-06-30 | 2019-01-03 | Intel Corporation | In-memory spiking neural networks for memory array architectures |
CN108846408A (en) * | 2018-04-25 | 2018-11-20 | 中国人民解放军军事科学院军事医学研究院 | Image classification method and device based on impulsive neural networks |
KR20210004349A (en) * | 2019-07-04 | 2021-01-13 | 한국과학기술연구원 | Neuromodule device and signaling method performed on the same |
Non-Patent Citations (1)
Title |
---|
沈阳靖;沈君成;叶俊;马琪;: "基于FPGA的脉冲神经网络加速器设计", 电子科技, no. 10 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688983A (en) * | 2021-08-09 | 2021-11-23 | 上海新氦类脑智能科技有限公司 | Convolution operation implementation method, circuit and terminal for reducing weight storage in impulse neural network |
CN114781633A (en) * | 2022-06-17 | 2022-07-22 | 电子科技大学 | Processor fusing artificial neural network and pulse neural network |
CN114819114A (en) * | 2022-07-04 | 2022-07-29 | 南京大学 | Pulse neural network hardware accelerator and optimization method thereof in convolution operation |
CN114819114B (en) * | 2022-07-04 | 2022-09-13 | 南京大学 | Pulse neural network hardware accelerator and optimization method thereof in convolution operation |
CN116205274A (en) * | 2023-04-27 | 2023-06-02 | 苏州浪潮智能科技有限公司 | Control method, device, equipment and storage medium of impulse neural network |
CN117054396A (en) * | 2023-10-11 | 2023-11-14 | 天津大学 | Raman spectrum detection method and device based on double-path multiplicative neural network |
CN117054396B (en) * | 2023-10-11 | 2024-01-05 | 天津大学 | Raman spectrum detection method and device based on double-path multiplicative neural network |
Also Published As
Publication number | Publication date |
---|---|
CN113128675B (en) | 2023-12-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113128675B (en) | Multiplication-free convolution scheduler based on impulse neural network and hardware implementation method thereof | |
CN109598338B (en) | Convolutional neural network accelerator based on FPGA (field programmable Gate array) for calculation optimization | |
CN111325321B (en) | Brain-like computing system based on multi-neural network fusion and execution method of instruction set | |
CN110334799A (en) | Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing | |
CN115516450B (en) | Inference engine circuit architecture | |
CN111105023B (en) | Data stream reconstruction method and reconfigurable data stream processor | |
CN111626403B (en) | Convolutional neural network accelerator based on CPU-FPGA memory sharing | |
CN111144556A (en) | Hardware circuit of range batch processing normalization algorithm for deep neural network training and reasoning | |
Sommer et al. | Efficient hardware acceleration of sparsely active convolutional spiking neural networks | |
CN111582465A (en) | Convolutional neural network acceleration processing system and method based on FPGA and terminal | |
US11501151B2 (en) | Pipelined accumulator | |
CN115423081A (en) | Neural network accelerator based on CNN _ LSTM algorithm of FPGA | |
CN113191488A (en) | LSTM network model-oriented hardware acceleration system | |
CN114429214A (en) | Arithmetic unit, related device and method | |
CN116822600A (en) | Neural network search chip based on RISC-V architecture | |
CN108073548B (en) | Convolution operation device and convolution operation method | |
CN115222028A (en) | One-dimensional CNN-LSTM acceleration platform based on FPGA and implementation method | |
CN115719088B (en) | Intermediate cache scheduling circuit device supporting in-memory CNN | |
Yu et al. | Implementation of convolutional neural network with co-design of high-level synthesis and verilog HDL | |
CN118070855B (en) | Convolutional neural network accelerator based on RISC-V architecture | |
CN115936064B (en) | Neural network acceleration array based on weight circulation data stream | |
CN113988280B (en) | Array computing accelerator architecture based on binarization neural network | |
US20220284265A1 (en) | Hardware architecture for spiking neural networks and method of operating | |
CN110738310B (en) | Sparse neural network accelerator and implementation method thereof | |
CN112487352B (en) | Fast Fourier transform operation method on reconfigurable processor and reconfigurable processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |