CN113568597A - Convolution neural network-oriented DSP packed word multiplication method and system - Google Patents
Convolution neural network-oriented DSP packed word multiplication method and system Download PDFInfo
- Publication number
- CN113568597A CN113568597A CN202110802058.8A CN202110802058A CN113568597A CN 113568597 A CN113568597 A CN 113568597A CN 202110802058 A CN202110802058 A CN 202110802058A CN 113568597 A CN113568597 A CN 113568597A
- Authority
- CN
- China
- Prior art keywords
- dsp
- multiplication
- packed word
- neural network
- oriented
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000001537 neural effect Effects 0.000 title claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 27
- 238000009825 accumulation Methods 0.000 claims abstract description 12
- 230000004913 activation Effects 0.000 claims description 36
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 238000000605 extraction Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 238000012856 packing Methods 0.000 claims description 5
- 238000004806 packaging method and process Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 10
- 230000009286 beneficial effect Effects 0.000 abstract description 7
- 238000005457 optimization Methods 0.000 abstract description 7
- 238000013139 quantization Methods 0.000 abstract description 6
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000013528 artificial neural network Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010248 power generation Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
- G06F17/153—Multidimensional correlation or convolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Optimization (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Analysis (AREA)
- Software Systems (AREA)
- Computational Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a convolution neural network-oriented DSP packed word multiplication method and system, and designs a packed word multiplication calculation mode realized based on DSP resources on an FPGA. The packed word multiplication utilizes the low bit advantage of data quantization, realizes a plurality of four-bit multiplications in one DSP, and improves the utilization efficiency of resources. In addition, because the FPGA specially optimizes the cascade connection among the DSP units, the invention also utilizes the cascade connection of the DSP units to realize the packed word multiplication accumulation, namely, after finishing multiple packed word multiplications and accumulation, the operation result is extracted from the packed word multiplication. The invention fully utilizes the characteristics of the DSP, improves the utilization efficiency of the DSP and is beneficial to the optimization of the energy efficiency ratio of the system.
Description
Technical Field
The invention relates to the technical field of convolutional neural networks, in particular to a convolution neural network-oriented DSP packed word multiplication method and system.
Background
Neural network technology is an important branch of artificial intelligence. A large number of neurons are interconnected to form a hierarchical structure similar to a human brain, and the hierarchical structure is a neural network and generally consists of an input layer, an output layer and a plurality of hidden layers.
The neural network has high precision and strong learning ability, and has wide and important application in the fields of image and voice recognition, pattern recognition and the like. The types of neural networks are many, and there are BP neural networks, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and the like. Among them, the convolutional neural network plays an important role in the field of image recognition due to its characteristics such as weight sharing and extraction of regional features. In the large-scale visual challenge race (ILSVRC), the best performance of image recognition is created by the convolutional neural network correlation algorithm.
However, convolutional neural networks are computationally and parameter intensive models, which place high demands on the computational power and storage capacity of the hardware. Considering the real-time and safety requirements of the application, the forward inference of the model is often deployed at the edge end near the data source. The edge end is an energy-and resource-constrained system, which presents challenges to the efficient execution of the convolutional neural network at the edge end. On the premise of ensuring model accuracy, how to improve throughput, reduce power consumption and resource use becomes the most concerned topic in the industry.
In order to break through the bottleneck of deployment of convolutional neural networks at the edge, the current research focus mainly on two aspects of algorithm and hardware design: on the algorithm level, on the premise of ensuring the precision or only having little precision loss, compressing an original model, for example, carrying out low bit quantization on the weight and the activation value by model quantization; on a hardware level, the high-efficiency special acceleration design conforming to the operation mode of the convolutional neural network is realized, and the requirement of edge end deployment is met. The FPGA supports fine-grained design, has good reconfigurability and is convenient for rapid deployment of various convolutional neural network models.
The core operations of the convolutional neural network (i.e., multiply-accumulate operations) are often mapped into DSP units on the FPGA. However, the DSP on the FPGA platform supports multiplication of 27bits × 18bits, and if both the weight and the activation value are quantized to four bits, only multiplication of 4bits × 4bits needs to be performed in the convolution calculation. In this case, if a special hardware design is not adopted, the EDA tool usually maps a multiplication operation in the hardware description language to a DSP, which causes a great waste of DSP resources, not only affects the energy efficiency ratio of the accelerator, but also makes the DSP resources a constraint condition for the deployment of the network at the edge.
In the publication of: CN101976044A discloses a wind power system modeling and DSP implementation method based on a neural network. The method comprises the steps of determining input and output signals of a wind power generation system and a neural network by analyzing the working mechanism of the wind power generation system and the neural network; the input signals comprise wind speed and pitch angle, the output signals have power, wind wheel rotating speed and wind wheel torque, a BP neural network is combined with a wind power generation system, the number of hidden layers is set to be large enough to achieve random training precision to determine the weight of each layer by establishing a BP neural network model, and the performance of a modeling object can be well fitted; and at the same time to achieve the possibility of its application.
Therefore, a calculation mode of the packed word multiplication is provided, a plurality of low-bit multiplications are mapped into one DSP unit, and the utilization rate of hardware resources and the energy efficiency ratio of model deployment are improved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a convolution neural network-oriented DSP packed word multiplication method and system.
The invention provides a convolution neural network-oriented DSP packed word multiplication method, which comprises the following steps:
step S1: respectively packaging four inputs, two weights and two input activation values of the multiply-accumulate unit through a shift-addition module;
step S2: taking packed word form as the operand of DSP;
step S3: using DSP to complete multiplication operation at the same time;
step S4: extracting the calculation result of the multiplication operation from the output result of the DSP, and completing the convolution multiplication and accumulation four partial sums; and performing further accumulation operation on the partial sums to complete the complete convolution operation.
Preferably, the weight in step S1 is two 4bits, and the input activation value is two 4 bits.
Preferably, the number of operands in the step S2 is two.
Preferably, in step S3, four multiplication operations are performed by using one DSP.
Preferably, the method is used for realizing efficient mapping of multiply-accumulate operation on FPGA in the convolutional neural network; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2.
Preferably, the calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the completion of multiple multiply-accumulate operations.
The invention also provides a convolution neural network-oriented DSP packed word multiplication system, which comprises the following modules:
module M1: packing the weight and the input activation value respectively;
module M2: taking packed word form as the operand of DSP;
module M3: using DSP to complete multiplication operation at the same time;
module M4: the calculation result of the multiplication operation is extracted from the output result of the DSP.
Preferably, the weight in the module M1 is two 4bits, and the input activation value is two 4 bits;
the number of the operands in the module M2 is two;
four multiplication operations are performed in the module M3 using one DSP.
Preferably, the system is used for realizing efficient mapping of multiply-accumulate operation on the FPGA in the convolutional neural network; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2.
Preferably, the calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the completion of multiple multiply-accumulate operations.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention designs a compact word multiplication calculation mode based on a digital signal processing unit (DSP) of a Field Programmable Gate Array (FPGA) by utilizing the characteristic of low bit width after model quantization, so as to improve the energy efficiency ratio of the convolutional neural network deployed at the edge end;
2. the invention fully utilizes the characteristics of the DSP, improves the utilization efficiency of the DSP and is beneficial to the optimization of the energy efficiency ratio of the system;
3. the convolution operation link provided by the invention can fully utilize an optimization circuit in the FPGA, is convenient for layout and wiring, and is beneficial to improving the performance and power consumption.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of a packed word multiplication operation according to the present invention;
FIG. 3 is a diagram of a one-dimensional convolution operation link structure based on DSP according to the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides a convolution neural network-oriented DSP packed word multiplication method, wherein a packed word multiplication calculation mode is to respectively pack two 4-bit weights and two 4-bit input activation values, and the packed word is used as two operands of a DSP in a packed word form, so that one DSP is used for simultaneously completing four multiplication operations. The calculation results of the four multiplication operations can be extracted from the output results of the DSP.
Referring to fig. 1 and 2, a 4bits input activation value a0Shifted left by 22 bits and then compared with another 4btis activation value a1Adding as an operand of the DSP; will be a 4bits weight w0Shifted 11bits to the left and then weighted by another 4btis weight w1Added as another operand of the DSP. The multiplication operation performed by the DSP, corresponding to fig. 1, is shown in equation (1), where P denotes the result of the multiplication operation.
P=(a0<<22+a1)(w0<<11+w1)
=(w1a1)+(w0a1<<11)+(w1a0<<22)+(w0a0<<33) (1)
Since the product of the signed 4-bit weight and the unsigned 4-bit input activation value is 8bits, and considering the effect of the complement on the above operation flow, the bit selection from the packed word product P can be used to extract the calculation results of four multiplication operations, i.e., P [10:0], P [21:11] + P [10], P [32:22] + P [21], and P [43:33] + P [32], as shown in equation (2).
Based on the above calculation mode, the same input activation value needs to be multiplied by two different weights, which can be regarded as parallel output channels, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, which can be regarded as the convolution window parallelism, and the parallelism degree is also 2. Therefore, the computation mode of packed word multiplication realizes the combination of two parallel schemes of convolution window parallel and output channel parallel, and the total parallelism is 4.
Calculation of each multiplicationThe result occupies 11bits in the packed word product, and considering that the product of 4-bit multiplication is only 8bits, so that the extraction of the calculation result can be performed after completing multiply-accumulate operations, the formula (1) and the formula (2) can be rewritten as the formula (3) and the formula (4), whereinRespectively representing a first input activation value, a second input activation value, a first weight and a second weight in the ith group of inputs.
Where N is the number of products accumulated. Under the current computing mode, the value range of N is derived as follows:
the signed number of an m bit is in the range of-2m-1,2m-1-1]An unsigned number of m bits in the range 0,2m-1]Then its product range is [ -2 [)m-1(2m-1),(2m-1-1)(2m-1)]. The range of values of N such products after accumulation is [ -N2m-1(2m-1),N(2m-1-1)(2m-1)]As shown in equation (4), the actual value range of the packed word product is [ -N2 ] due to the complementary code effectm-1(2m-1)-1,N(2m-1-1)(2m-1)-1]. If it is to be expressed as a q-bit signed number, equation (5) is satisfied.
Based on the above analysis, the present invention proposes a one-dimensional convolution operation link structure based on DSP. Referring to fig. 3, the packing of two weight data is completed by an independent adder, and the packing of two input activation values is realized by a 27bits pre-adder inside the DSP; after the data is packed, a multiplier inside the DSP is used for completing packed word multiplication operation; the packed word product is then accumulated with the partial sum output by the previous stage DSP using an accumulator internal to the DSP to obtain a new partial sum as input to the next stage DSP. After the accumulation of the whole link is completed, the splitting of the packed word product is completed according to formula 4, and 4 partial sums are obtained as the output of the module.
The structure can fully utilize the resources in the DSP, and map most operations of packed word multiplication into the DSP, thereby reducing the use of logic resources on the FPGA. In addition, the DSPs in the FPGA are arranged in an array mode, and the FPGA specially optimizes the cascade connection among the DSP units, so that the convolution operation link provided by the invention can fully utilize an optimization circuit in the FPGA, is convenient for layout and wiring, and is beneficial to improving the performance and power consumption.
In addition, in order to further improve the throughput rate, the invention introduces a pipeline structure in the operational chain, and the number of pipeline stages is equal to the number N of DSP units. Because the operation amount of convolution operation is huge, the clock period of pipeline starting and cooling can be ignored, so that the method can be approximately equivalent to the parallel of N DSP units in practical analysis, and a parallel scheme of multiplication parallel in convolution kernels or parallel input channels is realized. From the analysis of the perspective of three-dimensional convolution, the commonality of multiplication parallel in convolution kernels and input channel parallel is more, and the design of subsequent data flow is considered, and the parallel input channel is realized by adopting a one-dimensional convolution operation link. In combination with a parallel scheme realized by packed word multiplication, a one-dimensional convolution operation link can simultaneously realize convolution window parallel with the parallelism of 2, output channel parallel with the parallelism of 2 and input channel parallel with the parallelism of N.
Aiming at the multiply-accumulate operation in the convolutional neural network, the invention designs the calculation mode of the packed word multiplication by utilizing the low bit advantage of data quantization, and realizes a plurality of four-bit multiplications in one DSP. The invention also realizes the compact word multiplication accumulation by utilizing the special structure aiming at the DSP link in the FPGA. The design makes full use of the characteristics of the DSP, improves the utilization efficiency of the DSP and is beneficial to the optimization of the parallelism and the energy efficiency ratio of the system.
The invention also provides a convolution neural network-oriented DSP packed word multiplication system, which comprises the following modules:
module M1: packing the weight and the input activation value respectively; the weight is two 4bits and the input activation value is two 4 bits.
Module M2: taking packed word form as the operand of DSP; the number of operands is two.
Module M3: using DSP to complete multiplication operation at the same time; four multiplication operations are performed using one DSP.
Module M4: the calculation result of the multiplication operation is extracted from the output result of the DSP.
The system is used for realizing the high-efficiency mapping of multiply-accumulate operation in the convolutional neural network on the FPGA; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2. The calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the multiplication and accumulation operations are completed for multiple times.
The invention designs a compact word multiplication calculation mode based on a digital signal processing unit (DSP) of a Field Programmable Gate Array (FPGA) by utilizing the characteristic of low bit width after model quantization, so as to improve the energy efficiency ratio of the convolutional neural network deployed at the edge end; the characteristics of the DSP are fully utilized, the utilization efficiency of the DSP is improved, and the optimization of the energy efficiency ratio of the system is facilitated; the provided convolution operation link can make full use of an optimization circuit inside the FPGA, facilitates layout and wiring, and is beneficial to improving performance and power consumption.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A convolutional neural network-oriented DSP packed word multiplication method, comprising the steps of:
step S1: respectively packaging four inputs, two weights and two input activation values of the multiply-accumulate unit through a shift-addition module;
step S2: taking packed word form as the operand of DSP;
step S3: using DSP to complete multiplication operation at the same time;
step S4: extracting the calculation result of the multiplication operation from the output result of the DSP, and completing the convolution multiplication and accumulation four partial sums; and performing further accumulation operation on the partial sums to complete the complete convolution operation.
2. The convolutional neural network-oriented DSP packed word multiplication method as claimed in claim 1, wherein the weight in step S1 is two 4bits, and the input activation value is two 4 bits.
3. The convolutional neural network-oriented DSP packed word multiplication method of claim 1, wherein the number of operands in the step S2 is two.
4. The convolutional neural network-oriented DSP packed word multiplication method of claim 1, wherein four multiplication operations are performed in step S3 using one DSP.
5. The convolutional neural network-oriented DSP packed word multiplication method as claimed in claim 1, wherein the method is used for realizing efficient mapping of multiply-accumulate operations on FPGA in convolutional neural network; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2.
6. The convolutional neural network-oriented DSP packed word multiplication method of claim 1, wherein the calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the multiplication and accumulation operations are completed for multiple times.
7. A convolutional neural network-oriented DSP packed word multiplication system, comprising:
module M1: packing the weight and the input activation value respectively;
module M2: taking packed word form as the operand of DSP;
module M3: using DSP to complete multiplication operation at the same time;
module M4: the calculation result of the multiplication operation is extracted from the output result of the DSP.
8. The convolutional neural network-oriented DSP packed word multiplication system of claim 7, wherein the weight in the module M1 is two 4bits, and the input activation value is two 4 bits;
the number of the operands in the module M2 is two;
four multiplication operations are performed in the module M3 using one DSP.
9. The convolutional neural network oriented DSP packed word multiplication system of claim 7, wherein the system is configured to implement efficient mapping of multiply-accumulate operations on FPGAs in convolutional neural networks; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2.
10. The convolutional neural network-oriented DSP packed word multiplication system of claim 1, wherein the calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the multiplication and accumulation operations are completed for multiple times.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110802058.8A CN113568597B (en) | 2021-07-15 | 2021-07-15 | Convolution neural network-oriented DSP compact word multiplication method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110802058.8A CN113568597B (en) | 2021-07-15 | 2021-07-15 | Convolution neural network-oriented DSP compact word multiplication method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113568597A true CN113568597A (en) | 2021-10-29 |
CN113568597B CN113568597B (en) | 2024-07-26 |
Family
ID=78165006
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110802058.8A Active CN113568597B (en) | 2021-07-15 | 2021-07-15 | Convolution neural network-oriented DSP compact word multiplication method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113568597B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06274167A (en) * | 1993-03-18 | 1994-09-30 | Casio Comput Co Ltd | Device and method for adding effect |
CN110555516A (en) * | 2019-08-27 | 2019-12-10 | 上海交通大学 | FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method |
CN110780845A (en) * | 2019-10-17 | 2020-02-11 | 浙江大学 | Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof |
WO2020215124A1 (en) * | 2019-04-26 | 2020-10-29 | The University Of Sydney | An improved hardware primitive for implementations of deep neural networks |
CN112434801A (en) * | 2020-10-30 | 2021-03-02 | 西安交通大学 | Convolution operation acceleration method for carrying out weight splitting according to bit precision |
CN112734020A (en) * | 2020-12-28 | 2021-04-30 | 中国电子科技集团公司第十五研究所 | Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network |
-
2021
- 2021-07-15 CN CN202110802058.8A patent/CN113568597B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06274167A (en) * | 1993-03-18 | 1994-09-30 | Casio Comput Co Ltd | Device and method for adding effect |
WO2020215124A1 (en) * | 2019-04-26 | 2020-10-29 | The University Of Sydney | An improved hardware primitive for implementations of deep neural networks |
CN110555516A (en) * | 2019-08-27 | 2019-12-10 | 上海交通大学 | FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method |
CN110780845A (en) * | 2019-10-17 | 2020-02-11 | 浙江大学 | Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof |
CN112434801A (en) * | 2020-10-30 | 2021-03-02 | 西安交通大学 | Convolution operation acceleration method for carrying out weight splitting according to bit precision |
CN112734020A (en) * | 2020-12-28 | 2021-04-30 | 中国电子科技集团公司第十五研究所 | Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network |
Non-Patent Citations (3)
Title |
---|
YONGQUAN SHI, AT EL.: "Fast FPGA-Based Emulation for ReRAM-Enabled Deep Neural Network Accelerator", 《2021 IEEE INTERATIONAL SYMPOSIUM ON CIRUITS AND SYSTEMS》 * |
YUNHE WANG, AT EL.: "AdderNet and Ites Minimalist Hardware Ddsign for Energy-Efficient Artificial Intelligence", 《ARXIV:2101.10015V2 [CS.LS]》 * |
李永博等: "稀疏卷积神经网络加速器设计", 《微 电 子 学 与 计 算 机》, vol. 37, no. 6, pages 30 - 39 * |
Also Published As
Publication number | Publication date |
---|---|
CN113568597B (en) | 2024-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220012593A1 (en) | Neural network accelerator and neural network acceleration method based on structured pruning and low-bit quantization | |
CN111459877B (en) | Winograd YOLOv2 target detection model method based on FPGA acceleration | |
Wang et al. | PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks | |
CN109146067B (en) | Policy convolution neural network accelerator based on FPGA | |
CN107423816B (en) | Multi-calculation-precision neural network processing method and system | |
CN111832719A (en) | Fixed point quantization convolution neural network accelerator calculation circuit | |
CN110543939B (en) | Hardware acceleration realization device for convolutional neural network backward training based on FPGA | |
CN110991631A (en) | Neural network acceleration system based on FPGA | |
CN112434801B (en) | Convolution operation acceleration method for carrying out weight splitting according to bit precision | |
CN110543936B (en) | Multi-parallel acceleration method for CNN full-connection layer operation | |
CN113283587B (en) | Winograd convolution operation acceleration method and acceleration module | |
Xiao et al. | FPGA implementation of CNN for handwritten digit recognition | |
CN115018062A (en) | Convolutional neural network accelerator based on FPGA | |
Véstias et al. | A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs | |
Yang et al. | A sparse CNN accelerator for eliminating redundant computations in intra-and inter-convolutional/pooling layers | |
CN113568597B (en) | Convolution neural network-oriented DSP compact word multiplication method and system | |
Reddy et al. | Low Power and Efficient Re-Configurable Multiplier for Accelerator | |
CN102185585B (en) | Lattice type digital filter based on genetic algorithm | |
Adel et al. | Accelerating deep neural networks using FPGA | |
Jha et al. | Performance analysis of single-precision floating-point MAC for deep learning | |
Kumar et al. | Complex multiplier: implementation using efficient algorithms for signal processing application | |
Alhussain et al. | Hardware-efficient template-based deep CNNs accelerator design | |
CN112836793A (en) | Floating point separable convolution calculation accelerating device, system and image processing method | |
Cruz et al. | Extensible hardware inference accelerator for fpga using models from tensorflow lite | |
Li | A single precision floating point multiplier for machine learning hardware acceleration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |