CN113568597A - Convolution neural network-oriented DSP packed word multiplication method and system - Google Patents

Convolution neural network-oriented DSP packed word multiplication method and system Download PDF

Info

Publication number
CN113568597A
CN113568597A CN202110802058.8A CN202110802058A CN113568597A CN 113568597 A CN113568597 A CN 113568597A CN 202110802058 A CN202110802058 A CN 202110802058A CN 113568597 A CN113568597 A CN 113568597A
Authority
CN
China
Prior art keywords
dsp
multiplication
packed word
neural network
oriented
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110802058.8A
Other languages
Chinese (zh)
Other versions
CN113568597B (en
Inventor
莫志文
杜培栋
郭梦原
王琴
景乃锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110802058.8A priority Critical patent/CN113568597B/en
Publication of CN113568597A publication Critical patent/CN113568597A/en
Application granted granted Critical
Publication of CN113568597B publication Critical patent/CN113568597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Analysis (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a convolution neural network-oriented DSP packed word multiplication method and system, and designs a packed word multiplication calculation mode realized based on DSP resources on an FPGA. The packed word multiplication utilizes the low bit advantage of data quantization, realizes a plurality of four-bit multiplications in one DSP, and improves the utilization efficiency of resources. In addition, because the FPGA specially optimizes the cascade connection among the DSP units, the invention also utilizes the cascade connection of the DSP units to realize the packed word multiplication accumulation, namely, after finishing multiple packed word multiplications and accumulation, the operation result is extracted from the packed word multiplication. The invention fully utilizes the characteristics of the DSP, improves the utilization efficiency of the DSP and is beneficial to the optimization of the energy efficiency ratio of the system.

Description

Convolution neural network-oriented DSP packed word multiplication method and system
Technical Field
The invention relates to the technical field of convolutional neural networks, in particular to a convolution neural network-oriented DSP packed word multiplication method and system.
Background
Neural network technology is an important branch of artificial intelligence. A large number of neurons are interconnected to form a hierarchical structure similar to a human brain, and the hierarchical structure is a neural network and generally consists of an input layer, an output layer and a plurality of hidden layers.
The neural network has high precision and strong learning ability, and has wide and important application in the fields of image and voice recognition, pattern recognition and the like. The types of neural networks are many, and there are BP neural networks, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and the like. Among them, the convolutional neural network plays an important role in the field of image recognition due to its characteristics such as weight sharing and extraction of regional features. In the large-scale visual challenge race (ILSVRC), the best performance of image recognition is created by the convolutional neural network correlation algorithm.
However, convolutional neural networks are computationally and parameter intensive models, which place high demands on the computational power and storage capacity of the hardware. Considering the real-time and safety requirements of the application, the forward inference of the model is often deployed at the edge end near the data source. The edge end is an energy-and resource-constrained system, which presents challenges to the efficient execution of the convolutional neural network at the edge end. On the premise of ensuring model accuracy, how to improve throughput, reduce power consumption and resource use becomes the most concerned topic in the industry.
In order to break through the bottleneck of deployment of convolutional neural networks at the edge, the current research focus mainly on two aspects of algorithm and hardware design: on the algorithm level, on the premise of ensuring the precision or only having little precision loss, compressing an original model, for example, carrying out low bit quantization on the weight and the activation value by model quantization; on a hardware level, the high-efficiency special acceleration design conforming to the operation mode of the convolutional neural network is realized, and the requirement of edge end deployment is met. The FPGA supports fine-grained design, has good reconfigurability and is convenient for rapid deployment of various convolutional neural network models.
The core operations of the convolutional neural network (i.e., multiply-accumulate operations) are often mapped into DSP units on the FPGA. However, the DSP on the FPGA platform supports multiplication of 27bits × 18bits, and if both the weight and the activation value are quantized to four bits, only multiplication of 4bits × 4bits needs to be performed in the convolution calculation. In this case, if a special hardware design is not adopted, the EDA tool usually maps a multiplication operation in the hardware description language to a DSP, which causes a great waste of DSP resources, not only affects the energy efficiency ratio of the accelerator, but also makes the DSP resources a constraint condition for the deployment of the network at the edge.
In the publication of: CN101976044A discloses a wind power system modeling and DSP implementation method based on a neural network. The method comprises the steps of determining input and output signals of a wind power generation system and a neural network by analyzing the working mechanism of the wind power generation system and the neural network; the input signals comprise wind speed and pitch angle, the output signals have power, wind wheel rotating speed and wind wheel torque, a BP neural network is combined with a wind power generation system, the number of hidden layers is set to be large enough to achieve random training precision to determine the weight of each layer by establishing a BP neural network model, and the performance of a modeling object can be well fitted; and at the same time to achieve the possibility of its application.
Therefore, a calculation mode of the packed word multiplication is provided, a plurality of low-bit multiplications are mapped into one DSP unit, and the utilization rate of hardware resources and the energy efficiency ratio of model deployment are improved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a convolution neural network-oriented DSP packed word multiplication method and system.
The invention provides a convolution neural network-oriented DSP packed word multiplication method, which comprises the following steps:
step S1: respectively packaging four inputs, two weights and two input activation values of the multiply-accumulate unit through a shift-addition module;
step S2: taking packed word form as the operand of DSP;
step S3: using DSP to complete multiplication operation at the same time;
step S4: extracting the calculation result of the multiplication operation from the output result of the DSP, and completing the convolution multiplication and accumulation four partial sums; and performing further accumulation operation on the partial sums to complete the complete convolution operation.
Preferably, the weight in step S1 is two 4bits, and the input activation value is two 4 bits.
Preferably, the number of operands in the step S2 is two.
Preferably, in step S3, four multiplication operations are performed by using one DSP.
Preferably, the method is used for realizing efficient mapping of multiply-accumulate operation on FPGA in the convolutional neural network; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2.
Preferably, the calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the completion of multiple multiply-accumulate operations.
The invention also provides a convolution neural network-oriented DSP packed word multiplication system, which comprises the following modules:
module M1: packing the weight and the input activation value respectively;
module M2: taking packed word form as the operand of DSP;
module M3: using DSP to complete multiplication operation at the same time;
module M4: the calculation result of the multiplication operation is extracted from the output result of the DSP.
Preferably, the weight in the module M1 is two 4bits, and the input activation value is two 4 bits;
the number of the operands in the module M2 is two;
four multiplication operations are performed in the module M3 using one DSP.
Preferably, the system is used for realizing efficient mapping of multiply-accumulate operation on the FPGA in the convolutional neural network; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2.
Preferably, the calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the completion of multiple multiply-accumulate operations.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention designs a compact word multiplication calculation mode based on a digital signal processing unit (DSP) of a Field Programmable Gate Array (FPGA) by utilizing the characteristic of low bit width after model quantization, so as to improve the energy efficiency ratio of the convolutional neural network deployed at the edge end;
2. the invention fully utilizes the characteristics of the DSP, improves the utilization efficiency of the DSP and is beneficial to the optimization of the energy efficiency ratio of the system;
3. the convolution operation link provided by the invention can fully utilize an optimization circuit in the FPGA, is convenient for layout and wiring, and is beneficial to improving the performance and power consumption.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of a packed word multiplication operation according to the present invention;
FIG. 3 is a diagram of a one-dimensional convolution operation link structure based on DSP according to the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides a convolution neural network-oriented DSP packed word multiplication method, wherein a packed word multiplication calculation mode is to respectively pack two 4-bit weights and two 4-bit input activation values, and the packed word is used as two operands of a DSP in a packed word form, so that one DSP is used for simultaneously completing four multiplication operations. The calculation results of the four multiplication operations can be extracted from the output results of the DSP.
Referring to fig. 1 and 2, a 4bits input activation value a0Shifted left by 22 bits and then compared with another 4btis activation value a1Adding as an operand of the DSP; will be a 4bits weight w0Shifted 11bits to the left and then weighted by another 4btis weight w1Added as another operand of the DSP. The multiplication operation performed by the DSP, corresponding to fig. 1, is shown in equation (1), where P denotes the result of the multiplication operation.
P=(a0<<22+a1)(w0<<11+w1)
=(w1a1)+(w0a1<<11)+(w1a0<<22)+(w0a0<<33) (1)
Since the product of the signed 4-bit weight and the unsigned 4-bit input activation value is 8bits, and considering the effect of the complement on the above operation flow, the bit selection from the packed word product P can be used to extract the calculation results of four multiplication operations, i.e., P [10:0], P [21:11] + P [10], P [32:22] + P [21], and P [43:33] + P [32], as shown in equation (2).
Figure BDA0003165033260000041
Based on the above calculation mode, the same input activation value needs to be multiplied by two different weights, which can be regarded as parallel output channels, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, which can be regarded as the convolution window parallelism, and the parallelism degree is also 2. Therefore, the computation mode of packed word multiplication realizes the combination of two parallel schemes of convolution window parallel and output channel parallel, and the total parallelism is 4.
Calculation of each multiplicationThe result occupies 11bits in the packed word product, and considering that the product of 4-bit multiplication is only 8bits, so that the extraction of the calculation result can be performed after completing multiply-accumulate operations, the formula (1) and the formula (2) can be rewritten as the formula (3) and the formula (4), wherein
Figure BDA0003165033260000042
Respectively representing a first input activation value, a second input activation value, a first weight and a second weight in the ith group of inputs.
Figure BDA0003165033260000051
Figure BDA0003165033260000052
Where N is the number of products accumulated. Under the current computing mode, the value range of N is derived as follows:
the signed number of an m bit is in the range of-2m-1,2m-1-1]An unsigned number of m bits in the range 0,2m-1]Then its product range is [ -2 [)m-1(2m-1),(2m-1-1)(2m-1)]. The range of values of N such products after accumulation is [ -N2m-1(2m-1),N(2m-1-1)(2m-1)]As shown in equation (4), the actual value range of the packed word product is [ -N2 ] due to the complementary code effectm-1(2m-1)-1,N(2m-1-1)(2m-1)-1]. If it is to be expressed as a q-bit signed number, equation (5) is satisfied.
Figure BDA0003165033260000053
Based on the above analysis, the present invention proposes a one-dimensional convolution operation link structure based on DSP. Referring to fig. 3, the packing of two weight data is completed by an independent adder, and the packing of two input activation values is realized by a 27bits pre-adder inside the DSP; after the data is packed, a multiplier inside the DSP is used for completing packed word multiplication operation; the packed word product is then accumulated with the partial sum output by the previous stage DSP using an accumulator internal to the DSP to obtain a new partial sum as input to the next stage DSP. After the accumulation of the whole link is completed, the splitting of the packed word product is completed according to formula 4, and 4 partial sums are obtained as the output of the module.
The structure can fully utilize the resources in the DSP, and map most operations of packed word multiplication into the DSP, thereby reducing the use of logic resources on the FPGA. In addition, the DSPs in the FPGA are arranged in an array mode, and the FPGA specially optimizes the cascade connection among the DSP units, so that the convolution operation link provided by the invention can fully utilize an optimization circuit in the FPGA, is convenient for layout and wiring, and is beneficial to improving the performance and power consumption.
In addition, in order to further improve the throughput rate, the invention introduces a pipeline structure in the operational chain, and the number of pipeline stages is equal to the number N of DSP units. Because the operation amount of convolution operation is huge, the clock period of pipeline starting and cooling can be ignored, so that the method can be approximately equivalent to the parallel of N DSP units in practical analysis, and a parallel scheme of multiplication parallel in convolution kernels or parallel input channels is realized. From the analysis of the perspective of three-dimensional convolution, the commonality of multiplication parallel in convolution kernels and input channel parallel is more, and the design of subsequent data flow is considered, and the parallel input channel is realized by adopting a one-dimensional convolution operation link. In combination with a parallel scheme realized by packed word multiplication, a one-dimensional convolution operation link can simultaneously realize convolution window parallel with the parallelism of 2, output channel parallel with the parallelism of 2 and input channel parallel with the parallelism of N.
Aiming at the multiply-accumulate operation in the convolutional neural network, the invention designs the calculation mode of the packed word multiplication by utilizing the low bit advantage of data quantization, and realizes a plurality of four-bit multiplications in one DSP. The invention also realizes the compact word multiplication accumulation by utilizing the special structure aiming at the DSP link in the FPGA. The design makes full use of the characteristics of the DSP, improves the utilization efficiency of the DSP and is beneficial to the optimization of the parallelism and the energy efficiency ratio of the system.
The invention also provides a convolution neural network-oriented DSP packed word multiplication system, which comprises the following modules:
module M1: packing the weight and the input activation value respectively; the weight is two 4bits and the input activation value is two 4 bits.
Module M2: taking packed word form as the operand of DSP; the number of operands is two.
Module M3: using DSP to complete multiplication operation at the same time; four multiplication operations are performed using one DSP.
Module M4: the calculation result of the multiplication operation is extracted from the output result of the DSP.
The system is used for realizing the high-efficiency mapping of multiply-accumulate operation in the convolutional neural network on the FPGA; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2. The calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the multiplication and accumulation operations are completed for multiple times.
The invention designs a compact word multiplication calculation mode based on a digital signal processing unit (DSP) of a Field Programmable Gate Array (FPGA) by utilizing the characteristic of low bit width after model quantization, so as to improve the energy efficiency ratio of the convolutional neural network deployed at the edge end; the characteristics of the DSP are fully utilized, the utilization efficiency of the DSP is improved, and the optimization of the energy efficiency ratio of the system is facilitated; the provided convolution operation link can make full use of an optimization circuit inside the FPGA, facilitates layout and wiring, and is beneficial to improving performance and power consumption.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A convolutional neural network-oriented DSP packed word multiplication method, comprising the steps of:
step S1: respectively packaging four inputs, two weights and two input activation values of the multiply-accumulate unit through a shift-addition module;
step S2: taking packed word form as the operand of DSP;
step S3: using DSP to complete multiplication operation at the same time;
step S4: extracting the calculation result of the multiplication operation from the output result of the DSP, and completing the convolution multiplication and accumulation four partial sums; and performing further accumulation operation on the partial sums to complete the complete convolution operation.
2. The convolutional neural network-oriented DSP packed word multiplication method as claimed in claim 1, wherein the weight in step S1 is two 4bits, and the input activation value is two 4 bits.
3. The convolutional neural network-oriented DSP packed word multiplication method of claim 1, wherein the number of operands in the step S2 is two.
4. The convolutional neural network-oriented DSP packed word multiplication method of claim 1, wherein four multiplication operations are performed in step S3 using one DSP.
5. The convolutional neural network-oriented DSP packed word multiplication method as claimed in claim 1, wherein the method is used for realizing efficient mapping of multiply-accumulate operations on FPGA in convolutional neural network; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2.
6. The convolutional neural network-oriented DSP packed word multiplication method of claim 1, wherein the calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the multiplication and accumulation operations are completed for multiple times.
7. A convolutional neural network-oriented DSP packed word multiplication system, comprising:
module M1: packing the weight and the input activation value respectively;
module M2: taking packed word form as the operand of DSP;
module M3: using DSP to complete multiplication operation at the same time;
module M4: the calculation result of the multiplication operation is extracted from the output result of the DSP.
8. The convolutional neural network-oriented DSP packed word multiplication system of claim 7, wherein the weight in the module M1 is two 4bits, and the input activation value is two 4 bits;
the number of the operands in the module M2 is two;
four multiplication operations are performed in the module M3 using one DSP.
9. The convolutional neural network oriented DSP packed word multiplication system of claim 7, wherein the system is configured to implement efficient mapping of multiply-accumulate operations on FPGAs in convolutional neural networks; the same input activation value needs to be multiplied by two different weights, the input activation value is regarded as an output channel to be parallel, and the parallelism is 2; the same weight needs to be multiplied by two different activation values, and is regarded as the parallel of the convolution windows, and the parallelism is also 2.
10. The convolutional neural network-oriented DSP packed word multiplication system of claim 1, wherein the calculation result of each multiplication occupies 11bits in the packed word product, and the extraction of the calculation result is performed after the multiplication and accumulation operations are completed for multiple times.
CN202110802058.8A 2021-07-15 2021-07-15 Convolution neural network-oriented DSP compact word multiplication method and system Active CN113568597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110802058.8A CN113568597B (en) 2021-07-15 2021-07-15 Convolution neural network-oriented DSP compact word multiplication method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110802058.8A CN113568597B (en) 2021-07-15 2021-07-15 Convolution neural network-oriented DSP compact word multiplication method and system

Publications (2)

Publication Number Publication Date
CN113568597A true CN113568597A (en) 2021-10-29
CN113568597B CN113568597B (en) 2024-07-26

Family

ID=78165006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110802058.8A Active CN113568597B (en) 2021-07-15 2021-07-15 Convolution neural network-oriented DSP compact word multiplication method and system

Country Status (1)

Country Link
CN (1) CN113568597B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06274167A (en) * 1993-03-18 1994-09-30 Casio Comput Co Ltd Device and method for adding effect
CN110555516A (en) * 2019-08-27 2019-12-10 上海交通大学 FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method
CN110780845A (en) * 2019-10-17 2020-02-11 浙江大学 Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof
WO2020215124A1 (en) * 2019-04-26 2020-10-29 The University Of Sydney An improved hardware primitive for implementations of deep neural networks
CN112434801A (en) * 2020-10-30 2021-03-02 西安交通大学 Convolution operation acceleration method for carrying out weight splitting according to bit precision
CN112734020A (en) * 2020-12-28 2021-04-30 中国电子科技集团公司第十五研究所 Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06274167A (en) * 1993-03-18 1994-09-30 Casio Comput Co Ltd Device and method for adding effect
WO2020215124A1 (en) * 2019-04-26 2020-10-29 The University Of Sydney An improved hardware primitive for implementations of deep neural networks
CN110555516A (en) * 2019-08-27 2019-12-10 上海交通大学 FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method
CN110780845A (en) * 2019-10-17 2020-02-11 浙江大学 Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof
CN112434801A (en) * 2020-10-30 2021-03-02 西安交通大学 Convolution operation acceleration method for carrying out weight splitting according to bit precision
CN112734020A (en) * 2020-12-28 2021-04-30 中国电子科技集团公司第十五研究所 Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YONGQUAN SHI, AT EL.: "Fast FPGA-Based Emulation for ReRAM-Enabled Deep Neural Network Accelerator", 《2021 IEEE INTERATIONAL SYMPOSIUM ON CIRUITS AND SYSTEMS》 *
YUNHE WANG, AT EL.: "AdderNet and Ites Minimalist Hardware Ddsign for Energy-Efficient Artificial Intelligence", 《ARXIV:2101.10015V2 [CS.LS]》 *
李永博等: "稀疏卷积神经网络加速器设计", 《微 电 子 学 与 计 算 机》, vol. 37, no. 6, pages 30 - 39 *

Also Published As

Publication number Publication date
CN113568597B (en) 2024-07-26

Similar Documents

Publication Publication Date Title
US20220012593A1 (en) Neural network accelerator and neural network acceleration method based on structured pruning and low-bit quantization
CN111459877B (en) Winograd YOLOv2 target detection model method based on FPGA acceleration
Wang et al. PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks
CN109146067B (en) Policy convolution neural network accelerator based on FPGA
CN107423816B (en) Multi-calculation-precision neural network processing method and system
CN111832719A (en) Fixed point quantization convolution neural network accelerator calculation circuit
CN110543939B (en) Hardware acceleration realization device for convolutional neural network backward training based on FPGA
CN110991631A (en) Neural network acceleration system based on FPGA
CN112434801B (en) Convolution operation acceleration method for carrying out weight splitting according to bit precision
CN110543936B (en) Multi-parallel acceleration method for CNN full-connection layer operation
CN113283587B (en) Winograd convolution operation acceleration method and acceleration module
Xiao et al. FPGA implementation of CNN for handwritten digit recognition
CN115018062A (en) Convolutional neural network accelerator based on FPGA
Véstias et al. A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs
Yang et al. A sparse CNN accelerator for eliminating redundant computations in intra-and inter-convolutional/pooling layers
CN113568597B (en) Convolution neural network-oriented DSP compact word multiplication method and system
Reddy et al. Low Power and Efficient Re-Configurable Multiplier for Accelerator
CN102185585B (en) Lattice type digital filter based on genetic algorithm
Adel et al. Accelerating deep neural networks using FPGA
Jha et al. Performance analysis of single-precision floating-point MAC for deep learning
Kumar et al. Complex multiplier: implementation using efficient algorithms for signal processing application
Alhussain et al. Hardware-efficient template-based deep CNNs accelerator design
CN112836793A (en) Floating point separable convolution calculation accelerating device, system and image processing method
Cruz et al. Extensible hardware inference accelerator for fpga using models from tensorflow lite
Li A single precision floating point multiplier for machine learning hardware acceleration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant