CN112149814A - Convolutional neural network acceleration system based on FPGA - Google Patents

Convolutional neural network acceleration system based on FPGA Download PDF

Info

Publication number
CN112149814A
CN112149814A CN202011009835.5A CN202011009835A CN112149814A CN 112149814 A CN112149814 A CN 112149814A CN 202011009835 A CN202011009835 A CN 202011009835A CN 112149814 A CN112149814 A CN 112149814A
Authority
CN
China
Prior art keywords
module
convolution
fpga
data
pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011009835.5A
Other languages
Chinese (zh)
Inventor
罗中明
周磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202011009835.5A priority Critical patent/CN112149814A/en
Publication of CN112149814A publication Critical patent/CN112149814A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)

Abstract

A convolutional neural network acceleration system based on FPGA. Deep convolutional neural networks are well known to be computationally intensive, with convolution operations accounting for more than 90% of the total operands. The invention comprises a packet data preprocessing module, an FPGA module and a controller, wherein a convolution kernel for performing convolution neural network operation on input data is arranged in the FPGA module, and the data preprocessing module is used for reading corresponding convolution kernel parameters and input characteristic diagrams from a data storage module according to the current calculation stage and preprocessing the convolution kernel parameters and the input characteristic diagrams: the 4-dimensional convolution kernel parameters are arranged into 3 dimensions, and the input characteristic graph is unfolded and copied by using a sliding window, so that the local characteristic graphs in the sliding window correspond to the convolution kernel parameters one by one, and a convolution kernel parameter sequence and a local characteristic graph series which are convenient to calculate directly are obtained. The method is used for the convolution neural network acceleration of the FPGA.

Description

Convolutional neural network acceleration system based on FPGA
Technical Field
The invention relates to a convolutional neural network acceleration system based on an FPGA (field programmable gate array).
Background
In recent years, the use of deep neural networks has grown rapidly, and has had a significant impact on the world's economic and social activities. Deep convolutional neural network technology has received a great deal of attention in many machine learning fields, including speech recognition, natural language processing, and intelligent image processing, and particularly in the field of image recognition, deep convolutional neural networks have achieved some significant results. In these areas, deep convolutional neural networks can achieve accuracy that exceeds that of humans. The superiority of the deep convolutional neural network stems from its ability to extract high-level features from raw data after statistical learning of large amounts of data.
Deep convolutional neural networks are well known to be computationally intensive, with convolution operations accounting for more than 90% of the total operands. These large numbers of calculations are reduced by using the running information and algorithm structure in the convolution calculations, i.e. the work required to reduce the inference to work in order to develop a new round of hot research.
PGA has features of abundant computing resources, high flexibility, high energy efficiency, and the like, and has advantages of programmability, high integration, high speed, high reliability, and the like, compared to conventional digital circuit systems, and has been continuously tried to accelerate neural networks. OpenCL is a heterogeneous computing language based on the conventional C language, can run on acceleration processors such as CPU, GPU, PFGA, and DSP, and has a high language abstraction level, so that a programmer can develop a high-performance application program without knowing hardware circuits and bottom level details, thereby greatly reducing the complexity of a programming process.
Disclosure of Invention
In order to solve the defects of the prior art, the invention aims to provide a convolutional neural network acceleration system based on an FPGA (field programmable gate array) so as to overcome the defects in the prior art.
In order to achieve the above object, the present invention provides an FPGA-based convolutional neural network acceleration system, which includes a data preprocessing module, an FPGA module, and a controller, where the FPGA module is internally provided with a convolution kernel for performing convolutional neural network operation on input data, and the data preprocessing module is configured to read corresponding convolution kernel parameters and input feature maps from a data storage module according to a current calculation stage, and preprocess the convolution kernel parameters and the input feature maps: arranging 4-dimensional convolution kernel parameters into 3 dimensions, expanding and copying an input feature graph by using a sliding window, so that local feature graphs in the sliding window correspond to convolution kernel parameters one by one, and a convolution kernel parameter sequence and a local feature graph series which are convenient to calculate directly are obtained;
the FPGA module comprises a pooling buffer module, a convolution buffer module and a pooling module;
the pooling cache module is connected with an address generator and a volume address generator;
the convolution cache module is connected with an address generator and a pooling address generator;
a Data selection module Data-Mux for selecting Data input to the convolution module is arranged between the pooling buffer module and the convolution module;
a convolution selector Conv-Mux for selecting the pooled module after convolution is arranged between the convolution module and the convolution cache module;
a Pooling-Mux selector for selecting operation after Pooling is arranged between the Pooling module and the Pooling cache module;
the controller is used for controlling the working state of the accelerator and realizing the conversion between the working states.
As a further description of the FPGA-based convolutional neural network acceleration system according to the present invention, preferably, the Data-Mux is connected to an input terminal of an original image address generator for inputting an original image, and the Pooling-Mux is connected to an output terminal.
As a further description of the FPGA-based convolutional neural network acceleration system, preferably, the controller is connected to the Data-Mux, the Conv-Mux, and the Pooling-Mux, and is connected to a Pooling address generator of a convolutional cache module of each path of the operation processing unit, and an output of each Pooling module is connected to the controller.
As a further description of the convolutional neural network acceleration system based on FPGA of the present invention, preferably, the data preprocessing module includes a data transmission sub-module, a convolutional kernel parameter preprocessing sub-module, and a feature map preprocessing sub-module; the data transmission submodule is used for controlling the transmission of the characteristic diagram and the convolution kernel parameter between the data storage module and the convolution neural network computing module; the convolution kernel parameter preprocessing submodule is used for rearranging and sorting the convolution kernel parameters; the characteristic diagram preprocessing submodule is used for expanding, copying and sorting the characteristic diagram.
As a further description of the FPGA-based convolutional neural network acceleration system of the present invention, preferably, the controller is composed of 7 states, which are respectively: waiting, writing a characteristic diagram, writing an input index, writing a convolution kernel, writing a weight index, performing convolution calculation and sending a calculation result, and sending a corresponding control signal to a corresponding sub-module in each state to complete a corresponding function.
The invention has the beneficial effects that:
1. the invention reduces redundant useless calculation and reading of parameter data by using the operation information and the algorithm structure during the convolution calculation, accelerates the convolution neural network by using the FPGA, can improve the real-time performance of the DCNN, realizes higher calculation performance and reduces energy consumption.
2. The invention realizes the convolutional neural network accelerator based on the FPGA, analyzes the convolutional neural network algorithm, adopts a data alignment parallel processing method to realize the parallel processing and transmission of a data layer in order to improve the universality of the architecture design and adapt to various input image sizes.
Drawings
FIG. 1 is a schematic structural diagram of a convolutional neural network acceleration system based on an FPGA;
Detailed Description
To further understand the structure, characteristics and other objects of the present invention, the following detailed description is given with reference to the accompanying preferred embodiments, which are only used to illustrate the technical solutions of the present invention and are not to limit the present invention.
In a first specific embodiment, a convolutional neural network acceleration system based on an FPGA includes a data preprocessing module, an FPGA module, and a controller, where a convolutional kernel for performing convolutional neural network operation on input data is provided in the FPGA module, and the data preprocessing module is configured to read corresponding convolutional kernel parameters and input feature maps from a data storage module according to a current calculation stage, and preprocess the convolutional kernel parameters and the input feature maps: arranging 4-dimensional convolution kernel parameters into 3 dimensions, expanding and copying an input feature graph by using a sliding window, so that local feature graphs in the sliding window correspond to convolution kernel parameters one by one, and a convolution kernel parameter sequence and a local feature graph series which are convenient to calculate directly are obtained;
the FPGA module comprises a pooling buffer module, a convolution buffer module and a pooling module;
the pooling cache module is connected with an address generator and a volume address generator;
the convolution cache module is connected with an address generator and a pooling address generator;
a Data selection module Data-Mux for selecting Data input to the convolution module is arranged between the pooling buffer module and the convolution module;
a convolution selector Conv-Mux for selecting the pooled module after convolution is arranged between the convolution module and the convolution cache module;
a Pooling-Mux selector for selecting operation after Pooling is arranged between the Pooling module and the Pooling cache module;
the controller is used for controlling the working state of the accelerator and realizing the conversion between the working states.
In a second embodiment, the Data-Mux is connected to an input end of an original image address generator for inputting an original image, and the firing-Mux is connected to an output end of the original image address generator.
The upper limit switch is used for limiting the upward movement position of the square ring disc on the lower side, and the lower limit switch is used for limiting the downward movement position of the square ring disc on the lower side.
In a third specific embodiment, the present embodiment is a further description of the FPGA-based convolutional neural network acceleration system described in the first specific embodiment, where the controller is connected to the Data-Mux, the Conv-Mux, and the Pooling-Mux, and is connected to the pooled address generator of the convolutional cache module of each path of the operation processing unit, and an output of each pooled module is connected to the controller.
In a fourth specific embodiment, the present embodiment is a further description of the FPGA-based convolutional neural network acceleration system in the first specific embodiment, where the data preprocessing module includes a data transmission sub-module, a convolution kernel parameter preprocessing sub-module, and a feature map preprocessing sub-module; the data transmission submodule is used for controlling the transmission of the characteristic diagram and the convolution kernel parameter between the data storage module and the convolution neural network computing module; the convolution kernel parameter preprocessing submodule is used for rearranging and sorting the convolution kernel parameters; the characteristic diagram preprocessing submodule is used for expanding, copying and sorting the characteristic diagram.
In a fifth embodiment, the present embodiment is a further description of the convolutional neural network acceleration system based on FPGA according to the first embodiment, where the controller is composed of 7 states, and each of the states is: waiting, writing a characteristic diagram, writing an input index, writing a convolution kernel, writing a weight index, performing convolution calculation and sending a calculation result, and sending a corresponding control signal to a corresponding sub-module in each state to complete a corresponding function.
It should be noted that the above summary and the detailed description are intended to demonstrate the practical application of the technical solutions provided by the present invention, and should not be construed as limiting the scope of the present invention. Various modifications, equivalent substitutions, or improvements may be made by those skilled in the art within the spirit and principles of the invention. The scope of the invention is to be determined by the appended claims.

Claims (5)

1. The convolutional neural network acceleration system based on the FPGA is characterized by comprising a data preprocessing module, an FPGA module and a controller, wherein a convolutional kernel for carrying out convolutional neural network operation on input data is arranged in the FPGA module, and the data preprocessing module is used for reading corresponding convolutional kernel parameters and input feature maps from a data storage module according to the current calculation stage and preprocessing the convolutional kernel parameters and the input feature maps: arranging 4-dimensional convolution kernel parameters into 3 dimensions, expanding and copying an input feature graph by using a sliding window, so that local feature graphs in the sliding window correspond to convolution kernel parameters one by one, and a convolution kernel parameter sequence and a local feature graph series which are convenient to calculate directly are obtained;
the FPGA module comprises a pooling buffer module, a convolution buffer module and a pooling module;
the pooling cache module is connected with an address generator and a volume address generator;
the convolution cache module is connected with an address generator and a pooling address generator;
a Data selection module Data-Mux for selecting Data input to the convolution module is arranged between the pooling buffer module and the convolution module;
a convolution selector Conv-Mux for selecting the pooled module after convolution is arranged between the convolution module and the convolution cache module;
a Pooling-Mux selector for selecting operation after Pooling is arranged between the Pooling module and the Pooling cache module;
the controller is used for controlling the working state of the accelerator and realizing the conversion between the working states.
2. The FPGA-based convolutional neural network acceleration system of claim 1, wherein said Data-Mux is connected to an input of an original image address generator for inputting an original image, said firing-Mux being connected to an output.
3. The system according to claim 2, wherein the controller is connected to the Data-Mux, the Conv-Mux, and the Pooling address generator of the convolution buffer module connected to each path of the arithmetic processing unit, and an output of each Pooling module is connected to the controller.
4. The FPGA-based convolutional neural network acceleration system of claim 3, wherein the data preprocessing module comprises a data transmission sub-module, a convolutional kernel parameter preprocessing sub-module and a feature map preprocessing sub-module; the data transmission submodule is used for controlling the transmission of the characteristic diagram and the convolution kernel parameter between the data storage module and the convolution neural network computing module; the convolution kernel parameter preprocessing submodule is used for rearranging and sorting the convolution kernel parameters; the characteristic diagram preprocessing submodule is used for expanding, copying and sorting the characteristic diagram.
5. The FPGA-based convolutional neural network acceleration system of claim 4, wherein the controller is composed of 7 states, respectively: waiting, writing a characteristic diagram, writing an input index, writing a convolution kernel, writing a weight index, performing convolution calculation and sending a calculation result, and sending a corresponding control signal to a corresponding sub-module in each state to complete a corresponding function.
CN202011009835.5A 2020-09-23 2020-09-23 Convolutional neural network acceleration system based on FPGA Pending CN112149814A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011009835.5A CN112149814A (en) 2020-09-23 2020-09-23 Convolutional neural network acceleration system based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011009835.5A CN112149814A (en) 2020-09-23 2020-09-23 Convolutional neural network acceleration system based on FPGA

Publications (1)

Publication Number Publication Date
CN112149814A true CN112149814A (en) 2020-12-29

Family

ID=73896180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011009835.5A Pending CN112149814A (en) 2020-09-23 2020-09-23 Convolutional neural network acceleration system based on FPGA

Country Status (1)

Country Link
CN (1) CN112149814A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344179A (en) * 2021-05-31 2021-09-03 哈尔滨理工大学 IP core of binary convolution neural network algorithm based on FPGA
CN114327676A (en) * 2021-12-28 2022-04-12 北京航天自动控制研究所 High-reliability accelerator for convolutional neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875012A (en) * 2017-02-09 2017-06-20 武汉魅瞳科技有限公司 A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA
CN109086867A (en) * 2018-07-02 2018-12-25 武汉魅瞳科技有限公司 A kind of convolutional neural networks acceleration system based on FPGA
CN109598338A (en) * 2018-12-07 2019-04-09 东南大学 A kind of convolutional neural networks accelerator of the calculation optimization based on FPGA
CN109740748A (en) * 2019-01-08 2019-05-10 西安邮电大学 A kind of convolutional neural networks accelerator based on FPGA

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875012A (en) * 2017-02-09 2017-06-20 武汉魅瞳科技有限公司 A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA
CN109086867A (en) * 2018-07-02 2018-12-25 武汉魅瞳科技有限公司 A kind of convolutional neural networks acceleration system based on FPGA
CN109598338A (en) * 2018-12-07 2019-04-09 东南大学 A kind of convolutional neural networks accelerator of the calculation optimization based on FPGA
CN109740748A (en) * 2019-01-08 2019-05-10 西安邮电大学 A kind of convolutional neural networks accelerator based on FPGA

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344179A (en) * 2021-05-31 2021-09-03 哈尔滨理工大学 IP core of binary convolution neural network algorithm based on FPGA
CN113344179B (en) * 2021-05-31 2022-06-14 哈尔滨理工大学 IP core of binary convolution neural network algorithm based on FPGA
CN114327676A (en) * 2021-12-28 2022-04-12 北京航天自动控制研究所 High-reliability accelerator for convolutional neural network

Similar Documents

Publication Publication Date Title
Chen et al. Embedded system real-time vehicle detection based on improved YOLO network
CN111488983A (en) Lightweight CNN model calculation accelerator based on FPGA
CN108665063B (en) Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator
CN111967468A (en) FPGA-based lightweight target detection neural network implementation method
CN106228240A (en) Degree of depth convolutional neural networks implementation method based on FPGA
CN112149814A (en) Convolutional neural network acceleration system based on FPGA
CN109472734B (en) Target detection network based on FPGA and implementation method thereof
Sun et al. A high-performance accelerator for large-scale convolutional neural networks
CN113792621B (en) FPGA-based target detection accelerator design method
Qian et al. R-cnn object detection inference with deep learning accelerator
Shi et al. Design of parallel acceleration method of convolutional neural network based on fpga
CN116822600A (en) Neural network search chip based on RISC-V architecture
Liu et al. CASSANN-v2: A high-performance CNN accelerator architecture with on-chip memory self-adaptive tuning
Yu et al. Optimizing FPGA-based convolutional encoder-decoder architecture for semantic segmentation
Adel et al. Accelerating deep neural networks using FPGA
Fang et al. A sort-less FPGA-based non-maximum suppression accelerator using multi-thread computing and binary max engine for object detection
CN115640772A (en) Neighborhood connected heterogeneous design method based on self-adaptive chip
CN114595813A (en) Heterogeneous acceleration processor and data calculation method
Wen FPGA-Based Deep Convolutional Neural Network Optimization Method
Bai et al. An OpenCL-based FPGA accelerator with the Winograd’s minimal filtering algorithm for convolution neuron networks
Wang et al. The inference operation optimization of an improved LeNet-5 convolutional neural network and its FPGA hardware implementation
CN109003222B (en) Asynchronous energy-efficient graph calculation accelerator
Zhang et al. Design of a Convolutional Neural Network Accelerator based on PYNQ
CN112330524B (en) Device and method for quickly realizing convolution in image tracking system
CN111860781A (en) Convolutional neural network feature decoding system realized based on FPGA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201229