CN112149814A

CN112149814A - Convolutional neural network acceleration system based on FPGA

Info

Publication number: CN112149814A
Application number: CN202011009835.5A
Authority: CN
Inventors: 罗中明; 周磊
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2020-12-29

Abstract

A convolutional neural network acceleration system based on FPGA. Deep convolutional neural networks are well known to be computationally intensive, with convolution operations accounting for more than 90% of the total operands. The invention comprises a packet data preprocessing module, an FPGA module and a controller, wherein a convolution kernel for performing convolution neural network operation on input data is arranged in the FPGA module, and the data preprocessing module is used for reading corresponding convolution kernel parameters and input characteristic diagrams from a data storage module according to the current calculation stage and preprocessing the convolution kernel parameters and the input characteristic diagrams: the 4-dimensional convolution kernel parameters are arranged into 3 dimensions, and the input characteristic graph is unfolded and copied by using a sliding window, so that the local characteristic graphs in the sliding window correspond to the convolution kernel parameters one by one, and a convolution kernel parameter sequence and a local characteristic graph series which are convenient to calculate directly are obtained. The method is used for the convolution neural network acceleration of the FPGA.

Description

Convolutional neural network acceleration system based on FPGA

Technical Field

The invention relates to a convolutional neural network acceleration system based on an FPGA (field programmable gate array).

Background

In recent years, the use of deep neural networks has grown rapidly, and has had a significant impact on the world's economic and social activities. Deep convolutional neural network technology has received a great deal of attention in many machine learning fields, including speech recognition, natural language processing, and intelligent image processing, and particularly in the field of image recognition, deep convolutional neural networks have achieved some significant results. In these areas, deep convolutional neural networks can achieve accuracy that exceeds that of humans. The superiority of the deep convolutional neural network stems from its ability to extract high-level features from raw data after statistical learning of large amounts of data.

Deep convolutional neural networks are well known to be computationally intensive, with convolution operations accounting for more than 90% of the total operands. These large numbers of calculations are reduced by using the running information and algorithm structure in the convolution calculations, i.e. the work required to reduce the inference to work in order to develop a new round of hot research.

PGA has features of abundant computing resources, high flexibility, high energy efficiency, and the like, and has advantages of programmability, high integration, high speed, high reliability, and the like, compared to conventional digital circuit systems, and has been continuously tried to accelerate neural networks. OpenCL is a heterogeneous computing language based on the conventional C language, can run on acceleration processors such as CPU, GPU, PFGA, and DSP, and has a high language abstraction level, so that a programmer can develop a high-performance application program without knowing hardware circuits and bottom level details, thereby greatly reducing the complexity of a programming process.

Disclosure of Invention

In order to solve the defects of the prior art, the invention aims to provide a convolutional neural network acceleration system based on an FPGA (field programmable gate array) so as to overcome the defects in the prior art.

In order to achieve the above object, the present invention provides an FPGA-based convolutional neural network acceleration system, which includes a data preprocessing module, an FPGA module, and a controller, where the FPGA module is internally provided with a convolution kernel for performing convolutional neural network operation on input data, and the data preprocessing module is configured to read corresponding convolution kernel parameters and input feature maps from a data storage module according to a current calculation stage, and preprocess the convolution kernel parameters and the input feature maps: arranging 4-dimensional convolution kernel parameters into 3 dimensions, expanding and copying an input feature graph by using a sliding window, so that local feature graphs in the sliding window correspond to convolution kernel parameters one by one, and a convolution kernel parameter sequence and a local feature graph series which are convenient to calculate directly are obtained;

the FPGA module comprises a pooling buffer module, a convolution buffer module and a pooling module;

the pooling cache module is connected with an address generator and a volume address generator;

the convolution cache module is connected with an address generator and a pooling address generator;

a Data selection module Data-Mux for selecting Data input to the convolution module is arranged between the pooling buffer module and the convolution module;

a convolution selector Conv-Mux for selecting the pooled module after convolution is arranged between the convolution module and the convolution cache module;

a Pooling-Mux selector for selecting operation after Pooling is arranged between the Pooling module and the Pooling cache module;

the controller is used for controlling the working state of the accelerator and realizing the conversion between the working states.

As a further description of the FPGA-based convolutional neural network acceleration system according to the present invention, preferably, the Data-Mux is connected to an input terminal of an original image address generator for inputting an original image, and the Pooling-Mux is connected to an output terminal.

As a further description of the FPGA-based convolutional neural network acceleration system, preferably, the controller is connected to the Data-Mux, the Conv-Mux, and the Pooling-Mux, and is connected to a Pooling address generator of a convolutional cache module of each path of the operation processing unit, and an output of each Pooling module is connected to the controller.

As a further description of the convolutional neural network acceleration system based on FPGA of the present invention, preferably, the data preprocessing module includes a data transmission sub-module, a convolutional kernel parameter preprocessing sub-module, and a feature map preprocessing sub-module; the data transmission submodule is used for controlling the transmission of the characteristic diagram and the convolution kernel parameter between the data storage module and the convolution neural network computing module; the convolution kernel parameter preprocessing submodule is used for rearranging and sorting the convolution kernel parameters; the characteristic diagram preprocessing submodule is used for expanding, copying and sorting the characteristic diagram.

As a further description of the FPGA-based convolutional neural network acceleration system of the present invention, preferably, the controller is composed of 7 states, which are respectively: waiting, writing a characteristic diagram, writing an input index, writing a convolution kernel, writing a weight index, performing convolution calculation and sending a calculation result, and sending a corresponding control signal to a corresponding sub-module in each state to complete a corresponding function.

The invention has the beneficial effects that:

1. the invention reduces redundant useless calculation and reading of parameter data by using the operation information and the algorithm structure during the convolution calculation, accelerates the convolution neural network by using the FPGA, can improve the real-time performance of the DCNN, realizes higher calculation performance and reduces energy consumption.

2. The invention realizes the convolutional neural network accelerator based on the FPGA, analyzes the convolutional neural network algorithm, adopts a data alignment parallel processing method to realize the parallel processing and transmission of a data layer in order to improve the universality of the architecture design and adapt to various input image sizes.

Drawings

FIG. 1 is a schematic structural diagram of a convolutional neural network acceleration system based on an FPGA;

Detailed Description

To further understand the structure, characteristics and other objects of the present invention, the following detailed description is given with reference to the accompanying preferred embodiments, which are only used to illustrate the technical solutions of the present invention and are not to limit the present invention.

In a first specific embodiment, a convolutional neural network acceleration system based on an FPGA includes a data preprocessing module, an FPGA module, and a controller, where a convolutional kernel for performing convolutional neural network operation on input data is provided in the FPGA module, and the data preprocessing module is configured to read corresponding convolutional kernel parameters and input feature maps from a data storage module according to a current calculation stage, and preprocess the convolutional kernel parameters and the input feature maps: arranging 4-dimensional convolution kernel parameters into 3 dimensions, expanding and copying an input feature graph by using a sliding window, so that local feature graphs in the sliding window correspond to convolution kernel parameters one by one, and a convolution kernel parameter sequence and a local feature graph series which are convenient to calculate directly are obtained;

In a second embodiment, the Data-Mux is connected to an input end of an original image address generator for inputting an original image, and the firing-Mux is connected to an output end of the original image address generator.

The upper limit switch is used for limiting the upward movement position of the square ring disc on the lower side, and the lower limit switch is used for limiting the downward movement position of the square ring disc on the lower side.

In a third specific embodiment, the present embodiment is a further description of the FPGA-based convolutional neural network acceleration system described in the first specific embodiment, where the controller is connected to the Data-Mux, the Conv-Mux, and the Pooling-Mux, and is connected to the pooled address generator of the convolutional cache module of each path of the operation processing unit, and an output of each pooled module is connected to the controller.

In a fourth specific embodiment, the present embodiment is a further description of the FPGA-based convolutional neural network acceleration system in the first specific embodiment, where the data preprocessing module includes a data transmission sub-module, a convolution kernel parameter preprocessing sub-module, and a feature map preprocessing sub-module; the data transmission submodule is used for controlling the transmission of the characteristic diagram and the convolution kernel parameter between the data storage module and the convolution neural network computing module; the convolution kernel parameter preprocessing submodule is used for rearranging and sorting the convolution kernel parameters; the characteristic diagram preprocessing submodule is used for expanding, copying and sorting the characteristic diagram.

In a fifth embodiment, the present embodiment is a further description of the convolutional neural network acceleration system based on FPGA according to the first embodiment, where the controller is composed of 7 states, and each of the states is: waiting, writing a characteristic diagram, writing an input index, writing a convolution kernel, writing a weight index, performing convolution calculation and sending a calculation result, and sending a corresponding control signal to a corresponding sub-module in each state to complete a corresponding function.

It should be noted that the above summary and the detailed description are intended to demonstrate the practical application of the technical solutions provided by the present invention, and should not be construed as limiting the scope of the present invention. Various modifications, equivalent substitutions, or improvements may be made by those skilled in the art within the spirit and principles of the invention. The scope of the invention is to be determined by the appended claims.

Claims

1. The convolutional neural network acceleration system based on the FPGA is characterized by comprising a data preprocessing module, an FPGA module and a controller, wherein a convolutional kernel for carrying out convolutional neural network operation on input data is arranged in the FPGA module, and the data preprocessing module is used for reading corresponding convolutional kernel parameters and input feature maps from a data storage module according to the current calculation stage and preprocessing the convolutional kernel parameters and the input feature maps: arranging 4-dimensional convolution kernel parameters into 3 dimensions, expanding and copying an input feature graph by using a sliding window, so that local feature graphs in the sliding window correspond to convolution kernel parameters one by one, and a convolution kernel parameter sequence and a local feature graph series which are convenient to calculate directly are obtained;

2. The FPGA-based convolutional neural network acceleration system of claim 1, wherein said Data-Mux is connected to an input of an original image address generator for inputting an original image, said firing-Mux being connected to an output.

3. The system according to claim 2, wherein the controller is connected to the Data-Mux, the Conv-Mux, and the Pooling address generator of the convolution buffer module connected to each path of the arithmetic processing unit, and an output of each Pooling module is connected to the controller.

4. The FPGA-based convolutional neural network acceleration system of claim 3, wherein the data preprocessing module comprises a data transmission sub-module, a convolutional kernel parameter preprocessing sub-module and a feature map preprocessing sub-module; the data transmission submodule is used for controlling the transmission of the characteristic diagram and the convolution kernel parameter between the data storage module and the convolution neural network computing module; the convolution kernel parameter preprocessing submodule is used for rearranging and sorting the convolution kernel parameters; the characteristic diagram preprocessing submodule is used for expanding, copying and sorting the characteristic diagram.

5. The FPGA-based convolutional neural network acceleration system of claim 4, wherein the controller is composed of 7 states, respectively: waiting, writing a characteristic diagram, writing an input index, writing a convolution kernel, writing a weight index, performing convolution calculation and sending a calculation result, and sending a corresponding control signal to a corresponding sub-module in each state to complete a corresponding function.