CN109800867B

CN109800867B - Data calling method based on FPGA off-chip memory

Info

Publication number: CN109800867B
Application number: CN201811545237.2A
Authority: CN
Inventors: 龙腾; 魏鑫; 陈禾; 陈磊; 陈亮
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2020-09-29
Anticipated expiration: 2038-12-17
Also published as: CN109800867A

Abstract

The invention provides a data calling method based on an FPGA off-chip memory, which is characterized in that data of a feature diagram are sequentially stored in fifo row by row, every time of read-write operation, the first M fifo outputs the currently stored first data, the second M fifo writes the currently stored first data back to the data tail of fifo storage corresponding to the number L-M smaller than the number of the first data, meanwhile, the first data of the line L +1 of the feature diagram is written into the data tail of the line L-1 fifo storage, the first data of the line L +2 of the feature diagram is written into the data tail of the line L fifo storage, and therefore when the fifo continuously outputs the data out of a fifo group in sequence, the rest data of the feature diagram are written into the fifo group in sequence to wait to be read until the data traversal of the whole feature diagram is completed; therefore, the invention does not directly call the data of the FPGA off-chip memory, avoids complex address jump and greatly improves the efficiency of calling the data of the FPGA off-chip memory.

Description

Data calling method based on FPGA off-chip memory

Technical Field

The invention belongs to the technical field of image classification and identification, and particularly relates to a data calling method based on an FPGA (field programmable gate array) off-chip memory.

Background

In recent five years, the convolutional neural network has achieved good effects in the fields of image feature extraction, classification and identification and the like. Because the convolutional neural network architecture is flexible and changeable, the conventional convolutional neural network is mainly realized by software platforms such as a CPU (central processing unit), a GPU (graphics processing unit) and the like. However, in the current engineering application, the requirements for the real-time performance and the low power consumption of the system are more and more outstanding, so that the calculation of the convolutional neural network is accelerated by using a hardware platform to achieve the purpose of reducing the power consumption of the system, and the method becomes a hot research problem of the convolutional neural network in the engineering application. One of the promising solutions is the Field Programmable Gate Array (FPGA). However, on-chip storage resources of the FPGA hardly satisfy storage of image data, parameters, and intermediate results in the convolutional neural network, and therefore, when the computation of the convolutional neural network is accelerated by using the FPGA, off-chip storage resources of the FPGA need to be called to satisfy storage requirements of the system. Therefore, the problem of researching the reasonable calling of the off-chip storage of the FPGA becomes the focus of the current research.

Due to reasonable calling of FPGA off-chip storage, the convolutional neural network computing unit designed based on FPGA can fully exert the characteristics of parallel computing in the convolutional neural network algorithm, accelerate convolutional computing to the maximum extent and improve the throughput of the system. Therefore, the calling optimization problem of the off-chip storage of the FPGA has become one of the important research directions for realizing the accelerated development of the computation on the FPGA by the convolutional neural network in the future.

In the existing optimization method for calling the FPGA off-chip storage, the storage is optimized mainly based on the structure of the FPGA off-chip storage. The data in the off-chip memory of the FPGA is stored in different banks, and the main idea of the current method is to store different input characteristic diagram data in the convolutional neural network in different banks of the off-chip memory of the FPGA as much as possible.

Disclosure of Invention

In order to solve the problems, the invention provides a data calling method based on an FPGA off-chip memory, which does not directly call the data of the FPGA off-chip memory, avoids complex address jump and greatly improves the efficiency of calling the data of the FPGA off-chip memory.

A data calling method based on an FPGA off-chip memory is applied to a convolutional neural network and comprises the following steps:

s1: the method comprises the following steps of setting fifo groups in an FPGA on-chip memory, wherein each fifo group comprises L fifos, numbering the fifos in sequence from 1 to L, and determining the number M of the fifos which need to output data out of the fifo groups at the same time, specifically:

L＝2×kernel+Stride×(N-2)

M＝kernel+Stride×(N-1)

wherein, kernel is the size of a preset convolution kernel, Stride is the step length of a sliding window adopted in convolution calculation, and N is the group number of the sliding window data which needs to be generated simultaneously, wherein N is more than or equal to 2;

s2: storing the front L rows of data in the feature map in the FPGA off-chip memory into fifo groups line by line, wherein each fifo stores a row of data of the feature map, and the depth of the fifo is greater than the size of the feature map;

s3: performing read-write operation on each fifo in the fifo group, wherein the read-write operation specifically comprises:

for the first M fifo's, each fifo stores the first data output out of fifo group as the sliding window data of the convolutional neural network, and the second data becomes the first data; for the last M fifo's, the first data of each fifo memory is written into the data tail of the fifo memory corresponding to the number L-M smaller than the number, meanwhile, the first data of the L +1 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, the first data of the L +2 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, and the updating of each fifo in the fifo group is completed;

s4: and repeating the step S3, and performing read-write operation on each fifo in the updated fifo group again until the traversal of all data of the feature map is completed.

Has the advantages that:

the invention provides a data calling method based on an FPGA off-chip memory, which is characterized in that data of a feature diagram are sequentially stored in fifo line by line, M fifo outputs the first currently stored data before each read-write operation, M fifo writes the first currently stored data back to the tail of the fifo stored data corresponding to the number L-M smaller than the number of the fifo, meanwhile, the first data of the line L +1 of the feature diagram is written into the tail of the data stored in the line L-1 of the fifo, and the first data of the line L +2 of the feature diagram is written into the tail of the data stored in the line L fifo, so that when the fifo continuously outputs the data out of the fifo group in sequence, the rest data of the feature diagram is written into the fifo group in sequence and waits to be read until the data traversal of the whole feature diagram is completed; therefore, the fifo group is constructed in the FPGA chip by utilizing the fifo, and according to the data sequence requirement required by convolution calculation, the data of the whole characteristic diagram stored in the FPGA off-chip memory are output to the convolution calculation unit outside the group one by each fifo, so that the data of the FPGA off-chip memory are not directly called in the process of calling the data from the FPGA off-chip memory to the FPGA on-chip memory, the complex address jump is avoided, and the efficiency of calling the data of the FPGA off-chip memory is greatly improved.

Drawings

FIG. 1 is a flow chart of a data calling method based on an FPGA off-chip memory according to the present invention;

FIG. 2 is a schematic diagram of data storage of each fifo in the fifo group when no read-write operation is performed according to the present invention;

FIG. 3 is a schematic diagram of data storage of each fifo in the fifo group after the first read-write operation is performed according to the present invention;

FIG. 4 is a schematic diagram of data storage of each fifo in the fifo group after the second read-write operation is performed according to the present invention;

fig. 5 is a schematic diagram of data storage of each fifo in the fifo group after the third read-write operation is performed according to the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Example one

Referring to fig. 1, the figure is a flowchart of a data calling method based on an FPGA off-chip memory according to this embodiment. A data calling method based on FPGA off-chip memory is applied to a convolutional neural network, in particular to a process of extracting data of a characteristic diagram with the size of S multiplied by S in a sliding window mode in the calculation of the convolutional neural network, and comprises the following steps:

s1: the method comprises the steps of setting a fifo group in an FPGA on-chip memory, wherein the fifo group comprises L fifos (first input first output, first output queue), numbering the fifos in sequence from 1 to L, and determining the number M of the fifos which need to output data outside the fifo group at the same time, specifically:

L＝2×kernel+Stride×(N-2) (1)

M＝kernel+Stride×(N-1) (2)

wherein, kernel is the size of a preset convolution kernel, Stride is the step length of a sliding window adopted in convolution calculation, and N is the number of groups of sliding window data which need to be generated simultaneously, wherein N is more than or equal to 2.

It should be noted that, in a computer, the fifo queue is a traditional sequential execution method, and an instruction that enters first completes and retires first, and then executes a second instruction.

S2: and storing the first L rows of data in the feature map in the FPGA off-chip memory into fifo groups line by line, wherein each fifo stores one row of data of the feature map, and the depth of the fifo is greater than the size S of the feature map.

for the first M fifo's from the front, the first data stored in each fifo is output as the sliding window data of the convolutional neural network, for the last M fifo's from the back, the first data stored in each fifo is written into the data tail of the fifo storage corresponding to the number L-M smaller than the number, at the same time, the first data of the L +1 th line of the characteristic diagram is written into the data tail of the L-1 th fifo storage, the first data of the L +2 th line of the characteristic diagram is written into the data tail of the L-1 th fifo storage, and the updating of each fifo in the fifo group is completed.

It should be noted that, for the last M fifo's, the first data stored in each fifo is written back to the fifo stored data trailer corresponding to the number L-M smaller than the number, that is, the first data in the L-M +1 fifo is written back to the first fifo stored data trailer, the first data in the L-M +2 fifo is written back to the second fifo stored data trailer, and so on until the first data in the L fifo is written back to the M fifo stored data trailer.

It should be noted that, in the physical storage of the actual FPGA on-chip memory, after the first data currently stored in each fifo is output outside the fifo group, because the fifos follow the first-in first-out storage policy, the data stored in each fifo will move its storage position forward by one bit in sequence, that is, the second data becomes the first data, the third data becomes the second data, and so on, until the last bit is empty, the last M fifos that are counted backwards cannot be written, the first data stored in each fifo is written into the data tail of the fifo storage corresponding to the number L-M smaller than the number thereof, and at the same time, the first data in the L +1 th line of the feature map is written into the data tail of the L-1 th fifo storage, and the first data in the L +2 th line of the feature map is written into the data tail of the L-th fifo storage.

Example two

Based on the above embodiments, this embodiment describes an FPGA off-chip memory calling method in detail by taking as an example that the size of the feature map is 15 × 15, the size of the convolution kernel is 3 × 3, the step size Stride of the sliding window in the convolution calculation of the feature map is 1, and the number of the convolution calculation units on the FPGA chip is 2, that is, the number N of the sliding windows that need to be processed simultaneously is 2.

Step one, determining the number L of fifo in the fifo group

Determining the number L of fifos in each fifo group according to three parameters, namely the size (kernel) of a convolution kernel in convolution calculation, the step length (Stride) of a sliding window in the convolution calculation of the feature map and the number (N) of convolution calculation units on the FPGA chip, and satisfying the following formula:

L＝2×3+1×(2-2)＝6

that is, there are 6 fifo in the fifo group.

Step two, determining the number M of fifo needing to output data out of fifo group at the same time

According to the assumed situation, the size (kernel) of the convolution kernel is 3, the step length (Stride) of the sliding window during convolution calculation of the feature map is 1, and the number (N) of convolution calculation units on the FPGA chip is 2, so that it can be determined that each fifo group needs to output data in M fifos at the same time according to the three parameters, and the following formula is satisfied:

M＝3+(2-1)×1＝4

that is, each fifo read-write operation requires that 4 fifo's of data be output outside the fifo group at the same time.

Step three, determining the depth of fifo

According to the formula: depth ≧ size can be known, and the depth of each fifo can be selected as 16.

And step four, storing the first 6 rows of data in the feature map into fifo groups line by line, wherein each fifo stores one row of data of the feature map.

Referring to fig. 2, this figure is a schematic diagram of data storage of each fifo in the fifo group when no read/write operation is performed. Wherein the serial numbers of the fifo in the fifo group are 1-6 from top to bottom in sequence. Assuming that the data numbers of the first eight rows of the input feature map are 1 to 120 respectively, before fifo read-write operation is performed, the data of the rows 1 to 6 of the input feature map are written into fifo in fifo groups respectively, wherein fifo with number 1 writes the data of the first row, fifo with number 2 writes the data of the second row, and so on, and the data storage condition of each fifo in fifo groups is as shown in fig. 2.

Referring to fig. 3, this is a schematic diagram of data storage of each fifo in the fifo group after the first read-write operation is performed in this embodiment. Outputting first data stored in 4 fifo with the serial numbers of 1 to 4 out of the fifo groups, namely outputting 4 data with the serial numbers of 1,16,31 and 46 out of the fifo groups at the same time, and storing the data in two convolution calculation units on an FPGA slice; the first data stored in the 4 fifo with the numbers of 3 to 6 is written in the data tail of the fifo storage with the number corresponding to the number 2 smaller than the number, wherein the first data 31 stored in the fifo with the number 3 is written in the data tail of the fifo storage with the number 1, the first data 46 stored in the fifo with the number 4 is written in the data tail of the fifo storage with the number 2, the first data 61 stored in the fifo with the number 5 is written in the data tail of the fifo storage with the number 3, and the first data 76 stored in the fifo with the number 6 is written in the data tail of the fifo storage with the number 4; meanwhile, the first data 91 of the line 7 of the feature map is written into the data trailer of the fifo storage with the number 5, and the first data 106 of the line 8 of the feature map is written into the data trailer of the fifo storage with the number 6, so that the updating of each fifo in the fifo group is completed, as shown in fig. 3.

Referring to fig. 4 and fig. 5, schematic diagrams of data storage of each fifo in the fifo group after performing the second and third read-write operations are provided in this embodiment, respectively. The data transfer mode of each fifo storage in the fifo group is similar to the first read-write operation, and this embodiment will not be described in detail.

Therefore, according to the method for calling the FPGA off-chip memory provided by this embodiment, data of the feature diagram is sequentially stored in fifo line by line, M fifo outputs the first currently stored data before each read-write operation, M fifo writes the first currently stored data back to the data tail of fifo storage corresponding to the number L-M smaller than the number, meanwhile, the first data of the L +1 th line of the feature diagram is written into the data tail of L-1 th fifo storage, and the first data of the L +2 th line of the feature diagram is written into the data tail of L fifo storage, so that when fifo continuously outputs the data fifo group in sequence, the remaining data of the feature diagram is written into the fifo group in sequence for waiting to be read until the data traversal of the whole feature diagram is completed; therefore, in the embodiment, fifo groups are constructed in the FPGA chip by using fifo, and according to the data sequence requirement required by convolution calculation, each fifo outputs the data of the whole feature map to the convolution calculation unit outside the group one by one, wherein the convolution calculation unit also belongs to the FPGA chip memory, so that in the process of calling the data from the FPGA chip memory to the FPGA chip memory, the data of the FPGA chip memory is not directly called, so that complex address jump is avoided, and the efficiency of calling the data of the FPGA chip memory is greatly improved.

In addition, the existing calling optimization method of the FPGA off-chip memory is easily influenced by the number of input feature maps in convolution calculation, and when the number of the input feature maps is larger than the bank number of the off-chip memory, the problem of address hopping access can be encountered.

Furthermore, the existing calling optimization method for the FPGA off-chip memory is difficult to meet the requirement that the convolution neural network calculation needs to flexibly configure the convolution calculation data input according to different convolution kernel sizes, different feature map sliding window step lengths and different convolution calculation unit numbers, and the method of the embodiment can determine the number L of fifo in the fifo group through the formula (1) and the formula (2) and the number M of fifo needing to output data to the outside of the fifo group at the same time, thereby adjusting the number of fifo in each fifo group and completing flexible configuration.

The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it will be understood by those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A data calling method based on an FPGA off-chip memory is applied to a convolutional neural network and is characterized by comprising the following steps:

L＝2×kernel+Stride×(N-2)

M＝kernel+Stride×(N-1)