CN109800867B - Data calling method based on FPGA off-chip memory - Google Patents

Data calling method based on FPGA off-chip memory Download PDF

Info

Publication number
CN109800867B
CN109800867B CN201811545237.2A CN201811545237A CN109800867B CN 109800867 B CN109800867 B CN 109800867B CN 201811545237 A CN201811545237 A CN 201811545237A CN 109800867 B CN109800867 B CN 109800867B
Authority
CN
China
Prior art keywords
fifo
data
fpga
group
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811545237.2A
Other languages
Chinese (zh)
Other versions
CN109800867A (en
Inventor
龙腾
魏鑫
陈禾
陈磊
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201811545237.2A priority Critical patent/CN109800867B/en
Publication of CN109800867A publication Critical patent/CN109800867A/en
Application granted granted Critical
Publication of CN109800867B publication Critical patent/CN109800867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a data calling method based on an FPGA off-chip memory, which is characterized in that data of a feature diagram are sequentially stored in fifo row by row, every time of read-write operation, the first M fifo outputs the currently stored first data, the second M fifo writes the currently stored first data back to the data tail of fifo storage corresponding to the number L-M smaller than the number of the first data, meanwhile, the first data of the line L +1 of the feature diagram is written into the data tail of the line L-1 fifo storage, the first data of the line L +2 of the feature diagram is written into the data tail of the line L fifo storage, and therefore when the fifo continuously outputs the data out of a fifo group in sequence, the rest data of the feature diagram are written into the fifo group in sequence to wait to be read until the data traversal of the whole feature diagram is completed; therefore, the invention does not directly call the data of the FPGA off-chip memory, avoids complex address jump and greatly improves the efficiency of calling the data of the FPGA off-chip memory.

Description

Data calling method based on FPGA off-chip memory
Technical Field
The invention belongs to the technical field of image classification and identification, and particularly relates to a data calling method based on an FPGA (field programmable gate array) off-chip memory.
Background
In recent five years, the convolutional neural network has achieved good effects in the fields of image feature extraction, classification and identification and the like. Because the convolutional neural network architecture is flexible and changeable, the conventional convolutional neural network is mainly realized by software platforms such as a CPU (central processing unit), a GPU (graphics processing unit) and the like. However, in the current engineering application, the requirements for the real-time performance and the low power consumption of the system are more and more outstanding, so that the calculation of the convolutional neural network is accelerated by using a hardware platform to achieve the purpose of reducing the power consumption of the system, and the method becomes a hot research problem of the convolutional neural network in the engineering application. One of the promising solutions is the Field Programmable Gate Array (FPGA). However, on-chip storage resources of the FPGA hardly satisfy storage of image data, parameters, and intermediate results in the convolutional neural network, and therefore, when the computation of the convolutional neural network is accelerated by using the FPGA, off-chip storage resources of the FPGA need to be called to satisfy storage requirements of the system. Therefore, the problem of researching the reasonable calling of the off-chip storage of the FPGA becomes the focus of the current research.
Due to reasonable calling of FPGA off-chip storage, the convolutional neural network computing unit designed based on FPGA can fully exert the characteristics of parallel computing in the convolutional neural network algorithm, accelerate convolutional computing to the maximum extent and improve the throughput of the system. Therefore, the calling optimization problem of the off-chip storage of the FPGA has become one of the important research directions for realizing the accelerated development of the computation on the FPGA by the convolutional neural network in the future.
In the existing optimization method for calling the FPGA off-chip storage, the storage is optimized mainly based on the structure of the FPGA off-chip storage. The data in the off-chip memory of the FPGA is stored in different banks, and the main idea of the current method is to store different input characteristic diagram data in the convolutional neural network in different banks of the off-chip memory of the FPGA as much as possible.
Disclosure of Invention
In order to solve the problems, the invention provides a data calling method based on an FPGA off-chip memory, which does not directly call the data of the FPGA off-chip memory, avoids complex address jump and greatly improves the efficiency of calling the data of the FPGA off-chip memory.
A data calling method based on an FPGA off-chip memory is applied to a convolutional neural network and comprises the following steps:
s1: the method comprises the following steps of setting fifo groups in an FPGA on-chip memory, wherein each fifo group comprises L fifos, numbering the fifos in sequence from 1 to L, and determining the number M of the fifos which need to output data out of the fifo groups at the same time, specifically:
L=2×kernel+Stride×(N-2)
M=kernel+Stride×(N-1)
wherein, kernel is the size of a preset convolution kernel, Stride is the step length of a sliding window adopted in convolution calculation, and N is the group number of the sliding window data which needs to be generated simultaneously, wherein N is more than or equal to 2;
s2: storing the front L rows of data in the feature map in the FPGA off-chip memory into fifo groups line by line, wherein each fifo stores a row of data of the feature map, and the depth of the fifo is greater than the size of the feature map;
s3: performing read-write operation on each fifo in the fifo group, wherein the read-write operation specifically comprises:
for the first M fifo's, each fifo stores the first data output out of fifo group as the sliding window data of the convolutional neural network, and the second data becomes the first data; for the last M fifo's, the first data of each fifo memory is written into the data tail of the fifo memory corresponding to the number L-M smaller than the number, meanwhile, the first data of the L +1 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, the first data of the L +2 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, and the updating of each fifo in the fifo group is completed;
s4: and repeating the step S3, and performing read-write operation on each fifo in the updated fifo group again until the traversal of all data of the feature map is completed.
Has the advantages that:
the invention provides a data calling method based on an FPGA off-chip memory, which is characterized in that data of a feature diagram are sequentially stored in fifo line by line, M fifo outputs the first currently stored data before each read-write operation, M fifo writes the first currently stored data back to the tail of the fifo stored data corresponding to the number L-M smaller than the number of the fifo, meanwhile, the first data of the line L +1 of the feature diagram is written into the tail of the data stored in the line L-1 of the fifo, and the first data of the line L +2 of the feature diagram is written into the tail of the data stored in the line L fifo, so that when the fifo continuously outputs the data out of the fifo group in sequence, the rest data of the feature diagram is written into the fifo group in sequence and waits to be read until the data traversal of the whole feature diagram is completed; therefore, the fifo group is constructed in the FPGA chip by utilizing the fifo, and according to the data sequence requirement required by convolution calculation, the data of the whole characteristic diagram stored in the FPGA off-chip memory are output to the convolution calculation unit outside the group one by each fifo, so that the data of the FPGA off-chip memory are not directly called in the process of calling the data from the FPGA off-chip memory to the FPGA on-chip memory, the complex address jump is avoided, and the efficiency of calling the data of the FPGA off-chip memory is greatly improved.
Drawings
FIG. 1 is a flow chart of a data calling method based on an FPGA off-chip memory according to the present invention;
FIG. 2 is a schematic diagram of data storage of each fifo in the fifo group when no read-write operation is performed according to the present invention;
FIG. 3 is a schematic diagram of data storage of each fifo in the fifo group after the first read-write operation is performed according to the present invention;
FIG. 4 is a schematic diagram of data storage of each fifo in the fifo group after the second read-write operation is performed according to the present invention;
fig. 5 is a schematic diagram of data storage of each fifo in the fifo group after the third read-write operation is performed according to the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
Example one
Referring to fig. 1, the figure is a flowchart of a data calling method based on an FPGA off-chip memory according to this embodiment. A data calling method based on FPGA off-chip memory is applied to a convolutional neural network, in particular to a process of extracting data of a characteristic diagram with the size of S multiplied by S in a sliding window mode in the calculation of the convolutional neural network, and comprises the following steps:
s1: the method comprises the steps of setting a fifo group in an FPGA on-chip memory, wherein the fifo group comprises L fifos (first input first output, first output queue), numbering the fifos in sequence from 1 to L, and determining the number M of the fifos which need to output data outside the fifo group at the same time, specifically:
L=2×kernel+Stride×(N-2) (1)
M=kernel+Stride×(N-1) (2)
wherein, kernel is the size of a preset convolution kernel, Stride is the step length of a sliding window adopted in convolution calculation, and N is the number of groups of sliding window data which need to be generated simultaneously, wherein N is more than or equal to 2.
It should be noted that, in a computer, the fifo queue is a traditional sequential execution method, and an instruction that enters first completes and retires first, and then executes a second instruction.
S2: and storing the first L rows of data in the feature map in the FPGA off-chip memory into fifo groups line by line, wherein each fifo stores one row of data of the feature map, and the depth of the fifo is greater than the size S of the feature map.
S3: performing read-write operation on each fifo in the fifo group, wherein the read-write operation specifically comprises:
for the first M fifo's from the front, the first data stored in each fifo is output as the sliding window data of the convolutional neural network, for the last M fifo's from the back, the first data stored in each fifo is written into the data tail of the fifo storage corresponding to the number L-M smaller than the number, at the same time, the first data of the L +1 th line of the characteristic diagram is written into the data tail of the L-1 th fifo storage, the first data of the L +2 th line of the characteristic diagram is written into the data tail of the L-1 th fifo storage, and the updating of each fifo in the fifo group is completed.
It should be noted that, for the last M fifo's, the first data stored in each fifo is written back to the fifo stored data trailer corresponding to the number L-M smaller than the number, that is, the first data in the L-M +1 fifo is written back to the first fifo stored data trailer, the first data in the L-M +2 fifo is written back to the second fifo stored data trailer, and so on until the first data in the L fifo is written back to the M fifo stored data trailer.
It should be noted that, in the physical storage of the actual FPGA on-chip memory, after the first data currently stored in each fifo is output outside the fifo group, because the fifos follow the first-in first-out storage policy, the data stored in each fifo will move its storage position forward by one bit in sequence, that is, the second data becomes the first data, the third data becomes the second data, and so on, until the last bit is empty, the last M fifos that are counted backwards cannot be written, the first data stored in each fifo is written into the data tail of the fifo storage corresponding to the number L-M smaller than the number thereof, and at the same time, the first data in the L +1 th line of the feature map is written into the data tail of the L-1 th fifo storage, and the first data in the L +2 th line of the feature map is written into the data tail of the L-th fifo storage.
S4: and repeating the step S3, and performing read-write operation on each fifo in the updated fifo group again until the traversal of all data of the feature map is completed.
Example two
Based on the above embodiments, this embodiment describes an FPGA off-chip memory calling method in detail by taking as an example that the size of the feature map is 15 × 15, the size of the convolution kernel is 3 × 3, the step size Stride of the sliding window in the convolution calculation of the feature map is 1, and the number of the convolution calculation units on the FPGA chip is 2, that is, the number N of the sliding windows that need to be processed simultaneously is 2.
Step one, determining the number L of fifo in the fifo group
Determining the number L of fifos in each fifo group according to three parameters, namely the size (kernel) of a convolution kernel in convolution calculation, the step length (Stride) of a sliding window in the convolution calculation of the feature map and the number (N) of convolution calculation units on the FPGA chip, and satisfying the following formula:
L=2×3+1×(2-2)=6
that is, there are 6 fifo in the fifo group.
Step two, determining the number M of fifo needing to output data out of fifo group at the same time
According to the assumed situation, the size (kernel) of the convolution kernel is 3, the step length (Stride) of the sliding window during convolution calculation of the feature map is 1, and the number (N) of convolution calculation units on the FPGA chip is 2, so that it can be determined that each fifo group needs to output data in M fifos at the same time according to the three parameters, and the following formula is satisfied:
M=3+(2-1)×1=4
that is, each fifo read-write operation requires that 4 fifo's of data be output outside the fifo group at the same time.
Step three, determining the depth of fifo
According to the formula: depth ≧ size can be known, and the depth of each fifo can be selected as 16.
And step four, storing the first 6 rows of data in the feature map into fifo groups line by line, wherein each fifo stores one row of data of the feature map.
Referring to fig. 2, this figure is a schematic diagram of data storage of each fifo in the fifo group when no read/write operation is performed. Wherein the serial numbers of the fifo in the fifo group are 1-6 from top to bottom in sequence. Assuming that the data numbers of the first eight rows of the input feature map are 1 to 120 respectively, before fifo read-write operation is performed, the data of the rows 1 to 6 of the input feature map are written into fifo in fifo groups respectively, wherein fifo with number 1 writes the data of the first row, fifo with number 2 writes the data of the second row, and so on, and the data storage condition of each fifo in fifo groups is as shown in fig. 2.
Referring to fig. 3, this is a schematic diagram of data storage of each fifo in the fifo group after the first read-write operation is performed in this embodiment. Outputting first data stored in 4 fifo with the serial numbers of 1 to 4 out of the fifo groups, namely outputting 4 data with the serial numbers of 1,16,31 and 46 out of the fifo groups at the same time, and storing the data in two convolution calculation units on an FPGA slice; the first data stored in the 4 fifo with the numbers of 3 to 6 is written in the data tail of the fifo storage with the number corresponding to the number 2 smaller than the number, wherein the first data 31 stored in the fifo with the number 3 is written in the data tail of the fifo storage with the number 1, the first data 46 stored in the fifo with the number 4 is written in the data tail of the fifo storage with the number 2, the first data 61 stored in the fifo with the number 5 is written in the data tail of the fifo storage with the number 3, and the first data 76 stored in the fifo with the number 6 is written in the data tail of the fifo storage with the number 4; meanwhile, the first data 91 of the line 7 of the feature map is written into the data trailer of the fifo storage with the number 5, and the first data 106 of the line 8 of the feature map is written into the data trailer of the fifo storage with the number 6, so that the updating of each fifo in the fifo group is completed, as shown in fig. 3.
Referring to fig. 4 and fig. 5, schematic diagrams of data storage of each fifo in the fifo group after performing the second and third read-write operations are provided in this embodiment, respectively. The data transfer mode of each fifo storage in the fifo group is similar to the first read-write operation, and this embodiment will not be described in detail.
Therefore, according to the method for calling the FPGA off-chip memory provided by this embodiment, data of the feature diagram is sequentially stored in fifo line by line, M fifo outputs the first currently stored data before each read-write operation, M fifo writes the first currently stored data back to the data tail of fifo storage corresponding to the number L-M smaller than the number, meanwhile, the first data of the L +1 th line of the feature diagram is written into the data tail of L-1 th fifo storage, and the first data of the L +2 th line of the feature diagram is written into the data tail of L fifo storage, so that when fifo continuously outputs the data fifo group in sequence, the remaining data of the feature diagram is written into the fifo group in sequence for waiting to be read until the data traversal of the whole feature diagram is completed; therefore, in the embodiment, fifo groups are constructed in the FPGA chip by using fifo, and according to the data sequence requirement required by convolution calculation, each fifo outputs the data of the whole feature map to the convolution calculation unit outside the group one by one, wherein the convolution calculation unit also belongs to the FPGA chip memory, so that in the process of calling the data from the FPGA chip memory to the FPGA chip memory, the data of the FPGA chip memory is not directly called, so that complex address jump is avoided, and the efficiency of calling the data of the FPGA chip memory is greatly improved.
In addition, the existing calling optimization method of the FPGA off-chip memory is easily influenced by the number of input feature maps in convolution calculation, and when the number of the input feature maps is larger than the bank number of the off-chip memory, the problem of address hopping access can be encountered.
Furthermore, the existing calling optimization method for the FPGA off-chip memory is difficult to meet the requirement that the convolution neural network calculation needs to flexibly configure the convolution calculation data input according to different convolution kernel sizes, different feature map sliding window step lengths and different convolution calculation unit numbers, and the method of the embodiment can determine the number L of fifo in the fifo group through the formula (1) and the formula (2) and the number M of fifo needing to output data to the outside of the fifo group at the same time, thereby adjusting the number of fifo in each fifo group and completing flexible configuration.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it will be understood by those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (1)

1. A data calling method based on an FPGA off-chip memory is applied to a convolutional neural network and is characterized by comprising the following steps:
s1: the method comprises the following steps of setting fifo groups in an FPGA on-chip memory, wherein each fifo group comprises L fifos, numbering the fifos in sequence from 1 to L, and determining the number M of the fifos which need to output data out of the fifo groups at the same time, specifically:
L=2×kernel+Stride×(N-2)
M=kernel+Stride×(N-1)
wherein, kernel is the size of a preset convolution kernel, Stride is the step length of a sliding window adopted in convolution calculation, and N is the group number of the sliding window data which needs to be generated simultaneously, wherein N is more than or equal to 2;
s2: storing the front L rows of data in the feature map in the FPGA off-chip memory into fifo groups line by line, wherein each fifo stores a row of data of the feature map, and the depth of the fifo is greater than the size of the feature map;
s3: performing read-write operation on each fifo in the fifo group, wherein the read-write operation specifically comprises:
for the first M fifo's, each fifo stores the first data output out of fifo group as the sliding window data of the convolutional neural network, and the second data becomes the first data; for the last M fifo's, the first data of each fifo memory is written into the data tail of the fifo memory corresponding to the number L-M smaller than the number, meanwhile, the first data of the L +1 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, the first data of the L +2 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, and the updating of each fifo in the fifo group is completed;
s4: and repeating the step S3, and performing read-write operation on each fifo in the updated fifo group again until the traversal of all data of the feature map is completed.
CN201811545237.2A 2018-12-17 2018-12-17 Data calling method based on FPGA off-chip memory Active CN109800867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811545237.2A CN109800867B (en) 2018-12-17 2018-12-17 Data calling method based on FPGA off-chip memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811545237.2A CN109800867B (en) 2018-12-17 2018-12-17 Data calling method based on FPGA off-chip memory

Publications (2)

Publication Number Publication Date
CN109800867A CN109800867A (en) 2019-05-24
CN109800867B true CN109800867B (en) 2020-09-29

Family

ID=66556986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811545237.2A Active CN109800867B (en) 2018-12-17 2018-12-17 Data calling method based on FPGA off-chip memory

Country Status (1)

Country Link
CN (1) CN109800867B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583095B (en) * 2020-05-22 2022-03-22 浪潮电子信息产业股份有限公司 Image data storage method, image data processing system and related device
CN112488305B (en) * 2020-12-22 2023-04-18 西北工业大学 Neural network storage device and configurable management method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10846591B2 (en) * 2015-12-29 2020-11-24 Synopsys, Inc. Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks
CN108229645B (en) * 2017-04-28 2021-08-06 北京市商汤科技开发有限公司 Convolution acceleration and calculation processing method and device, electronic equipment and storage medium
KR102008287B1 (en) * 2017-05-23 2019-08-07 고려대학교 산학협력단 Bidirectional fifo memoy and processing device for convoultion using the same
CN107392309A (en) * 2017-09-11 2017-11-24 东南大学—无锡集成电路技术研究所 A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA
CN107862650B (en) * 2017-11-29 2021-07-06 中科亿海微电子科技(苏州)有限公司 Method for accelerating calculation of CNN convolution of two-dimensional image
CN108764182B (en) * 2018-06-01 2020-12-08 阿依瓦(北京)技术有限公司 Optimized acceleration method and device for artificial intelligence
CN108717571B (en) * 2018-06-01 2020-09-15 阿依瓦(北京)技术有限公司 Acceleration method and device for artificial intelligence
CN108681984B (en) * 2018-07-26 2023-08-15 珠海一微半导体股份有限公司 Acceleration circuit of 3*3 convolution algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software;Jin Hee Kim et al.;《arXiv》;20180727;第1-6页 *

Also Published As

Publication number Publication date
CN109800867A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
CN108108809B (en) Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof
US10140123B2 (en) SIMD processing lanes storing input pixel operand data in local register file for thread execution of image processing operations
US11775430B1 (en) Memory access for multiple circuit components
US8094157B1 (en) Performing an occurence count of radices
US8676874B2 (en) Data structure for tiling and packetizing a sparse matrix
US10346507B2 (en) Symmetric block sparse matrix-vector multiplication
CN111414994B (en) FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
US8769216B2 (en) Optimizing output vector data generation using a formatted matrix data structure
CN108388537B (en) Convolutional neural network acceleration device and method
CN108717571B (en) Acceleration method and device for artificial intelligence
CN110321997B (en) High-parallelism computing platform, system and computing implementation method
US7689541B1 (en) Reordering data using a series of offsets
CN110674927A (en) Data recombination method for pulse array structure
CN115880132B (en) Graphics processor, matrix multiplication task processing method, device and storage medium
CN112668708B (en) Convolution operation device for improving data utilization rate
US7624107B1 (en) Radix sort algorithm for graphics processing units
CN109800867B (en) Data calling method based on FPGA off-chip memory
CN111008040A (en) Cache device and cache method, computing device and computing method
WO2022206556A1 (en) Matrix operation method and apparatus for image data, device, and storage medium
CN110796236A (en) Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
CN110377874B (en) Convolution operation method and system
CN108764182B (en) Optimized acceleration method and device for artificial intelligence
CN108416430A (en) The pond arithmetic unit and method of convolutional neural networks
CN113743587A (en) Convolutional neural network pooling calculation method, system and storage medium
Shahbahrami et al. FPGA implementation of parallel histogram computation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant