CN109800867B - Data calling method based on FPGA off-chip memory - Google Patents
Data calling method based on FPGA off-chip memory Download PDFInfo
- Publication number
- CN109800867B CN109800867B CN201811545237.2A CN201811545237A CN109800867B CN 109800867 B CN109800867 B CN 109800867B CN 201811545237 A CN201811545237 A CN 201811545237A CN 109800867 B CN109800867 B CN 109800867B
- Authority
- CN
- China
- Prior art keywords
- fifo
- data
- fpga
- group
- line
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention provides a data calling method based on an FPGA off-chip memory, which is characterized in that data of a feature diagram are sequentially stored in fifo row by row, every time of read-write operation, the first M fifo outputs the currently stored first data, the second M fifo writes the currently stored first data back to the data tail of fifo storage corresponding to the number L-M smaller than the number of the first data, meanwhile, the first data of the line L +1 of the feature diagram is written into the data tail of the line L-1 fifo storage, the first data of the line L +2 of the feature diagram is written into the data tail of the line L fifo storage, and therefore when the fifo continuously outputs the data out of a fifo group in sequence, the rest data of the feature diagram are written into the fifo group in sequence to wait to be read until the data traversal of the whole feature diagram is completed; therefore, the invention does not directly call the data of the FPGA off-chip memory, avoids complex address jump and greatly improves the efficiency of calling the data of the FPGA off-chip memory.
Description
Technical Field
The invention belongs to the technical field of image classification and identification, and particularly relates to a data calling method based on an FPGA (field programmable gate array) off-chip memory.
Background
In recent five years, the convolutional neural network has achieved good effects in the fields of image feature extraction, classification and identification and the like. Because the convolutional neural network architecture is flexible and changeable, the conventional convolutional neural network is mainly realized by software platforms such as a CPU (central processing unit), a GPU (graphics processing unit) and the like. However, in the current engineering application, the requirements for the real-time performance and the low power consumption of the system are more and more outstanding, so that the calculation of the convolutional neural network is accelerated by using a hardware platform to achieve the purpose of reducing the power consumption of the system, and the method becomes a hot research problem of the convolutional neural network in the engineering application. One of the promising solutions is the Field Programmable Gate Array (FPGA). However, on-chip storage resources of the FPGA hardly satisfy storage of image data, parameters, and intermediate results in the convolutional neural network, and therefore, when the computation of the convolutional neural network is accelerated by using the FPGA, off-chip storage resources of the FPGA need to be called to satisfy storage requirements of the system. Therefore, the problem of researching the reasonable calling of the off-chip storage of the FPGA becomes the focus of the current research.
Due to reasonable calling of FPGA off-chip storage, the convolutional neural network computing unit designed based on FPGA can fully exert the characteristics of parallel computing in the convolutional neural network algorithm, accelerate convolutional computing to the maximum extent and improve the throughput of the system. Therefore, the calling optimization problem of the off-chip storage of the FPGA has become one of the important research directions for realizing the accelerated development of the computation on the FPGA by the convolutional neural network in the future.
In the existing optimization method for calling the FPGA off-chip storage, the storage is optimized mainly based on the structure of the FPGA off-chip storage. The data in the off-chip memory of the FPGA is stored in different banks, and the main idea of the current method is to store different input characteristic diagram data in the convolutional neural network in different banks of the off-chip memory of the FPGA as much as possible.
Disclosure of Invention
In order to solve the problems, the invention provides a data calling method based on an FPGA off-chip memory, which does not directly call the data of the FPGA off-chip memory, avoids complex address jump and greatly improves the efficiency of calling the data of the FPGA off-chip memory.
A data calling method based on an FPGA off-chip memory is applied to a convolutional neural network and comprises the following steps:
s1: the method comprises the following steps of setting fifo groups in an FPGA on-chip memory, wherein each fifo group comprises L fifos, numbering the fifos in sequence from 1 to L, and determining the number M of the fifos which need to output data out of the fifo groups at the same time, specifically:
L=2×kernel+Stride×(N-2)
M=kernel+Stride×(N-1)
wherein, kernel is the size of a preset convolution kernel, Stride is the step length of a sliding window adopted in convolution calculation, and N is the group number of the sliding window data which needs to be generated simultaneously, wherein N is more than or equal to 2;
s2: storing the front L rows of data in the feature map in the FPGA off-chip memory into fifo groups line by line, wherein each fifo stores a row of data of the feature map, and the depth of the fifo is greater than the size of the feature map;
s3: performing read-write operation on each fifo in the fifo group, wherein the read-write operation specifically comprises:
for the first M fifo's, each fifo stores the first data output out of fifo group as the sliding window data of the convolutional neural network, and the second data becomes the first data; for the last M fifo's, the first data of each fifo memory is written into the data tail of the fifo memory corresponding to the number L-M smaller than the number, meanwhile, the first data of the L +1 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, the first data of the L +2 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, and the updating of each fifo in the fifo group is completed;
s4: and repeating the step S3, and performing read-write operation on each fifo in the updated fifo group again until the traversal of all data of the feature map is completed.
Has the advantages that:
the invention provides a data calling method based on an FPGA off-chip memory, which is characterized in that data of a feature diagram are sequentially stored in fifo line by line, M fifo outputs the first currently stored data before each read-write operation, M fifo writes the first currently stored data back to the tail of the fifo stored data corresponding to the number L-M smaller than the number of the fifo, meanwhile, the first data of the line L +1 of the feature diagram is written into the tail of the data stored in the line L-1 of the fifo, and the first data of the line L +2 of the feature diagram is written into the tail of the data stored in the line L fifo, so that when the fifo continuously outputs the data out of the fifo group in sequence, the rest data of the feature diagram is written into the fifo group in sequence and waits to be read until the data traversal of the whole feature diagram is completed; therefore, the fifo group is constructed in the FPGA chip by utilizing the fifo, and according to the data sequence requirement required by convolution calculation, the data of the whole characteristic diagram stored in the FPGA off-chip memory are output to the convolution calculation unit outside the group one by each fifo, so that the data of the FPGA off-chip memory are not directly called in the process of calling the data from the FPGA off-chip memory to the FPGA on-chip memory, the complex address jump is avoided, and the efficiency of calling the data of the FPGA off-chip memory is greatly improved.
Drawings
FIG. 1 is a flow chart of a data calling method based on an FPGA off-chip memory according to the present invention;
FIG. 2 is a schematic diagram of data storage of each fifo in the fifo group when no read-write operation is performed according to the present invention;
FIG. 3 is a schematic diagram of data storage of each fifo in the fifo group after the first read-write operation is performed according to the present invention;
FIG. 4 is a schematic diagram of data storage of each fifo in the fifo group after the second read-write operation is performed according to the present invention;
fig. 5 is a schematic diagram of data storage of each fifo in the fifo group after the third read-write operation is performed according to the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
Example one
Referring to fig. 1, the figure is a flowchart of a data calling method based on an FPGA off-chip memory according to this embodiment. A data calling method based on FPGA off-chip memory is applied to a convolutional neural network, in particular to a process of extracting data of a characteristic diagram with the size of S multiplied by S in a sliding window mode in the calculation of the convolutional neural network, and comprises the following steps:
s1: the method comprises the steps of setting a fifo group in an FPGA on-chip memory, wherein the fifo group comprises L fifos (first input first output, first output queue), numbering the fifos in sequence from 1 to L, and determining the number M of the fifos which need to output data outside the fifo group at the same time, specifically:
L=2×kernel+Stride×(N-2) (1)
M=kernel+Stride×(N-1) (2)
wherein, kernel is the size of a preset convolution kernel, Stride is the step length of a sliding window adopted in convolution calculation, and N is the number of groups of sliding window data which need to be generated simultaneously, wherein N is more than or equal to 2.
It should be noted that, in a computer, the fifo queue is a traditional sequential execution method, and an instruction that enters first completes and retires first, and then executes a second instruction.
S2: and storing the first L rows of data in the feature map in the FPGA off-chip memory into fifo groups line by line, wherein each fifo stores one row of data of the feature map, and the depth of the fifo is greater than the size S of the feature map.
S3: performing read-write operation on each fifo in the fifo group, wherein the read-write operation specifically comprises:
for the first M fifo's from the front, the first data stored in each fifo is output as the sliding window data of the convolutional neural network, for the last M fifo's from the back, the first data stored in each fifo is written into the data tail of the fifo storage corresponding to the number L-M smaller than the number, at the same time, the first data of the L +1 th line of the characteristic diagram is written into the data tail of the L-1 th fifo storage, the first data of the L +2 th line of the characteristic diagram is written into the data tail of the L-1 th fifo storage, and the updating of each fifo in the fifo group is completed.
It should be noted that, for the last M fifo's, the first data stored in each fifo is written back to the fifo stored data trailer corresponding to the number L-M smaller than the number, that is, the first data in the L-M +1 fifo is written back to the first fifo stored data trailer, the first data in the L-M +2 fifo is written back to the second fifo stored data trailer, and so on until the first data in the L fifo is written back to the M fifo stored data trailer.
It should be noted that, in the physical storage of the actual FPGA on-chip memory, after the first data currently stored in each fifo is output outside the fifo group, because the fifos follow the first-in first-out storage policy, the data stored in each fifo will move its storage position forward by one bit in sequence, that is, the second data becomes the first data, the third data becomes the second data, and so on, until the last bit is empty, the last M fifos that are counted backwards cannot be written, the first data stored in each fifo is written into the data tail of the fifo storage corresponding to the number L-M smaller than the number thereof, and at the same time, the first data in the L +1 th line of the feature map is written into the data tail of the L-1 th fifo storage, and the first data in the L +2 th line of the feature map is written into the data tail of the L-th fifo storage.
S4: and repeating the step S3, and performing read-write operation on each fifo in the updated fifo group again until the traversal of all data of the feature map is completed.
Example two
Based on the above embodiments, this embodiment describes an FPGA off-chip memory calling method in detail by taking as an example that the size of the feature map is 15 × 15, the size of the convolution kernel is 3 × 3, the step size Stride of the sliding window in the convolution calculation of the feature map is 1, and the number of the convolution calculation units on the FPGA chip is 2, that is, the number N of the sliding windows that need to be processed simultaneously is 2.
Step one, determining the number L of fifo in the fifo group
Determining the number L of fifos in each fifo group according to three parameters, namely the size (kernel) of a convolution kernel in convolution calculation, the step length (Stride) of a sliding window in the convolution calculation of the feature map and the number (N) of convolution calculation units on the FPGA chip, and satisfying the following formula:
L=2×3+1×(2-2)=6
that is, there are 6 fifo in the fifo group.
Step two, determining the number M of fifo needing to output data out of fifo group at the same time
According to the assumed situation, the size (kernel) of the convolution kernel is 3, the step length (Stride) of the sliding window during convolution calculation of the feature map is 1, and the number (N) of convolution calculation units on the FPGA chip is 2, so that it can be determined that each fifo group needs to output data in M fifos at the same time according to the three parameters, and the following formula is satisfied:
M=3+(2-1)×1=4
that is, each fifo read-write operation requires that 4 fifo's of data be output outside the fifo group at the same time.
Step three, determining the depth of fifo
According to the formula: depth ≧ size can be known, and the depth of each fifo can be selected as 16.
And step four, storing the first 6 rows of data in the feature map into fifo groups line by line, wherein each fifo stores one row of data of the feature map.
Referring to fig. 2, this figure is a schematic diagram of data storage of each fifo in the fifo group when no read/write operation is performed. Wherein the serial numbers of the fifo in the fifo group are 1-6 from top to bottom in sequence. Assuming that the data numbers of the first eight rows of the input feature map are 1 to 120 respectively, before fifo read-write operation is performed, the data of the rows 1 to 6 of the input feature map are written into fifo in fifo groups respectively, wherein fifo with number 1 writes the data of the first row, fifo with number 2 writes the data of the second row, and so on, and the data storage condition of each fifo in fifo groups is as shown in fig. 2.
Referring to fig. 3, this is a schematic diagram of data storage of each fifo in the fifo group after the first read-write operation is performed in this embodiment. Outputting first data stored in 4 fifo with the serial numbers of 1 to 4 out of the fifo groups, namely outputting 4 data with the serial numbers of 1,16,31 and 46 out of the fifo groups at the same time, and storing the data in two convolution calculation units on an FPGA slice; the first data stored in the 4 fifo with the numbers of 3 to 6 is written in the data tail of the fifo storage with the number corresponding to the number 2 smaller than the number, wherein the first data 31 stored in the fifo with the number 3 is written in the data tail of the fifo storage with the number 1, the first data 46 stored in the fifo with the number 4 is written in the data tail of the fifo storage with the number 2, the first data 61 stored in the fifo with the number 5 is written in the data tail of the fifo storage with the number 3, and the first data 76 stored in the fifo with the number 6 is written in the data tail of the fifo storage with the number 4; meanwhile, the first data 91 of the line 7 of the feature map is written into the data trailer of the fifo storage with the number 5, and the first data 106 of the line 8 of the feature map is written into the data trailer of the fifo storage with the number 6, so that the updating of each fifo in the fifo group is completed, as shown in fig. 3.
Referring to fig. 4 and fig. 5, schematic diagrams of data storage of each fifo in the fifo group after performing the second and third read-write operations are provided in this embodiment, respectively. The data transfer mode of each fifo storage in the fifo group is similar to the first read-write operation, and this embodiment will not be described in detail.
Therefore, according to the method for calling the FPGA off-chip memory provided by this embodiment, data of the feature diagram is sequentially stored in fifo line by line, M fifo outputs the first currently stored data before each read-write operation, M fifo writes the first currently stored data back to the data tail of fifo storage corresponding to the number L-M smaller than the number, meanwhile, the first data of the L +1 th line of the feature diagram is written into the data tail of L-1 th fifo storage, and the first data of the L +2 th line of the feature diagram is written into the data tail of L fifo storage, so that when fifo continuously outputs the data fifo group in sequence, the remaining data of the feature diagram is written into the fifo group in sequence for waiting to be read until the data traversal of the whole feature diagram is completed; therefore, in the embodiment, fifo groups are constructed in the FPGA chip by using fifo, and according to the data sequence requirement required by convolution calculation, each fifo outputs the data of the whole feature map to the convolution calculation unit outside the group one by one, wherein the convolution calculation unit also belongs to the FPGA chip memory, so that in the process of calling the data from the FPGA chip memory to the FPGA chip memory, the data of the FPGA chip memory is not directly called, so that complex address jump is avoided, and the efficiency of calling the data of the FPGA chip memory is greatly improved.
In addition, the existing calling optimization method of the FPGA off-chip memory is easily influenced by the number of input feature maps in convolution calculation, and when the number of the input feature maps is larger than the bank number of the off-chip memory, the problem of address hopping access can be encountered.
Furthermore, the existing calling optimization method for the FPGA off-chip memory is difficult to meet the requirement that the convolution neural network calculation needs to flexibly configure the convolution calculation data input according to different convolution kernel sizes, different feature map sliding window step lengths and different convolution calculation unit numbers, and the method of the embodiment can determine the number L of fifo in the fifo group through the formula (1) and the formula (2) and the number M of fifo needing to output data to the outside of the fifo group at the same time, thereby adjusting the number of fifo in each fifo group and completing flexible configuration.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it will be understood by those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (1)
1. A data calling method based on an FPGA off-chip memory is applied to a convolutional neural network and is characterized by comprising the following steps:
s1: the method comprises the following steps of setting fifo groups in an FPGA on-chip memory, wherein each fifo group comprises L fifos, numbering the fifos in sequence from 1 to L, and determining the number M of the fifos which need to output data out of the fifo groups at the same time, specifically:
L=2×kernel+Stride×(N-2)
M=kernel+Stride×(N-1)
wherein, kernel is the size of a preset convolution kernel, Stride is the step length of a sliding window adopted in convolution calculation, and N is the group number of the sliding window data which needs to be generated simultaneously, wherein N is more than or equal to 2;
s2: storing the front L rows of data in the feature map in the FPGA off-chip memory into fifo groups line by line, wherein each fifo stores a row of data of the feature map, and the depth of the fifo is greater than the size of the feature map;
s3: performing read-write operation on each fifo in the fifo group, wherein the read-write operation specifically comprises:
for the first M fifo's, each fifo stores the first data output out of fifo group as the sliding window data of the convolutional neural network, and the second data becomes the first data; for the last M fifo's, the first data of each fifo memory is written into the data tail of the fifo memory corresponding to the number L-M smaller than the number, meanwhile, the first data of the L +1 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, the first data of the L +2 th line of the characteristic diagram is written into the data tail of the L-1 th fifo memory, and the updating of each fifo in the fifo group is completed;
s4: and repeating the step S3, and performing read-write operation on each fifo in the updated fifo group again until the traversal of all data of the feature map is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811545237.2A CN109800867B (en) | 2018-12-17 | 2018-12-17 | Data calling method based on FPGA off-chip memory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811545237.2A CN109800867B (en) | 2018-12-17 | 2018-12-17 | Data calling method based on FPGA off-chip memory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109800867A CN109800867A (en) | 2019-05-24 |
CN109800867B true CN109800867B (en) | 2020-09-29 |
Family
ID=66556986
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811545237.2A Active CN109800867B (en) | 2018-12-17 | 2018-12-17 | Data calling method based on FPGA off-chip memory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109800867B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111583095B (en) * | 2020-05-22 | 2022-03-22 | 浪潮电子信息产业股份有限公司 | Image data storage method, image data processing system and related device |
CN112488305B (en) * | 2020-12-22 | 2023-04-18 | 西北工业大学 | Neural network storage device and configurable management method thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
CN106250103A (en) * | 2016-08-04 | 2016-12-21 | 东南大学 | A kind of convolutional neural networks cyclic convolution calculates the system of data reusing |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10846591B2 (en) * | 2015-12-29 | 2020-11-24 | Synopsys, Inc. | Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks |
CN108229645B (en) * | 2017-04-28 | 2021-08-06 | 北京市商汤科技开发有限公司 | Convolution acceleration and calculation processing method and device, electronic equipment and storage medium |
KR102008287B1 (en) * | 2017-05-23 | 2019-08-07 | 고려대학교 산학협력단 | Bidirectional fifo memoy and processing device for convoultion using the same |
CN107392309A (en) * | 2017-09-11 | 2017-11-24 | 东南大学—无锡集成电路技术研究所 | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA |
CN107862650B (en) * | 2017-11-29 | 2021-07-06 | 中科亿海微电子科技(苏州)有限公司 | Method for accelerating calculation of CNN convolution of two-dimensional image |
CN108764182B (en) * | 2018-06-01 | 2020-12-08 | 阿依瓦(北京)技术有限公司 | Optimized acceleration method and device for artificial intelligence |
CN108717571B (en) * | 2018-06-01 | 2020-09-15 | 阿依瓦(北京)技术有限公司 | Acceleration method and device for artificial intelligence |
CN108681984B (en) * | 2018-07-26 | 2023-08-15 | 珠海一微半导体股份有限公司 | Acceleration circuit of 3*3 convolution algorithm |
-
2018
- 2018-12-17 CN CN201811545237.2A patent/CN109800867B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
CN106250103A (en) * | 2016-08-04 | 2016-12-21 | 东南大学 | A kind of convolutional neural networks cyclic convolution calculates the system of data reusing |
Non-Patent Citations (1)
Title |
---|
FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software;Jin Hee Kim et al.;《arXiv》;20180727;第1-6页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109800867A (en) | 2019-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108108809B (en) | Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof | |
US10140123B2 (en) | SIMD processing lanes storing input pixel operand data in local register file for thread execution of image processing operations | |
US11775430B1 (en) | Memory access for multiple circuit components | |
US8094157B1 (en) | Performing an occurence count of radices | |
US8676874B2 (en) | Data structure for tiling and packetizing a sparse matrix | |
US10346507B2 (en) | Symmetric block sparse matrix-vector multiplication | |
CN111414994B (en) | FPGA-based Yolov3 network computing acceleration system and acceleration method thereof | |
US8769216B2 (en) | Optimizing output vector data generation using a formatted matrix data structure | |
CN108388537B (en) | Convolutional neural network acceleration device and method | |
CN108717571B (en) | Acceleration method and device for artificial intelligence | |
CN110321997B (en) | High-parallelism computing platform, system and computing implementation method | |
US7689541B1 (en) | Reordering data using a series of offsets | |
CN110674927A (en) | Data recombination method for pulse array structure | |
CN115880132B (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN112668708B (en) | Convolution operation device for improving data utilization rate | |
US7624107B1 (en) | Radix sort algorithm for graphics processing units | |
CN109800867B (en) | Data calling method based on FPGA off-chip memory | |
CN111008040A (en) | Cache device and cache method, computing device and computing method | |
WO2022206556A1 (en) | Matrix operation method and apparatus for image data, device, and storage medium | |
CN110796236A (en) | Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network | |
CN110377874B (en) | Convolution operation method and system | |
CN108764182B (en) | Optimized acceleration method and device for artificial intelligence | |
CN108416430A (en) | The pond arithmetic unit and method of convolutional neural networks | |
CN113743587A (en) | Convolutional neural network pooling calculation method, system and storage medium | |
Shahbahrami et al. | FPGA implementation of parallel histogram computation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |