CN109800867A

CN109800867A - A kind of data calling method based on FPGA chip external memory

Info

Publication number: CN109800867A
Application number: CN201811545237.2A
Authority: CN
Inventors: 龙腾; 魏鑫; 陈禾; 陈磊; 陈亮
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2019-05-24
Anticipated expiration: 2038-12-17
Also published as: CN109800867B

Abstract

The present invention provides a kind of data calling method based on FPGA chip external memory, the data of characteristic pattern are stored in sequence in fifo line by line, each read-write operation, preceding M fifo exports first currently stored data, first currently stored data are written back to the data trailer of fifo storage more corresponding than the number that it numbers small L-M by M fifo afterwards, simultaneously, first data of characteristic pattern L+1 row are written to the data trailer of the L-1 fifo storage, by the data trailer of first data write-in l-th fifo storage of characteristic pattern L+2 row, when exporting fifo constantly sequentially data outside fifo group, it is medium to be read that the remaining data of characteristic pattern are sequentially written to fifo group again, data traversal until completing whole picture characteristic pattern；Therefore, the present invention does not go directly to call the data of FPGA chip external memory, avoids complicated address jump, substantially increases the efficiency for calling FPGA chip external memory data.

Description

A kind of data calling method based on FPGA chip external memory

Technical field

The invention belongs to image classification identification technology field more particularly to a kind of data tune based on FPGA chip external memory Use method.

Background technique

Over nearly 5 years, convolutional neural networks achieve good effect in fields such as image characteristics extraction, Classification and Identifications.By Flexible and changeable in convolutional neural networks framework, present convolutional neural networks mainly pass through the software desk Implementations such as CPU and GPU. It is more and more prominent for the demand of system real time, low-power consumption but in present engineer application, therefore utilize hardware platform The calculating of convolutional neural networks is accelerated and achievees the purpose that reduce system power dissipation, has become convolutional neural networks in work Research hotspot problem in Cheng Yingyong.Field programmable gate array (FPGA) is exactly one of up-and-coming solution. But the on piece storage resource of FPGA is difficult to meet the storage of image data in convolutional neural networks, parameter and intermediate result, Therefore when accelerating the calculating of convolutional neural networks using FPGA, need to call the storage resource outside FPGA piece to meet system Storage demand.Therefore, the reasonable calling problem for studying the piece external storage of FPGA becomes the emphasis of current research.

Due to the reasonable calling of FPGA piece external storage, the convolutional neural networks based on FPGA design can be made to calculate single Member, farthest accelerates convolutional calculation, improves system the characteristics of giving full play to parallel computation in convolutional neural networks algorithm Handling capacity.Therefore, the calling optimization problem of FPGA piece external storage has become the following convolutional neural networks and realizes acceleration on FPGA Calculate one of the important research direction of development.

In the existing optimization method called about FPGA piece external storage, it is mainly based upon the structure of FPGA chip external memory The optimization stored.Data divide Bank to store in the chip external memory of FPGA, and the main thinking of current method is by convolution Different input feature vector diagram datas being stored in the different Bank of FPGA chip external memory as far as possible in neural network, but in this way Method need continually to jump location access chip external memory, read-write efficiency is low, is especially encountering large-scale convolutional Neural When network query function, it is more unable to satisfy the efficiency requirements of data call.

Summary of the invention

To solve the above problems, the present invention provides a kind of data calling method based on FPGA chip external memory, without straight The data for calling FPGA chip external memory are connect, complicated address jump is avoided, substantially increases and call FPGA piece external storage The efficiency of device data.

A kind of data calling method based on FPGA chip external memory is applied to convolutional neural networks, comprising the following steps:

S1: in FPGA on-chip memory be arranged fifo group, wherein fifo group include L fifo, then by each fifo according to Secondary number is 1 to L, and determines the number M for needing the fifo of output data to outside fifo group simultaneously, specific:

L=2 × kernel+Stride × (N-2)

M=kernel+Stride × (N-1)

Wherein, kernel is preset convolution kernel size, and Stride is the step-length of sliding window employed in convolutional calculation, N Group number for the Sliding window data for needing while generating, wherein N >=2；

S2: the preceding L row data in FPGA chip external memory in characteristic pattern are stored in line by line in fifo group, wherein each Fifo stores the data line of characteristic pattern, and the depth depth of fifo is greater than the size of characteristic pattern；

S3: being written and read each fifo in fifo group, wherein the read-write operation specifically:

For the preceding M fifo from front number, first data output fifo group of each fifo storage is outer to be used as convolution The Sliding window data of neural network, while second data becomes first data；For rear M fifo, each fifo reciprocal The data trailer of first data write-in of storage fifo storage more corresponding than the number that it numbers small L-M, meanwhile, by characteristic pattern The data trailer of the L-1 fifo storage is written in first data of L+1 row, by first data of characteristic pattern L+2 row The data trailer of l-th fifo storage is written, completes the update of each fifo in fifo group；

S4: repeating step S3, each fifo in updated fifo group re-started read-write operation, until completing characteristic pattern The traversal of all data.

The utility model has the advantages that

The present invention provides a kind of data calling method based on FPGA chip external memory, the data of characteristic pattern is pressed line by line suitable Sequence is stored in fifo, and M fifo exports first currently stored data before each read-write operation, and rear M fifo will be current First data of storage can write the data trailer of fifo storage more corresponding than the number that it numbers small L-M, meanwhile, by feature The data trailer of the L-1 fifo storage is written in first data of figure L+1 row, by first number of characteristic pattern L+2 row According to the data trailer of write-in l-th fifo storage, so that when fifo constantly sequentially exports data outside fifo group, characteristic pattern Remaining data are sequentially written to that fifo group is medium to be read again, the data traversal until completing whole picture characteristic pattern；Therefore, this hair Bright to construct fifo group using fifo in FPGA piece, according to data sequence requirement needed for convolutional calculation, each fifo will be stored in The data of the whole picture characteristic pattern of FPGA chip external memory are output to the convolutional calculation unit outside group one by one, then at this by FPGA piece During external memory to FPGA on-chip memory data call, the data for calling FPGA chip external memory are not gone directly, Complicated address jump is avoided, the efficiency for calling FPGA chip external memory data is substantially increased.

Detailed description of the invention

Fig. 1 is a kind of flow chart of the data calling method based on FPGA chip external memory provided by the invention；

Fig. 2 is provided by the invention when not being written and read, and the data of each fifo store schematic diagram in fifo group；

Fig. 3 is the data storage schematic diagram of each fifo in fifo group after progress first time read-write operation provided by the invention；

Fig. 4 is the data storage schematic diagram of each fifo in fifo group after second of read-write operation of progress provided by the invention；

Fig. 5 is the data storage schematic diagram of each fifo in fifo group after progress third time read-write operation provided by the invention.

Specific embodiment

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described.

Embodiment one

Referring to Fig. 1, which is a kind of stream of data calling method based on FPGA chip external memory provided in this embodiment Cheng Tu.A kind of data calling method based on FPGA chip external memory is applied to convolutional neural networks, especially convolutional Neural net Network uses the mode of sliding window to size for the process of the carry out data extraction of the characteristic pattern of S × S in calculating, comprising the following steps:

S1: fifo group is set in FPGA on-chip memory, wherein fifo group includes L fifo (first input First output, First Input First Output), it is then 1 to L by each fifo number consecutively, and determine and need simultaneously to fifo group The number M of the fifo of outer output data, specific:

L=2 × kernel+Stride × (N-2) (1)

M=kernel+Stride × (N-1) (2)

Wherein, kernel is preset convolution kernel size, and Stride is the step-length of sliding window employed in convolutional calculation, N Group number for the Sliding window data for needing while generating, wherein N >=2.

It should be noted that in a computer, First Input First Output is a kind of traditional sequentially execution method, it is introduced into Instruction is first completed and retires from office, and Article 2 instruction is and then just executed.

S2: the preceding L row data in FPGA chip external memory in characteristic pattern are stored in line by line in fifo group, wherein each Fifo stores the data line of characteristic pattern, and the depth depth of fifo is greater than the size S of characteristic pattern.

For the preceding M fifo from front number, first data output fifo group of each fifo storage is outer to be used as convolution The Sliding window data of neural network, for rear M fifo reciprocal, first data write-in of each fifo storage is smaller than its number The data trailer of the corresponding fifo storage of the number of L-M, meanwhile, by first data write-in L-1 of characteristic pattern L+1 row The data trailer of fifo storage, it is complete by the data trailer of first data write-in l-th fifo storage of characteristic pattern L+2 row At the update of each fifo in fifo group.

It should be noted that first data write-in of each fifo storage is than its number for rear M fifo reciprocal First data in the L-M+1 fifo are exactly written back into the by the data trailer of the number corresponding fifo storage of small L-M The data trailer of one fifo storage, first data in the L-M+2 fifo are written back into the data that second fifo is stored Tail portion, and so on, until first data in l-th fifo are written back into m-th fifo storing data tail portion.

It should be noted that in the physical store of actual FPGA on-chip memory, each fifo it is currently stored One data output fifo group outside after, since fifo follows the storage strategy of first in first out, then the data meeting that is stored in each fifo The storage location of itself is successively moved forward one, i.e. second data become first data, and third data become Two data, and so on, it to the last vacates for one, first for rear M fifo reciprocal, each fifo could being stored The data trailer of the number corresponding fifo storage of small L-M is numbered in data write-in than it, meanwhile, by the of characteristic pattern L+1 row The data trailer of the L-1 fifo storage is written in one data, and l-th is written in first data of characteristic pattern L+2 row The data trailer of fifo storage.

Embodiment two

Based on above embodiments, the present embodiment is with the size of characteristic pattern for 15 × 15, and the size of convolution kernel is 3 × 3, feature The step-length Stride of sliding window is 1 when figure convolutional calculation and the number of FPGA on piece convolutional calculation unit is 2, that is, is needed simultaneously For the sliding window number N=2 of processing, a kind of FPGA chip external memory call method is described in detail.

Step 1: determining the number L of fifo in fifo group

According to the step-length (Stride) of sliding window when the size (kernel) of convolution kernel in convolutional calculation, characteristic pattern convolutional calculation And number (N) of FPGA on piece convolutional calculation unit these three parameters determine the number L of fifo in each fifo group, meet with Lower formula:

L=2 × 3+2 × (2-2)=6

That is, there is 6 fifo in fifo group.

Step 2: determining the number M for needing the fifo of output data to outside fifo group simultaneously

The case where according to assuming, the step-length of sliding window when the size (kernel) of convolution kernel is 3, characteristic pattern convolutional calculation (Stride) number (N) for being 1, FPGA on piece convolutional calculation unit is 2, so can be determined according to these three parameters each Fifo group needs while exporting the data in M fifo, meets the following formula:

M=3+ (2-1) × 1=4

That is, fifo read-write operation needs to export the data of 4 fifo to outside fifo group simultaneously every time.

Step 3: determining the depth of fifo

According to formula: depth >=size is it is recognised that the depth of each fifo is chosen as 16.

Step 4: 6 row data before in characteristic pattern are stored in line by line in fifo group, wherein each fifo stores feature The data line of figure.

Referring to fig. 2, which is provided in this embodiment when not being written and read, the data storage of each fifo in fifo group Schematic diagram.Wherein, the number of each fifo is followed successively by 1~6 from top to bottom in fifo group.It is assumed that the number of the first eight row of the characteristic pattern of input It is respectively 1 to 120 according to number, before carrying out fifo read-write operation, the fifo in fifo group is respectively written into input feature vector Fig. 1 extremely The data of 6 rows, wherein the data of the second row are written in the data for the fifo write-in the first row that number is 1, the fifo that number is 2, according to Secondary to analogize, the data storage condition of each fifo is as shown in Figure 2 in fifo group.

Referring to Fig. 3, which is the data of each fifo in fifo group after progress first time read-write operation provided in this embodiment Store schematic diagram.The first data output fifo group stored in 4 fifo that number is 1 to 4 is outer, i.e. characteristic pattern number is 1, 16,31,46 4 data are exported simultaneously outside fifo group, are stored in two convolutional calculation units of FPGA on piece；Number is 3 to 6 4 fifo in first data write-in for storing the data trailer of small 2 number corresponding fifo storage is numbered than it, In, the data trailer for the fifo storage that the first data 31 write-in number stored in the fifo that number is 3 is 1, number is 4 The data trailer of the fifo for being 2 storage is numbered in first data 46 write-in stored in fifo, is stored in the fifo that number is 5 The data trailer of the fifo for being 3 storage, first data 76 stored in the fifo that number is 6 are numbered in first write-in of data 61 The data trailer for the fifo storage that write-in number is 4；Meanwhile first data 91 write-in number of the 7th row of characteristic pattern is 5 The data trailer of fifo storage, the data trailer for the fifo storage that first data 106 write-in number of characteristic pattern eighth row is 6, The update of each fifo in fifo group is completed, as shown in Figure 3.

Referring to fig. 4 and Fig. 5, respectively provided in this embodiment to carry out second, after third time read-write operation, in fifo group The data of each fifo store schematic diagram.Wherein, each fifo is stored in fifo group data branch mode and first time read-write operation Similar, the present embodiment does not repeat this.

It can be seen that a kind of FPGA chip external memory call method provided in this embodiment, line by line by the data of characteristic pattern It is stored in sequence in fifo, M fifo exports first currently stored data before each read-write operation, and rear M fifo will First currently stored data can write the data trailer of fifo storage more corresponding than the number that it numbers small L-M, meanwhile, it will The data trailer of the L-1 fifo storage is written in first data of characteristic pattern L+1 row, by the first of characteristic pattern L+2 row The data trailer of a data write-in l-th fifo storage, so that fifo constantly sequentially output data fifo group outside when, feature Scheme remaining data and be sequentially written to that fifo group is medium to be read again, until the data traversal of completion whole picture characteristic pattern；Therefore, originally Embodiment constructs fifo group using fifo in FPGA piece, and according to data sequence requirement needed for convolutional calculation, each fifo will be whole The data of width characteristic pattern are output to the convolutional calculation unit outside group one by one, wherein convolutional calculation unit also belongs to FPGA piece memory Reservoir, then at this by FPGA chip external memory to during FPGA on-chip memory data call, do not go directly to call The data of FPGA chip external memory avoid complicated address jump, substantially increase and call FPGA chip external memory data Efficiency.

In addition, the calling optimization method of existing FPGA chip external memory is easy by input feature vector figure in convolutional calculation Several influences can also encounter the problem of jumping location access when the number of input feature vector figure is greater than the bank number of chip external memory, and The method of the present embodiment is not influenced by input feature vector figure number, and different convolutional neural networks Structure Calculations can be flexibly met It needs.

Furthermore the calling optimization method of existing FPGA chip external memory is difficult to meet need in convolutional neural networks calculating It will be according to different convolution kernel sizes, different characteristic figure sliding window step-length and different convolutional calculation unit number flexible configuration convolution meters The requirement to count according to input, and the method for the present embodiment can determine of fifo in fifo group by formula (1) and formula (2) Number L, and the number M of the fifo of output data to outside fifo group simultaneously is needed, so as to adjust of fifo in each fifo group Number completes flexible configuration.

Certainly, the invention may also have other embodiments, without deviating from the spirit and substance of the present invention, ripe Various corresponding changes and modifications can be made according to the present invention certainly by knowing those skilled in the art, but these it is corresponding change and Deformation all should fall within the scope of protection of the appended claims of the present invention.

Claims

1. a kind of data calling method based on FPGA chip external memory is applied to convolutional neural networks, which is characterized in that including Following steps:

S1: fifo group is set in FPGA on-chip memory, wherein fifo group includes L fifo, then successively compiles each fifo Number it is 1 to L, and determines the number M for needing the fifo of output data to outside fifo group simultaneously, specific:

L=2 × kernel+Stride × (N-2)

M=kernel+Stride × (N-1)

Wherein, kernel is preset convolution kernel size, and Stride is the step-length of sliding window employed in convolutional calculation, and N is to need The group number for the Sliding window data to generate simultaneously, wherein N >=2；

S2: the preceding L row data in FPGA chip external memory in characteristic pattern are stored in line by line in fifo group, wherein each fifo The data line of characteristic pattern is stored, and the depth depth of fifo is greater than the size of characteristic pattern；

For the preceding M fifo from front number, first data output fifo group of each fifo storage is outer to be used as convolutional Neural The Sliding window data of network, while second data becomes first data；For rear M fifo reciprocal, each fifo storage The write-in of first data the data trailer of the number corresponding fifo storage of small L-M is numbered than it, meanwhile, by characteristic pattern L+ The data trailer of the L-1 fifo storage is written in first data of 1 row, and first data of characteristic pattern L+2 row are written The data trailer of l-th fifo storage, completes the update of each fifo in fifo group；

S4: repeating step S3, each fifo in updated fifo group re-started read-write operation, all until completing characteristic pattern The traversal of data.