CN109447893A - A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device - Google Patents
A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device Download PDFInfo
- Publication number
- CN109447893A CN109447893A CN201910077362.3A CN201910077362A CN109447893A CN 109447893 A CN109447893 A CN 109447893A CN 201910077362 A CN201910077362 A CN 201910077362A CN 109447893 A CN109447893 A CN 109447893A
- Authority
- CN
- China
- Prior art keywords
- image
- image array
- array
- predeterminated frequency
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The present invention provides a kind of convolutional neural networks FPGA accelerate in image preprocessing method and device, wherein method includes: that camera image is divided into the image array of N × M × 3 by S1;S2 enables n=1, m=1;S3 is cached according to n × m × 3, the sequence deposit of n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3;S4 reads n × m × 3, and convolution algorithm is done in n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3 and 3 × 3 × 3 filters;S5 will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA;S6 reads n+1 × m+1 × 3, and convolution algorithm is done in n+2 × m+2 × 3 and the image array of n+3 × m+3 × 3 and 3 × 3 × 3 filters;S7 judges whether that n+3 is equal to N and m+3 is equal to M, no execution S8, is to execute S9;S8 enables n=n+4, m=m+4 execute S3;S9 exports operation result.
Description
Technical field
The present invention relates to convolutional neural networks FPGA(field programmable gate arrays) acceleration technique field, more particularly to it is a kind of
Image preprocessing method and device in convolutional neural networks FPGA acceleration.
Background technique
The calculation amount of convolutional neural networks image processing algorithm is very big, more demanding to the calculating speed of computing platform, leads to
It is often used GPU(Graphics Processing Unit, graphics processor) it is accelerated.But the price of GPU is high, function
Consumption is big, is unable to satisfy the demand of some pairs of real-times and the especially sensitive application scenarios of power consumption, these application scenarios are usually used
It is low in energy consumption can concurrent operation FPGA be convolutional calculation accelerate.Each layer of convolutional calculation of convolutional neural networks requires to obtain upper one
Layer calculated result as current layer image input, then by FPGA concurrent operation accelerate input image and filter it
Between convolution algorithm, and by calculated result storage into caching.The input of first layer convolutional calculation is the RGB figure from camera
Picture, FPGA from read in caching image data speed will far faster than the speed for reading image data from camera, therefore the
The speed of one layer of convolutional calculation is slower, becomes the bottleneck in convolutional neural networks FPGA acceleration, and entire image is stored
A large amount of FPGA cache resources will be consumed in the buffer, limit in this way convolutional neural networks image processing algorithm it is low at
Application on this FPGA.
Summary of the invention
The present invention is intended to provide a kind of overcome the problems, such as one of above problem or at least be partially solved any of the above-described volume
Image preprocessing method and device in product neural network FPGA acceleration.
In order to achieve the above objectives, technical solution of the present invention is specifically achieved in that
One aspect of the present invention provide a kind of convolutional neural networks FPGA accelerate in image preprocessing method, comprising: S1, will
Camera image is divided into the image array of N × M × 3, wherein N >=4, M >=4;S2 enables n=1, m=1;S3, according to the first default frequency
Rate is successively by the image array of N × M × 3 according to the image array of n × m × 3, the image array of n+1 × m+1 × 3 and n+2 × m+2 × 3
In the caching of the sequence deposit FPGA of image array, wherein 1≤n≤N-3,1≤m≤M-3;S4 is read according to the second predeterminated frequency
Take the image array of n × m × 3, the image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, and with 3 × 3 × 3 filters one
It plays feeding PE and does convolution algorithm;S5, will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency;
S6 reads the image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, the figure of n+3 × m+3 × 3 according to the second predeterminated frequency
As matrix, and PE is sent into together with 3 × 3 × 3 filters and does convolution algorithm;S7 judges whether that n+3 is equal to N and m+3 is equal to M, such as
Fruit is no, thens follow the steps S8, if so, thening follow the steps S9;S8 enables n=n+4, m=m+4 return to step S3;S9 is completed
The convolution algorithm of camera image and 3 × 3 × 3 filters, and export operation result.
Wherein, the first predeterminated frequency is 25MHz, and the second predeterminated frequency is 225MHz.
Wherein, the format of camera image is RGB888 format.
Another aspect of the present invention provide a kind of convolutional neural networks FPGA accelerate in image pretreating device, comprising: draw
Sub-module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4;Assignment module, for enable n=
1, m=1;Cache module, for according to the first predeterminated frequency by the image array of N × M × 3 according to the image array of n × m × 3, n+1 ×
The sequence of the image array of m+1 × 3 and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein 1≤n≤N-3,1
≤m≤M-3;Convolutional calculation module, for reading the image array of n × m × 3, the image of n+1 × m+1 × 3 according to the second predeterminated frequency
Matrix, the image array of n+2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;Cache module is also used
According to the first predeterminated frequency by the image array of n+3 × m+3 × 3 deposit FPGA caching in;Convolutional calculation module, be also used to by
The image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, the image moment of n+3 × m+3 × 3 are read according to the second predeterminated frequency
Battle array, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;Judgment module, for judging whether that n+3 is equal to N and m+3 etc.
In M, if it is not, then the operation of the assignment module writ of execution n=n+4, m=m+4 are notified, if it is, notice output module executes behaviour
Make;Assignment module is also used to enable n=n+4, m=m+4, and notice cache module executes operation;Output module, for completing camera
The convolution algorithm of image and 3 × 3 × 3 filters, and export operation result.
Wherein, the first predeterminated frequency is 25MHz, and the second predeterminated frequency is 225MHz.
Wherein, the format of camera image is RGB888 format.
Another aspect of the invention provide a kind of convolutional neural networks FPGA accelerate in image pretreating device, comprising: take the photograph
As head image module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4;4 row cache modules,
For enabling n=1, m=1, according to the first predeterminated frequency by the image array of N × M × 3 according to the image array of n × m × 3, n+1 × m+1 ×
The sequence of 3 image arrays and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein and 1≤n≤N-3,1≤m≤
M-3;PE module, for reading the image array of n × m × 3, the image array of n+1 × m+1 × 3, n+2 × m according to the second predeterminated frequency
+ 2 × 3 image arrays, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;4 row cache modules are also used to according to
One predeterminated frequency will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA;PE module is also used to according to the second predeterminated frequency
The image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, the image array of n+3 × m+3 × 3 are read, and is filtered with 3 × 3 × 3
Wave device is sent into PE together and does convolution algorithm;Judge whether that n+3 is equal to N and m+3 is equal to M, if it is not, then enabling n=n+4, m=m+4 leads to
Know that 4 row cache modules are executed the image array of N × M × 3 according to the first predeterminated frequency according to the image array of n × m × 3, n+1 × m+
The sequence of 1 × 3 image array and the image array of n+2 × m+2 × 3 is stored in the operation in the caching of FPGA, if it is, completing
The convolution algorithm of camera image and 3 × 3 × 3 filters, and export operation result.
Wherein, the first predeterminated frequency is 25MHz, and the second predeterminated frequency is 225MHz.
Wherein, the format of camera image is RGB888 format.
It can be seen that the convolutional neural networks FPGA that provides through the embodiment of the present invention accelerate in image preprocessing method and
Device does not need to store entire image in the buffer, it is only necessary to occupy 4 rows caching, every row cache size and camera image
Every a line size it is identical, camera image transmission during simultaneously start convolutional calculation, camera image is transmitted
Only needing to carry out last three rows camera image data to involve in calculating afterwards can complete in convolutional neural networks FPGA acceleration
First layer convolutional calculation.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment
Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is the flow chart of image preprocessing method during convolutional neural networks FPGA provided in an embodiment of the present invention accelerates;
A kind of structural representation of image pretreating device in Fig. 2 convolutional neural networks FPGA acceleration provided in an embodiment of the present invention
Figure;
Fig. 3 takes the photograph for what four rows in image preprocessing method in convolutional neural networks FPGA provided in an embodiment of the present invention acceleration cached
As head image write sequence schematic diagram;
Fig. 4 is image reading mode in image preprocessing method in convolutional neural networks FPGA provided in an embodiment of the present invention acceleration
Schematic diagram;
Fig. 5, which is that one kind is specific in image preprocessing method during convolutional neural networks FPGA provided in an embodiment of the present invention accelerates, to scheme
As the schematic diagram of reading manner;
Fig. 6 is that another structure of image pretreating device during convolutional neural networks FPGA provided in an embodiment of the present invention accelerates is shown
It is intended to.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Using convolutional neural networks FPGA provided in an embodiment of the present invention accelerate in image preprocessing method and device, can be with
Solve the problems, such as that the first layer storage whole picture camera image occupancy spatial cache of convolutional neural networks image processing algorithm is big, and
Solve the problems, such as that the first layer calculating speed of convolutional neural networks image processing algorithm is slow.
Fig. 1 shows the process of image preprocessing method in convolutional neural networks FPGA acceleration provided in an embodiment of the present invention
Figure, referring to Fig. 1, image preprocessing method in convolutional neural networks FPGA acceleration provided in an embodiment of the present invention, comprising:
Camera image is divided into the image array of N × M × 3, wherein N >=4, M >=4 by S1;
S2 enables n=1, m=1;
S3, according to the first predeterminated frequency successively by the image array of N × M × 3 according to the image array of n × m × 3, the figure of n+1 × m+1 × 3
It is stored in as the sequence of matrix and the image array of n+2 × m+2 × 3 in the caching of FPGA, wherein 1≤n≤N-3,1≤m≤M-3;
S4 reads the image array of n × m × 3, the image array of n+1 × m+1 × 3, the figure of n+2 × m+2 × 3 according to the second predeterminated frequency
As matrix, and PE is sent into together with 3 × 3 × 3 filters and does convolution algorithm;
S5, will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency;
S6, according to the second predeterminated frequency read the image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, n+3 × m+3 ×
3 image arrays, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;
S7 judges whether that n+3 is equal to N and m+3 is equal to M, if not, S8 is thened follow the steps, if so, thening follow the steps S9;
S8 enables n=n+4, m=m+4 return to step S3;
S9, completes the convolution algorithm of camera image and 3 × 3 × 3 filters, and exports operation result.
In specific implementation, referring to fig. 2, a kind of convolutional neural networks FPGA provided in an embodiment of the present invention accelerate in front of image
Processing unit can use lower component such as and realize: camera image module, 4 row cache modules, 3 × 3 × 3 filters and PE
(PE is the combination of multiplier and adder to module, such as can be by 3 × 3 × 3 image array and 3 × 3 × 3 electric-wave filter matrix
It sums again after the corresponding multiplication of each element);Wherein:
Camera image module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4;
4 row cache modules, for enabling n=1, m=1, according to the first predeterminated frequency by the image array of N × M × 3 according to the figure of n × m × 3
As matrix, in the caching of the sequence deposit FPGA of the image array of n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3, wherein
1≤n≤N-3,1≤m≤M-3;
PE module, for reading the image array of n × m × 3, the image array of n+1 × m+1 × 3, n+2 × m according to the second predeterminated frequency
+ 2 × 3 image arrays, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;
4 row cache modules, being also used to will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency;
PE module is also used to according to the second predeterminated frequency reading image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3,
The image array of n+3 × m+3 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;Judge whether that n+3 is equal to N and m
+ 3 are equal to M, if it is not, then enable n=n+4, m=m+4 notifies 4 row cache modules to execute the figure of N × M × 3 according to the first predeterminated frequency
As matrix is according to the image array of n × m × 3, the sequence of the image array of n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3 is deposited
Enter the operation in the caching of FPGA, if it is, completing the convolution algorithm of camera image and 3 × 3 × 3 filters, and exports
Operation result.
Specifically, 4 rows caching is opened up in FPGA, camera image is written since the first row with the frequency of 25MHz
In the caching of FPGA, when image storage is to fourth line the first, start to read preceding 3 row in caching with the frequency of 225MHz
Image is simultaneously sent into PE together with 3 × 3 × 3 filters and does convolution algorithm, when fourth line camera image data, which store, to be completed before
Three row images just finish convolution algorithm;Then the 5th row image data is written into the first row caching, while reading the 2nd~4 row
Camera image data in caching are simultaneously sent into PE together with 3 × 3 × 3 filter and do convolution algorithm;So constantly under circulation
It goes, only needing the filter last three rows image and 3 × 3 × 3 to do convolution algorithm when camera image finishes receiving can
To complete the first layer convolution algorithm in convolutional neural networks FPGA acceleration, image storage space is greatly saved and has improved
Arithmetic speed.
As an optional embodiment of the embodiment of the present invention, the first predeterminated frequency is 25MHz, and the second predeterminated frequency is
225MHz。
As an optional embodiment of the embodiment of the present invention, the format of camera image is RGB888 format.
Hereinafter, in conjunction with Fig. 2, being illustrated provided in an embodiment of the present invention so that camera image size is 416*416 as an example
Image preprocessing method in convolutional neural networks FPGA acceleration, but the present invention is not limited thereto:
For convenience of explanation, needed 9 RGB888(24 before doing convolution algorithm) pixel composition 3 × 3 matrix-splits
At 27 83 × 3 × 3 matrixes.1 is set by first pixel in camera image, passs two pixel settings
It is 2, this each pixel adds 1 increment mode.
4 rows caching is the Block RAM(block random access memory in FPGA, and it includes two sets that structure, which is real two-port RAM,
Complete 36bit read-write data/address bus and corresponding control bus), every row caching can store a line in camera image
Image.
Referring to Fig. 3, the sequence of 4 rows caching is written in camera image are as follows: is stored in the 1st row image data of camera image
3rd row image data of camera image is stored in slow by caching 1 by the 2nd row image data deposit caching 2 of camera image
3 are deposited, by the 4th row image data deposit caching 4 of camera image, the 5th row image data of camera image is stored in caching
1, the 6th row image data deposit caching 2... of camera image is so recycled until entire image is all stored in caching.
The mode of image is read from caching as shown in Figure 4 are as follows: read in order from address 0,1,2 from caching 1 first
Three pixels 1,2,3 then read three pixels 417,418,419 from address 0,1,2 in order from caching 2, then
Three pixels 833,834,835 are read from address 0,1,2 in order from caching 3,9 pixels is read altogether and forms first
3 × 3 matrixes;Similarly, three points 2,3,4 are read from the address 1,2,3 of caching 1, reads three points from the address 1,2,3 of caching 2
418,419,420, three points 834,835,836 are read in the address 1,2,3 for caching 3, and totally 9 points constitute second 3 × 3 matrix;
It has been read completely according to rule above until by three rows caching.After caching 1, caching 2, caching 3 are run through, then according to above
Rule read caching 2, caching 3, caching 4.
Specifically, the mode that camera image data form 3 × 3 image arrays is read from four rows caching referring to Fig. 5,
Each system clock reads a pixel, and rd_data is the data read from first three rows caching, reads the rule of data are as follows: from
The addr0 of caching 1, addr1, addr2 read three points, cache 2 address addr0, and addr1, addr2 read three points, delay
3 address addr0 is deposited, addr1, addr2 read three points, totally 9 points;Similarly, from the addr1, addr2, addr3 of caching 1
Three points are read, 2 address addr1 is cached, addr2, addr3 read three points, cache 3 address addr1, addr2,
Addr3 reads three points, totally 9 points;So circulation is gone down, until all data of traversal first three rows.Then by rd_data number
According to 8 system clocks of delay, respectively obtain ra_data_dalay1, ra_data_dalay2, ra_data_dalay3... with
And ra_data_dalay8 takes rd_data in the same time, ra_data_dalay1, ra_data_ in every 9 system clocks
Dalay2.... totally 9 points constitute a convolution Nuclear Data and are exported.
It can be seen that the convolutional neural networks FPGA that provides through the embodiment of the present invention accelerate in image preprocessing method not
It needs to store entire image in the buffer, it is only necessary to occupy 4 rows caching, every a line of every row cache size and camera image
Size is identical.By taking 416 × 416 camera images as an example, it is only necessary to occupy 4 × 416 spatial cache, proposed by the invention is new
What method occupied caches insufficient 1% originally.
Further, start convolutional calculation simultaneously during camera image is transmitted, after camera image is transmitted
Only needing to carry out involving in last three rows camera image data calculating can complete in convolutional neural networks FPGA acceleration
First layer convolutional calculation.By taking 416 × 416 camera images as an example, the present invention is needed not wait for after camera image is transmitted
Carry out convolution algorithm again, the time of convolutional calculation part is insufficient original 1%.
Fig. 6 shows the structure of image pretreating device in convolutional neural networks FPGA acceleration provided in an embodiment of the present invention
Schematic diagram, convolutional neural networks FPGA accelerate in image pretreating device be applied to the above method, below only to convolutional Neural
The structure of image pretreating device is briefly described in network FPGA acceleration, other unaccomplished matters, please refers to above-mentioned convolution mind
Through the associated description in image preprocessing method in network FPGA acceleration, referring to Fig. 6, convolutional Neural provided in an embodiment of the present invention
Image pretreating device in network FPGA acceleration, comprising:
Division module 601, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4;
Assignment module 605, for enabling n=1, m=1;
Cache module 602 is used for the image array of N × M × 3 according to the first predeterminated frequency according to the image array of n × m × 3, n+1
The sequence of the image array of × m+1 × 3 and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein 1≤n≤N-3,
1≤m≤M-3;
Convolutional calculation module 603, for reading the image array of n × m × 3, the image moment of n+1 × m+1 × 3 according to the second predeterminated frequency
Battle array, the image array of n+2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;
Cache module 602, being also used to will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency;
Convolutional calculation module 603 is also used to read the image array of n+1 × m+1 × 3, n+2 × m+2 × 3 according to the second predeterminated frequency
Image array, the image array of n+3 × m+3 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;
Judgment module 604, for judging whether that n+3 is equal to N and m+3 is equal to M, if it is not, then notice assignment module writ of execution n=n
+ 4, m=m+4 operation, if it is, notice output module executes operation;
Assignment module 605 is also used to enable n=n+4, m=m+4, and notice cache module executes operation;
Output module 606 for completing the convolution algorithm of camera image Yu 3 × 3 × 3 filters, and exports operation result.
It can be seen that the convolutional neural networks FPGA that provides through the embodiment of the present invention accelerate in image pretreating device not
It needs to store entire image in the buffer, it is only necessary to occupy 4 rows caching, every a line of every row cache size and camera image
Size is identical.Further, start convolutional calculation simultaneously during camera image is transmitted, after camera image is transmitted
Only needing to carry out involving in last three rows camera image data calculating can complete in convolutional neural networks FPGA acceleration
First layer convolutional calculation.
As an optional embodiment of the embodiment of the present invention, the first predeterminated frequency is 25MHz, and the second predeterminated frequency is
225MHz。
As an optional embodiment of the embodiment of the present invention, the format of camera image is RGB888 format.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculate equipment include one or more processors (CPU), input/output interface,
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie
The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer include, but are not limited to phase change memory (PRAM), static random access memory (SRAM),
Dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electricity
Erasable Programmable Read Only Memory EPROM (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM)
(CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other
Magnetic storage device or any other non-transmission medium, can be used for storage can be accessed by a computing device information.According to herein
In define, computer-readable medium does not include temporary computer readable media (transitory media), such as the data of modulation
Signal and carrier wave.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (9)
1. image preprocessing method in a kind of convolutional neural networks FPGA acceleration characterized by comprising
Camera image is divided into the image array of N × M × 3, wherein N >=4, M >=4 by S1;
S2 enables n=1, m=1;
S3, according to the first predeterminated frequency successively by the image array of the N × M × 3 according to the image array of n × m × 3, n+1 × m+1
The sequence of × 3 image arrays and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein 1≤n≤N-3,1≤m
≤M-3;
S4 reads the image array of n × m × 3, the image array of the n+1 × m+1 × 3, the n+ according to the second predeterminated frequency
The image array of 2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;
S5, will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency;
S6 reads the image array of n+1 × m+1 × 3, the image array of the n+2 × m+2 × 3, institute according to the second predeterminated frequency
The image array of n+3 × m+3 × 3 is stated, and is sent into PE together with 3 × 3 × 3 filters and does convolution algorithm;
S7 judges whether that the n+3 is equal to N and the m+3 is equal to M, if not, S8 is thened follow the steps, if it is, executing step
Rapid S9;
S8 enables n=n+4, m=m+4 return to step S3;
S9, completes the convolution algorithm of the camera image Yu 3 × 3 × 3 filter, and exports operation result.
2. described second is default the method according to claim 1, wherein first predeterminated frequency is 25MHz
Frequency is 225MHz.
3. the method according to claim 1, wherein the format of the camera image is RGB888 format.
4. image pretreating device in a kind of convolutional neural networks FPGA acceleration characterized by comprising
Division module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4;
Assignment module, for enabling n=1, m=1;
Cache module is used for the image array of the N × M × 3 according to the first predeterminated frequency according to the image array of n × m × 3, n+1
The sequence of the image array of × m+1 × 3 and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein 1≤n≤N-3,
1≤m≤M-3;
Convolutional calculation module, for reading the image array of n × m × 3, n+1 × m+1 × 3 according to the second predeterminated frequency
Image array, the image array of the n+2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;
The cache module, being also used to will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency;
The convolutional calculation module is also used to read the image array of n+1 × m+1 × 3, the n+ according to the second predeterminated frequency
The image array of 2 × m+2 × 3, the image array of the n+3 × m+3 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution fortune
It calculates;
Judgment module, for judging whether that the n+3 is equal to N and the m+3 is equal to M, if it is not, then notice assignment module executes
N=n+4, the operation of m=m+4 are enabled, if it is, notice output module executes operation;
The assignment module is also used to enable n=n+4, m=m+4, the cache module is notified to execute operation;
Output module for completing the convolution algorithm of the camera image Yu 3 × 3 × 3 filter, and exports operation knot
Fruit.
5. device according to claim 4, which is characterized in that first predeterminated frequency is 25MHz, and described second is default
Frequency is 225MHz.
6. device according to claim 4, which is characterized in that the format of the camera image is RGB888 format.
7. image pretreating device in a kind of convolutional neural networks FPGA acceleration characterized by comprising camera figure mould
Block, 4 row cache modules, 3 × 3 × 3 filters and PE module;Wherein:
Camera image module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4;
4 row cache modules, for enabling n=1, m=1, according to the first predeterminated frequency by the image array of the N × M × 3 according to n × m ×
In the caching of the sequence deposit FPGA of 3 image arrays, the image array of n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3,
In, 1≤n≤N-3,1≤m≤M-3;
PE module, for reading the image array of n × m × 3, the image moment of the n+1 × m+1 × 3 according to the second predeterminated frequency
Battle array, the image array of the n+2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;
The 4 row cache module is also used to be stored in the image array of n+3 × m+3 × 3 according to the first predeterminated frequency the caching of FPGA
In;
The PE module is also used to read the image array of n+1 × m+1 × 3, the n+2 × m+2 according to the second predeterminated frequency
× 3 image arrays, the image array of the n+3 × m+3 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm;Sentence
Disconnected whether the n+3 is equal to N and the m+3 is equal to M, if it is not, then enabling n=n+4, m=m+4 notifies the 4 row cache module to hold
Row is according to the first predeterminated frequency by the image array of the N × M × 3 according to the image array of n × m × 3, the image moment of n+1 × m+1 × 3
Operation in the caching of battle array and the sequence of the image array of n+2 × m+2 × 3 deposit FPGA, if it is, completing the camera
The convolution algorithm of image and 3 × 3 × 3 filter, and export operation result.
8. device according to claim 7, which is characterized in that first predeterminated frequency is 25MHz, and described second is default
Frequency is 225MHz.
9. device according to claim 7, which is characterized in that the format of the camera image is RGB888 format.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910077362.3A CN109447893A (en) | 2019-01-28 | 2019-01-28 | A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910077362.3A CN109447893A (en) | 2019-01-28 | 2019-01-28 | A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109447893A true CN109447893A (en) | 2019-03-08 |
Family
ID=65544231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910077362.3A Pending CN109447893A (en) | 2019-01-28 | 2019-01-28 | A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109447893A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175670A (en) * | 2019-04-09 | 2019-08-27 | 华中科技大学 | A kind of method and system for realizing YOLOv2 detection network based on FPGA |
CN111626405A (en) * | 2020-05-15 | 2020-09-04 | Tcl华星光电技术有限公司 | CNN acceleration method, CNN acceleration device and computer readable storage medium |
CN112905954A (en) * | 2020-12-28 | 2021-06-04 | 北京计算机技术及应用研究所 | CNN model convolution operation accelerated calculation method using FPGA BRAM |
WO2023024668A1 (en) * | 2021-08-27 | 2023-03-02 | 深圳云天励飞技术股份有限公司 | Convolution calculation method, system and device, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537334A (en) * | 2018-04-26 | 2018-09-14 | 济南浪潮高新科技投资发展有限公司 | A kind of acceleration array design methodology for CNN convolutional layer operations |
CN108805274A (en) * | 2018-05-28 | 2018-11-13 | 重庆大学 | The hardware-accelerated method and system of Tiny-yolo convolutional neural networks based on FPGA |
KR20180125843A (en) * | 2017-05-16 | 2018-11-26 | 광운대학교 산학협력단 | A hardware classifier applicable to various CNN models |
-
2019
- 2019-01-28 CN CN201910077362.3A patent/CN109447893A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180125843A (en) * | 2017-05-16 | 2018-11-26 | 광운대학교 산학협력단 | A hardware classifier applicable to various CNN models |
CN108537334A (en) * | 2018-04-26 | 2018-09-14 | 济南浪潮高新科技投资发展有限公司 | A kind of acceleration array design methodology for CNN convolutional layer operations |
CN108805274A (en) * | 2018-05-28 | 2018-11-13 | 重庆大学 | The hardware-accelerated method and system of Tiny-yolo convolutional neural networks based on FPGA |
Non-Patent Citations (2)
Title |
---|
张榜 等: "一种基于FPGA的卷积神经网络加速器的设计与实现", 《复旦学报(自然科学版)》 * |
李杏华 等: "基于FPGA的图像实时处理系统的设计", 《半导体光电》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175670A (en) * | 2019-04-09 | 2019-08-27 | 华中科技大学 | A kind of method and system for realizing YOLOv2 detection network based on FPGA |
CN111626405A (en) * | 2020-05-15 | 2020-09-04 | Tcl华星光电技术有限公司 | CNN acceleration method, CNN acceleration device and computer readable storage medium |
CN111626405B (en) * | 2020-05-15 | 2024-05-07 | Tcl华星光电技术有限公司 | CNN acceleration method, acceleration device and computer readable storage medium |
CN112905954A (en) * | 2020-12-28 | 2021-06-04 | 北京计算机技术及应用研究所 | CNN model convolution operation accelerated calculation method using FPGA BRAM |
WO2023024668A1 (en) * | 2021-08-27 | 2023-03-02 | 深圳云天励飞技术股份有限公司 | Convolution calculation method, system and device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109447893A (en) | A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device | |
CN106358003A (en) | Video analysis and accelerating method based on thread level flow line | |
CN106251392A (en) | For the method and apparatus performing to interweave | |
US20230394615A1 (en) | Task execution in a simd processing unit with parallel groups of processing lanes | |
CN109461119A (en) | Image filling method and device in convolutional neural networks FPGA acceleration | |
CN109275011A (en) | The processing method and processing device of smart television motor pattern switching, user equipment | |
CN111476706A (en) | Vertex parallel processing method and device, computer storage medium and electronic equipment | |
CN106101712B (en) | A kind of processing method and processing device of video stream data | |
CN107783933A (en) | Image processing method and equipment | |
CN117271136A (en) | Data processing method, device, equipment and storage medium | |
CN112749011A (en) | FPGA (field programmable Gate array) acceleration method based on non-maximum suppression algorithm | |
WO2021070303A1 (en) | Computation processing device | |
CN104899840A (en) | Guided-filtering optimization speed-up method based on CUDA | |
CN109448092B (en) | Load balancing cluster rendering method based on dynamic task granularity | |
US6771271B2 (en) | Apparatus and method of processing image data | |
CN115860080A (en) | Computing core, accelerator, computing method, device, equipment, medium and system | |
US20220188380A1 (en) | Data processing method and apparatus applied to graphics processing unit, and electronic device | |
CN117785480B (en) | Processor, reduction calculation method and electronic equipment | |
CN110647984B (en) | Chip, integrated processing device and operation method thereof | |
WO2022027818A1 (en) | Data batch processing method and batch processing apparatus thereof, and storage medium | |
US20210272232A1 (en) | Filter Independent L1 Mapping Of Convolution Data Into General Purpose Register | |
CN114692851A (en) | Calculation method and device of neural network model, terminal and storage medium | |
Lin et al. | A High-Throughout Real-Time Prewitt Operator on Embedded NEON+ ARM System | |
CN116934973A (en) | 3D multi-frame file reconstruction method and device | |
CN117422608A (en) | Image guided filtering method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190308 |