CN109447893A

CN109447893A - A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device

Info

Publication number: CN109447893A
Application number: CN201910077362.3A
Authority: CN
Inventors: 陈海波
Original assignee: DeepBlue AI Chips Research Institute Jiangsu Co Ltd
Current assignee: DeepBlue AI Chips Research Institute Jiangsu Co Ltd
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2019-03-08

Abstract

The present invention provides a kind of convolutional neural networks FPGA accelerate in image preprocessing method and device, wherein method includes: that camera image is divided into the image array of N × M × 3 by S1；S2 enables n=1, m=1；S3 is cached according to n × m × 3, the sequence deposit of n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3；S4 reads n × m × 3, and convolution algorithm is done in n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3 and 3 × 3 × 3 filters；S5 will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA；S6 reads n+1 × m+1 × 3, and convolution algorithm is done in n+2 × m+2 × 3 and the image array of n+3 × m+3 × 3 and 3 × 3 × 3 filters；S7 judges whether that n+3 is equal to N and m+3 is equal to M, no execution S8, is to execute S9；S8 enables n=n+4, m=m+4 execute S3；S9 exports operation result.

Description

A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device

Technical field

The present invention relates to convolutional neural networks FPGA(field programmable gate arrays) acceleration technique field, more particularly to it is a kind of Image preprocessing method and device in convolutional neural networks FPGA acceleration.

Background technique

The calculation amount of convolutional neural networks image processing algorithm is very big, more demanding to the calculating speed of computing platform, leads to It is often used GPU(Graphics Processing Unit, graphics processor) it is accelerated.But the price of GPU is high, function Consumption is big, is unable to satisfy the demand of some pairs of real-times and the especially sensitive application scenarios of power consumption, these application scenarios are usually used It is low in energy consumption can concurrent operation FPGA be convolutional calculation accelerate.Each layer of convolutional calculation of convolutional neural networks requires to obtain upper one Layer calculated result as current layer image input, then by FPGA concurrent operation accelerate input image and filter it Between convolution algorithm, and by calculated result storage into caching.The input of first layer convolutional calculation is the RGB figure from camera Picture, FPGA from read in caching image data speed will far faster than the speed for reading image data from camera, therefore the The speed of one layer of convolutional calculation is slower, becomes the bottleneck in convolutional neural networks FPGA acceleration, and entire image is stored A large amount of FPGA cache resources will be consumed in the buffer, limit in this way convolutional neural networks image processing algorithm it is low at Application on this FPGA.

Summary of the invention

The present invention is intended to provide a kind of overcome the problems, such as one of above problem or at least be partially solved any of the above-described volume Image preprocessing method and device in product neural network FPGA acceleration.

In order to achieve the above objectives, technical solution of the present invention is specifically achieved in that

One aspect of the present invention provide a kind of convolutional neural networks FPGA accelerate in image preprocessing method, comprising: S1, will Camera image is divided into the image array of N × M × 3, wherein N >=4, M >=4；S2 enables n=1, m=1；S3, according to the first default frequency Rate is successively by the image array of N × M × 3 according to the image array of n × m × 3, the image array of n+1 × m+1 × 3 and n+2 × m+2 × 3 In the caching of the sequence deposit FPGA of image array, wherein 1≤n≤N-3,1≤m≤M-3；S4 is read according to the second predeterminated frequency Take the image array of n × m × 3, the image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, and with 3 × 3 × 3 filters one It plays feeding PE and does convolution algorithm；S5, will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency； S6 reads the image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, the figure of n+3 × m+3 × 3 according to the second predeterminated frequency As matrix, and PE is sent into together with 3 × 3 × 3 filters and does convolution algorithm；S7 judges whether that n+3 is equal to N and m+3 is equal to M, such as Fruit is no, thens follow the steps S8, if so, thening follow the steps S9；S8 enables n=n+4, m=m+4 return to step S3；S9 is completed The convolution algorithm of camera image and 3 × 3 × 3 filters, and export operation result.

Wherein, the first predeterminated frequency is 25MHz, and the second predeterminated frequency is 225MHz.

Wherein, the format of camera image is RGB888 format.

Another aspect of the present invention provide a kind of convolutional neural networks FPGA accelerate in image pretreating device, comprising: draw Sub-module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4；Assignment module, for enable n= 1, m=1；Cache module, for according to the first predeterminated frequency by the image array of N × M × 3 according to the image array of n × m × 3, n+1 × The sequence of the image array of m+1 × 3 and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein 1≤n≤N-3,1 ≤m≤M-3；Convolutional calculation module, for reading the image array of n × m × 3, the image of n+1 × m+1 × 3 according to the second predeterminated frequency Matrix, the image array of n+2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；Cache module is also used According to the first predeterminated frequency by the image array of n+3 × m+3 × 3 deposit FPGA caching in；Convolutional calculation module, be also used to by The image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, the image moment of n+3 × m+3 × 3 are read according to the second predeterminated frequency Battle array, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；Judgment module, for judging whether that n+3 is equal to N and m+3 etc. In M, if it is not, then the operation of the assignment module writ of execution n=n+4, m=m+4 are notified, if it is, notice output module executes behaviour Make；Assignment module is also used to enable n=n+4, m=m+4, and notice cache module executes operation；Output module, for completing camera The convolution algorithm of image and 3 × 3 × 3 filters, and export operation result.

Wherein, the format of camera image is RGB888 format.

Another aspect of the invention provide a kind of convolutional neural networks FPGA accelerate in image pretreating device, comprising: take the photograph As head image module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4；4 row cache modules, For enabling n=1, m=1, according to the first predeterminated frequency by the image array of N × M × 3 according to the image array of n × m × 3, n+1 × m+1 × The sequence of 3 image arrays and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein and 1≤n≤N-3,1≤m≤ M-3；PE module, for reading the image array of n × m × 3, the image array of n+1 × m+1 × 3, n+2 × m according to the second predeterminated frequency + 2 × 3 image arrays, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；4 row cache modules are also used to according to One predeterminated frequency will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA；PE module is also used to according to the second predeterminated frequency The image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, the image array of n+3 × m+3 × 3 are read, and is filtered with 3 × 3 × 3 Wave device is sent into PE together and does convolution algorithm；Judge whether that n+3 is equal to N and m+3 is equal to M, if it is not, then enabling n=n+4, m=m+4 leads to Know that 4 row cache modules are executed the image array of N × M × 3 according to the first predeterminated frequency according to the image array of n × m × 3, n+1 × m+ The sequence of 1 × 3 image array and the image array of n+2 × m+2 × 3 is stored in the operation in the caching of FPGA, if it is, completing The convolution algorithm of camera image and 3 × 3 × 3 filters, and export operation result.

Wherein, the format of camera image is RGB888 format.

It can be seen that the convolutional neural networks FPGA that provides through the embodiment of the present invention accelerate in image preprocessing method and Device does not need to store entire image in the buffer, it is only necessary to occupy 4 rows caching, every row cache size and camera image Every a line size it is identical, camera image transmission during simultaneously start convolutional calculation, camera image is transmitted Only needing to carry out last three rows camera image data to involve in calculating afterwards can complete in convolutional neural networks FPGA acceleration First layer convolutional calculation.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is the flow chart of image preprocessing method during convolutional neural networks FPGA provided in an embodiment of the present invention accelerates；

A kind of structural representation of image pretreating device in Fig. 2 convolutional neural networks FPGA acceleration provided in an embodiment of the present invention Figure；

Fig. 3 takes the photograph for what four rows in image preprocessing method in convolutional neural networks FPGA provided in an embodiment of the present invention acceleration cached As head image write sequence schematic diagram；

Fig. 4 is image reading mode in image preprocessing method in convolutional neural networks FPGA provided in an embodiment of the present invention acceleration Schematic diagram；

Fig. 5, which is that one kind is specific in image preprocessing method during convolutional neural networks FPGA provided in an embodiment of the present invention accelerates, to scheme As the schematic diagram of reading manner；

Fig. 6 is that another structure of image pretreating device during convolutional neural networks FPGA provided in an embodiment of the present invention accelerates is shown It is intended to.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

Using convolutional neural networks FPGA provided in an embodiment of the present invention accelerate in image preprocessing method and device, can be with Solve the problems, such as that the first layer storage whole picture camera image occupancy spatial cache of convolutional neural networks image processing algorithm is big, and Solve the problems, such as that the first layer calculating speed of convolutional neural networks image processing algorithm is slow.

Fig. 1 shows the process of image preprocessing method in convolutional neural networks FPGA acceleration provided in an embodiment of the present invention Figure, referring to Fig. 1, image preprocessing method in convolutional neural networks FPGA acceleration provided in an embodiment of the present invention, comprising:

Camera image is divided into the image array of N × M × 3, wherein N >=4, M >=4 by S1；

S2 enables n=1, m=1；

S3, according to the first predeterminated frequency successively by the image array of N × M × 3 according to the image array of n × m × 3, the figure of n+1 × m+1 × 3 It is stored in as the sequence of matrix and the image array of n+2 × m+2 × 3 in the caching of FPGA, wherein 1≤n≤N-3,1≤m≤M-3；

S4 reads the image array of n × m × 3, the image array of n+1 × m+1 × 3, the figure of n+2 × m+2 × 3 according to the second predeterminated frequency As matrix, and PE is sent into together with 3 × 3 × 3 filters and does convolution algorithm；

S5, will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency；

S6, according to the second predeterminated frequency read the image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, n+3 × m+3 × 3 image arrays, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；

S7 judges whether that n+3 is equal to N and m+3 is equal to M, if not, S8 is thened follow the steps, if so, thening follow the steps S9；

S8 enables n=n+4, m=m+4 return to step S3；

S9, completes the convolution algorithm of camera image and 3 × 3 × 3 filters, and exports operation result.

In specific implementation, referring to fig. 2, a kind of convolutional neural networks FPGA provided in an embodiment of the present invention accelerate in front of image Processing unit can use lower component such as and realize: camera image module, 4 row cache modules, 3 × 3 × 3 filters and PE (PE is the combination of multiplier and adder to module, such as can be by 3 × 3 × 3 image array and 3 × 3 × 3 electric-wave filter matrix It sums again after the corresponding multiplication of each element)；Wherein:

Camera image module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4；

4 row cache modules, for enabling n=1, m=1, according to the first predeterminated frequency by the image array of N × M × 3 according to the figure of n × m × 3 As matrix, in the caching of the sequence deposit FPGA of the image array of n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3, wherein 1≤n≤N-3,1≤m≤M-3；

PE module, for reading the image array of n × m × 3, the image array of n+1 × m+1 × 3, n+2 × m according to the second predeterminated frequency + 2 × 3 image arrays, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；

4 row cache modules, being also used to will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency；

PE module is also used to according to the second predeterminated frequency reading image array of n+1 × m+1 × 3, the image array of n+2 × m+2 × 3, The image array of n+3 × m+3 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；Judge whether that n+3 is equal to N and m + 3 are equal to M, if it is not, then enable n=n+4, m=m+4 notifies 4 row cache modules to execute the figure of N × M × 3 according to the first predeterminated frequency As matrix is according to the image array of n × m × 3, the sequence of the image array of n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3 is deposited Enter the operation in the caching of FPGA, if it is, completing the convolution algorithm of camera image and 3 × 3 × 3 filters, and exports Operation result.

Specifically, 4 rows caching is opened up in FPGA, camera image is written since the first row with the frequency of 25MHz In the caching of FPGA, when image storage is to fourth line the first, start to read preceding 3 row in caching with the frequency of 225MHz Image is simultaneously sent into PE together with 3 × 3 × 3 filters and does convolution algorithm, when fourth line camera image data, which store, to be completed before Three row images just finish convolution algorithm；Then the 5th row image data is written into the first row caching, while reading the 2nd~4 row Camera image data in caching are simultaneously sent into PE together with 3 × 3 × 3 filter and do convolution algorithm；So constantly under circulation It goes, only needing the filter last three rows image and 3 × 3 × 3 to do convolution algorithm when camera image finishes receiving can To complete the first layer convolution algorithm in convolutional neural networks FPGA acceleration, image storage space is greatly saved and has improved Arithmetic speed.

As an optional embodiment of the embodiment of the present invention, the first predeterminated frequency is 25MHz, and the second predeterminated frequency is 225MHz。

As an optional embodiment of the embodiment of the present invention, the format of camera image is RGB888 format.

Hereinafter, in conjunction with Fig. 2, being illustrated provided in an embodiment of the present invention so that camera image size is 416*416 as an example Image preprocessing method in convolutional neural networks FPGA acceleration, but the present invention is not limited thereto:

For convenience of explanation, needed 9 RGB888(24 before doing convolution algorithm) pixel composition 3 × 3 matrix-splits At 27 83 × 3 × 3 matrixes.1 is set by first pixel in camera image, passs two pixel settings It is 2, this each pixel adds 1 increment mode.

4 rows caching is the Block RAM(block random access memory in FPGA, and it includes two sets that structure, which is real two-port RAM, Complete 36bit read-write data/address bus and corresponding control bus), every row caching can store a line in camera image Image.

Referring to Fig. 3, the sequence of 4 rows caching is written in camera image are as follows: is stored in the 1st row image data of camera image 3rd row image data of camera image is stored in slow by caching 1 by the 2nd row image data deposit caching 2 of camera image 3 are deposited, by the 4th row image data deposit caching 4 of camera image, the 5th row image data of camera image is stored in caching 1, the 6th row image data deposit caching 2... of camera image is so recycled until entire image is all stored in caching.

The mode of image is read from caching as shown in Figure 4 are as follows: read in order from address 0,1,2 from caching 1 first Three pixels 1,2,3 then read three pixels 417,418,419 from address 0,1,2 in order from caching 2, then Three pixels 833,834,835 are read from address 0,1,2 in order from caching 3,9 pixels is read altogether and forms first 3 × 3 matrixes；Similarly, three points 2,3,4 are read from the address 1,2,3 of caching 1, reads three points from the address 1,2,3 of caching 2 418,419,420, three points 834,835,836 are read in the address 1,2,3 for caching 3, and totally 9 points constitute second 3 × 3 matrix； It has been read completely according to rule above until by three rows caching.After caching 1, caching 2, caching 3 are run through, then according to above Rule read caching 2, caching 3, caching 4.

Specifically, the mode that camera image data form 3 × 3 image arrays is read from four rows caching referring to Fig. 5, Each system clock reads a pixel, and rd_data is the data read from first three rows caching, reads the rule of data are as follows: from The addr0 of caching 1, addr1, addr2 read three points, cache 2 address addr0, and addr1, addr2 read three points, delay 3 address addr0 is deposited, addr1, addr2 read three points, totally 9 points；Similarly, from the addr1, addr2, addr3 of caching 1 Three points are read, 2 address addr1 is cached, addr2, addr3 read three points, cache 3 address addr1, addr2, Addr3 reads three points, totally 9 points；So circulation is gone down, until all data of traversal first three rows.Then by rd_data number According to 8 system clocks of delay, respectively obtain ra_data_dalay1, ra_data_dalay2, ra_data_dalay3... with And ra_data_dalay8 takes rd_data in the same time, ra_data_dalay1, ra_data_ in every 9 system clocks Dalay2.... totally 9 points constitute a convolution Nuclear Data and are exported.

It can be seen that the convolutional neural networks FPGA that provides through the embodiment of the present invention accelerate in image preprocessing method not It needs to store entire image in the buffer, it is only necessary to occupy 4 rows caching, every a line of every row cache size and camera image Size is identical.By taking 416 × 416 camera images as an example, it is only necessary to occupy 4 × 416 spatial cache, proposed by the invention is new What method occupied caches insufficient 1% originally.

Further, start convolutional calculation simultaneously during camera image is transmitted, after camera image is transmitted Only needing to carry out involving in last three rows camera image data calculating can complete in convolutional neural networks FPGA acceleration First layer convolutional calculation.By taking 416 × 416 camera images as an example, the present invention is needed not wait for after camera image is transmitted Carry out convolution algorithm again, the time of convolutional calculation part is insufficient original 1%.

Fig. 6 shows the structure of image pretreating device in convolutional neural networks FPGA acceleration provided in an embodiment of the present invention Schematic diagram, convolutional neural networks FPGA accelerate in image pretreating device be applied to the above method, below only to convolutional Neural The structure of image pretreating device is briefly described in network FPGA acceleration, other unaccomplished matters, please refers to above-mentioned convolution mind Through the associated description in image preprocessing method in network FPGA acceleration, referring to Fig. 6, convolutional Neural provided in an embodiment of the present invention Image pretreating device in network FPGA acceleration, comprising:

Division module 601, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4；

Assignment module 605, for enabling n=1, m=1；

Cache module 602 is used for the image array of N × M × 3 according to the first predeterminated frequency according to the image array of n × m × 3, n+1 The sequence of the image array of × m+1 × 3 and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein 1≤n≤N-3, 1≤m≤M-3；

Convolutional calculation module 603, for reading the image array of n × m × 3, the image moment of n+1 × m+1 × 3 according to the second predeterminated frequency Battle array, the image array of n+2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；

Cache module 602, being also used to will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency；

Convolutional calculation module 603 is also used to read the image array of n+1 × m+1 × 3, n+2 × m+2 × 3 according to the second predeterminated frequency Image array, the image array of n+3 × m+3 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；

Judgment module 604, for judging whether that n+3 is equal to N and m+3 is equal to M, if it is not, then notice assignment module writ of execution n=n + 4, m=m+4 operation, if it is, notice output module executes operation；

Assignment module 605 is also used to enable n=n+4, m=m+4, and notice cache module executes operation；

Output module 606 for completing the convolution algorithm of camera image Yu 3 × 3 × 3 filters, and exports operation result.

It can be seen that the convolutional neural networks FPGA that provides through the embodiment of the present invention accelerate in image pretreating device not It needs to store entire image in the buffer, it is only necessary to occupy 4 rows caching, every a line of every row cache size and camera image Size is identical.Further, start convolutional calculation simultaneously during camera image is transmitted, after camera image is transmitted Only needing to carry out involving in last three rows camera image data calculating can complete in convolutional neural networks FPGA acceleration First layer convolutional calculation.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

In a typical configuration, calculate equipment include one or more processors (CPU), input/output interface, Network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer include, but are not limited to phase change memory (PRAM), static random access memory (SRAM), Dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electricity Erasable Programmable Read Only Memory EPROM (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other Magnetic storage device or any other non-transmission medium, can be used for storage can be accessed by a computing device information.According to herein In define, computer-readable medium does not include temporary computer readable media (transitory media), such as the data of modulation Signal and carrier wave.

The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims

1. image preprocessing method in a kind of convolutional neural networks FPGA acceleration characterized by comprising

S2 enables n=1, m=1；

S3, according to the first predeterminated frequency successively by the image array of the N × M × 3 according to the image array of n × m × 3, n+1 × m+1 The sequence of × 3 image arrays and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein 1≤n≤N-3,1≤m ≤M-3；

S4 reads the image array of n × m × 3, the image array of the n+1 × m+1 × 3, the n+ according to the second predeterminated frequency The image array of 2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；

S6 reads the image array of n+1 × m+1 × 3, the image array of the n+2 × m+2 × 3, institute according to the second predeterminated frequency The image array of n+3 × m+3 × 3 is stated, and is sent into PE together with 3 × 3 × 3 filters and does convolution algorithm；

S7 judges whether that the n+3 is equal to N and the m+3 is equal to M, if not, S8 is thened follow the steps, if it is, executing step Rapid S9；

S8 enables n=n+4, m=m+4 return to step S3；

S9, completes the convolution algorithm of the camera image Yu 3 × 3 × 3 filter, and exports operation result.

2. described second is default the method according to claim 1, wherein first predeterminated frequency is 25MHz Frequency is 225MHz.

3. the method according to claim 1, wherein the format of the camera image is RGB888 format.

4. image pretreating device in a kind of convolutional neural networks FPGA acceleration characterized by comprising

Division module, for camera image to be divided into the image array of N × M × 3, wherein N >=4, M >=4；

Assignment module, for enabling n=1, m=1；

Cache module is used for the image array of the N × M × 3 according to the first predeterminated frequency according to the image array of n × m × 3, n+1 The sequence of the image array of × m+1 × 3 and the image array of n+2 × m+2 × 3 is stored in the caching of FPGA, wherein 1≤n≤N-3, 1≤m≤M-3；

Convolutional calculation module, for reading the image array of n × m × 3, n+1 × m+1 × 3 according to the second predeterminated frequency Image array, the image array of the n+2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；

The cache module, being also used to will be in the caching of the image array of n+3 × m+3 × 3 deposit FPGA according to the first predeterminated frequency；

The convolutional calculation module is also used to read the image array of n+1 × m+1 × 3, the n+ according to the second predeterminated frequency The image array of 2 × m+2 × 3, the image array of the n+3 × m+3 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution fortune It calculates；

Judgment module, for judging whether that the n+3 is equal to N and the m+3 is equal to M, if it is not, then notice assignment module executes N=n+4, the operation of m=m+4 are enabled, if it is, notice output module executes operation；

The assignment module is also used to enable n=n+4, m=m+4, the cache module is notified to execute operation；

Output module for completing the convolution algorithm of the camera image Yu 3 × 3 × 3 filter, and exports operation knot Fruit.

5. device according to claim 4, which is characterized in that first predeterminated frequency is 25MHz, and described second is default Frequency is 225MHz.

6. device according to claim 4, which is characterized in that the format of the camera image is RGB888 format.

7. image pretreating device in a kind of convolutional neural networks FPGA acceleration characterized by comprising camera figure mould Block, 4 row cache modules, 3 × 3 × 3 filters and PE module；Wherein:

4 row cache modules, for enabling n=1, m=1, according to the first predeterminated frequency by the image array of the N × M × 3 according to n × m × In the caching of the sequence deposit FPGA of 3 image arrays, the image array of n+1 × m+1 × 3 and the image array of n+2 × m+2 × 3, In, 1≤n≤N-3,1≤m≤M-3；

PE module, for reading the image array of n × m × 3, the image moment of the n+1 × m+1 × 3 according to the second predeterminated frequency Battle array, the image array of the n+2 × m+2 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；

The 4 row cache module is also used to be stored in the image array of n+3 × m+3 × 3 according to the first predeterminated frequency the caching of FPGA In；

The PE module is also used to read the image array of n+1 × m+1 × 3, the n+2 × m+2 according to the second predeterminated frequency × 3 image arrays, the image array of the n+3 × m+3 × 3, and be sent into PE together with 3 × 3 × 3 filters and do convolution algorithm；Sentence Disconnected whether the n+3 is equal to N and the m+3 is equal to M, if it is not, then enabling n=n+4, m=m+4 notifies the 4 row cache module to hold Row is according to the first predeterminated frequency by the image array of the N × M × 3 according to the image array of n × m × 3, the image moment of n+1 × m+1 × 3 Operation in the caching of battle array and the sequence of the image array of n+2 × m+2 × 3 deposit FPGA, if it is, completing the camera The convolution algorithm of image and 3 × 3 × 3 filter, and export operation result.

8. device according to claim 7, which is characterized in that first predeterminated frequency is 25MHz, and described second is default Frequency is 225MHz.

9. device according to claim 7, which is characterized in that the format of the camera image is RGB888 format.