CN110276444A

CN110276444A - Image processing method and device based on convolutional neural networks

Info

Publication number: CN110276444A
Application number: CN201910480468.8A
Authority: CN
Inventors: 周方坤; 欧阳鹏; 尹首一; 李秀东; 王博
Original assignee: Beijing Qingwei Intelligent Technology Co Ltd
Current assignee: Beijing Qingwei Intelligent Technology Co Ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2019-09-24
Anticipated expiration: 2039-06-04
Also published as: CN110276444B

Abstract

The invention discloses a kind of image processing method and device based on convolutional neural networks, this method comprises: obtaining the convolution algorithm result of image to be processed；According to convolution algorithm sequence, the convolution algorithm result of image to be processed is cached；According to pond order of operation, the convolution algorithm result of caching is read；Pond operation is carried out to the convolution algorithm result of reading, obtains the pond operation result of image to be processed.The present invention can substantially reduce the data buffer storage space of pond module, improve resource utilization.

Description

Image processing method and device based on convolutional neural networks

Technical field

The present invention relates to field of image processing more particularly to a kind of image processing methods and dress based on convolutional neural networks It sets.

Background technique

This part intends to provides background or context for the embodiment of the present invention stated in claims.Description herein Recognize it is the prior art not because not being included in this section.

Convolutional neural networks (Convolutional Neural Network, CNN) be deep learning representative algorithm it One.In convolutional neural networks, a pond layer is often added after convolutional layer.Convolutional layer is used to carry out input picture special Sign is extracted, and characteristic pattern is obtained.On the one hand pond layer makes feature for the characteristic pattern that convolutional layer exports to be compressed and extracted Figure becomes smaller, and simplifies network query function complexity；On the other hand Feature Compression is carried out, main feature is extracted.

It should be noted that realizing the hardware components of the pond module of pond function by data cache module and data processing Module composition.Wherein, the working method of module depends on the mode of image data input.Fig. 1 is one kind that the prior art provides The pond processing schematic of image data, as shown in Figure 1, set to image data carry out pond operation pond window as (pooling stride=2, pooling size=2), wherein pooling stride indicates the step-length of pond window, Pooling size indicates the size of pond window, in the case where not considering number of channels, carries out most to image shown in FIG. 1 Big value pond operation, then be equivalent to and need to calculate max (A, B, H, I), max (C, D, J, K) etc..

As shown in Figure 1, needing pondization to handle when the input sequence of image data is (A, B, H, I, C, D, J, K ...) Data are very simple (the data to arrive every time being made comparisons with data before, every four numbers take a maximum value), In this case, the capacity of data cache module only needs the space of 1 size of data, the function that pond module is realized on hardware It can be also relatively easy.And when the input sequence of image data is (A, B, C, D, E, F, G, H ...), i.e., it is each in real image It goes in the case where carrying out next line data again after fully entering completion, needs to cache the maximum value of (A, B), when the maximum of (H, I) After value is calculated, occupied data buffer storage space can be just released.Thus, data buffer storage space needed for the module of pond takes Certainly in the size of real image a line.Similarly, when the input sequence of image data is (A, H, O, V ... B, I, P, W ...), pond Data buffer storage space needed for changing module depends on the size that real image one arranges.

By it is upper analysis it is found that pond module realize pond when data buffer storage capacity (namely size of spatial cache) depend on and The mode that convolution module output data arrives.Since the resource of hardware is limited (for example, ASIC or FPGA), to pond module into It is more likely to higher resource utilization when row design, thus, how the data entry mode of pond module is optimized, with Data buffer storage space is reduced, is urgently problem to be resolved.

Summary of the invention

The embodiment of the present invention provides a kind of image processing method based on convolutional neural networks, to solve existing convolution mind Pond operation directly is carried out to the convolution algorithm result of convolutional layer output through pond layer in network, causes data buffer storage space larger The technical issues of, this method comprises: obtaining the convolution algorithm result of image to be processed；According to convolution algorithm sequence, cache wait locate Manage the convolution algorithm result of image；According to pond order of operation, the convolution algorithm result of caching is read；To the convolution algorithm of reading As a result pond operation is carried out, the pond operation result of image to be processed is obtained.

The embodiment of the present invention also provides a kind of image processing method based on convolutional neural networks, to solve existing convolution In neural network pond layer directly to convolutional layer output convolution algorithm result carry out pond operation, cause data buffer storage space compared with Big technical problem, this method comprises: determining convolution algorithm sequence according to pond order of operation；It is defeated according to convolution algorithm sequence The convolution algorithm result of image to be processed out；Pond operation is carried out to the convolution algorithm result of image to be processed, obtains Chi Huayun Calculate result；Cache pool operation result, and pond order of operation is determined according to pond operation result.

The embodiment of the present invention also provides a kind of image processing apparatus based on convolutional neural networks, to solve existing convolution In neural network pond layer directly to convolutional layer output convolution algorithm result carry out pond operation, cause data buffer storage space compared with Big technical problem, the device include: convolution module, for exporting the convolution algorithm result of image to be processed；Cache module, with Convolution module connection, for the convolution algorithm sequence according to convolution module, the convolution algorithm result of caching convolution module output；Pond Change module, connect with cache module, for the pond order of operation according to pond module, read the convolution algorithm of caching as a result, And pond operation is carried out to the convolution algorithm result of reading, obtain the pond operation result of image to be processed.

The embodiment of the present invention also provides a kind of image processing apparatus based on convolutional neural networks, to solve existing convolution In neural network pond layer directly to convolutional layer output convolution algorithm result carry out pond operation, cause data buffer storage space compared with Big technical problem, the device include: convolution module, for determining convolution algorithm sequentially according to pond order of operation, and according to The convolution algorithm result of convolution algorithm Sequential output image to be processed；Pond module, connect with convolution module, for convolution mould The convolution algorithm result of block output carries out pond operation, obtains pond operation result；Cache module, with pond module and convolution mould Block is separately connected, and is used for cache pool module output pool operation result, and determine according to the pond operation result of pond module The pond order of operation of pond module.

In the embodiment of the present invention, by convolution module to image to be processed progress convolution algorithm as a result, according to convolution module Convolution algorithm sequence cached, pond module according to pond order of operation, read in caching convolution algorithm accordingly as a result, Pond module is allowed to execute pond operation according to pond order of operation, to obtain the pond operation result of image to be processed. Through the embodiment of the present invention, the data buffer storage space for greatly reducing pond module, improves resource utilization.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.In the accompanying drawings:

Fig. 1 is a kind of pond processing schematic for image data that the prior art provides；

Fig. 2 is a kind of image processing apparatus schematic diagram of the pipeline organization provided in the embodiment of the present invention；

Fig. 3 is a kind of image processing apparatus schematic diagram of the non-pipeline structure provided in the embodiment of the present invention；

Fig. 4 is a kind of image processing method flow chart based on convolutional neural networks provided in the embodiment of the present invention；

Fig. 5 is another image processing method flow chart based on convolutional neural networks provided in the embodiment of the present invention；

Fig. 6 is a kind of schematic diagram that multichannel image data are handled using parallel mode provided in the embodiment of the present invention；

Fig. 7 is a kind of signal that multichannel image data are handled using time division multiplexing mode provided in the embodiment of the present invention Figure；

Fig. 8 is a kind of time diagram provided in the embodiment of the present invention；

Fig. 9 is that a kind of two FIFO memories of use provided in the embodiment of the present invention cache pond intermediate results of operations Process schematic；

Figure 10 is a kind of slow to pond intermediate results of operations using two FIFO memories for what is provided in the embodiment of the present invention The result schematic diagram deposited；

Figure 11 is a kind of slow to pond intermediate results of operations using single FIFO memory for what is provided in the embodiment of the present invention The process schematic deposited；

Figure 12 is a kind of slow to pond intermediate results of operations using single FIFO memory for what is provided in the embodiment of the present invention The result schematic diagram deposited.

Specific embodiment

Understand in order to make the object, technical scheme and advantages of the embodiment of the invention clearer, with reference to the accompanying drawing to this hair Bright embodiment is described in further details.Here, the illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but simultaneously It is not as a limitation of the invention.

Such as the application background parts content it is found that in existing convolutional neural networks, pond layer is connected to after convolutional layer, directly It connects and pond operation is carried out to the convolution algorithm result of convolutional layer output, after the completion of convolution algorithm, when consumed by the layer of pond Between can be ignored, thus, assembly line (pipeline) structure that this convolutional layer and pond layer are constituted, its advantage is that except rolling up The additional clock cycle will not be consumed again outside the clock cycle of product consumption, so as to save the total clock cycle of operation；Its The disadvantage is that pond layer needs to cooperate the output data of convolutional layer, prepare cache resources, so that data buffer storage space hold is larger.

To solve the above-mentioned problems, the embodiment of the invention provides a kind of image procossings of pipeline organization shown in Fig. 2 Schematic device, as shown in Fig. 2, the device includes: convolution module 21, pond module 22 and cache module 23；Wherein, Chi Huamo Block 22 is connect with convolution module 21, and cache module 23 is separately connected with pond module 22 and convolution module 21；

Specifically, convolution module 21 determines the volume of convolution module 21 according to the pond order of operation obtained from cache module 23 Product order of operation, and according to the convolution algorithm result of the convolution algorithm Sequential output image to be processed；Pond module 22 is directly right The convolution algorithm result that convolution module 21 exports carries out pond operation, obtains the pond operation result of image to be processed.Wherein, delay The pond operation result that storing module 23 can be exported with cache pool module 22, and pond module 23 is determined according to pond operation result Pond order of operation.

It should be noted that the image processing apparatus of pipeline organization shown in Fig. 2 is using pond priority principle, i.e., The output of convolution module is adjusted according to data needed for the module of pond.By this structure, realize according to pond order of operation Input data (i.e. the distribution guiding of convolution algorithm reason pond result, calculate convolution results by the data of pond requirement), no The capacity of cache resources can be only saved, data buffer storage space is reduced；And due to being used between convolution module and pond module Pipeline organization can save the total clock cycle of the data processing of operation.But this structure reads original number to convolution module According to harsher sequence requirement is proposed, i.e. convolution module needs more complicated structure and preparatory configuration to cooperate Chi Huamo Block.In addition, the image data in cache module is read out, calculating image address can be more complicated, may will increase reading data Time and required data buffer storage space.

It is used as a kind of preferred scheme as a result, the embodiment of the invention also provides a kind of nonpipeline knots shown in Fig. 3 The image processing apparatus schematic diagram of structure, as shown in figure 3, the device also includes: convolution module 21, pond module 22 and cache module 23；Wherein, convolution module 21 and pond module 22 are connect with cache module 23 respectively；

As seen from Figure 3, due to there is cache module 23 between convolution module 21 and pond module 22, then convolution module 21 It may not need the output of match-pool module 22, directly according to convolution algorithm sequence, by the convolution algorithm result of image to be processed Directly caching is to cache module 23, and pond module 22 is without the output for matching convolution module 21, from the on-demand root of cache module 23 According to pond order of operation, the convolution algorithm cached in cache module 23 is read as a result, and carrying out to the convolution algorithm result of reading Pond operation obtains the pond operation result of image to be processed.

It should be noted that the image processing apparatus of non-pipeline structure shown in Fig. 3, allow pond module according to Self-demand is convenient from cache module, accurately fetches, and not only saves the cache resources of itself, can also be in different nets Access mode can be adjusted in network at any time, flexibility is stronger；The disadvantage is that compared to the image of pipeline organization shown in Fig. 2 Processing unit, more waste clock cycle, and also can be complicated to the programming of pond module.

As a kind of optional embodiment, Fig. 2 and cache module shown in Fig. 3 23 can be deposited using static random-access Reservoir (Static Random-Access Memory, SRAM), using this memory stores data, as long as remaining powered on, The data side of its storage inside constant can be kept.

It should be noted that in network after the analysis more than needed of data volume and clock, can take a kind of couple of ASIC or FPGA is more suitably designed.In ASIC or FPGA practical application, resource consumed by convolution algorithm will be far longer than with the time Convolution output is not being changed in pond operation, and for save the clock cycle by the way of pipeline organization in, how to change Pond is the key point of improving performance to " adaptability " of data.

The embodiment of the invention also provides a kind of image processing methods based on convolutional neural networks, can apply but unlimited The image processing apparatus of the non-pipeline structure shown in Fig. 3.

Fig. 4 is a kind of image processing method flow chart based on convolutional neural networks provided in the embodiment of the present invention, such as Shown in Fig. 4, this method comprises the following steps:

S401 obtains the convolution algorithm result of image to be processed；

S402 caches the convolution algorithm result of image to be processed according to convolution algorithm sequence；

S403 reads the convolution algorithm result of caching according to pond order of operation；

S404 carries out pond operation to the convolution algorithm result of reading, obtains the pond operation result of image to be processed.

Convolution module is carried out the knot of convolution algorithm by the scheme provided by above-mentioned S401 to S404 to image to be processed Fruit is cached according to the convolution algorithm of convolution module sequence, and pond module is read corresponding in caching according to pond order of operation Convolution algorithm as a result, allow pond module according to pond order of operation execute pond operation, to obtain image to be processed Pond operation result, greatly reduce the data buffer storage space of pond module, improve resource utilization.

The embodiment of the invention also provides a kind of image processing methods based on convolutional neural networks, can apply but unlimited In the image processing apparatus of pipeline organization shown in Fig. 2.

Fig. 5 is another image processing method flow chart based on convolutional neural networks provided in the embodiment of the present invention, As shown in figure 5, this method comprises the following steps:

S501 determines convolution algorithm sequence according to pond order of operation；

S502 exports the convolution algorithm result of image to be processed according to convolution algorithm sequence；

S503 carries out pond operation to the convolution algorithm result of image to be processed, obtains pond operation result；

S504, cache pool operation result, and pond order of operation is determined according to pond operation result.

The scheme provided by above-mentioned S501 to S504 determines convolution module according to the pond order of operation of pond module Convolution algorithm sequence, so that convolution module match-pool module exports the convolution algorithm of image to be processed as a result, making pond Module can carry out pond operation according to the convolution algorithm result of convolution module output in real time, obtain corresponding pond operation knot Fruit caches the pond operation result of pond module to cache module, so that cache module is according to pond operation result, under determining The pond order of operation of one pond operation.

It should be noted that in the case that image to be processed in the embodiment of the present invention is multichannel image, to be processed Image carries out the image data in the convolution algorithm result of convolution algorithm comprising multiple channels.And for multichannel image data Pond operation, it is only necessary to replicate each channel image data that enough pond modules are gone in processing convolution algorithm result, most The pond result of each channel image data combines output at last.From here it can also be seen that being based on hardware realization pond When, the capacity of data buffer storage depends on the mode that upper layer result arrives, and the complexity of data processing depends on the side of data buffer storage Formula, and the occupied resource of integral module additionally depends on the quantity in channel.

As a result, in S404 either shown in Fig. 4 or S503 shown in fig. 5, as an alternative embodiment, When the convolution algorithm result to multichannel image data carries out pond operation, it can specifically comprise the following steps: to obtain The each channel image data for including in convolution algorithm result；To each channel image data for including in convolution algorithm result into The operation of row pond obtains the pond operation result of each channel image data；By the pond operation knot of all channel image datas Fruit merges, and obtains the pond operation result of image to be processed.

Optionally, each channel image data for including in convolution algorithm result carries out pond operation, obtains each When the pond operation result of channel image data, it can be realized by any one in the following two kinds mode:

First way is based on time-multiplexed mode, each to include in convolution algorithm result using an operator Channel image data carries out pond operation, obtains the pond operation result of each channel image data.

Specifically, aforesaid way can specifically include: each channel image data for including in caching convolution algorithm result； By the way of first in first out, each channel image data of caching is read；Pond is carried out to each channel image data of reading Change operation；Cache the pond operation result of each channel image data；By the way of first in first out, each channel image is exported The pond operation result of data.

The second way is based on parallel form, using multiple operators to each channel for including in convolution algorithm result Image data carries out pond operation, obtains the pond operation result of each channel image data.

Specifically, if the quantity of operator is greater than port number, the operator for directlying adopt multiple in port number transports convolution It calculates each channel image data for including in result and carries out pond operation, obtain the pond operation knot of each channel image data Fruit；It is each logical to include in convolution algorithm result by the way of operator multiplexing if the quantity of operator is less than port number Road image data carries out pond operation, obtains the pond operation result of each channel image data.

In convolutional neural networks, each layer of convolution algorithm can calculate not according to network configuration and corresponding weighted data With the feature diagram data under channel.In hardware adaptations, the operation of this multichannel form can be by increasing operator or again It is completed again using operator.Fig. 6 is a kind of using parallel mode processing multichannel image data for what is provided in the embodiment of the present invention Schematic diagram；As shown in Figure 6, it is assumed that the width of single image data is 16bit in network, when the valid data for entering pond are When (i.e. 3 channels) 48bit, pond module will use 3 identical ALU computing modules (also referred to as operator) and come respectively to each channel Image data carries out pond operation, and after calculating result, integrates into the outcome pool operation result output of 48bit, this Kind mode does not need additionally to program, but when port number has been more than the quantity of preset operator, it is still necessary to which other strategies are adjusted It is whole.

It is more flexible due to that largely can seem too fat to move using the design of duplicate operator in extensive ASIC or FPGA application The quantity that ground cooperation network carrys out contraction operator is to optimize a direction of resource.Fig. 7 is the one kind provided in the embodiment of the present invention Using the schematic diagram of time division multiplexing mode processing multichannel image data, as shown in fig. 7, being equally that 3 channel datas are sent into pond When module, pond module only uses 1 operator to carry out pond operation, but at this moment need using FIFO memory into The data for entering pond module are cached, and pondization only releases the data of 16bit for operation each time.A FIFO is needed simultaneously Memory caches come the Chi Huayu operation result to each channel, waits 3 channel data results to export again when completing complete 48bit result.

It should be noted that pond operation is simply many with respect to convolution algorithm, thus resource used in operator can't Account for it is too many, and can save in time it is very much, particularly with the situation that pond module shown in Fig. 2 is directly connected to convolution module, Pondization almost can completion while convolution terminates.Thus, parallel schema shown in fig. 6 can be adapted for preferential defeated in channel Enter mode, i.e. image data is fully entered the input for completing to carry out next point again later by the channel of each point.And shown in Fig. 7 Time division multiplexing, can be adapted for not being in the environment very stringent, port number is less, resource is nervous to time requirement.

In addition, it should also be noted that, in convolutional neural networks, since the data channel number of layer and layer can constantly become Change, is difficult to set quantity of the fixed value to determine operator.If the port number of a certain layer is greater than the quantity of operator, need Certain strategy while not increasing operator quantity dispose data.It is being based on parallel form as a result, use is multiple When operator carries out pond operation to each channel image data for including in convolution algorithm result, if the quantity of operator is big In port number, then directly adopt multiple in port number operator to each channel image data for including in convolution algorithm result into The operation of row pond obtains the pond operation result of each channel image data；If the quantity of operator is less than port number, need By the way of operator multiplexing, pond operation is carried out to each channel image data for including in convolution algorithm result, is obtained every The pond operation result of one channel image data.

Although multiplexing operator can also consume more simultaneously come to calculate the data beyond port number be more commonly used method More clock cycle, thus additional progress data segmentation in advance is needed, and logic is designed to prepare data to operator.

(1) for the quantity of operator be less than port number situation, when carrying out operator multiplexing, need to consider be in timing It is no to meet the requirements.Assuming that the quantity of existing operator is t, input pond data width is w_s, image data width is w_t, calculating The number that each operator needs to reuse when the data in complete primary all channels is n, then there is following equation:

If n is greater than 1, needs to be programmed processing is split to data, using FIFO and storage control by data Operator calculating is gradually pressed into after segmentation, obtained result is combined again after saving to all port numbers calculating of the pixel Output.The time in pond entire in this way will increase to greater than n times.Directly connect if it is the pond module of use with convolution module The pipeline organization connect needs to guarantee before convolutional calculation goes out result next time, pond has been calculated in order to avoid generating delay At.As shown in figure 8, conv1, conv2 and conv3 representative have carried out 3 convolution algorithms, the result of convolution algorithm is completed in convolution After export, a, b, c and d, e, f respectively represent operator and carry out after 3 repetition pondizations operate the occupied time.

Assuming that a convolution algorithm occupied time is t_c, a pond operation occupied time is t_p, operator needs The number of reuse is n, then needs to meet:

t_c> n × t_p；

In the case where meeting above equation, when reusing operator, delay will not be generated during pipeline operation.

If as a result, in channel prioritised transmission and pond module and convolution module are under pipeline organization operation mode, if When operator lazy weight has enough disposably calculated all data and needed to be repeated as many times to complete operation, need to consider that convolution is consumed Time be greater than multiple operator operation total time；It should be noted that not needing to examine if not using pipeline organization operation Consider the time consumed by convolution.

The port number required for layer a certain in CNN network is smaller, and when pond part resource spilling, if on adjustment hardware Weighting, it is possible to the time of entire operation is reduced after accelerating pond.

It is greater than the situation of port number accordingly, for the quantity of operator, the operator quantity of times port number can be used simultaneously To shorten the time in pond.Because the convolution time is typically larger than pond operation time, in this case, the input meeting one in pond The straight completion for waiting convolution, thus, pipeline organization is not considered secondary.As shown in figure 3, the input of pond module can be via slow It is obtained after storing module.Accelerate whole system at two aspects in this way: first, pond module can basis most resource-saving sequence from Data are read in cache module, cache resources needed for this data entry mode can make pond module are less；Second, it obtains simultaneously The data volume that can make all operator work, accelerates the speed in pond.

It should be noted that this structure shown in Fig. 3, cache module and convolution module and pond module can be brought simultaneously Transmit the complexity of the waste of time caused by identical data and program.In actual application, network should also be considered It is middle to be less than the frequency that operator quantity such case in pond occurs there are port number to consider whether to add the logic accelerated in this way To be optimized to hardware.Under normal conditions, specific network can be optimized in ASIC design, such as channel is more Perhaps the whole design of the less MTCNN network of port number may answering because of data or calculating for faster-RCNN network Miscellaneous degree gives resource or clock more than needed to other module.

It, can be by it should be noted that when convolution module exports convolution algorithm result (characteristic pattern) of image to be processed Convolution algorithm result is input to pond module by column by row.In a kind of embodiment, if the data of convolution module output are by row It is input to pond module, then pond module needs to cache the convolution algorithm result of every a line.Still by taking Fig. 1 as an example, when pond window When for (pooling stride=2, pooling size=3), it is intermediate to need to cache max (A, B, C) and max (C, D, E) etc. As a result, and the quantity of intermediate result is related to length (i.e. the width of image) of row.Similarly it can be concluded that, if convolution module export Data by row be input to pond module, then the data for needing to cache are related to length (i.e. the height of image) of column.Exist as a result, The convolution algorithm result of convolution module is not stored in cache module, but in the case where being transferred directly to pond module, it can be with Pond module incoming on one side shorter in convolution algorithm result row or column (i.e. the width or height of characteristic pattern) is chosen, to save Chi Huamo The size of block inner buffer.

Above-mentioned S503 can specifically comprise the following steps: the line width and col width that obtain convolution algorithm result as a result,；If row Width is less than col width, then carries out pond operation by convolution algorithm result of the row to image to be processed；If line width is greater than col width, press It arranges and pond operation is carried out to the convolution algorithm result of image to be processed.

It further, can be with after carrying out pond operation to the convolution algorithm result of image to be processed by row or column Further cache the pond intermediate results of operations of every row or column and the operation result of every row or column and next row or column；Using The mode of first in first out reads the data of caching repeatedly, the pond operation result until getting image to be processed.

For the situation that pond module and convolution module use pipeline organization to be directly connected to, the step-length of pond window and greatly The small service condition for influencing whether pond own cache resource.The number cached needed for pondization under two kinds of pond step-lengths is described below Amount, to reduce data buffer storage space as far as possible.

(1) pond window is (pooling stride=1, pooling size=3)

Assuming that the bit wide of image data is 16bit, data are sent into pond module in the preferential mode in channel.Carrying out pond Before, need to prepare the register of 3 (16bit) identical as Input Data word width, for caching the data received；2 depths Degree is (w_i- 1) FIFO memory (i.e. fifo_0 and fifo_1) come cache the preceding temporary pooling of 1 row go out value.So 2 FIFO memory is necessary, and depth is (w_i- 1), wherein w_iIndicate the width of image.Fig. 9 is to mention in the embodiment of the present invention A kind of process schematic that pond intermediate results of operations is cached using two FIFO memories supplied, as shown in figure 9, being cached Journey is as follows:

1. receive the 1st the 1st data A of row, it is stored in data_buf_0；

2. receive the 1st the 2nd data B of row, it is stored in data_buf_1；

3. receive the 1st the 3rd data C of row, it is stored in data_buf_2, and compare data_buf_0 and data_buf_1 And data_buf_2, the larger value is pressed into fifo_0, and B is stored in data_buf_0, and C is stored in data_buf_1；

4. receive the 1st the 4th data D of row, it is stored in data_buf_2, and compare data_buf_0 and data_buf_1 And data_buf_2, the larger value is pressed into fifo_0, and C is stored in data_buf_0, and D is stored in data_buf_1；

5. until starting to read the 2nd row the 1st several H, being stored in data_buf_0 after the completion of the result of the 1st row compares；

6. receive the 2nd the 2nd data I of row, it is stored in data_buf_1；

7. receive the 2nd the 3rd data J of row, it is stored in data_buf_2, pop goes out 1 number in fifo_0, and compares It and data_buf_0 and data_buf_1 and data_buf_2, the larger value are pressed into fifo_1, and by data_buf_0 and data_ The value of buf_1 and data_buf_2 is pressed into fifo_0；

8. repeating above step, and as shown in Figure 10, after being stored in the 0th row result in fifo_0, the 1st row result is being obtained Meanwhile calculate the 0th row and the 1st row as a result, and result is pressed into fifo_1, while the result of the 1st row is also pressed into fifo_ 0, it is cached.When receiving the result of the 2nd row, pop goes out the result in fifo_1 compared with the 2nd row, and result is stored in and is delayed Storing module, while the second row result is successively pressed into fifo_0.It repeats the above steps and utilizes 2 FIFO memories by all ponds Operation result is stored in cache module, and issues completion and interrupt.

(2) pond window is (pooling stride=2, pooling size=3)

Before carrying out pond, need to prepare the register of 3 (16bit) identical as Input Data word width, for caching The data received；1The fifo of depth is used to cache the value that temporary pond dissolves.Figure 11 is to provide in the embodiment of the present invention A kind of process schematic that pond intermediate results of operations is cached using single FIFO memory, as shown in figure 11, process of caching It is as follows:

1. receive the 1st the 1st data A of row, it is stored in data_buf_0；

2. receive the 1st the 2nd data B of row, it is stored in data_buf_1；

3. receive the 1st the 3rd data C of row, it is stored in data_buf_2, and compare data_buf_0 and data_buf_1 And data_buf_2, the larger value is pressed into fifo, and C is stored in data_buf_0；

4. receive the 1st the 4th data D of row, it is stored in data_buf_1, when the 1st the 5th data E of row, is stored in data_ Buf_2, and compare data_buf_0 and data_buf_1 and data_buf_2, the larger value is pressed into fifo, and E is stored in data_ buf_0；

5. repeating above step, the 0th row is first stored in as shown in figure 12, in fifo as a result, when the 1st row result comes out simultaneously Pop goes out to count and the 1st row result is compared, and is successively pressed into fifo.Go out to count when the 2nd row result comes out pop simultaneously, will compare Big number deposit SRAM, while the 2nd row data are pressed into fifo.It is repeated in, until all results are stored in SRAM.

It should be noted that (carrying out pond operation to multichannel image data using multiple operators) preferential for channel Situation, in the embodiment of the present invention, intermediate result is stored using FIFO as much as possible, and minimize the clock cycle to come repeatedly FIFO is read, to achieve the purpose that more resource-saving.

In addition it is also necessary to explanation, the mode in pond changes with the variation of network in convolutional neural networks.Its In common are maximum value pond, average value pond, random value pond.In the pond structure proposed herein, it is only necessary to change Become the algorithm of operator, to adapt to different pond modes.Such as maximum value pond, pondization caching in only need to store always Maximum value, and maximum value is continuously updated in the range of pond；Average value pondization needs each storing data Zhan Chihua The weight of range, and be constantly added weight and obtain average value to the end；Random value pondization then need to read in every time generate before data with Machine vector records the value of vector pointed location by random vector.The mode in pond is not usually the bottle of pond module design Neck, module in hardware adaptations will not because of pond mode change and do large-scale change.According to corresponding to network The calculating behavior that pond mode changes operator can adapt to most of pond scenes.

The embodiment of the present invention also provides a kind of computer equipment, direct to solve pond layer in existing convolutional neural networks Pond operation, the technical problem for causing data buffer storage space larger, the calculating are carried out to the convolution algorithm result of convolutional layer output Machine equipment include memory, processor and storage on a memory and the computer program that can run on a processor, processor Any one in execution above method embodiment is optional or is preferably based on the image processing method of convolutional neural networks.

The embodiment of the present invention also provides a kind of computer readable storage medium, to solve pond in existing convolutional neural networks Change layer and pond operation directly is carried out to the convolution algorithm result of convolutional layer output, the biggish technology in data buffer storage space is caused to be asked Topic, the computer-readable recording medium storage have any one in execution above method embodiment optional or are preferably based on convolution The computer program of the image processing method of neural network.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects Describe in detail it is bright, it should be understood that the above is only a specific embodiment of the present invention, the guarantor being not intended to limit the present invention Range is protected, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in this Within the protection scope of invention.

Claims

1. a kind of image processing method based on convolutional neural networks characterized by comprising

Obtain the convolution algorithm result of image to be processed；

According to convolution algorithm sequence, the convolution algorithm result of the image to be processed is cached；

According to pond order of operation, the convolution algorithm result of caching is read；

Pond operation is carried out to the convolution algorithm result of reading, obtains the pond operation result of the image to be processed.

2. the method as described in claim 1, which is characterized in that the image to be processed is multichannel image, the convolution fortune Calculate the image data in result comprising multiple channels, wherein pond operation is carried out to the convolution algorithm result of reading, is obtained described The pond operation result of image to be processed, comprising:

Obtain each channel image data for including in convolution algorithm result；

Pond operation is carried out to each channel image data for including in the convolution algorithm result, obtains each channel image number According to pond operation result；

The pond operation result of all channel image datas is merged, the pond operation result of the image to be processed is obtained.

3. method according to claim 2, which is characterized in that each channel image for including in the convolution algorithm result Data carry out pond operation, obtain the pond operation result of each channel image data, including one of any as follows:

Based on time-multiplexed mode, using an operator to each channel image data for including in the convolution algorithm result Pond operation is carried out, the pond operation result of each channel image data is obtained；

Based on parallel form, each channel image data for including in the convolution algorithm result is carried out using multiple operators Pond operation obtains the pond operation result of each channel image data.

4. method as claimed in claim 3, which is characterized in that time-multiplexed mode is based on, using an operator to described The each channel image data for including in convolution algorithm result carries out pond operation, obtains the Chi Huayun of each channel image data Calculate result, comprising:

Cache each channel image data for including in the convolution algorithm result；

By the way of first in first out, each channel image data of caching is read；

Pond operation is carried out to each channel image data of reading；

Cache the pond operation result of each channel image data；

By the way of first in first out, the pond operation result of each channel image data is exported.

5. method as claimed in claim 3, which is characterized in that parallel form is based on, using multiple operators to the convolution The each channel image data for including in operation result carries out pond operation, obtains the pond operation knot of each channel image data Fruit, comprising:

If the quantity of operator is greater than port number, multiple is directlyed adopt in the operator of port number in the convolution algorithm result The each channel image data for including carries out pond operation, obtains the pond operation result of each channel image data；

If the quantity of operator is less than port number, by the way of being multiplexed using operator, to including in the convolution algorithm result Each channel image data carries out pond operation, obtains the pond operation result of each channel image data.

6. a kind of image processing method based on convolutional neural networks characterized by comprising

According to pond order of operation, convolution algorithm sequence is determined；

According to the convolution algorithm sequence, the convolution algorithm result of image to be processed is exported；

Pond operation is carried out to the convolution algorithm result of image to be processed, obtains pond operation result；

The pond operation result is cached, and pond order of operation is determined according to the pond operation result.

7. method as claimed in claim 6, which is characterized in that carry out Chi Huayun to the convolution algorithm result of image to be processed It calculates, obtains pond operation result, comprising:

Obtain the line width and col width of convolution algorithm result；

If line width is less than col width, pond operation is carried out by convolution algorithm result of the row to image to be processed；

If line width is greater than col width, pond operation is carried out by convolution algorithm result of the column to image to be processed.

8. the method for claim 7, which is characterized in that by row or column to the convolution algorithm result of image to be processed into After the operation of row pond, the method also includes:

Cache the pond intermediate results of operations of every row or column and the operation result of every row or column and next row or column；

By the way of first in first out, the data of caching are read repeatedly, the pond operation until getting the image to be processed As a result.

9. a kind of image processing apparatus based on convolutional neural networks characterized by comprising

Convolution module, for exporting the convolution algorithm result of image to be processed；

Cache module is connect with the convolution module, for the convolution algorithm sequence according to the convolution module, caches the volume The convolution algorithm result of volume module output；

Pond module is connect with the cache module, for the pond order of operation according to the pond module, reads caching Convolution algorithm as a result, and to the convolution algorithm result of reading carry out pond operation, obtain the pond operation of the image to be processed As a result.

10. a kind of image processing apparatus based on convolutional neural networks characterized by comprising

Convolution module, for determining convolution algorithm sequence according to pond order of operation, and according to the convolution algorithm Sequential output The convolution algorithm result of image to be processed；

Pond module is connect with the convolution module, and the convolution algorithm result for exporting to the convolution module carries out pond Operation obtains pond operation result；

Cache module is separately connected with the pond module and the convolution module, for caching the pond module output pool Change operation result, and determines the pond order of operation of the pond module according to the pond operation result of the pond module.