CN110276444A - Image processing method and device based on convolutional neural networks - Google Patents

Image processing method and device based on convolutional neural networks Download PDF

Info

Publication number
CN110276444A
CN110276444A CN201910480468.8A CN201910480468A CN110276444A CN 110276444 A CN110276444 A CN 110276444A CN 201910480468 A CN201910480468 A CN 201910480468A CN 110276444 A CN110276444 A CN 110276444A
Authority
CN
China
Prior art keywords
pond
result
convolution algorithm
module
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910480468.8A
Other languages
Chinese (zh)
Other versions
CN110276444B (en
Inventor
周方坤
欧阳鹏
尹首一
李秀东
王博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qingwei Intelligent Technology Co Ltd
Original Assignee
Beijing Qingwei Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qingwei Intelligent Technology Co Ltd filed Critical Beijing Qingwei Intelligent Technology Co Ltd
Priority to CN201910480468.8A priority Critical patent/CN110276444B/en
Publication of CN110276444A publication Critical patent/CN110276444A/en
Application granted granted Critical
Publication of CN110276444B publication Critical patent/CN110276444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a kind of image processing method and device based on convolutional neural networks, this method comprises: obtaining the convolution algorithm result of image to be processed;According to convolution algorithm sequence, the convolution algorithm result of image to be processed is cached;According to pond order of operation, the convolution algorithm result of caching is read;Pond operation is carried out to the convolution algorithm result of reading, obtains the pond operation result of image to be processed.The present invention can substantially reduce the data buffer storage space of pond module, improve resource utilization.

Description

Image processing method and device based on convolutional neural networks
Technical field
The present invention relates to field of image processing more particularly to a kind of image processing methods and dress based on convolutional neural networks It sets.
Background technique
This part intends to provides background or context for the embodiment of the present invention stated in claims.Description herein Recognize it is the prior art not because not being included in this section.
Convolutional neural networks (Convolutional Neural Network, CNN) be deep learning representative algorithm it One.In convolutional neural networks, a pond layer is often added after convolutional layer.Convolutional layer is used to carry out input picture special Sign is extracted, and characteristic pattern is obtained.On the one hand pond layer makes feature for the characteristic pattern that convolutional layer exports to be compressed and extracted Figure becomes smaller, and simplifies network query function complexity;On the other hand Feature Compression is carried out, main feature is extracted.
It should be noted that realizing the hardware components of the pond module of pond function by data cache module and data processing Module composition.Wherein, the working method of module depends on the mode of image data input.Fig. 1 is one kind that the prior art provides The pond processing schematic of image data, as shown in Figure 1, set to image data carry out pond operation pond window as (pooling stride=2, pooling size=2), wherein pooling stride indicates the step-length of pond window, Pooling size indicates the size of pond window, in the case where not considering number of channels, carries out most to image shown in FIG. 1 Big value pond operation, then be equivalent to and need to calculate max (A, B, H, I), max (C, D, J, K) etc..
As shown in Figure 1, needing pondization to handle when the input sequence of image data is (A, B, H, I, C, D, J, K ...) Data are very simple (the data to arrive every time being made comparisons with data before, every four numbers take a maximum value), In this case, the capacity of data cache module only needs the space of 1 size of data, the function that pond module is realized on hardware It can be also relatively easy.And when the input sequence of image data is (A, B, C, D, E, F, G, H ...), i.e., it is each in real image It goes in the case where carrying out next line data again after fully entering completion, needs to cache the maximum value of (A, B), when the maximum of (H, I) After value is calculated, occupied data buffer storage space can be just released.Thus, data buffer storage space needed for the module of pond takes Certainly in the size of real image a line.Similarly, when the input sequence of image data is (A, H, O, V ... B, I, P, W ...), pond Data buffer storage space needed for changing module depends on the size that real image one arranges.
By it is upper analysis it is found that pond module realize pond when data buffer storage capacity (namely size of spatial cache) depend on and The mode that convolution module output data arrives.Since the resource of hardware is limited (for example, ASIC or FPGA), to pond module into It is more likely to higher resource utilization when row design, thus, how the data entry mode of pond module is optimized, with Data buffer storage space is reduced, is urgently problem to be resolved.
Summary of the invention
The embodiment of the present invention provides a kind of image processing method based on convolutional neural networks, to solve existing convolution mind Pond operation directly is carried out to the convolution algorithm result of convolutional layer output through pond layer in network, causes data buffer storage space larger The technical issues of, this method comprises: obtaining the convolution algorithm result of image to be processed;According to convolution algorithm sequence, cache wait locate Manage the convolution algorithm result of image;According to pond order of operation, the convolution algorithm result of caching is read;To the convolution algorithm of reading As a result pond operation is carried out, the pond operation result of image to be processed is obtained.
The embodiment of the present invention also provides a kind of image processing method based on convolutional neural networks, to solve existing convolution In neural network pond layer directly to convolutional layer output convolution algorithm result carry out pond operation, cause data buffer storage space compared with Big technical problem, this method comprises: determining convolution algorithm sequence according to pond order of operation;It is defeated according to convolution algorithm sequence The convolution algorithm result of image to be processed out;Pond operation is carried out to the convolution algorithm result of image to be processed, obtains Chi Huayun Calculate result;Cache pool operation result, and pond order of operation is determined according to pond operation result.
The embodiment of the present invention also provides a kind of image processing apparatus based on convolutional neural networks, to solve existing convolution In neural network pond layer directly to convolutional layer output convolution algorithm result carry out pond operation, cause data buffer storage space compared with Big technical problem, the device include: convolution module, for exporting the convolution algorithm result of image to be processed;Cache module, with Convolution module connection, for the convolution algorithm sequence according to convolution module, the convolution algorithm result of caching convolution module output;Pond Change module, connect with cache module, for the pond order of operation according to pond module, read the convolution algorithm of caching as a result, And pond operation is carried out to the convolution algorithm result of reading, obtain the pond operation result of image to be processed.
The embodiment of the present invention also provides a kind of image processing apparatus based on convolutional neural networks, to solve existing convolution In neural network pond layer directly to convolutional layer output convolution algorithm result carry out pond operation, cause data buffer storage space compared with Big technical problem, the device include: convolution module, for determining convolution algorithm sequentially according to pond order of operation, and according to The convolution algorithm result of convolution algorithm Sequential output image to be processed;Pond module, connect with convolution module, for convolution mould The convolution algorithm result of block output carries out pond operation, obtains pond operation result;Cache module, with pond module and convolution mould Block is separately connected, and is used for cache pool module output pool operation result, and determine according to the pond operation result of pond module The pond order of operation of pond module.
In the embodiment of the present invention, by convolution module to image to be processed progress convolution algorithm as a result, according to convolution module Convolution algorithm sequence cached, pond module according to pond order of operation, read in caching convolution algorithm accordingly as a result, Pond module is allowed to execute pond operation according to pond order of operation, to obtain the pond operation result of image to be processed. Through the embodiment of the present invention, the data buffer storage space for greatly reducing pond module, improves resource utilization.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.In the accompanying drawings:
Fig. 1 is a kind of pond processing schematic for image data that the prior art provides;
Fig. 2 is a kind of image processing apparatus schematic diagram of the pipeline organization provided in the embodiment of the present invention;
Fig. 3 is a kind of image processing apparatus schematic diagram of the non-pipeline structure provided in the embodiment of the present invention;
Fig. 4 is a kind of image processing method flow chart based on convolutional neural networks provided in the embodiment of the present invention;
Fig. 5 is another image processing method flow chart based on convolutional neural networks provided in the embodiment of the present invention;
Fig. 6 is a kind of schematic diagram that multichannel image data are handled using parallel mode provided in the embodiment of the present invention;
Fig. 7 is a kind of signal that multichannel image data are handled using time division multiplexing mode provided in the embodiment of the present invention Figure;
Fig. 8 is a kind of time diagram provided in the embodiment of the present invention;
Fig. 9 is that a kind of two FIFO memories of use provided in the embodiment of the present invention cache pond intermediate results of operations Process schematic;
Figure 10 is a kind of slow to pond intermediate results of operations using two FIFO memories for what is provided in the embodiment of the present invention The result schematic diagram deposited;
Figure 11 is a kind of slow to pond intermediate results of operations using single FIFO memory for what is provided in the embodiment of the present invention The process schematic deposited;
Figure 12 is a kind of slow to pond intermediate results of operations using single FIFO memory for what is provided in the embodiment of the present invention The result schematic diagram deposited.
Specific embodiment
Understand in order to make the object, technical scheme and advantages of the embodiment of the invention clearer, with reference to the accompanying drawing to this hair Bright embodiment is described in further details.Here, the illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but simultaneously It is not as a limitation of the invention.
Such as the application background parts content it is found that in existing convolutional neural networks, pond layer is connected to after convolutional layer, directly It connects and pond operation is carried out to the convolution algorithm result of convolutional layer output, after the completion of convolution algorithm, when consumed by the layer of pond Between can be ignored, thus, assembly line (pipeline) structure that this convolutional layer and pond layer are constituted, its advantage is that except rolling up The additional clock cycle will not be consumed again outside the clock cycle of product consumption, so as to save the total clock cycle of operation;Its The disadvantage is that pond layer needs to cooperate the output data of convolutional layer, prepare cache resources, so that data buffer storage space hold is larger.
To solve the above-mentioned problems, the embodiment of the invention provides a kind of image procossings of pipeline organization shown in Fig. 2 Schematic device, as shown in Fig. 2, the device includes: convolution module 21, pond module 22 and cache module 23;Wherein, Chi Huamo Block 22 is connect with convolution module 21, and cache module 23 is separately connected with pond module 22 and convolution module 21;
Specifically, convolution module 21 determines the volume of convolution module 21 according to the pond order of operation obtained from cache module 23 Product order of operation, and according to the convolution algorithm result of the convolution algorithm Sequential output image to be processed;Pond module 22 is directly right The convolution algorithm result that convolution module 21 exports carries out pond operation, obtains the pond operation result of image to be processed.Wherein, delay The pond operation result that storing module 23 can be exported with cache pool module 22, and pond module 23 is determined according to pond operation result Pond order of operation.
It should be noted that the image processing apparatus of pipeline organization shown in Fig. 2 is using pond priority principle, i.e., The output of convolution module is adjusted according to data needed for the module of pond.By this structure, realize according to pond order of operation Input data (i.e. the distribution guiding of convolution algorithm reason pond result, calculate convolution results by the data of pond requirement), no The capacity of cache resources can be only saved, data buffer storage space is reduced;And due to being used between convolution module and pond module Pipeline organization can save the total clock cycle of the data processing of operation.But this structure reads original number to convolution module According to harsher sequence requirement is proposed, i.e. convolution module needs more complicated structure and preparatory configuration to cooperate Chi Huamo Block.In addition, the image data in cache module is read out, calculating image address can be more complicated, may will increase reading data Time and required data buffer storage space.
It is used as a kind of preferred scheme as a result, the embodiment of the invention also provides a kind of nonpipeline knots shown in Fig. 3 The image processing apparatus schematic diagram of structure, as shown in figure 3, the device also includes: convolution module 21, pond module 22 and cache module 23;Wherein, convolution module 21 and pond module 22 are connect with cache module 23 respectively;
As seen from Figure 3, due to there is cache module 23 between convolution module 21 and pond module 22, then convolution module 21 It may not need the output of match-pool module 22, directly according to convolution algorithm sequence, by the convolution algorithm result of image to be processed Directly caching is to cache module 23, and pond module 22 is without the output for matching convolution module 21, from the on-demand root of cache module 23 According to pond order of operation, the convolution algorithm cached in cache module 23 is read as a result, and carrying out to the convolution algorithm result of reading Pond operation obtains the pond operation result of image to be processed.
It should be noted that the image processing apparatus of non-pipeline structure shown in Fig. 3, allow pond module according to Self-demand is convenient from cache module, accurately fetches, and not only saves the cache resources of itself, can also be in different nets Access mode can be adjusted in network at any time, flexibility is stronger;The disadvantage is that compared to the image of pipeline organization shown in Fig. 2 Processing unit, more waste clock cycle, and also can be complicated to the programming of pond module.
As a kind of optional embodiment, Fig. 2 and cache module shown in Fig. 3 23 can be deposited using static random-access Reservoir (Static Random-Access Memory, SRAM), using this memory stores data, as long as remaining powered on, The data side of its storage inside constant can be kept.
It should be noted that in network after the analysis more than needed of data volume and clock, can take a kind of couple of ASIC or FPGA is more suitably designed.In ASIC or FPGA practical application, resource consumed by convolution algorithm will be far longer than with the time Convolution output is not being changed in pond operation, and for save the clock cycle by the way of pipeline organization in, how to change Pond is the key point of improving performance to " adaptability " of data.
The embodiment of the invention also provides a kind of image processing methods based on convolutional neural networks, can apply but unlimited The image processing apparatus of the non-pipeline structure shown in Fig. 3.
Fig. 4 is a kind of image processing method flow chart based on convolutional neural networks provided in the embodiment of the present invention, such as Shown in Fig. 4, this method comprises the following steps:
S401 obtains the convolution algorithm result of image to be processed;
S402 caches the convolution algorithm result of image to be processed according to convolution algorithm sequence;
S403 reads the convolution algorithm result of caching according to pond order of operation;
S404 carries out pond operation to the convolution algorithm result of reading, obtains the pond operation result of image to be processed.
Convolution module is carried out the knot of convolution algorithm by the scheme provided by above-mentioned S401 to S404 to image to be processed Fruit is cached according to the convolution algorithm of convolution module sequence, and pond module is read corresponding in caching according to pond order of operation Convolution algorithm as a result, allow pond module according to pond order of operation execute pond operation, to obtain image to be processed Pond operation result, greatly reduce the data buffer storage space of pond module, improve resource utilization.
The embodiment of the invention also provides a kind of image processing methods based on convolutional neural networks, can apply but unlimited In the image processing apparatus of pipeline organization shown in Fig. 2.
Fig. 5 is another image processing method flow chart based on convolutional neural networks provided in the embodiment of the present invention, As shown in figure 5, this method comprises the following steps:
S501 determines convolution algorithm sequence according to pond order of operation;
S502 exports the convolution algorithm result of image to be processed according to convolution algorithm sequence;
S503 carries out pond operation to the convolution algorithm result of image to be processed, obtains pond operation result;
S504, cache pool operation result, and pond order of operation is determined according to pond operation result.
The scheme provided by above-mentioned S501 to S504 determines convolution module according to the pond order of operation of pond module Convolution algorithm sequence, so that convolution module match-pool module exports the convolution algorithm of image to be processed as a result, making pond Module can carry out pond operation according to the convolution algorithm result of convolution module output in real time, obtain corresponding pond operation knot Fruit caches the pond operation result of pond module to cache module, so that cache module is according to pond operation result, under determining The pond order of operation of one pond operation.
It should be noted that in the case that image to be processed in the embodiment of the present invention is multichannel image, to be processed Image carries out the image data in the convolution algorithm result of convolution algorithm comprising multiple channels.And for multichannel image data Pond operation, it is only necessary to replicate each channel image data that enough pond modules are gone in processing convolution algorithm result, most The pond result of each channel image data combines output at last.From here it can also be seen that being based on hardware realization pond When, the capacity of data buffer storage depends on the mode that upper layer result arrives, and the complexity of data processing depends on the side of data buffer storage Formula, and the occupied resource of integral module additionally depends on the quantity in channel.
As a result, in S404 either shown in Fig. 4 or S503 shown in fig. 5, as an alternative embodiment, When the convolution algorithm result to multichannel image data carries out pond operation, it can specifically comprise the following steps: to obtain The each channel image data for including in convolution algorithm result;To each channel image data for including in convolution algorithm result into The operation of row pond obtains the pond operation result of each channel image data;By the pond operation knot of all channel image datas Fruit merges, and obtains the pond operation result of image to be processed.
Optionally, each channel image data for including in convolution algorithm result carries out pond operation, obtains each When the pond operation result of channel image data, it can be realized by any one in the following two kinds mode:
First way is based on time-multiplexed mode, each to include in convolution algorithm result using an operator Channel image data carries out pond operation, obtains the pond operation result of each channel image data.
Specifically, aforesaid way can specifically include: each channel image data for including in caching convolution algorithm result; By the way of first in first out, each channel image data of caching is read;Pond is carried out to each channel image data of reading Change operation;Cache the pond operation result of each channel image data;By the way of first in first out, each channel image is exported The pond operation result of data.
The second way is based on parallel form, using multiple operators to each channel for including in convolution algorithm result Image data carries out pond operation, obtains the pond operation result of each channel image data.
Specifically, if the quantity of operator is greater than port number, the operator for directlying adopt multiple in port number transports convolution It calculates each channel image data for including in result and carries out pond operation, obtain the pond operation knot of each channel image data Fruit;It is each logical to include in convolution algorithm result by the way of operator multiplexing if the quantity of operator is less than port number Road image data carries out pond operation, obtains the pond operation result of each channel image data.
In convolutional neural networks, each layer of convolution algorithm can calculate not according to network configuration and corresponding weighted data With the feature diagram data under channel.In hardware adaptations, the operation of this multichannel form can be by increasing operator or again It is completed again using operator.Fig. 6 is a kind of using parallel mode processing multichannel image data for what is provided in the embodiment of the present invention Schematic diagram;As shown in Figure 6, it is assumed that the width of single image data is 16bit in network, when the valid data for entering pond are When (i.e. 3 channels) 48bit, pond module will use 3 identical ALU computing modules (also referred to as operator) and come respectively to each channel Image data carries out pond operation, and after calculating result, integrates into the outcome pool operation result output of 48bit, this Kind mode does not need additionally to program, but when port number has been more than the quantity of preset operator, it is still necessary to which other strategies are adjusted It is whole.
It is more flexible due to that largely can seem too fat to move using the design of duplicate operator in extensive ASIC or FPGA application The quantity that ground cooperation network carrys out contraction operator is to optimize a direction of resource.Fig. 7 is the one kind provided in the embodiment of the present invention Using the schematic diagram of time division multiplexing mode processing multichannel image data, as shown in fig. 7, being equally that 3 channel datas are sent into pond When module, pond module only uses 1 operator to carry out pond operation, but at this moment need using FIFO memory into The data for entering pond module are cached, and pondization only releases the data of 16bit for operation each time.A FIFO is needed simultaneously Memory caches come the Chi Huayu operation result to each channel, waits 3 channel data results to export again when completing complete 48bit result.
It should be noted that pond operation is simply many with respect to convolution algorithm, thus resource used in operator can't Account for it is too many, and can save in time it is very much, particularly with the situation that pond module shown in Fig. 2 is directly connected to convolution module, Pondization almost can completion while convolution terminates.Thus, parallel schema shown in fig. 6 can be adapted for preferential defeated in channel Enter mode, i.e. image data is fully entered the input for completing to carry out next point again later by the channel of each point.And shown in Fig. 7 Time division multiplexing, can be adapted for not being in the environment very stringent, port number is less, resource is nervous to time requirement.
In addition, it should also be noted that, in convolutional neural networks, since the data channel number of layer and layer can constantly become Change, is difficult to set quantity of the fixed value to determine operator.If the port number of a certain layer is greater than the quantity of operator, need Certain strategy while not increasing operator quantity dispose data.It is being based on parallel form as a result, use is multiple When operator carries out pond operation to each channel image data for including in convolution algorithm result, if the quantity of operator is big In port number, then directly adopt multiple in port number operator to each channel image data for including in convolution algorithm result into The operation of row pond obtains the pond operation result of each channel image data;If the quantity of operator is less than port number, need By the way of operator multiplexing, pond operation is carried out to each channel image data for including in convolution algorithm result, is obtained every The pond operation result of one channel image data.
Although multiplexing operator can also consume more simultaneously come to calculate the data beyond port number be more commonly used method More clock cycle, thus additional progress data segmentation in advance is needed, and logic is designed to prepare data to operator.
(1) for the quantity of operator be less than port number situation, when carrying out operator multiplexing, need to consider be in timing It is no to meet the requirements.Assuming that the quantity of existing operator is t, input pond data width is ws, image data width is wt, calculating The number that each operator needs to reuse when the data in complete primary all channels is n, then there is following equation:
If n is greater than 1, needs to be programmed processing is split to data, using FIFO and storage control by data Operator calculating is gradually pressed into after segmentation, obtained result is combined again after saving to all port numbers calculating of the pixel Output.The time in pond entire in this way will increase to greater than n times.Directly connect if it is the pond module of use with convolution module The pipeline organization connect needs to guarantee before convolutional calculation goes out result next time, pond has been calculated in order to avoid generating delay At.As shown in figure 8, conv1, conv2 and conv3 representative have carried out 3 convolution algorithms, the result of convolution algorithm is completed in convolution After export, a, b, c and d, e, f respectively represent operator and carry out after 3 repetition pondizations operate the occupied time.
Assuming that a convolution algorithm occupied time is tc, a pond operation occupied time is tp, operator needs The number of reuse is n, then needs to meet:
tc> n × tp
In the case where meeting above equation, when reusing operator, delay will not be generated during pipeline operation.
If as a result, in channel prioritised transmission and pond module and convolution module are under pipeline organization operation mode, if When operator lazy weight has enough disposably calculated all data and needed to be repeated as many times to complete operation, need to consider that convolution is consumed Time be greater than multiple operator operation total time;It should be noted that not needing to examine if not using pipeline organization operation Consider the time consumed by convolution.
The port number required for layer a certain in CNN network is smaller, and when pond part resource spilling, if on adjustment hardware Weighting, it is possible to the time of entire operation is reduced after accelerating pond.
It is greater than the situation of port number accordingly, for the quantity of operator, the operator quantity of times port number can be used simultaneously To shorten the time in pond.Because the convolution time is typically larger than pond operation time, in this case, the input meeting one in pond The straight completion for waiting convolution, thus, pipeline organization is not considered secondary.As shown in figure 3, the input of pond module can be via slow It is obtained after storing module.Accelerate whole system at two aspects in this way: first, pond module can basis most resource-saving sequence from Data are read in cache module, cache resources needed for this data entry mode can make pond module are less;Second, it obtains simultaneously The data volume that can make all operator work, accelerates the speed in pond.
It should be noted that this structure shown in Fig. 3, cache module and convolution module and pond module can be brought simultaneously Transmit the complexity of the waste of time caused by identical data and program.In actual application, network should also be considered It is middle to be less than the frequency that operator quantity such case in pond occurs there are port number to consider whether to add the logic accelerated in this way To be optimized to hardware.Under normal conditions, specific network can be optimized in ASIC design, such as channel is more Perhaps the whole design of the less MTCNN network of port number may answering because of data or calculating for faster-RCNN network Miscellaneous degree gives resource or clock more than needed to other module.
It, can be by it should be noted that when convolution module exports convolution algorithm result (characteristic pattern) of image to be processed Convolution algorithm result is input to pond module by column by row.In a kind of embodiment, if the data of convolution module output are by row It is input to pond module, then pond module needs to cache the convolution algorithm result of every a line.Still by taking Fig. 1 as an example, when pond window When for (pooling stride=2, pooling size=3), it is intermediate to need to cache max (A, B, C) and max (C, D, E) etc. As a result, and the quantity of intermediate result is related to length (i.e. the width of image) of row.Similarly it can be concluded that, if convolution module export Data by row be input to pond module, then the data for needing to cache are related to length (i.e. the height of image) of column.Exist as a result, The convolution algorithm result of convolution module is not stored in cache module, but in the case where being transferred directly to pond module, it can be with Pond module incoming on one side shorter in convolution algorithm result row or column (i.e. the width or height of characteristic pattern) is chosen, to save Chi Huamo The size of block inner buffer.
Above-mentioned S503 can specifically comprise the following steps: the line width and col width that obtain convolution algorithm result as a result,;If row Width is less than col width, then carries out pond operation by convolution algorithm result of the row to image to be processed;If line width is greater than col width, press It arranges and pond operation is carried out to the convolution algorithm result of image to be processed.
It further, can be with after carrying out pond operation to the convolution algorithm result of image to be processed by row or column Further cache the pond intermediate results of operations of every row or column and the operation result of every row or column and next row or column;Using The mode of first in first out reads the data of caching repeatedly, the pond operation result until getting image to be processed.
For the situation that pond module and convolution module use pipeline organization to be directly connected to, the step-length of pond window and greatly The small service condition for influencing whether pond own cache resource.The number cached needed for pondization under two kinds of pond step-lengths is described below Amount, to reduce data buffer storage space as far as possible.
(1) pond window is (pooling stride=1, pooling size=3)
Assuming that the bit wide of image data is 16bit, data are sent into pond module in the preferential mode in channel.Carrying out pond Before, need to prepare the register of 3 (16bit) identical as Input Data word width, for caching the data received;2 depths Degree is (wi- 1) FIFO memory (i.e. fifo_0 and fifo_1) come cache the preceding temporary pooling of 1 row go out value.So 2 FIFO memory is necessary, and depth is (wi- 1), wherein wiIndicate the width of image.Fig. 9 is to mention in the embodiment of the present invention A kind of process schematic that pond intermediate results of operations is cached using two FIFO memories supplied, as shown in figure 9, being cached Journey is as follows:
1. receive the 1st the 1st data A of row, it is stored in data_buf_0;
2. receive the 1st the 2nd data B of row, it is stored in data_buf_1;
3. receive the 1st the 3rd data C of row, it is stored in data_buf_2, and compare data_buf_0 and data_buf_1 And data_buf_2, the larger value is pressed into fifo_0, and B is stored in data_buf_0, and C is stored in data_buf_1;
4. receive the 1st the 4th data D of row, it is stored in data_buf_2, and compare data_buf_0 and data_buf_1 And data_buf_2, the larger value is pressed into fifo_0, and C is stored in data_buf_0, and D is stored in data_buf_1;
5. until starting to read the 2nd row the 1st several H, being stored in data_buf_0 after the completion of the result of the 1st row compares;
6. receive the 2nd the 2nd data I of row, it is stored in data_buf_1;
7. receive the 2nd the 3rd data J of row, it is stored in data_buf_2, pop goes out 1 number in fifo_0, and compares It and data_buf_0 and data_buf_1 and data_buf_2, the larger value are pressed into fifo_1, and by data_buf_0 and data_ The value of buf_1 and data_buf_2 is pressed into fifo_0;
8. repeating above step, and as shown in Figure 10, after being stored in the 0th row result in fifo_0, the 1st row result is being obtained Meanwhile calculate the 0th row and the 1st row as a result, and result is pressed into fifo_1, while the result of the 1st row is also pressed into fifo_ 0, it is cached.When receiving the result of the 2nd row, pop goes out the result in fifo_1 compared with the 2nd row, and result is stored in and is delayed Storing module, while the second row result is successively pressed into fifo_0.It repeats the above steps and utilizes 2 FIFO memories by all ponds Operation result is stored in cache module, and issues completion and interrupt.
(2) pond window is (pooling stride=2, pooling size=3)
Before carrying out pond, need to prepare the register of 3 (16bit) identical as Input Data word width, for caching The data received;1The fifo of depth is used to cache the value that temporary pond dissolves.Figure 11 is to provide in the embodiment of the present invention A kind of process schematic that pond intermediate results of operations is cached using single FIFO memory, as shown in figure 11, process of caching It is as follows:
1. receive the 1st the 1st data A of row, it is stored in data_buf_0;
2. receive the 1st the 2nd data B of row, it is stored in data_buf_1;
3. receive the 1st the 3rd data C of row, it is stored in data_buf_2, and compare data_buf_0 and data_buf_1 And data_buf_2, the larger value is pressed into fifo, and C is stored in data_buf_0;
4. receive the 1st the 4th data D of row, it is stored in data_buf_1, when the 1st the 5th data E of row, is stored in data_ Buf_2, and compare data_buf_0 and data_buf_1 and data_buf_2, the larger value is pressed into fifo, and E is stored in data_ buf_0;
5. repeating above step, the 0th row is first stored in as shown in figure 12, in fifo as a result, when the 1st row result comes out simultaneously Pop goes out to count and the 1st row result is compared, and is successively pressed into fifo.Go out to count when the 2nd row result comes out pop simultaneously, will compare Big number deposit SRAM, while the 2nd row data are pressed into fifo.It is repeated in, until all results are stored in SRAM.
It should be noted that (carrying out pond operation to multichannel image data using multiple operators) preferential for channel Situation, in the embodiment of the present invention, intermediate result is stored using FIFO as much as possible, and minimize the clock cycle to come repeatedly FIFO is read, to achieve the purpose that more resource-saving.
In addition it is also necessary to explanation, the mode in pond changes with the variation of network in convolutional neural networks.Its In common are maximum value pond, average value pond, random value pond.In the pond structure proposed herein, it is only necessary to change Become the algorithm of operator, to adapt to different pond modes.Such as maximum value pond, pondization caching in only need to store always Maximum value, and maximum value is continuously updated in the range of pond;Average value pondization needs each storing data Zhan Chihua The weight of range, and be constantly added weight and obtain average value to the end;Random value pondization then need to read in every time generate before data with Machine vector records the value of vector pointed location by random vector.The mode in pond is not usually the bottle of pond module design Neck, module in hardware adaptations will not because of pond mode change and do large-scale change.According to corresponding to network The calculating behavior that pond mode changes operator can adapt to most of pond scenes.
The embodiment of the present invention also provides a kind of computer equipment, direct to solve pond layer in existing convolutional neural networks Pond operation, the technical problem for causing data buffer storage space larger, the calculating are carried out to the convolution algorithm result of convolutional layer output Machine equipment include memory, processor and storage on a memory and the computer program that can run on a processor, processor Any one in execution above method embodiment is optional or is preferably based on the image processing method of convolutional neural networks.
The embodiment of the present invention also provides a kind of computer readable storage medium, to solve pond in existing convolutional neural networks Change layer and pond operation directly is carried out to the convolution algorithm result of convolutional layer output, the biggish technology in data buffer storage space is caused to be asked Topic, the computer-readable recording medium storage have any one in execution above method embodiment optional or are preferably based on convolution The computer program of the image processing method of neural network.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects Describe in detail it is bright, it should be understood that the above is only a specific embodiment of the present invention, the guarantor being not intended to limit the present invention Range is protected, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in this Within the protection scope of invention.

Claims (10)

1. a kind of image processing method based on convolutional neural networks characterized by comprising
Obtain the convolution algorithm result of image to be processed;
According to convolution algorithm sequence, the convolution algorithm result of the image to be processed is cached;
According to pond order of operation, the convolution algorithm result of caching is read;
Pond operation is carried out to the convolution algorithm result of reading, obtains the pond operation result of the image to be processed.
2. the method as described in claim 1, which is characterized in that the image to be processed is multichannel image, the convolution fortune Calculate the image data in result comprising multiple channels, wherein pond operation is carried out to the convolution algorithm result of reading, is obtained described The pond operation result of image to be processed, comprising:
Obtain each channel image data for including in convolution algorithm result;
Pond operation is carried out to each channel image data for including in the convolution algorithm result, obtains each channel image number According to pond operation result;
The pond operation result of all channel image datas is merged, the pond operation result of the image to be processed is obtained.
3. method according to claim 2, which is characterized in that each channel image for including in the convolution algorithm result Data carry out pond operation, obtain the pond operation result of each channel image data, including one of any as follows:
Based on time-multiplexed mode, using an operator to each channel image data for including in the convolution algorithm result Pond operation is carried out, the pond operation result of each channel image data is obtained;
Based on parallel form, each channel image data for including in the convolution algorithm result is carried out using multiple operators Pond operation obtains the pond operation result of each channel image data.
4. method as claimed in claim 3, which is characterized in that time-multiplexed mode is based on, using an operator to described The each channel image data for including in convolution algorithm result carries out pond operation, obtains the Chi Huayun of each channel image data Calculate result, comprising:
Cache each channel image data for including in the convolution algorithm result;
By the way of first in first out, each channel image data of caching is read;
Pond operation is carried out to each channel image data of reading;
Cache the pond operation result of each channel image data;
By the way of first in first out, the pond operation result of each channel image data is exported.
5. method as claimed in claim 3, which is characterized in that parallel form is based on, using multiple operators to the convolution The each channel image data for including in operation result carries out pond operation, obtains the pond operation knot of each channel image data Fruit, comprising:
If the quantity of operator is greater than port number, multiple is directlyed adopt in the operator of port number in the convolution algorithm result The each channel image data for including carries out pond operation, obtains the pond operation result of each channel image data;
If the quantity of operator is less than port number, by the way of being multiplexed using operator, to including in the convolution algorithm result Each channel image data carries out pond operation, obtains the pond operation result of each channel image data.
6. a kind of image processing method based on convolutional neural networks characterized by comprising
According to pond order of operation, convolution algorithm sequence is determined;
According to the convolution algorithm sequence, the convolution algorithm result of image to be processed is exported;
Pond operation is carried out to the convolution algorithm result of image to be processed, obtains pond operation result;
The pond operation result is cached, and pond order of operation is determined according to the pond operation result.
7. method as claimed in claim 6, which is characterized in that carry out Chi Huayun to the convolution algorithm result of image to be processed It calculates, obtains pond operation result, comprising:
Obtain the line width and col width of convolution algorithm result;
If line width is less than col width, pond operation is carried out by convolution algorithm result of the row to image to be processed;
If line width is greater than col width, pond operation is carried out by convolution algorithm result of the column to image to be processed.
8. the method for claim 7, which is characterized in that by row or column to the convolution algorithm result of image to be processed into After the operation of row pond, the method also includes:
Cache the pond intermediate results of operations of every row or column and the operation result of every row or column and next row or column;
By the way of first in first out, the data of caching are read repeatedly, the pond operation until getting the image to be processed As a result.
9. a kind of image processing apparatus based on convolutional neural networks characterized by comprising
Convolution module, for exporting the convolution algorithm result of image to be processed;
Cache module is connect with the convolution module, for the convolution algorithm sequence according to the convolution module, caches the volume The convolution algorithm result of volume module output;
Pond module is connect with the cache module, for the pond order of operation according to the pond module, reads caching Convolution algorithm as a result, and to the convolution algorithm result of reading carry out pond operation, obtain the pond operation of the image to be processed As a result.
10. a kind of image processing apparatus based on convolutional neural networks characterized by comprising
Convolution module, for determining convolution algorithm sequence according to pond order of operation, and according to the convolution algorithm Sequential output The convolution algorithm result of image to be processed;
Pond module is connect with the convolution module, and the convolution algorithm result for exporting to the convolution module carries out pond Operation obtains pond operation result;
Cache module is separately connected with the pond module and the convolution module, for caching the pond module output pool Change operation result, and determines the pond order of operation of the pond module according to the pond operation result of the pond module.
CN201910480468.8A 2019-06-04 2019-06-04 Image processing method and device based on convolutional neural network Active CN110276444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910480468.8A CN110276444B (en) 2019-06-04 2019-06-04 Image processing method and device based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910480468.8A CN110276444B (en) 2019-06-04 2019-06-04 Image processing method and device based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN110276444A true CN110276444A (en) 2019-09-24
CN110276444B CN110276444B (en) 2021-05-07

Family

ID=67961966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910480468.8A Active CN110276444B (en) 2019-06-04 2019-06-04 Image processing method and device based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN110276444B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027682A (en) * 2019-12-09 2020-04-17 Oppo广东移动通信有限公司 Neural network processor, electronic device and data processing method
CN111105015A (en) * 2019-12-06 2020-05-05 浪潮(北京)电子信息产业有限公司 General CNN reasoning accelerator, control method thereof and readable storage medium
CN111340224A (en) * 2020-02-27 2020-06-26 杭州雄迈集成电路技术股份有限公司 Accelerated design method of CNN network suitable for low-resource embedded chip
CN111445420A (en) * 2020-04-09 2020-07-24 北京爱芯科技有限公司 Image operation method and device of convolutional neural network and electronic equipment
CN112784952A (en) * 2019-11-04 2021-05-11 珠海格力电器股份有限公司 Convolutional neural network operation system, method and equipment
CN117391149A (en) * 2023-11-30 2024-01-12 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017044214A1 (en) * 2015-09-10 2017-03-16 Intel Corporation Distributed neural networks for scalable real-time analytics
CN106875012A (en) * 2017-02-09 2017-06-20 武汉魅瞳科技有限公司 A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA
US20170294010A1 (en) * 2016-04-12 2017-10-12 Adobe Systems Incorporated Utilizing deep learning for rating aesthetics of digital images
CN107491726A (en) * 2017-07-04 2017-12-19 重庆邮电大学 A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks
CN107704923A (en) * 2017-10-19 2018-02-16 珠海格力电器股份有限公司 Convolutional neural networks computing circuit
CN107836001A (en) * 2015-06-29 2018-03-23 微软技术许可有限责任公司 Convolutional neural networks on hardware accelerator
US20180137407A1 (en) * 2016-11-14 2018-05-17 Kneron, Inc. Convolution operation device and convolution operation method
CN108229645A (en) * 2017-04-28 2018-06-29 北京市商汤科技开发有限公司 Convolution accelerates and computation processing method, device, electronic equipment and storage medium
CN108805267A (en) * 2018-05-28 2018-11-13 重庆大学 The data processing method hardware-accelerated for convolutional neural networks
CN108805274A (en) * 2018-05-28 2018-11-13 重庆大学 The hardware-accelerated method and system of Tiny-yolo convolutional neural networks based on FPGA
CN109344878A (en) * 2018-09-06 2019-02-15 北京航空航天大学 A kind of imitative hawk brain feature integration Small object recognition methods based on ResNet
CN109447241A (en) * 2018-09-29 2019-03-08 西安交通大学 A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field
CN109740748A (en) * 2019-01-08 2019-05-10 西安邮电大学 A kind of convolutional neural networks accelerator based on FPGA

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107836001A (en) * 2015-06-29 2018-03-23 微软技术许可有限责任公司 Convolutional neural networks on hardware accelerator
WO2017044214A1 (en) * 2015-09-10 2017-03-16 Intel Corporation Distributed neural networks for scalable real-time analytics
US20170294010A1 (en) * 2016-04-12 2017-10-12 Adobe Systems Incorporated Utilizing deep learning for rating aesthetics of digital images
US20180137407A1 (en) * 2016-11-14 2018-05-17 Kneron, Inc. Convolution operation device and convolution operation method
CN106875012A (en) * 2017-02-09 2017-06-20 武汉魅瞳科技有限公司 A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA
CN108229645A (en) * 2017-04-28 2018-06-29 北京市商汤科技开发有限公司 Convolution accelerates and computation processing method, device, electronic equipment and storage medium
CN107491726A (en) * 2017-07-04 2017-12-19 重庆邮电大学 A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks
CN107704923A (en) * 2017-10-19 2018-02-16 珠海格力电器股份有限公司 Convolutional neural networks computing circuit
CN108805267A (en) * 2018-05-28 2018-11-13 重庆大学 The data processing method hardware-accelerated for convolutional neural networks
CN108805274A (en) * 2018-05-28 2018-11-13 重庆大学 The hardware-accelerated method and system of Tiny-yolo convolutional neural networks based on FPGA
CN109344878A (en) * 2018-09-06 2019-02-15 北京航空航天大学 A kind of imitative hawk brain feature integration Small object recognition methods based on ResNet
CN109447241A (en) * 2018-09-29 2019-03-08 西安交通大学 A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field
CN109740748A (en) * 2019-01-08 2019-05-10 西安邮电大学 A kind of convolutional neural networks accelerator based on FPGA

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANTHONY G. SCANLAN: ""Low power & mobile hardware accelerators for deep convolutional neural networks"", 《INTEGRATION》 *
CHANGXI LIU等: ""T1000: Mitigating the memory footprint of convolution neural networks with decomposition and re-fusion"", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
何凯旋等: ""基于FPGA动态重构的卷积神经网络硬件架构设计"", 《信息技术与网络安全》 *
卢丽强等: ""面向卷积神经网络的FPGA设计"", 《中国科学:信息科学》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784952A (en) * 2019-11-04 2021-05-11 珠海格力电器股份有限公司 Convolutional neural network operation system, method and equipment
CN112784952B (en) * 2019-11-04 2024-03-19 珠海格力电器股份有限公司 Convolutional neural network operation system, method and equipment
CN111105015A (en) * 2019-12-06 2020-05-05 浪潮(北京)电子信息产业有限公司 General CNN reasoning accelerator, control method thereof and readable storage medium
CN111027682A (en) * 2019-12-09 2020-04-17 Oppo广东移动通信有限公司 Neural network processor, electronic device and data processing method
CN111340224A (en) * 2020-02-27 2020-06-26 杭州雄迈集成电路技术股份有限公司 Accelerated design method of CNN network suitable for low-resource embedded chip
CN111340224B (en) * 2020-02-27 2023-11-21 浙江芯劢微电子股份有限公司 Accelerated design method of CNN (computer network) suitable for low-resource embedded chip
CN111445420A (en) * 2020-04-09 2020-07-24 北京爱芯科技有限公司 Image operation method and device of convolutional neural network and electronic equipment
CN111445420B (en) * 2020-04-09 2023-06-06 北京爱芯科技有限公司 Image operation method and device of convolutional neural network and electronic equipment
CN117391149A (en) * 2023-11-30 2024-01-12 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data
CN117391149B (en) * 2023-11-30 2024-03-26 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data

Also Published As

Publication number Publication date
CN110276444B (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN110276444A (en) Image processing method and device based on convolutional neural networks
US20220012593A1 (en) Neural network accelerator and neural network acceleration method based on structured pruning and low-bit quantization
US20240119255A1 (en) Methods and apparatus to tile walk a tensor for convolution operations
CN108805267B (en) Data processing method for hardware acceleration of convolutional neural network
CN105263050B (en) Mobile terminal real-time rendering system and method based on cloud platform
CN1203428C (en) Information processing apparatus and entertainment system
CN105869117A (en) Method for accelerating GPU directed at deep learning super-resolution technology
KR102572757B1 (en) Modifying machine learning models to improve locality
CN105739951B (en) A kind of L1 minimization problem fast solution methods based on GPU
CN113613066B (en) Rendering method, system and device for real-time video special effect and storage medium
CN111160545A (en) Artificial neural network processing system and data processing method thereof
CN106934058B (en) Vector data reading method and system and vector data visualization method and system
CN110222818A (en) A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data
CN113792621B (en) FPGA-based target detection accelerator design method
CN106651748A (en) Image processing method and apparatus
CN102222316A (en) Double-buffer ping-bang parallel-structure image processing optimization method based on DMA (direct memory access)
CN113220630A (en) Reconfigurable array optimization method and automatic tuning method of hardware accelerator
CN112950656A (en) Block convolution method for pre-reading data according to channel based on FPGA platform
CN109447893A (en) A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device
CN109740619B (en) Neural network terminal operation method and device for target recognition
CN117032807A (en) AI acceleration processor architecture based on RISC-V instruction set
CN109799977B (en) Method and system for developing and scheduling data by instruction program
CN114461978A (en) Data processing method and device, electronic equipment and readable storage medium
CN112862083B (en) Deep neural network inference method and device in edge environment
US10417815B2 (en) Out of order pixel shader exports

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Zhou Fangkun

Inventor after: OuYang Peng

Inventor after: Li Xiudong

Inventor after: Wang Bo

Inventor before: Zhou Fangkun

Inventor before: OuYang Peng

Inventor before: Yin Shouyi

Inventor before: Li Xiudong

Inventor before: Wang Bo

CB03 Change of inventor or designer information