CN110276444B

CN110276444B - Image processing method and device based on convolutional neural network

Info

Publication number: CN110276444B
Application number: CN201910480468.8A
Authority: CN
Inventors: 周方坤; 欧阳鹏; 尹首一; 李秀东; 王博
Original assignee: Beijing Qingwei Intelligent Technology Co ltd
Current assignee: Beijing Qingwei Intelligent Technology Co ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2021-05-07
Anticipated expiration: 2039-06-04
Also published as: CN110276444A

Abstract

The invention discloses an image processing method and device based on a convolutional neural network, wherein the method comprises the following steps: obtaining a convolution operation result of an image to be processed; caching the convolution operation result of the image to be processed according to the convolution operation sequence; reading a cached convolution operation result according to the pooling operation sequence; and performing pooling operation on the read convolution operation result to obtain a pooling operation result of the image to be processed. The invention can greatly reduce the data caching space of the pooling module and improve the resource utilization rate.

Description

Image processing method and device based on convolutional neural network

Technical Field

The present invention relates to the field of image processing, and in particular, to an image processing method and apparatus based on a convolutional neural network.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Convolutional Neural Networks (CNN) are one of the representative algorithms for deep learning. In convolutional neural networks, convolutional layers are often followed by a pooling layer. The convolutional layer is used for carrying out feature extraction on the input image to obtain a feature map. The pooling layer is used for compressing and extracting the feature map output by the convolutional layer, so that the feature map is reduced, and the network computation complexity is simplified; and on the other hand, feature compression is carried out to extract main features.

It should be noted that the hardware part of the pooling module for implementing the pooling function is composed of a data caching module and a data processing module. The mode of operation of the module depends, among other things, on the way in which the image data is input. Fig. 1 is a schematic diagram of a pooling process of image data according to the prior art, as shown in fig. 1, a pooling window for pooling the image data is set to (poolingstride ═ 2, poolingsize ═ 2), where poolingstride represents a step size of the pooling window, and poolingsize represents a size of the pooling window, and when a maximum pooling operation is performed on the image shown in fig. 1 without considering the number of channels, it is equivalent to the need to calculate max (a, B, H, I), max (C, D, J, K), and the like.

As shown in fig. 1, when the input order of the image data is (a, B, H, I, C, D, J, K, …), the data to be pooled is very simple (i.e., the data to be pooled each time is compared with the previous data, and the maximum value is obtained every four times), in this case, the capacity of the data buffer module only needs 1 data size space, and the function of the pooling module implemented on hardware is relatively simple. When the input order of the image data is (a, B, C, D, E, F, G, H, …), that is, when the next line of data comes after the input of all lines of the actual image is completed, the maximum value of (a, B) needs to be buffered, and when the maximum value of (H, I) is calculated, the occupied data buffer space can be released. Thus, the data buffer space required by the pooling module depends on the size of a line of the actual image. Similarly, when the input order of the image data is (a, H, O, V, … B, I, P, W, …), the data buffer space required by the pooling module depends on the size of one column of the actual image.

From the above analysis, the data buffer capacity (i.e. the size of the buffer space) when the pooling module implements pooling depends on the manner in which the data output by the convolution module arrives. Since hardware resources are limited (for example, ASIC or FPGA), a higher resource utilization rate tends to be obtained when the pooling module is designed, and therefore, how to optimize a data input manner of the pooling module to reduce a data cache space is a problem to be solved.

Disclosure of Invention

The embodiment of the invention provides an image processing method based on a convolutional neural network, which is used for solving the technical problem that a pooling layer in the conventional convolutional neural network directly performs pooling operation on a convolutional operation result output by the convolutional layer, so that the data cache space is larger, and the method comprises the following steps: acquiring a convolution operation result of an image to be processed, wherein the image to be processed is a multi-channel image, and the convolution operation result comprises image data of a plurality of channels; caching the convolution operation result of the image to be processed according to the convolution operation sequence; reading a cached convolution operation result according to the pooling operation sequence; acquiring image data of each channel contained in a convolution operation result; performing pooling operation on each channel image data contained in the convolution operation result to obtain a pooling operation result of each channel image data; merging the pooling operation results of all channel image data to obtain a pooling operation result of the image to be processed; performing pooling operation on each channel image data included in the convolution operation result to obtain a pooling operation result of each channel image data, wherein the pooling operation result includes any one of the following:

based on a time division multiplexing mode, performing pooling operation on each channel image data contained in the convolution operation result by adopting an operator to obtain a pooling operation result of each channel image data;

and based on a parallel mode, performing pooling operation on each channel image data contained in the convolution operation result by adopting a plurality of operators to obtain a pooling operation result of each channel image data.

The embodiment of the invention also provides an image processing method based on the convolutional neural network, which is used for solving the technical problem that a pooling layer in the conventional convolutional neural network directly performs pooling operation on a convolutional operation result output by the convolutional layer, so that the data cache space is larger, and the method comprises the following steps: determining a convolution operation sequence according to the pooling operation sequence; outputting a convolution operation result of the image to be processed according to the convolution operation sequence, wherein the image to be processed is a multi-channel image, and the convolution operation result comprises image data of a plurality of channels; acquiring image data of each channel contained in a convolution operation result; performing pooling operation on each channel image data contained in the convolution operation result to obtain a pooling operation result of each channel image data; merging the pooling operation results of all channel image data to obtain a pooling operation result of the image to be processed; caching the pooling operation result, and determining a pooling operation sequence according to the pooling operation result; performing pooling operation on each channel image data included in the convolution operation result to obtain a pooling operation result of each channel image data, wherein the pooling operation result includes any one of the following:

The embodiment of the invention also provides an image processing device based on a convolutional neural network, which is used for solving the technical problem that a pooling layer in the conventional convolutional neural network directly performs pooling operation on a convolutional operation result output by the convolutional layer, so that the data cache space is larger, and the device comprises: the convolution module is used for outputting a convolution operation result of the image to be processed, the image to be processed is a multi-channel image, and the convolution operation result comprises image data of a plurality of channels; the buffer module is connected with the convolution module and used for buffering convolution operation results output by the convolution module according to the convolution operation sequence of the convolution module; the pooling module is connected with the caching module and used for reading a cached convolution operation result according to a pooling operation sequence of the pooling module, acquiring each channel image data contained in the convolution operation result, performing pooling operation on each channel image data contained in the convolution operation result to obtain a pooling operation result of each channel image data, and merging the pooling operation results of all channel image data to obtain a pooling operation result of an image to be processed; the pooling module is further used for performing pooling operation on each channel image data contained in the convolution operation result by adopting an operator based on a time division multiplexing mode to obtain a pooling operation result of each channel image data; or based on a parallel mode, performing pooling operation on each channel image data contained in the convolution operation result by adopting a plurality of operators to obtain a pooling operation result of each channel image data.

The embodiment of the invention also provides an image processing device based on a convolutional neural network, which is used for solving the technical problem that a pooling layer in the conventional convolutional neural network directly performs pooling operation on a convolutional operation result output by the convolutional layer, so that the data cache space is larger, and the device comprises: the convolution module is used for determining a convolution operation sequence according to the pooling operation sequence and outputting a convolution operation result of the image to be processed according to the convolution operation sequence, wherein the image to be processed is a multi-channel image, and the convolution operation result comprises image data of a plurality of channels; the pooling module is connected with the convolution module and used for acquiring each channel image data contained in the convolution operation result, performing pooling operation on each channel image data contained in the convolution operation result to obtain a pooling operation result of each channel image data, and combining the pooling operation results of all the channel image data to obtain a pooling operation result of the image to be processed; the cache module is respectively connected with the pooling module and the convolution module and used for caching the pooling module to output a pooling operation result and determining the pooling operation sequence of the pooling module according to the pooling operation result of the pooling module; the pooling module is further used for performing pooling operation on each channel image data contained in the convolution operation result by adopting an operator based on a time division multiplexing mode to obtain a pooling operation result of each channel image data; or based on a parallel mode, performing pooling operation on each channel image data contained in the convolution operation result by adopting a plurality of operators to obtain a pooling operation result of each channel image data.

In the embodiment of the invention, the result of the convolution operation performed by the convolution module on the image to be processed is cached according to the convolution operation sequence of the convolution module, and the pooling module reads the corresponding convolution operation result in the cache according to the pooling operation sequence, so that the pooling module can execute the pooling operation according to the pooling operation sequence to obtain the pooling operation result of the image to be processed. By the embodiment of the invention, the data cache space of the pooling module is greatly reduced, and the resource utilization rate is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:

FIG. 1 is a diagram illustrating a pooling process of image data provided by the prior art;

FIG. 2 is a schematic diagram of an image processing apparatus with a pipeline structure according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an image processing apparatus with a non-pipeline structure according to an embodiment of the present invention;

FIG. 4 is a flowchart of an image processing method based on a convolutional neural network according to an embodiment of the present invention;

FIG. 5 is a flowchart of another image processing method based on a convolutional neural network according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a parallel processing of multi-channel image data according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of processing multi-channel image data in a time-division multiplexing manner according to an embodiment of the present invention;

FIG. 8 is a timing diagram according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a process for buffering intermediate results of a pooling operation using two FIFO memories according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating an example of buffering the intermediate result of the pooling operation using two FIFO memories according to the present invention;

FIG. 11 is a schematic diagram illustrating a process of buffering intermediate results of a pooling operation using a single FIFO memory according to an embodiment of the present invention;

fig. 12 is a diagram illustrating a result of buffering the intermediate result of the pooling operation by using a single FIFO memory according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

As can be seen from the background of the present application, in the conventional convolutional neural network, after the pooling layer is connected to the convolutional layer, the pooling operation is directly performed on the convolutional operation result output by the convolutional layer, and after the convolutional operation is completed, the time consumed by the pooling layer can be ignored, so that the pipeline (pipeline) structure formed by the convolutional layer and the pooling layer has the advantages that no additional clock cycle is consumed except the clock cycle consumed by the convolutional layer, and thus the total clock cycle of the operation can be saved; the disadvantage is that the pooling layer needs to be matched with the output data of the convolution layer to prepare the cache resource, so that the data cache space occupies a larger space.

In order to solve the above problem, an embodiment of the present invention provides a schematic diagram of an image processing apparatus with a pipeline structure shown in fig. 2, as shown in fig. 2, the apparatus includes: a convolution module 21, a pooling module 22 and a buffering module 23; wherein, the pooling module 22 is connected with the convolution module 21, and the buffer module 23 is respectively connected with the pooling module 22 and the convolution module 21;

specifically, the convolution module 21 determines a convolution operation sequence of the convolution module 21 according to the pooling operation sequence obtained from the buffer module 23, and outputs a convolution operation result of the image to be processed according to the convolution operation sequence; the pooling module 22 directly performs pooling operation on the convolution operation result output by the convolution module 21 to obtain a pooling operation result of the image to be processed. The buffer module 23 may buffer the pooling operation result output by the pooling module 22, and determine the pooling operation sequence of the pooling module 23 according to the pooling operation result.

It should be noted that the image processing apparatus with the pipeline structure shown in fig. 2 employs the pooling priority principle, that is, the output of the convolution module is adjusted according to the data required by the pooling module. By the structure, data are input according to the pooling operation sequence (namely, the convolution operation is guided by the distribution of the pooling result, and the convolution result is calculated according to the data required by pooling), so that the capacity of cache resources can be saved, and the data cache space is reduced; and because a pipeline structure is adopted between the convolution module and the pooling module, the total clock period of data processing of operation can be saved. However, this structure imposes a more strict order requirement on the convolution module to read the raw data, i.e., the convolution module needs a more complex structure and a pre-configuration to fit the pooling module. In addition, when the image data in the buffer module is read, it is more complicated to calculate the image address, which may increase the time for reading the data and the required data buffer space.

Thus, as a more preferable solution, the embodiment of the present invention further provides a schematic diagram of an image processing apparatus of a non-pipeline structure shown in fig. 3, as shown in fig. 3, the apparatus also includes: a convolution module 21, a pooling module 22 and a buffering module 23; the convolution module 21 and the pooling module 22 are respectively connected with the buffer module 23;

as can be seen from fig. 3, since the buffer module 23 is disposed between the convolution module 21 and the pooling module 22, the convolution module 21 may directly buffer the convolution operation result of the image to be processed into the buffer module 23 according to the convolution operation sequence without matching the output of the pooling module 22, the pooling module 22 may also not match the output of the convolution module 21, the convolution operation result buffered in the buffer module 23 is read from the buffer module 23 according to the pooling operation sequence, and the read convolution operation result is subjected to pooling operation to obtain the pooling operation result of the image to be processed.

It should be noted that, in the image processing apparatus with a non-pipeline structure shown in fig. 3, the pooling module can conveniently and accurately fetch data from the cache module according to its own needs, which not only saves its own cache resources, but also can adjust the fetching mode at any time in different networks, and has stronger flexibility; the disadvantage is that the image processing apparatus is more clock-cycle-consuming than the pipeline structure shown in fig. 2, and the programming of the pooling module is also complicated.

As an alternative embodiment, the cache module 23 shown in fig. 2 and 3 may employ a Static Random-Access Memory (SRAM), which is used to store data, and as long as power is kept on, the data stored therein may be constantly kept.

It should be noted that after analyzing the data volume and clock margin in the network, a more appropriate design for an ASIC or FPGA may be adopted. In the practical application of ASIC or FPGA, the resource and time consumed by convolution operation are far greater than those of pooling operation, and in the mode that convolution output cannot be changed and a pipeline structure is adopted for saving clock period, how to change the 'adaptability' of pooling to data is the key point for improving performance.

The embodiment of the invention also provides an image processing method based on the convolutional neural network, which can be applied to but not limited to the image processing device with the non-pipeline structure shown in fig. 3.

Fig. 4 is a flowchart of an image processing method based on a convolutional neural network according to an embodiment of the present invention, and as shown in fig. 4, the method includes the following steps:

s401, obtaining a convolution operation result of an image to be processed;

s402, caching the convolution operation result of the image to be processed according to the convolution operation sequence;

s403, reading the cached convolution operation result according to the pooling operation sequence;

s404, performing pooling operation on the read convolution operation result to obtain a pooling operation result of the image to be processed.

Through the scheme provided by the S401 to S404, the result of the convolution operation of the convolution module on the image to be processed is cached according to the convolution operation sequence of the convolution module, and the pooling module reads the corresponding convolution operation result in the cache according to the pooling operation sequence, so that the pooling module can execute the pooling operation according to the pooling operation sequence to obtain the pooling operation result of the image to be processed, the data caching space of the pooling module is greatly reduced, and the resource utilization rate is improved.

The embodiment of the invention also provides an image processing method based on the convolutional neural network, which can be applied to but not limited to the image processing device with the pipeline structure shown in fig. 2.

Fig. 5 is a flowchart of another image processing method based on a convolutional neural network according to an embodiment of the present invention, and as shown in fig. 5, the method includes the following steps:

s501, determining a convolution operation sequence according to a pooling operation sequence;

s502, outputting a convolution operation result of the image to be processed according to the convolution operation sequence;

s503, performing pooling operation on the convolution operation result of the image to be processed to obtain a pooling operation result;

s504, caching the pooling operation result, and determining the pooling operation sequence according to the pooling operation result.

Through the scheme provided by the above S501 to S504, the convolution operation sequence of the convolution module is determined according to the pooling operation sequence of the pooling module, so that the convolution module matches the pooling module to output the convolution operation result of the image to be processed, the pooling module can perform pooling operation according to the convolution operation result output by the convolution module in real time to obtain a corresponding pooling operation result, and the pooling operation result of the pooling module is cached to the caching module so that the caching module determines the pooling operation sequence of the next pooling operation according to the pooling operation result.

It should be noted that, in the embodiment of the present invention, when the image to be processed is a multi-channel image, a convolution operation result obtained by performing a convolution operation on the image to be processed includes image data of a plurality of channels. For the pooling operation of the multi-channel image data, only enough pooling modules need to be copied to process each channel image data in the convolution operation result, and finally, the pooling result of each channel image data is combined and output. It can also be seen from this that, when pooling is implemented based on hardware, the capacity of the data cache depends on the way the upper layer results come, the complexity of data processing depends on the way of data cache, and the resources occupied by the whole module also depends on the number of channels.

Therefore, in S404 shown in fig. 4 or S503 shown in fig. 5, as an alternative embodiment, when performing the pooling operation on the convolution operation result of the multi-channel image data, the method may specifically include the following steps: acquiring image data of each channel contained in a convolution operation result; performing pooling operation on each channel image data contained in the convolution operation result to obtain a pooling operation result of each channel image data; and merging the pooling operation results of all the channel image data to obtain the pooling operation result of the image to be processed.

Optionally, when performing pooling operation on each channel image data included in the convolution operation result to obtain a pooling operation result of each channel image data, the pooling operation result may be implemented by any one of the following two ways:

in the first mode, based on a time division multiplexing mode, an operator is used for performing pooling operation on each channel of image data included in a convolution operation result to obtain a pooling operation result of each channel of image data.

Specifically, the above method may specifically include: caching the image data of each channel contained in the convolution operation result; reading the image data of each channel cached in a first-in first-out mode; performing pooling operation on the read image data of each channel; caching the pooling operation result of each channel of image data; and outputting the pooling operation result of the image data of each channel in a first-in first-out mode.

And in the second mode, based on a parallel mode, a plurality of operators are adopted to perform pooling operation on each channel of image data contained in the convolution operation result, so that a pooling operation result of each channel of image data is obtained.

Specifically, if the number of operators is greater than the number of channels, pooling operation is directly performed on each channel of image data included in the convolution operation result by using an operator which is multiple of the number of channels, so that a pooling operation result of each channel of image data is obtained; and if the number of the operators is less than the number of the channels, performing pooling operation on each channel of image data contained in the convolution operation result in an operator multiplexing mode to obtain a pooling operation result of each channel of image data.

In the convolutional neural network, each layer of convolution operation can calculate feature map data under different channels according to network configuration and corresponding weight data. In hardware applications, this multi-pass form of operation can be accomplished by adding operators or reusing operators. FIG. 6 is a diagram illustrating a parallel processing of multi-channel image data according to an embodiment of the present invention; as shown in fig. 6, assuming that the width of a single image data in the network is 16 bits, when the effective data entering the pooling process is 48 bits (i.e. 3 channels), the pooling module uses 3 same ALU calculation modules (also called operators) to perform the pooling operation on each channel image data, and after calculating the result, the result is unified and combined into 48-bit result pooling operation result for output.

Because the design of applying a large number of repeated operators in a large-scale ASIC or FPGA is bloated, the number of operators is compressed by more flexibly matching with a network, and the direction of optimizing resources is provided. Fig. 7 is a schematic diagram of processing multi-channel image data in a time division multiplexing manner according to an embodiment of the present invention, as shown in fig. 7, when 3-channel data is sent to a pooling module, the pooling module only uses 1 operator to perform pooling operation, but at this time, a FIFO memory is required to buffer the data entering the pooling module, and only 16 bits of data are released for operation in each pooling. Meanwhile, a FIFO memory is needed to buffer the pooling and operation results of each channel, and a complete 48-bit result is output when the data result of the 3 channels is finished.

It should be noted that the pooling operation is much simpler than the convolution operation, so the resources used by the operator do not occupy much and are much saved in time, especially in the case that the pooling module is directly connected to the convolution module as shown in fig. 2, the pooling can be almost completed at the same time as the convolution is finished. Therefore, the parallel mode shown in fig. 6 can be applied to a channel-first input mode in which image data is input at the next point after all the input of the image data at each point is completed. The time division multiplexing mode shown in fig. 7 can be applied to an environment with less strict time requirement, fewer channels, and less resources.

In addition, it should be noted that, in the convolutional neural network, since the number of data channels between layers varies constantly, it is difficult to set a fixed value to determine the number of operators. If the number of channels of a certain layer is larger than the number of operators, a certain strategy is needed to process data while the number of operators is not increased. Therefore, when a plurality of operators are adopted to perform pooling operation on each channel image data contained in the convolution operation result based on a parallel mode, if the number of the operators is greater than the number of the channels, the operators with the multiples of the number of the channels are directly adopted to perform pooling operation on each channel image data contained in the convolution operation result, and a pooling operation result of each channel image data is obtained; and if the number of the operators is less than the number of the channels, performing pooling operation on each channel of image data contained in the convolution operation result in an operator multiplexing mode to obtain a pooling operation result of each channel of image data.

Although it is a common method to multiplex operators to calculate data exceeding the number of channels, it also consumes more clock cycles, and therefore, it is necessary to perform additional data division in advance and design logic to prepare data for the operators.

And (I) when the number of operators is less than the number of channels, whether the time sequence meets the requirement or not needs to be considered when multiplexing the operators. Assuming that the number of existing operators is t, the input pooling data width is w_sWidth of image data being w_tAfter one-time calculationIf the number of times that each operator needs to be reused when there is data in a channel is n, the following equation exists:

if n is larger than 1, the data needs to be segmented by programming, the data is segmented and then gradually pressed into an operator for calculation by using an FIFO (first in first out) and a storage controller, and the obtained result is stored until all the channels of the pixel point are calculated and then is combined and output. So that the total pooling time will increase to more than n times. If the pipeline structure that the pooling module and the convolution module are directly connected is adopted, in order to avoid delay, pooling calculation needs to be ensured to be completed before the next convolution calculation result. As shown in fig. 8, conv1, conv2 and conv3 represent that convolution operations are performed 3 times, the result of the convolution operation is output after the convolution is completed, and a, b, c and d, e, f represent the time taken by the operator to perform 3 repeated pooling operations, respectively.

Suppose that the time taken for a convolution operation is t_cThe time occupied by one pooling operation is t_pAnd if the number of times that the operator needs to be reused is n, the following requirements are met:

t_c＞n×t_p；

when the operator is repeatedly used while satisfying the above equation, no delay is generated in the pipeline operation process.

Therefore, if the number of operators is not enough to finish all data at one time and the operations need to be finished for multiple times under the operation mode that the channel is transmitted preferentially and the pooling module and the convolution module are in a pipeline structure, the time consumed by convolution needs to be considered to be longer than the total time of multiple operator operations; it should be noted that if the pipeline structure operation is not adopted, the time consumed by convolution does not need to be considered.

When the number of channels required by a certain layer in the CNN network is small and the pooled resources overflow, it is possible to reduce the overall operation time after speeding up the pooling if the weight on the hardware is adjusted.

Thus, for the case where the number of operators is greater than the number of channels, the pooling time can be shortened by using the number of operators that is multiple of the number of channels at the same time. The pipeline architecture is left alone because the convolution time is typically greater than the pooling operation time, in which case the pooled input will wait for the convolution to complete. As shown in fig. 3, the input of the pooling module is obtained via the buffer module. This speeds up the overall system in two ways: firstly, the pooling module reads data from the cache module according to the sequence of the most saved resources, and the data input mode can enable the pooling module to need fewer cache resources; and secondly, the data size which enables all operators to work is obtained simultaneously, and the pooling speed is accelerated.

It should be noted that the structure shown in fig. 3 would bring about the waste of time and the complexity of the procedure caused by the buffer module transmitting the same data as the convolution module and the pooling module. In practical applications, the hardware should be optimized by considering how frequently channels exist in the network and the number of channels is smaller than the number of pooling operators, and whether to add such accelerated logic. Typically, ASIC designs are optimized for a particular network, such as a master-RCNN network with a large number of channels or an MTCNN network with a small number of channels, and the overall design may give resources or redundant clocks to other modules due to the complexity of data or computation.

When the convolution module outputs the convolution operation result (feature map) of the image to be processed, the convolution operation result may be input to the pooling module in rows or columns. In one embodiment, if the data output by the convolution module is input to the pooling module by row, the pooling module needs to buffer the convolution result of each row. Still taking fig. 1 as an example, when the pooling window is (pooling strand is 2, pooling size is 3), it is necessary to buffer intermediate results such as max (a, B, C) and max (C, D, E), and the number of intermediate results is related to the length of the line (i.e., the width of the image). The same can be concluded that if the data output by the convolution module is input to the pooling module by row, the data that needs to be buffered is related to the length of the column (i.e. the height of the image). Therefore, under the condition that the convolution operation result of the convolution module is not stored in the cache module but is directly transmitted to the pooling module, the shorter side of the row or the column (namely the width or the height of the feature map) of the convolution operation result can be selected and transmitted into the pooling module, so that the size of the cache in the pooling module is saved.

Thus, the step S503 may specifically include the following steps: acquiring the line width and the column width of a convolution operation result; if the line width is smaller than the column width, performing pooling operation on the convolution operation result of the image to be processed according to the lines; and if the line width is larger than the column width, performing pooling operation on the convolution operation result of the image to be processed according to the columns.

Further, after performing pooling operation on the convolution operation result of the image to be processed by row or column, the intermediate result of the pooling operation of each row or column and the operation result of each row or column and the next row or column can be further cached; and repeatedly reading the cached data in a first-in first-out mode until a pooling operation result of the image to be processed is obtained.

For the situation that the pooling module and the convolution module are directly connected by adopting a pipeline structure, the step length and the size of the pooling window can influence the use condition of the caching resource of the pooling module. The number of buffers required for pooling in two pooling steps is described below to minimize the data buffer space.

(1) The pooling window is (posing strand ═ 1, posing size ═ 3)

Assuming that the bit width of the image data is 16 bits, the data is sent to the pooling module in a channel-first manner. Before pooling, 3 registers with the same word width (16 bits) as the input data are needed to be prepared for caching the received data; 2 depth of (w)_i-1) FIFO memories (FIFO _0 and FIFO _1) to buffer the temporarily popped value of the first 1 line. So 2 FIFO memories are necessary, all the depths being (w)_i-1), wherein w_iRepresenting the width of the image. Fig. 9 is a schematic diagram of a process of buffering an intermediate result of a pooling operation by using two FIFO memories according to an embodiment of the present invention, as shown in fig. 9, the buffering process is as follows:

firstly, when receiving the 1 st data A in the 1 st line, storing the data A into data _ buf _ 0;

secondly, when receiving the 2 nd data B in the 1 st line, storing the data B into the data _ buf _ 1;

thirdly, when receiving the 3 rd data C in the 1 st line, storing the data _ buf _2, comparing the data _ buf _0 with the data _ buf _1 and the data _ buf _2, pressing a larger value into the fifo _0, storing the B into the data _ buf _0, and storing the C into the data _ buf _ 1;

fourthly, when receiving the 4 th data D in the 1 st line, storing the data into the data _ buf _2, comparing the data _ buf _0 with the data _ buf _1 and the data _ buf _2, pressing a larger value into the fifo _0, storing the C into the data _ buf _0, and storing the D into the data _ buf _ 1;

fifthly, after the result comparison of the 1 st line is completed, the 1 st number H of the 2 nd line is read and stored into the data _ buf _ 0;

when receiving the 2 nd data I in the 2 nd row, storing the data I into the data _ buf _ 1;

seventhly, when receiving the 3 rd data J in the 2 nd line, storing the data _ buf _2, pop out 1 number in fifo _0, comparing the number with the data _ buf _0, the data _ buf _1 and the data _ buf _2, pressing a larger value into fifo _1, and pressing the data _ buf _0, the data _ buf _1 and the data _ buf _2 into fifo _ 0;

repeating the above steps, as shown in fig. 10, after storing the 0 th row result in fifo _0, obtaining the 1 st row result, calculating the 0 th row result and the 1 st row result, and pushing the results into fifo _1, and pushing the 1 st row result into fifo _0 for caching. When receiving the result of line 2, pop compares the result in fifo _1 with line 2 and stores the result in the buffer module, and at the same time, pushes the second line result into fifo _0 in turn. And repeating the steps, storing all pooling operation results into a cache module by using 2 FIFO memories, and sending out the completion interrupt.

(2) The pooling window is (posing slide ═ 2, posing size ═ 3)

Before pooling, 3 registers with the same word width (16 bits) as the input data are needed to be prepared for caching the received data; 1 is provided with

The deep fifo is used to buffer temporarily pooled values. FIG. 11 is a diagram of a single FIFO memory pair poolAs shown in fig. 11, the process of caching the intermediate result of the chemometrics operation is as follows:

thirdly, when receiving the 3 rd data C in the 1 st line, storing the data into the data _ buf _2, comparing the data _ buf _0 with the data _ buf _1 and the data _ buf _2, pressing a larger value into the fifo, and storing the C into the data _ buf _ 0;

fourthly, when receiving the 4 th data D in the 1 st line, storing the data into the data _ buf _1, when storing the 5 th data E in the 1 st line, storing the data into the data _ buf _2, comparing the data _ buf _0 with the data _ buf _1 and the data _ buf _2, pressing a larger value into the fifo, and storing the E into the data _ buf _ 0;

fifthly, repeating the steps, as shown in fig. 12, storing the result of line 0 in fifo, comparing the number of pop output when the result of line 1 comes out with the result of line 1, and sequentially pressing fifo in. When the 2 nd line results out and pop out, the larger number is stored in SRAM, and the 2 nd line data is pressed into fifo. And repeating the steps in sequence until all the results are stored in the SRAM.

It should be noted that, for the case of channel-first (i.e. performing pooling operation on multi-channel image data by using multiple operators), in the embodiment of the present invention, FIFOs are used as much as possible to store intermediate results, and the FIFO is repeatedly read by minimizing clock cycles, so as to achieve the purpose of saving more resources.

In addition, it should be noted that the way in which pooling occurs in convolutional neural networks varies from network to network. The method is characterized by comprising the following steps of maximum pooling, average pooling and random pooling. In the pooling structure proposed herein, only the algorithm of the operator needs to be changed to accommodate the different pooling approaches. For example, the maximum value is pooled, the maximum value only needs to be stored in the pooled cache all the time, and the maximum value is continuously updated in the pooled range; the average value pooling needs to store the weight of the data occupying the pooling range every time, and the final average value is obtained continuously by the phase weighted value; the pooling of random values requires that a random vector be generated each time before data is read in, and the value of the position pointed by the vector is recorded through the random vector. The pooling approach is not generally a bottleneck in the design of the pooled module, nor is the module subject to extensive modifications in hardware applications due to changes in the pooling approach. The computing behavior of the operator is changed according to the pooling mode corresponding to the network, so that the method can adapt to most pooling scenes.

The embodiment of the present invention further provides a computer device, which is used to solve the technical problem that a pooling layer in an existing convolutional neural network directly performs pooling operation on a convolutional operation result output by a convolutional layer, so that a data cache space is large.

An embodiment of the present invention further provides a computer-readable storage medium, which is used to solve the technical problem that a pooling layer in an existing convolutional neural network directly performs pooling operation on a convolutional operation result output by a convolutional layer, so that a data cache space is large.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image processing method based on a convolutional neural network, comprising:

acquiring a convolution operation result of an image to be processed, wherein the image to be processed is a multi-channel image, and the convolution operation result comprises image data of a plurality of channels;

caching the convolution operation result of the image to be processed according to the convolution operation sequence;

reading a cached convolution operation result according to the pooling operation sequence;

acquiring image data of each channel contained in a convolution operation result;

performing pooling operation on each channel image data contained in the convolution operation result to obtain a pooling operation result of each channel image data;

merging the pooling operation results of all channel image data to obtain the pooling operation result of the image to be processed;

performing pooling operation on each channel image data included in the convolution operation result to obtain a pooling operation result of each channel image data, wherein the pooling operation result includes any one of the following:

and on the basis of a parallel mode, performing pooling operation on each channel of image data contained in the convolution operation result by adopting a plurality of operators to obtain a pooling operation result of each channel of image data.

2. The method of claim 1, wherein performing a pooling operation on each channel of image data included in the convolution operation result by using an operator based on a time division multiplexing method to obtain a pooling operation result of each channel of image data comprises:

caching the image data of each channel contained in the convolution operation result;

reading the image data of each channel cached in a first-in first-out mode;

performing pooling operation on the read image data of each channel;

caching the pooling operation result of each channel of image data;

and outputting the pooling operation result of the image data of each channel in a first-in first-out mode.

3. The method of claim 1, wherein performing a pooling operation on each channel of image data included in the convolution operation result using a plurality of operators on a parallel basis to obtain a pooling operation result for each channel of image data comprises:

if the number of the operators is larger than the number of the channels, directly adopting the operators with the multiple of the number of the channels to perform pooling operation on each channel of image data contained in the convolution operation result to obtain a pooling operation result of each channel of image data;

and if the number of the operators is less than the number of the channels, performing pooling operation on each channel of image data contained in the convolution operation result in an operator multiplexing mode to obtain a pooling operation result of each channel of image data.

4. An image processing method based on a convolutional neural network, comprising:

determining a convolution operation sequence according to the pooling operation sequence;

outputting a convolution operation result of an image to be processed according to the convolution operation sequence, wherein the image to be processed is a multi-channel image, and the convolution operation result comprises image data of a plurality of channels;

caching the pooling operation result, and determining a pooling operation sequence according to the pooling operation result;

5. The method of claim 4, wherein pooling the convolution operation results of the image to be processed to obtain a pooled operation result comprises:

acquiring the line width and the column width of a convolution operation result;

if the line width is smaller than the column width, performing pooling operation on the convolution operation result of the image to be processed according to the lines;

and if the line width is larger than the column width, performing pooling operation on the convolution operation result of the image to be processed according to the columns.

6. The method of claim 4, wherein after pooling the results of the convolution operations on the image to be processed by row or column, the method further comprises:

caching the middle result of the pooling operation of each row or column and the operation result of each row or column and the next row or column;

and repeatedly reading the cached data in a first-in first-out mode until the pooling operation result of the image to be processed is obtained.

7. An image processing apparatus based on a convolutional neural network, comprising:

the convolution module is used for outputting a convolution operation result of an image to be processed, wherein the image to be processed is a multi-channel image, and the convolution operation result comprises image data of a plurality of channels;

the buffer module is connected with the convolution module and used for buffering the convolution operation result output by the convolution module according to the convolution operation sequence of the convolution module;

the pooling module is connected with the caching module and used for reading a cached convolution operation result according to a pooling operation sequence of the pooling module, acquiring each channel image data contained in the convolution operation result, performing pooling operation on each channel image data contained in the convolution operation result to obtain a pooling operation result of each channel image data, and merging the pooling operation results of all channel image data to obtain a pooling operation result of the image to be processed;

the pooling module is further used for performing pooling operation on each channel image data contained in the convolution operation result by adopting an operator based on a time division multiplexing mode to obtain a pooling operation result of each channel image data; or based on a parallel mode, performing pooling operation on each channel image data contained in the convolution operation result by adopting a plurality of operators to obtain a pooling operation result of each channel image data.

8. An image processing apparatus based on a convolutional neural network, comprising:

the convolution module is used for determining a convolution operation sequence according to the pooling operation sequence and outputting a convolution operation result of an image to be processed according to the convolution operation sequence, wherein the image to be processed is a multi-channel image, and the convolution operation result comprises image data of a plurality of channels;

the pooling module is connected with the convolution module and used for acquiring each channel image data contained in a convolution operation result, performing pooling operation on each channel image data contained in the convolution operation result to obtain a pooling operation result of each channel image data, and merging the pooling operation results of all the channel image data to obtain a pooling operation result of the image to be processed;

the cache module is respectively connected with the pooling module and the convolution module and is used for caching the result of the pooling operation output by the pooling module and determining the pooling operation sequence of the pooling module according to the result of the pooling operation of the pooling module;