CN114758209B

CN114758209B - Convolution result obtaining method and device, computer equipment and storage medium

Info

Publication number: CN114758209B
Application number: CN202210668794.3A
Authority: CN
Inventors: 钱祎剑; 张斌; 沈小勇; 吕江波
Original assignee: Suzhou Simou Intelligent Technology Co ltd; Shenzhen Smartmore Technology Co Ltd
Current assignee: Suzhou Simou Intelligent Technology Co ltd; Shenzhen Smartmore Technology Co Ltd
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-09-02
Anticipated expiration: 2042-06-14
Also published as: CN114758209A

Abstract

The application provides a convolution result obtaining method and device, computer equipment and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring a characteristic diagram of an image to be processed under each channel; for a feature map under each channel, sliding a sliding window with a preset size on the feature map by a preset step length, and for a current feature matrix in the sliding window, extracting a first preset amount of data from the current feature matrix each time according to a first preset rule; determining a weight matrix based on the convolution kernel and the convolution kernel transformation matrix, and adjusting the position of each data in the weight matrix according to a second preset rule to obtain an adjusted weight matrix; and determining a convolution result corresponding to the image to be processed based on the first preset amount of data extracted each time under each channel and the adjusted weight matrix. By adopting the method and the device, the utilization rate of the computing unit can be improved, and further the consumption of logic resources is reduced.

Description

Convolution result obtaining method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a convolution result obtaining method and apparatus, a computer device, and a storage medium.

Background

In the artificial intelligence era, the application of Convolutional Neural Networks (CNNs) in various scenarios involves a large number of multiplication operations. The Winograd algorithm is used as a fast convolution algorithm, the multiplier resource optimization effect of CNN convolution calculation is obvious, and the calculation process of Winograd for optimizing 3 multiplied by 3 convolution is as follows: s = A ^T [(GgG ^T )·(B ^T dB)]A, inputting a sliding window with the size of 4 multiplied by 4 and step 2 on a feature map d, and for data in the sliding window, a Winograd operation process comprises B calculation and B ^T Calculation, matrix dot product, A ^T And (4) calculating and A calculating. Since the B and A matrices contain only 1, 0 and-1, the convolution operation can be done with adders instead of multipliers. However, Winograd needs to go through 4 matrix transformations (B) in the calculation process ^T d、B ^T dB、A ^T And a), making Winograd's optimization of multiplier resources become consumption of other logic resources, and CNN deployment difficult to apply on low-cost FPGA (Field Programmable Gate Array) or small ASIC (Application Specific Integrated Circuit).

In the current common convolution result obtaining method, a large number of registers are used for data caching in a production line due to the fact that the total data bit width is increased after matrix transformation; meanwhile, in order to process the increased data in parallel, adders are correspondingly increased, and the adders are in an idle state when the data cache is not filled, so that the logic resources are consumed, and the high utilization rate is not achieved. Therefore, the conventional convolution result acquisition method has the problem of large logic resource consumption.

Disclosure of Invention

The application provides a convolution result obtaining method, a convolution result obtaining device, computer equipment and a storage medium, which can improve the utilization rate of a computing unit and further reduce the consumption of logic resources.

In a first aspect, the present application provides a convolution result obtaining method, including:

acquiring a characteristic diagram of an image to be processed under each channel;

for a feature map under each channel, sliding a sliding window with a preset size on the feature map by a preset step length, and extracting a first preset amount of data from a current feature matrix in the sliding window every time according to a first preset rule for the current feature matrix in the sliding window;

determining a weight matrix based on the convolution kernel and the convolution kernel transformation matrix, and adjusting the position of each data in the weight matrix according to a second preset rule to obtain an adjusted weight matrix;

and determining a convolution result corresponding to the image to be processed based on the first preset amount of data extracted each time under each channel and the adjusted weight matrix.

In a second aspect, the present application further provides a convolution result obtaining apparatus, including:

the first acquisition module is used for acquiring a characteristic map of the image to be processed under each channel;

the extraction module is used for sliding a sliding window with a preset size on the feature map by a preset step length according to the feature map under each channel, and extracting a first preset amount of data from a current feature matrix in the sliding window every time according to a first preset rule;

the adjusting module is used for determining a weight matrix based on the convolution kernel and the convolution kernel transformation matrix, and adjusting the position of each datum in the weight matrix according to a second preset rule to obtain an adjusted weight matrix;

and the second acquisition module is used for determining a convolution result corresponding to the image to be processed based on the first preset amount of data extracted each time under each channel and the adjusted weight matrix.

In a third aspect, the present application further provides a computer device, where the computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the convolution result obtaining method when executing the computer program.

In a fourth aspect, the present application further provides a computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above convolution result obtaining method.

In a fifth aspect, the present application further provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the convolution result acquisition method described above.

According to the method, the characteristic diagrams of the images to be processed in each channel are extracted from the current characteristic matrix in the sliding window in a sliding window form and a first preset rule sequence, so that the input bandwidth of the characteristic diagrams can be compressed, and the consumption of logic resources is reduced; and a weight matrix determined based on the convolution kernel and the convolution kernel transformation matrix is adjusted according to a second preset rule, and a convolution result corresponding to the image to be processed is determined based on the first preset amount of data extracted each time under each channel and the adjusted weight matrix, so that the utilization rate of a computing unit can be further improved, and further the consumption of logic resources is reduced.

Drawings

Fig. 1 is an application environment diagram of a convolution result obtaining method according to an embodiment of the present application;

fig. 2 is a first flowchart illustrating a convolution result obtaining method according to an embodiment of the present application;

fig. 3 is a flowchart illustrating a convolution result obtaining method according to an embodiment of the present application;

fig. 4 is a schematic sub-flow chart of S860 according to an embodiment of the present disclosure;

fig. 5 is a schematic view of a sub-flow of S864 according to an embodiment of the present disclosure;

fig. 6 is a third schematic flowchart of a convolution result obtaining method according to an embodiment of the present application;

fig. 7 is a schematic flowchart of a convolution result obtaining method according to an embodiment of the present application;

fig. 8 is a data pipeline structure diagram of a convolution result obtaining method according to an embodiment of the present application;

fig. 9 is a schematic diagram of an input method of a current feature matrix of a feature map according to an embodiment of the present application;

FIG. 10 is a schematic diagram illustrating a method for adjusting positions of data in a weight matrix according to an embodiment of the present disclosure;

FIG. 11 shows a block diagram B according to an embodiment of the present application ^T A schematic diagram of a module calculation method;

fig. 12 is a schematic diagram illustrating a processing method of edge position data according to an embodiment of the present disclosure;

fig. 13 is a schematic diagram of a calculation method of a module B according to an embodiment of the present disclosure;

FIG. 14 shows a block diagram A provided in the embodiments of the present application ^T A, a schematic diagram of a module calculation method;

fig. 15 is a block diagram of a convolution result obtaining apparatus according to an embodiment of the present application;

fig. 16 is an internal structural diagram of a computer device according to an embodiment of the present application;

fig. 17 is an internal structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

The convolution result obtaining method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the computer device 102 communicates with the server 104 over a communication network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The computer device 102 acquires a feature map of the image to be processed under each channel; for a feature map under each channel, sliding a sliding window with a preset size on the feature map by a preset step length, and for a current feature matrix in the sliding window, extracting a first preset amount of data from the current feature matrix each time according to a first preset rule; determining a weight matrix based on the convolution kernel and the convolution kernel transformation matrix, and adjusting the position of each data in the weight matrix according to a second preset rule to obtain an adjusted weight matrix; and determining a convolution result corresponding to the image to be processed based on the first preset amount of data extracted each time in each channel and the adjusted weight matrix, and sending the convolution result to the server 104. The computer device 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or a server cluster comprised of multiple servers.

In some embodiments, as shown in fig. 2, a convolution result obtaining method is provided, which is described by taking the method as an example applied to the computer device 102 in fig. 1, and includes the following steps:

and S200, acquiring characteristic diagrams of the image to be processed under each channel.

Wherein, the channel refers to the channel of the input layer of the convolutional neural network, also called as the input channel; the convolution kernels for the various input channels are not identical. The characteristic diagram refers to the characteristic diagram under each input channel of the convolutional neural network; and obtaining a characteristic diagram of the image to be processed after the image to be processed is convolved by a filter in a convolutional neural network. Since one to-be-processed image corresponds to one input channel, usually, the to-be-processed image includes at least one image, and the input channel includes a plurality of channels, a feature map of the to-be-processed image under each channel is obtained, that is, a feature map of each to-be-processed image in the plurality of to-be-processed images under the corresponding input channel is obtained.

S400, for the feature map under each channel, sliding a sliding window with a preset size on the feature map by a preset step length, and extracting a first preset amount of data from the current feature matrix every time according to a first preset rule for the current feature matrix in the sliding window.

The step length and the sliding window are basic operations for constructing the convolutional neural network, the sliding window refers to a window with a preset size, and the convolution kernel moves the sliding window with the preset size on the feature map under each channel every time according to the sequence from left to right and from top to bottom so as to traverse each pixel of the feature map; the length of each moving sliding window is the step length; the current feature matrix in the sliding window is a matrix formed by feature data corresponding to the current sliding window in a feature map, for example, a feature map with a size of 24 × 24, the sliding window with a preset size of 4 × 4 is used, a preset step size is 2, padding is 1, padding is a circle of pixels filled in the edge of the feature map, each pixel of the feature map can be traversed after sliding for 144 times, the size of the current feature matrix in the sliding window is 4 × 4, if the feature map is according to the size of the current feature matrix in the sliding window, 4 × 4 feature data needs to be input into the convolutional neural network each time, in the present application, the feature map extracts a first preset number of data from the current feature matrix each time according to a first preset rule, the feature map inputs the data of the first preset data into the convolutional neural network each time according to the data of the size of the current feature matrix in the sliding window each time, the input bandwidth of the feature map can be compressed.

S600, determining a weight matrix based on the convolution kernel and the convolution kernel transformation matrix, and adjusting the position of each data in the weight matrix according to a second preset rule to obtain the adjusted weight matrix.

The convolution kernel in the convolution neural network is used for sequentially performing convolution with image blocks at different positions in the feature map of the image to be processed under each channel to obtain an output image; the output image is output through the output channels of the convolutional layers of the convolutional neural network, each input channel corresponds to one convolutional kernel, one output channel corresponds to one group of convolutional kernels, the number of the group of convolutional kernels is equal to the number of the input channels, for example, Ni input channels and 1 output channel, and the number of the convolutional kernels is equal to Ni; for another example, the number of the convolution kernels is Ni × No; the convolution kernel transformation matrix is used for adjusting the weight of a convolution kernel, and determining a weight matrix after the convolution kernel is convolved with the convolution kernel transformation matrix; the number of the weight matrixes is equal to the number of convolution kernels, the size of the weight matrixes can not be equal to the size of the convolution kernels, the position of each data of the weight matrixes is determined, the positions of the data in the weight matrixes are adjusted according to a second preset rule in the application to obtain the adjusted weight matrixes, the number of the adjusted weight matrixes is equal to the number of the convolution kernels, and the adjusted weight matrixes can be used for being sequentially convolved with image blocks at different positions in a feature map of each channel of an image to be processed to obtain an output image; the adjusted weight matrix has the same size as the weight matrix, the same data and different positions of the data.

And S800, determining a convolution result corresponding to the image to be processed based on the first preset amount of data extracted in each channel each time and the adjusted weight matrix.

The image to be processed is convoluted by a convolutional neural network and then outputs a convolution result through an output channel, a plurality of groups of data with a first preset number are obtained after multiple extraction according to the size and the step length of a sliding window based on the data with the first preset number extracted from a characteristic diagram each time under each input channel, a plurality of groups of data are extracted from the data with the first preset number according to the sequence of the input of the data with the first preset number, each group of data in the plurality of groups of data form a plurality of matrixes, the size of each matrix in the plurality of matrixes is equal to the size of a current characteristic matrix in the sliding window, each matrix in the plurality of matrixes is respectively convoluted with an adjusted weight matrix to obtain a plurality of convolution calculation results, the plurality of convolution calculation results are summed according to the channels to obtain a plurality of groups of summed convolution calculation results, the number of the summed convolution calculation results is equal to the number of the convolution neural network output channels, and the summed convolution calculation results are convolution results corresponding to the image to be processed; for example, there are 3 input channels, 1 output channel, the feature map size under each input channel is 24 × 24, the sliding window size is 4 × 4, the step size is 2, padding =1, the 3 adjusted weight matrices are all 4 × 4, one input channel corresponds to one adjusted weight matrix, 2 data in the feature map are extracted each time according to a first preset rule under each input channel, one 4 × 4 matrix is obtained in 8 cycles, the feature map of one channel is traversed in 576 cycles, 144 × 4 matrices are obtained, 432 4 × 4 matrices are obtained in 3 channels, Winograd convolution calculation is performed on each matrix in 144 × 4 matrices under each channel in 3 channels and the adjusted weight matrix corresponding to the matrix, 432 convolution calculation results are obtained, 432 convolution calculation results are summed up respectively, 1 convolution calculation result of 24 × 24 is obtained, the convolution calculation result is the convolution result corresponding to the image to be processed.

In the convolution result obtaining method, a first preset amount of data is extracted from a current characteristic matrix in a sliding window each time by using the characteristic diagram of the image to be processed in each channel in the form of the sliding window and the sequence of a first preset rule, so that the input bandwidth of the characteristic diagram can be compressed, and the consumption of logic resources is reduced; and determining a convolution result corresponding to the image to be processed based on the first preset quantity of data extracted each time under each channel and the adjusted weight matrix, thereby further improving the utilization rate of the computing unit and further reducing the consumption of logic resources.

In some embodiments, as shown in fig. 3, determining a convolution result corresponding to the image to be processed based on the first preset amount of data extracted each time under each channel and the adjusted weight matrix includes:

s820, aiming at each channel, determining target data which is not subjected to first conversion processing from the first preset number of data extracted each time; according to the first input transformation matrix, performing first transformation processing on the target data to obtain a first transformation processing result corresponding to the target data;

s840, determining a first conversion processing matrix corresponding to the current feature matrix according to a first conversion processing result corresponding to data subjected to first conversion processing in the first preset amount of data extracted each time and a first conversion processing result corresponding to target data;

and S860, determining a convolution result corresponding to the image to be processed based on the first conversion processing matrix corresponding to the current feature matrix in each channel and the adjusted weight matrix.

In this embodiment, for each channel, from the first preset number of data extracted each time, target data not subjected to the first conversion processing is determined, specifically: and acquiring the latest input first preset amount of data as target data which is not subjected to conversion processing according to the input sequence of the first preset amount of data from the extracted first preset amount of data, wherein the number of the target data which is not subjected to conversion processing is equal to the first preset amount.

According to the first input transformation matrix, performing first transformation processing on the target data to obtain a first transformation processing result corresponding to the target data, specifically: performing convolution calculation on a first preset number of rows of data in the first input transformation matrix and target data which is not subjected to first transformation processing to obtain an initial transformation processing result corresponding to the target data, converting the target data which is not subjected to the first transformation processing into the target data which is subjected to the first transformation processing through the first transformation processing, inputting the target data which is subjected to the first transformation processing into a register for caching, and enabling the initial transformation processing result corresponding to the target data to include a first preset number of columns of data. The data in the first input transform matrix includes only 0, 1 and-1, and thus the first transform process only requires the use of an adder and a subtractor. And continuously acquiring target data which is not subjected to the first conversion processing, combining the target data which is not subjected to the first conversion processing with the target data which is subjected to the first conversion processing, and performing convolution calculation with a first preset number of rows of data in the first input conversion matrix to obtain a first conversion processing result corresponding to the target data, wherein the first conversion processing result corresponding to the target data comprises a first preset number of columns of data.

According to a first conversion processing result corresponding to data which is subjected to first conversion processing in the first preset number of extracted data each time and a first conversion processing result corresponding to target data, determining column data of a first conversion processing matrix corresponding to the current feature matrix, collecting the column data of the first conversion processing matrix corresponding to the current feature matrix, obtaining the first conversion processing matrix corresponding to the current feature matrix, performing convolution calculation on the first conversion processing matrix corresponding to the current feature matrix under each channel and the adjusted weight matrix respectively, and determining a convolution result corresponding to the image to be processed.

In the solution of the foregoing embodiment, a first conversion processing result corresponding to target data is obtained by performing a first conversion processing on data that is not subjected to a first conversion processing and a first input conversion matrix, the first conversion processing only needs to use an adder and a subtractor, and the matrix calculation is simplified into an addition and subtraction calculation, so that the calculation efficiency is improved, the data that is subjected to the first conversion processing is cached in a register, and only needs a first preset number of registers, so that the resource consumption of the register can be saved, the first conversion processing matrix corresponding to the current feature matrix is determined according to the first conversion processing result corresponding to the data that is subjected to the first conversion processing and the first conversion processing result corresponding to the target data, and the utilization rate of the calculation unit can be further improved by determining the convolution result corresponding to the image to be processed based on the first conversion processing matrix corresponding to the current feature matrix under each channel and the adjusted weight matrix, thereby reducing the consumption of logic resources.

In some embodiments, as shown in fig. 4, determining a convolution result corresponding to the image to be processed based on the first transformation processing matrix corresponding to the current feature matrix in each channel and the adjusted weight matrix includes:

s862, extracting a second preset number of line data from the first conversion processing matrix corresponding to the current feature matrix for each channel; determining a second conversion processing result corresponding to the second preset number of row data based on the second preset number of row data and the second input transformation matrix;

s864, determining a convolution result corresponding to the image to be processed based on the second conversion processing result corresponding to the second preset number of line data in each channel and the adjusted weight matrix.

In this embodiment, for each channel, a second preset number of rows of data is extracted from the first conversion processing matrix corresponding to the current feature matrix each time, the second preset number of rows of data is convolved with the second input conversion matrix, and a second conversion processing result corresponding to the second preset number of rows of data is determined, where the data in the second input conversion matrix only includes 0, 1, and-1, so that the second conversion processing only needs an adder and a subtractor, the second conversion processing result corresponding to the second preset number of rows of data is part of data in the matrix, the number of part of data is the second preset number, after extracting the second preset number of rows of data for multiple times, the second conversion processing result corresponding to the second preset number of rows of data is determined, and the second conversion processing results corresponding to the second preset number of rows of data can form a matrix, and performing convolution calculation on the second conversion processing result corresponding to the second preset number of rows of data in each channel and the adjusted weight matrix under each channel to determine a convolution result corresponding to the image to be processed.

According to the scheme of the embodiment, each time the second preset number of row data is extracted from the first conversion processing matrix corresponding to the current feature matrix, the second preset number of row data in the first conversion processing matrix corresponding to the current feature matrix can be ensured to be subjected to matrix conversion with the second input conversion matrix, the second conversion processing results corresponding to the second preset number of row data are sequentially obtained, the output sequence of the second conversion processing results corresponding to the second preset number of row data is determined by determining the input sequence of the first conversion processing matrix corresponding to the current feature matrix, so that the second conversion processing results corresponding to the second preset number of row data and the adjusted weight matrix are calculated to obtain the convolution result corresponding to the image to be processed, the second conversion processing only needs to use an adder and a subtracter, and the matrix calculation is simplified into addition and subtraction calculation, the calculation efficiency is improved.

In some embodiments, as shown in fig. 5, determining a convolution result corresponding to the image to be processed based on the second conversion processing result corresponding to the second preset number of rows of data in each channel and the adjusted weight matrix includes:

s865, for each channel, performing a dot product process on the adjusted weight matrix and a second conversion processing result corresponding to a second preset number of line data to obtain a third conversion processing result corresponding to a second preset number of line data;

s866, accumulating and processing third conversion processing results corresponding to the second preset number of the line data in each channel to obtain fourth conversion processing results corresponding to the second preset number of the line data;

s867, determining a second conversion processing matrix corresponding to the current feature matrix based on a fourth conversion processing result corresponding to each extracted second preset number of rows of data;

and S868, determining a convolution result corresponding to the image to be processed based on the second conversion processing matrix, the first output transformation matrix and the second output transformation matrix corresponding to the current characteristic matrix.

In the embodiment, the dot multiplication calculation is to multiply corresponding elements in two matrices in the calculation of the two matrices to obtain a dot multiplication result of the two matrices; for each channel, performing dot product processing on the adjusted weight matrix and a second conversion processing result corresponding to a second preset number of line data to obtain a third conversion processing result corresponding to a second preset number of line data, specifically: multiplying the data in the adjusted weight matrix in each channel with the data in the second conversion processing result corresponding to the second preset number of line data at the corresponding position respectively to obtain a third conversion processing result corresponding to the second preset number of line data, wherein the third conversion processing result corresponding to the second preset number of line data can be a matrix, accumulating and processing the third conversion processing results corresponding to the second preset number of line data in each channel according to the channels to obtain fourth conversion processing results corresponding to the second preset number of line data, for example, the number of input channels is 3, the number of output channels is 1, the third conversion processing results corresponding to the second preset number of line data of each input channel in the 3 input channels are summed to obtain 1 output result, the result of the output channel is the fourth conversion processing result corresponding to the second preset number of line data. The second conversion processing results corresponding to the current feature matrix comprise a plurality of second conversion processing results, a plurality of matrixes are formed by fourth conversion processing results corresponding to the second preset number of rows of data extracted each time according to the extraction sequence, the plurality of matrixes are second conversion processing matrixes corresponding to the current matrixes, the current feature matrixes are convolved with the first output transformation matrixes and the second output transformation matrixes respectively to obtain convolution results, and the convolution results are convolution results corresponding to the images to be processed.

In the scheme of the above embodiment, for each channel, the dot product processing and the accumulation and processing are performed on the adjusted weight matrix and the second conversion processing result corresponding to the second preset number of pieces of line data to obtain the fourth conversion processing result corresponding to the second preset number of pieces of line data, and the second conversion processing result is taken each time for performing the dot product on the second conversion processing result corresponding to the second preset number of pieces of line data and the adjusted weight matrix to determine the output order of the fourth conversion processing result corresponding to the second preset number of pieces of line data, so that the second conversion processing matrix corresponding to the current feature matrix performs convolution calculation with the first output transformation matrix and the second output transformation matrix according to the output order of the fourth conversion processing result corresponding to the second preset number of pieces of line data respectively to obtain the convolution result corresponding to the image to be processed, and the utilization rate of the calculation unit can be further improved, thereby reducing the consumption of logic resources.

In some embodiments, as shown in fig. 6, the predetermined size is 4 × 4, and the first predetermined number is 2; according to a first preset rule, extracting a first preset amount of data from the current feature matrix every time, wherein the method comprises the following steps:

s420, extracting data of the 2 nd row and the 3 rd row in the 1 st column in the current feature matrix, and then extracting data of the 1 st row and the 4 th row in the 1 st column; then extracting data of a 2 nd row and a 3 rd row in a 2 nd column in the current feature matrix, and then extracting data of a 1 st row and a 4 th row in the 2 nd column; then extracting data of the 2 nd row and the 3 rd row in the 3 rd column in the current feature matrix, and then extracting data of the 1 st row and the 4 th row in the 3 rd column; then, the data of the 2 nd row and the 3 rd row in the 4 th column in the current feature matrix are extracted, and then the data of the 1 st row and the 4 th row in the 4 th column are extracted.

In this embodiment, after the current feature matrix of each sliding window is extracted, the sliding window is slid on the feature map from left to right, and the preset step length is set from top to bottom each time, so as to obtain a new sliding window and a new current feature matrix, and each new sliding window and new current feature matrix extracts 2 data from the new current feature matrix each time according to a first preset rule until the corresponding data in the feature map is traversed by the sliding window.

In the scheme of the embodiment, by setting the preset size of the sliding window to be 4 × 4 and the first preset number to be 2, the sliding window of 4 × 4 is slid on the feature map by a preset step length, the size of the current feature matrix in the sliding window of 4 × 4 is 4 × 4, and according to the first preset rule, 2 pieces of data are extracted from the current feature matrix each time, so that the input bandwidth of the feature map can be compressed, and the resource consumption of the computing unit is reduced.

In some embodiments, as shown in fig. 7, the convolution kernel size is 3 × 3, the weight matrix size is 4 × 4, and the adjusting of the position of each data in the weight matrix according to the second preset rule to obtain the adjusted weight matrix includes:

s620, arranging the data of the 2 nd row and the 2 nd column, the 3 rd row and the 2 nd column, the 2 nd row and the 3 rd column and the 3 rd row and the 3 rd column in the weight matrix in the adjusted 1 st row and the 1 st column, the 1 st row and the 2 nd column, the 1 st row and the 3 rd column and the 1 st row and the 4 th column in the weight matrix;

s640, arranging the data in the row 1, the column 2, the row 4, the column 2, the row 1, the column 3 and the row 4, the column 3 in the weight matrix in the row 2, the column 1, the row 2, the column 2, the row 2, the column 3 and the row 2, the column 4 in the adjusted weight matrix;

s660, arranging the data of the 2 nd row, the 1 st column, the 3 rd row, the 1 st column, the 2 nd row, the 4 th column and the 3 rd row, the 4 th column in the weight matrix in the adjusted weight matrix in sequence in the 3 rd row, the 1 st column, the 3 rd row, the 2 nd column, the 3 rd row, the 3 rd column and the 3 rd row, the 4 th column;

s680, the data in the 1 st row, the 1 st column, the 4 th row, the 1 st column, the 1 st row, the 4 th column and the 4 th row, the 4 th column in the weight matrix are arranged in the adjusted 4 th row, the 1 st column, the 4 th row, the 2 nd column, the 4 th row, the 3 rd column and the 4 th row, the 4 th column in the weight matrix.

In this embodiment, the size of a convolution kernel is selected to be 3 × 3, the convolution kernel transform matrix includes a first convolution kernel transform matrix and a second convolution kernel transform matrix, the first convolution kernel transform matrix is a transpose of the second convolution kernel transform matrix, the size of the first convolution kernel transform matrix is 4 × 3, the size of the second convolution kernel transform matrix is 3 × 4, a convolution kernel is convolved with the first convolution kernel transform matrix and the second convolution kernel transform matrix respectively, the size of an obtained weight matrix is 4 × 4, the position of each data in the weight matrix is adjusted according to a second preset rule, an adjusted weight matrix is obtained, and data in the adjusted weight matrix can be input into the convolution neural network in parallel to participate in convolution calculation.

In the scheme of the embodiment, the positions of the data in the weight matrix are adjusted according to the second preset rule by selecting the convolution kernel with the size of 3 × 3 and the weight matrix with the size of 4 × 4, so as to obtain the adjusted weight matrix, and the adjusted weight matrix obtained by the second preset rule ensures that when the adjusted weight matrix and the second conversion processing result corresponding to the second preset number of row data are subjected to dot product processing, the sequence of the data in the adjusted weight matrix is the same as the sequence of the data in the second conversion processing result corresponding to the second preset number of row data in the subsequent calculation, which is favorable for reducing the consumption of logic resources.

In some embodiments, the second preset number is 2, and extracting, each time from the first conversion processing matrix corresponding to the current feature matrix, a second preset number of line data includes: firstly, extracting the data of the 2 nd row and the 3 rd row in the first conversion processing matrix corresponding to the current feature matrix, and then extracting the data of the 1 st row and the 4 th row in the first conversion processing matrix corresponding to the current feature matrix.

In this embodiment, the second preset number is set to 2, and 2 rows of data are extracted from the first conversion processing matrix corresponding to the current feature matrix each time, specifically: the method comprises the steps of firstly extracting data of a 2 nd row and a 3 rd row in a first conversion processing matrix corresponding to a current feature matrix, then extracting data of a 1 st row and a 4 th row in the first conversion processing matrix corresponding to the current feature matrix, outputting the obtained first conversion processing matrix corresponding to the current feature matrix according to a first conversion processing result of a first preset number each time as a first preset number of data are extracted each time, determining the first conversion processing matrix corresponding to the current feature matrix according to the first conversion processing results of the first preset number, and extracting 2 rows of data from the first conversion processing matrix corresponding to the current feature matrix each time. When the first preset number is set to be 2, the preset size of the sliding window is 4 × 4, and the preset step is 2, the number of the registers configured to buffer the first conversion processing results may be 12, half of the data is duplicated when the current feature matrix of each sliding window is input, the duplicated data may be multiplexed in the convolution calculation of the next sliding window, data in 2 current feature matrices are input every cycle, 2 first conversion processing results are output every cycle, the output condition of the first conversion processing results of 4 cycles is taken, and when 2 first conversion processing results are output in the first cycle, 8 first conversion processing results buffered in the last cycle are added, where the first cycle actually includes 10 first conversion processing results, the 10 first conversion processing results respectively correspond to the 1 st column 4 data, the 2 nd column 4 data, the 3 rd column 2 nd row 1 data and the 3 rd column 3 rd row 1 data in the first conversion processing matrix corresponding to the current feature matrix, when the second cycle outputs 2 first conversion processing results, the multiplexed 10 first conversion processing results buffered in the last cycle are added, the second cycle actually includes 12 first conversion processing results, the 12 first conversion processing results respectively correspond to the 1 st column 4 data, the 2 nd column 4 data and the 3 rd column 4 data in the first conversion processing matrix corresponding to the current feature matrix, when the third cycle outputs 2 first conversion processing results, the multiplexed 12 first conversion processing results buffered in the last cycle are added, the third cycle actually includes 14 first conversion processing results, the 14 first conversion processing results respectively correspond to the 1 st column 4 data, the 2 nd column 4 data, the 3 rd column 4 data, the 4 th column 2 nd row 1 data and the 4 th column 3 rd row 1 data in the first conversion processing matrix corresponding to the current feature matrix, when the fourth cycle outputs 2 first conversion processing results, 12 first conversion processing results buffered in the last cycle are added, the fourth cycle actually includes 14 first conversion processing results, and the 14 first conversion processing results respectively correspond to the 1 st column 1 st row 1 data, the 1 st column 4 th row 1 data, the 2 nd column 4 th row 4 data, the 3 rd column 4 data and the 4 th column 4 data in the first conversion processing matrix corresponding to the current feature matrix.

Firstly, extracting data of a 2 nd row and a 3 rd row in a first conversion processing matrix corresponding to a current feature matrix, and then extracting data of a 1 st row and a 4 th row in the first conversion processing matrix corresponding to the current feature matrix, specifically: in a first period, extracting data of a 2 nd row and a 3 rd row in a first conversion processing matrix corresponding to a current feature matrix, performing convolution calculation on the extracted 2 nd row of data and a 2 nd column and a 3 rd column in a second input conversion matrix to obtain a second conversion processing result corresponding to a group of second preset number of row of data, wherein the second conversion processing result corresponding to the second preset number of row of data is data of a 2 nd row, a 2 rd column, a 2 nd row, a 3 rd column and a 3 rd row, the second conversion processing result corresponding to the second preset number of row of data is obtained in a second period, the second conversion processing result corresponding to the second preset number of row of data is data of a 1 st row and a 4 th row in the first conversion processing matrix corresponding to the current feature matrix, and the extracted 2 nd row of data and the 2 nd column and the 3 rd column in the second input conversion processing matrix, and the second conversion processing result corresponding to the second preset number of row of data is data of a 1 st row, a 2 nd column, the first conversion processing result of the corresponding to the second conversion processing matrix, Extracting data of a 2 nd row, a 2 nd column, a 1 st row, a 3 rd column and a 4 th row, a 3 rd column in a 4 th cycle from a first conversion processing matrix corresponding to the current feature matrix, performing convolution calculation on the extracted 2 nd row of data and the 1 st column and the 4 th column in a second input conversion matrix to obtain a second conversion processing result corresponding to a group of second preset number of row data, wherein the second conversion processing result corresponding to the second preset number of row data is the data of the 1 st row, the 1 st column in the 3 rd row, the 4 th column in the 2 nd row and the 4 th column in the 3 rd row of the corresponding matrix, extracting the data of the 1 st row and the 4 th row in the first conversion processing matrix corresponding to the current feature matrix in a fourth cycle, performing convolution calculation on the extracted 2 nd row of data and the 1 st column and the 4 th column in the second input conversion matrix to obtain a group of second conversion processing result corresponding to the second preset number of row of data, the second conversion processing result corresponding to the second preset number of rows of data is 1 st row and 1 st column, 4 th row and 1 st column, 1 st row and 4 th column and 4 th row and 4 th column data of the corresponding matrix.

In the scheme of the embodiment, by setting the second preset number to be 2, 2 rows of data are extracted from the first conversion processing matrix corresponding to the current feature matrix each time, and the extraction sequence is that the middle two rows are extracted first, and then the upper and lower two rows are extracted.

To describe the convolution result obtaining method and effect in this embodiment in detail, the following description is made with a most detailed embodiment:

for the example of a convolutional neural network based on Winograd algorithm with convolution of 3 × 3, step size of 1, padding =1, and F (2 × 2, 3 × 3), the calculation process of Winograd algorithm is: s = A ^T [(GgG ^T )·(B ^T dB)]A; where d is the current feature matrix, B ^T Is a first input transform matrix, B is a second input transform matrix, G is a convolution kernel, G is a first convolution kernel transform matrix, G is a second convolution kernel transform matrix ^T Transforming the matrix for a second convolution kernel; GgG ^T Is a weight matrix, marked as g'; a. the ^T Is the first output transformation matrix, A is the second output transformation matrix, S is the convolution result corresponding to the image to be processed,

the size of the sliding window is 4 × 4, the step length is 2, the first preset number is 2, and the second preset number is 2, for obtaining the convolution resultThe data pipeline of (A) is as shown in FIG. 8, the pipeline includes B ^T Module, B ^T d data buffer, B module, multiplication module, accumulation sum module and A ^T The input of the pipeline comprises a feature map of the image to be processed under 3 input channels and a pass GgG ^T The output of the pipeline is the convolution result corresponding to the image to be processed of 1 channel, as shown in fig. 9, the input method of the current feature matrix d in a channel feature map is shown schematically, and when padding takes 1, the input method of d is as follows: and starting from each odd row i of d, sequentially transmitting two data of an i +1 th row and an i-1 th row from left to right in each column, and then transmitting two data of an i +2 th row and an i-1 th row, wherein if the row sequence number i exceeds the size of d, 0 is supplemented. GgG ^T The size of the weight matrix g 'of the matrix transformation is 4 × 4, and as shown in fig. 10, the size is a schematic diagram of adjusting the position of each data in the weight matrix, and each weight matrix g' is labeled with serial numbers 0 to 15 according to the sequence of first column and second row, and is divided into 4 partial data and rearranged into the adjusted weight matrix, specifically: arranging data of a 2 nd row, a 2 nd column, a 3 rd row, a 2 nd column, a 2 nd row, a 3 rd column and a 3 rd row, a 3 rd column in the weight matrix g' in a 1 st row, a 2 nd column, a 1 st row, a 3 rd column and a 1 st row, a 4 th column in sequence; arranging the data of the 1 st row, the 2 nd column, the 4 th row, the 2 nd column, the 1 st row, the 3 rd column and the 4 th row, the 3 rd column in the weight matrix g' in the 2 nd row, the 1 st column, the 2 nd row, the 2 nd column, the 2 nd row, the 3 rd column and the 2 nd row, the 4 th column in sequence; arranging the data of the 2 nd row, the 1 st column, the 3 rd row, the 1 st column, the 2 nd row, the 4 th column and the 3 rd row, the 4 th column in the weight matrix g' in the 3 rd row, the 1 st column, the 3 rd row, the 2 nd column, the 3 rd row, the 3 rd column and the 3 rd row, the 4 th column in sequence; and sequentially arranging the data of the 1 st row, the 1 st column, the 4 th row, the 1 st column, the 1 st row, the 4 th column and the 4 th row, the 4 th column in the weight matrix g' in the 1 st row, the 1 st column, the 4 th row, the 2 nd column, the 4 th row, the 3 rd column and the 4 th row, the 4 th column to obtain an adjusted weight matrix, and then inputting the data in the adjusted weight matrix to the multiplication module in parallel. B in a pipeline ^T The module comprises 1 adder and 1 subtracter which can work in two modes of addition and subtraction, wherein the adder and the subtracter are respectively provided with 1 output register which is respectively used for caching the adder and the subtractionThe output result of the device, and two input registers, each of which buffers one input data of d, and B ^T D the calculation result is recorded as D ¹ The superscript symbol (n) indicates that the data is buffered through n cycles, as shown in FIG. 11 as B ^T A schematic diagram of a module calculation method, specifically: b is ^T The module is completed in the following manner ¹ The matrix calculates for each column: b is to be ^T D is decomposed into independent operation on each matrix of D columns, when the input data is data in the middle of a certain column of D, the adder works in an addition mode, and two D are obtained through calculation ¹ Column data D of ¹ ₁ And D ¹ ₂ Wherein D is ¹ ₁ =d ₁ +d ₂ ，D ¹ ₂ =d ₂ -d ₁ Meanwhile, the input register caches 2 d data input; when the input data is data of head and tail of a certain column of D, the adder works in a subtraction mode, and the remaining two D are calculated by the input 2D data and 2D data buffered for 1 cycle in the input register ¹ Column data D of ¹ ₀ And D ¹ ₃ Wherein D is ¹ ₀ =d ₀ -d ₂ ⁽¹⁾ ，D ¹ ₃ =d ₁ ⁽¹⁾ -d ₃ From which d matrices per column are calculated in turn to complete B ^T D operation, because Winograd 3 x 3 convolution algorithm takes out data of 4 x 4 sliding window stepping 2 on D each time, according to the sequence of the first row and the second row, D data taken out of adjacent sliding windows are calculated to obtain D ¹ There will be 2 columns of data repeats, so B ^T The module outputs a D every 4 cycles ¹ And (4) matrix. B in the pipeline ^T The d data cache comprises 6 sets of 2 registers each having an initial data value of 0, and each register has a B buffer ^T The output data of the module forms 1-level data cache by every 2 registers, and can cache D of 6 cycles ¹ Data, D calculated from a column matrix containing 3D ¹ As a result, these data will be treated as two adjacent D' s ¹ Data of matrix is used forBAnd (5) module calculation. When padding =1, there is a case where the edge has a value of 0, and the edge position data input d is shown in fig. 12In the schematic diagram of the processing method, under the condition that the d input sequence is continuous, the input 4 × 4 sliding window meets the condition that the data at the positions of the d line tail and the line head are 0, when the 0 value of the line tail edge padding is met, the data selector is used for selecting 0 data to input the next pipeline, and the value in the data cache is not changed; when a 0 value of the leading edge padding of the line is encountered, the corresponding data register is directly reset. The B module comprises 2 adders and 2 subtractors which can work in two modes of addition and subtraction, wherein each adder and each subtracter are respectively provided with 1 output register, and the B module passes through the B module ^T Output of calculation module and B ^T The data cache completes the calculation of the B module, as shown in FIG. 13, which is a schematic diagram of the calculation method of the B module, and the B module is obtained through 4 cycles of calculation ^T dB matrix, will B ^T The dB calculation result is recorded as D ² Wherein cycle 1: 2 adders operating in addition mode ¹ Carrying out matrix transformation on the middle two rows of data and the middle two columns of B, and calculating to obtain D ² Data in the middle of the matrix; cycle 2: 2 adders operating in addition mode ¹ Carrying out matrix transformation on the upper and lower lines of data and the middle two columns of B, and calculating to obtain D ² Data of the middle left and right parts of the matrix; cycle 3: 2 adders operating in subtraction mode ¹ Carrying out matrix transformation on the data of the middle two rows and the left and right columns of the B, and calculating to obtain D ² Data of the upper and lower parts in the middle of the matrix; cycle 4: 2 adders operating in subtraction mode ¹ Carrying out matrix transformation on the upper and lower lines of data and the left and right columns of data of B, and calculating to obtain D ² The data of the angle part of the matrix 4, and finally the data output of the B module is the same as the input sequence of the adjusted weight matrix. The multiplication module comprises 4 multipliers and 4 1-out-of-4 data selectors, receives data output by the B module and input adjusted weight matrixes to perform dot product calculation, and each period data selector selects 4D output by the B module from the adjusted weight matrixes ² Calculating the dot product of 4 data corresponding to the data, and completing D in 4 periods ² And performing dot product operation with the adjusted weight matrix. The accumulation and summation module sums all 3 input channels corresponding to 1 output channel, specifically: all 3 inputs are connectedIn road D ² Summing with the dot product operation result of the adjusted weight matrix, obtaining 1 summation result matrix every 4 periods, and recording the summation result matrix obtained by calculation as a Y matrix. A. the ^T The module A comprises 4 adders and 4 subtractors, wherein the 4 adders are add1, add2, add3 and add4 respectively, the 4 subtractors are sub1, sub2, sub3 and sub4 respectively, a 2 x 2 matrix is obtained through calculation, and the matrix is marked as C ^out ，C ^out I.e. the convolution result corresponding to the image to be processed, as shown in fig. 14 as a ^T A module calculation method schematic diagram, wherein A ^T The calculation process of the module A is completed in 4 cycles, wherein in the first cycle, the data of the 2 nd row and the 2 nd column, the 3 rd row and the 2 rd column, the 2 nd row and the 3 rd column and the 3 rd row and the 3 rd column of the 2 nd row are input into the Y matrix; in the second period, the Y matrix inputs data of a 1 st row, a 2 nd column, a 4 th row, a 2 nd column, a 1 st row, a 3 rd column and a 4 th row, a 3 rd column; in the third period, the Y matrix inputs data of a 2 nd row, a 1 st column, a 3 rd row, a 1 st column, a 2 nd row, a 4 th column and a 3 rd row, a 4 th column; in the fourth period, the Y matrix inputs the data of the 1 st row, the 1 st column, the 4 th row, the 1 st row, the 4 th column and the 4 th row, and the Y matrix is respectively connected with the A ^T And A is subjected to matrix transformation to obtain C ₀ ^out 、C ₁ ^out 、C ₂ ^out And C ₃ ^out Wherein, C ₀ ^out =Y ₀ +Y ₁ +Y ₂ +Y ₄ +Y ₅ +Y ₆ +Y ₈ +Y ₉ +Y ₁₀ ，C ₁ ^out =Y ₁ -Y ₂ -Y ₃ +Y ₅ -Y ₆ -Y ₇ +Y ₉ -Y ₁₀ -Y ₁₁ ，C ₂ ^out =Y ₄ +Y ₅ +Y ₆ -Y ₈ -Y ₉ -Y ₁₀ -Y ₁₂ -Y ₁₃ -Y ₁₄ ，C ₃ ^out =Y ₅ -Y ₆ -Y ₇ -Y ₉ +Y ₁₀ +Y ₁₁ -Y ₁₃₊ Y ₁₄₊ Y ₁₅ . The usage process of the adder and the subtracter in each period is as follows: first period, the input data of the Y matrix is Y ₅ 、Y ₆ 、Y ₉ 、Y ₁₀ Add1 calculates Y ₅ +Y ₉ Add2 calculationY ₆ +Y ₁₀ Sub1 calculating Y ₅ -Y ₉ Sub2 calculating Y ₆ -Y ₁₀ Leaving 2 adders and 2 subtractors to compute A for the first Winograd round of computation ^T A; second period, the input data of the Y matrix is Y ₁ 、Y ₂ Y13 and Y14, add1 calculates add1+ Y ₁ To obtain Y ₅ +Y ₉ +Y ₁ Add2 calculates add2+ Y ₂ To obtain Y ₆ +Y ₁₀ +Y ₂ Sub1 calculates sub1-Y ₁₃ To obtain Y ₅ -Y ₉ -Y ₁₃ Sub2 calculates sub2-Y ₁₄ To obtain Y ₆ -Y ₁₀ -Y ₁₄ Leaving 2 adders and 2 subtractors idle; in a third period, the input data of the Y matrix is Y ₄ 、Y ₈ 、Y ₇ And Y ₁₁ Add1 calculates add1+ Y ₄ To obtain Y ₅ +Y ₉ +Y ₁ +Y ₄ Add2 calculates add2+ Y ₇ To obtain Y ₆ +Y ₁₀ +Y ₂ +Y ₇ Sub1 calculates sub1-Y ₈ To obtain Y ₅ -Y ₉ -Y ₁₃ -Y ₈ Sub2 calculates sub2-Y ₁₁ To obtain Y ₆ -Y ₁₀ -Y ₁₄ -Y ₁₁ Add3 calculates add2+ Y ₈ To obtain Y ₆ +Y ₁₀ +Y ₂ +Y ₈ Add4 calculates sub2+ Y ₄ To obtain Y ₆ -Y ₁₀ -Y ₁₄ +Y ₄ Sub3 calculates add1-Y ₁₁ To obtain Y ₅ +Y ₉ +Y ₁ -Y ₁₁ Sub4 calculates sub1-Y ₇ To obtain Y ₅ -Y ₉ -Y ₁₃ -Y ₇ (ii) a In the fourth period, input data of a Y matrix are Y0, Y3, Y12 and Y15, add1 calculates add1+ Y0 to obtain Y5+ Y9+ Y1+ Y4+ Y0, add2 calculates add2+ Y3 to obtain Y6+ Y10+ Y2+ Y7+ Y3, sub1 calculates sub1-Y12 to obtain Y5-Y9-Y13-Y8-Y12, sub2 calculates sub2-Y15 to obtain Y6-Y10-Y14-Y11-Y15, add3 calculates add3+0 to obtain Y6+ Y10+ Y10+ Y10, add 10 calculates add 10+ 0 to obtain Y10-Y10-Y10 + Y10, Y10-Y10 + Y10 to obtain Y10-Y10 + Y10, and Y10-Y10-Y10 to obtain Y10+ Y10+ Y10 to obtain sub 10-Y10 + Y10 and Y10 to obtain Y10; the next round A ^T In the first cycle of the A module, add3 calculates add1+ add3 to obtain Y ₅ +Y ₉ +Y ₁ +Y ₄ +Y ₀ +Y ₆ +Y ₁₀ +Y ₂ +Y ₈ I.e. C ₀ ^out And calculating add4+ sub1 by add4 to obtain Y ₆ -Y ₁₀ -Y ₁₄ +Y ₄ +Y ₅ -Y ₉ -Y ₁₃ -Y ₈ -Y ₁₂ I.e. C ₂ ^out Calculating sub3-add2 by sub3 to obtain Y ₅ +Y ₉ +Y ₁ -Y ₁₁ -Y ₆ -Y ₁₀ -Y ₂ -Y ₇ -Y ₃ I.e. C ₁ ^out Calculating sub4-sub2 to obtain Y by sub4 ₅ -Y ₉ -Y ₁₃ -Y ₇ -Y ₆ +Y ₁₀ +Y ₁₄ +Y ₁₁ +Y ₁₅ I.e. C ₃ ^out And the A of the next Winograd round is calculated by remaining 2 adders and 2 subtractors ^T A。

According to the convolution result obtaining method, the characteristic diagrams of the image to be processed in each channel are extracted from the current characteristic matrix in the sliding window in a sliding window mode and a first preset rule sequence, a first preset amount of data are extracted each time, only 2 pieces of characteristic diagram data are required to be input in parallel, the data input bandwidth pressure of a production line is reduced, the data input bandwidth required by a calculation module is reduced, the consumption of logic resources is reduced, and the hardware cost is reduced; the weight matrix determined based on the convolution kernel and the convolution kernel transformation matrix is adjusted according to a second preset rule, the convolution result corresponding to the image to be processed is determined based on the first preset amount of data extracted each time under each channel and the adjusted weight matrix, the utilization rate of a computing unit can be further improved, the consumption of logic resources is further reduced, and only A is used ^T The utilization rate of the adder and the 2 subtractors in the period is 75%, the utilization rate of other computing units reaches 100% under the condition of continuous data input, and the utilization efficiency of the computing units is improved.

In addition, A is ^T The calculation method of the module A has multiple schemes, and is characterized in that the data output sequence of the module B is according to the data output sequence of the module A ^T In A calculation, frequency high-low permutation is used, and the number in the middle of Y matrix is calculated firstFinally, calculating data of 4 corners of the Y matrix; the output data rate may be at A ^T The output of the module A is adjusted by adding a register and a data selector, taking the example of outputting 4 data every 4 cycles for 1 time, and can also be adjusted to output 2 data every 2 cycles for 1 time or output 1 data every 1 cycle by adding the register and the data selector, and if higher output data rate is needed, the data can be realized by a copying pipeline mode; cumulative sum module and A ^T The placing sequence of the module A can be exchanged according to the actual hardware implementation scheme, the final calculation result is not influenced, and the module A is prepositive ^T The A-module will consume more logic resources, and consideration herein does not preclude the presence of multiple A' s ^T A case where the module hardware unit is more optimal than the multiple accumulation and module hardware cost; the computing unit in the above method includes an adder, a subtractor, an adder-subtractor, and a multiplier, and may be implemented by using different IP cores (Intellectual Property cores) or circuits according to an actual hardware scheme, where the case of optimizing resources or pipelines inside the computing unit to improve the performance of the whole circuit when no external data stream is changed is considered not to be excluded.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially shown as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts according to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the application also provides a convolution result acquisition device. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so the specific limitations in the following embodiment of the convolution result obtaining device may refer to the limitations on the convolution result obtaining method in the above, and are not described herein again.

In some embodiments, as shown in fig. 15, there is provided a convolution result acquisition apparatus 100 including:

a first obtaining module 120, configured to obtain feature maps of the to-be-processed image in each channel;

an extracting module 140, configured to slide a sliding window with a preset size on the feature map by a preset step length for the feature map under each channel, and extract, according to a first preset rule, a first preset amount of data from a current feature matrix in the sliding window each time;

the adjusting module 160 is configured to determine a weight matrix based on the convolution kernel and the convolution kernel transformation matrix, and adjust the position of each data in the weight matrix according to a second preset rule to obtain an adjusted weight matrix;

the second obtaining module 180 is configured to determine a convolution result corresponding to the image to be processed based on the first preset amount of data extracted each time in each channel and the adjusted weight matrix.

In some embodiments, in terms of determining a convolution result corresponding to the image to be processed based on the first preset amount of data extracted each time in each channel and the adjusted weight matrix, the second obtaining module 180 is specifically configured to:

for each channel, determining target data which is not subjected to first conversion processing from the first preset number of data extracted each time; according to the first input transformation matrix, performing first transformation processing on the target data to obtain a first transformation processing result corresponding to the target data;

determining a first conversion processing matrix corresponding to the current feature matrix according to a first conversion processing result corresponding to data subjected to first conversion processing in a first preset amount of extracted data each time and a first conversion processing result corresponding to target data;

and determining a convolution result corresponding to the image to be processed based on the first conversion processing matrix corresponding to the current characteristic matrix under each channel and the adjusted weight matrix.

In some embodiments, in terms of determining a convolution result corresponding to the image to be processed based on the first conversion processing matrix corresponding to the current feature matrix in each channel and the adjusted weight matrix, the second obtaining module 180 is specifically configured to:

for each channel, extracting a second preset number of row data from the first conversion processing matrix corresponding to the current feature matrix each time; determining a second conversion processing result corresponding to the second preset number of row data based on the second preset number of row data and the second input transformation matrix;

and determining a convolution result corresponding to the image to be processed based on a second conversion processing result corresponding to a second preset number of row data under each channel and the adjusted weight matrix.

In some embodiments, in terms of determining a convolution result corresponding to the image to be processed based on the second conversion processing result corresponding to the second preset number of line data in each channel and the adjusted weight matrix, the second obtaining module 180 is specifically configured to:

performing dot product processing on the adjusted weight matrix and a second conversion processing result corresponding to a second preset number of line data for each channel to obtain a third conversion processing result corresponding to the second preset number of line data;

accumulating and processing third conversion processing results corresponding to a second preset number of line data under each channel to obtain fourth conversion processing results corresponding to the second preset number of line data;

determining a second conversion processing matrix corresponding to the current feature matrix based on a fourth conversion processing result corresponding to each extracted second preset number of rows of data;

and determining a convolution result corresponding to the image to be processed based on the second conversion processing matrix, the first output transformation matrix and the second output transformation matrix corresponding to the current feature matrix.

In some embodiments, the preset size is 4 × 4, the first preset number is 2, and in terms of extracting a first preset number of data from the current feature matrix each time according to a first preset rule, the extracting module 140 is specifically configured to:

firstly, extracting data of a 2 nd row and a 3 rd row in a 1 st column in a current feature matrix, and then extracting data of a 1 st row and a 4 th row in the 1 st column; then extracting data of a 2 nd row and a 3 rd row in a 2 nd column in the current feature matrix, and then extracting data of a 1 st row and a 4 th row in the 2 nd column; then extracting data of a 2 nd row and a 3 rd row in a 3 rd column in the current feature matrix, and then extracting data of a 1 st row and a 4 th row in the 3 rd column; then, the data of the 2 nd row and the 3 rd row in the 4 th column in the current feature matrix are extracted, and then the data of the 1 st row and the 4 th row in the 4 th column are extracted.

In some embodiments, the convolution kernel size is 3 × 3, the weight matrix size is 4 × 4, and in terms of adjusting the position of each data in the weight matrix according to a second preset rule to obtain an adjusted weight matrix, the adjusting module 160 is specifically configured to:

arranging the data of the 2 nd row and the 2 nd column, the 3 rd row and the 2 nd column, the 2 nd row and the 3 rd column and the 3 rd row and the 3 rd column in the weight matrix in the adjusted weight matrix in the 1 st row and the 1 st column, the 1 st row and the 2 nd column, the 1 st row and the 3 rd column and the 1 st row and the 4 th column in sequence;

arranging the data of the 1 st row, the 2 nd column, the 4 th row, the 2 nd column, the 1 st row, the 3 rd column and the 4 th row, the 3 rd column in the weight matrix in the adjusted 2 nd row, the 1 st column, the 2 nd row, the 2 nd column, the 2 nd row, the 3 rd column and the 2 nd row, the 4 th column in the weight matrix in sequence;

arranging the data of the 2 nd row, the 1 st column, the 3 rd row, the 1 st column, the 2 nd row, the 4 th column and the 3 rd row, the 4 th column in the weight matrix in the adjusted weight matrix in the 3 rd row, the 1 st column, the 3 rd row, the 2 nd column, the 3 rd row, the 3 rd column and the 3 rd row, the 4 th column in sequence;

and sequentially arranging the data of the 1 st row, the 1 st column, the 4 th row, the 1 st column, the 1 st row, the 4 th column and the 4 th row, the 4 th column in the weight matrix in the 1 st row, the 1 st column, the 4 th row, the 2 nd column, the 4 th row, the 3 rd column and the 4 th row, the 4 th column in the adjusted weight matrix.

The modules in the convolution result obtaining device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 16. The computer device includes a processor, a memory, an Input/Output interface (I/O for short), and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing the characteristic diagram, the convolution kernel transformation matrix, the weight matrix, the adjusted weight matrix and the convolution result corresponding to the image to be processed under each channel of the image to be processed. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement the steps of the convolution result acquisition method described above.

Those skilled in the art will appreciate that the architecture shown in fig. 16 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In some embodiments, the present application further provides a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program being adapted to perform the steps of:

for a feature map under each channel, sliding a sliding window with a preset size on the feature map by a preset step length, and for a current feature matrix in the sliding window, extracting a first preset amount of data from the current feature matrix each time according to a first preset rule;

In some embodiments, in determining the convolution result corresponding to the image to be processed based on the first preset amount of data extracted each time in each channel and the adjusted weight matrix, the processor is specifically configured to implement the following steps when executing the computer program:

and determining a convolution result corresponding to the image to be processed based on the first conversion processing matrix corresponding to the current feature matrix under each channel and the adjusted weight matrix.

In some embodiments, in determining the convolution result corresponding to the image to be processed based on the first conversion processing matrix corresponding to the current feature matrix in each channel and the adjusted weight matrix, the processor, when executing the computer program, is specifically configured to implement the following steps:

for each channel, extracting a second preset number of line data from the first conversion processing matrix corresponding to the current feature matrix each time; determining a second conversion processing result corresponding to the second preset number of row data based on the second preset number of row data and the second input transformation matrix;

In some embodiments, in determining the convolution result corresponding to the image to be processed based on the second conversion processing result corresponding to the second preset number of line data in each channel and the adjusted weight matrix, the processor is specifically configured to implement the following steps when executing the computer program:

In some embodiments, the predetermined size is 4 × 4, the first predetermined number is 2, and the processor is specifically configured to implement the following steps when executing the computer program, in each case extracting a first predetermined number of data from the current feature matrix according to a first predetermined rule:

In some embodiments, the convolution kernel size is 3 × 3, the weight matrix size is 4 × 4, and in terms of adjusting the position of each data in the weight matrix according to a second preset rule to obtain an adjusted weight matrix, the processor is specifically configured to implement the following steps when executing the computer program:

arranging the data of the 2 nd row and the 2 nd column, the 3 rd row and the 2 nd column, the 2 nd row and the 3 rd column and the 3 rd row and the 3 rd column in the weight matrix in the adjusted weight matrix in the 1 st row and the 1 st column, the 1 st row and the 2 nd column, the 1 st row and the 3 rd column and the 1 st row and the 4 th column in sequence; arranging the data of the 1 st row, the 2 nd column, the 4 th row, the 2 nd column, the 1 st row, the 3 rd column and the 4 th row, the 3 rd column in the weight matrix in the adjusted 2 nd row, the 1 st column, the 2 nd row, the 2 nd column, the 2 nd row, the 3 rd column and the 2 nd row, the 4 th column in the weight matrix in sequence; arranging the data of the 2 nd row, the 1 st column, the 3 rd row, the 1 st column, the 2 nd row, the 4 th column and the 3 rd row, the 4 th column in the weight matrix in the adjusted weight matrix in the 3 rd row, the 1 st column, the 3 rd row, the 2 nd column, the 3 rd row, the 3 rd column and the 3 rd row, the 4 th column in sequence; and sequentially arranging the data of the 1 st row, the 1 st column, the 4 th row, the 1 st column, the 1 st row, the 4 th column and the 4 th row, the 4 th column in the weight matrix in the 1 st row, the 1 st column, the 4 th row, the 2 nd column, the 4 th row, the 3 rd column and the 4 th row, the 4 th column in the adjusted weight matrix.

In some embodiments, the present application further provides a computer-readable storage medium 900, on which a computer program 920 is stored, and when the computer program 920 is executed by a processor, the steps of the convolution result obtaining method are implemented, and an internal structure diagram of the method may be as shown in fig. 17.

In some embodiments, the present application further provides a computer program product comprising a computer program that when executed by a processor implements the steps of the convolution result acquisition method described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A convolution result acquisition method, comprising:

for a feature map under each channel, sliding a sliding window with a preset size on the feature map by a preset step length, and extracting a first preset amount of data from a current feature matrix in the sliding window each time according to a first preset rule; the preset size is 4 × 4, and the first preset number is 2;

determining a weight matrix based on a convolution kernel and a convolution kernel transformation matrix, and adjusting the position of each data in the weight matrix according to a second preset rule to obtain an adjusted weight matrix;

2. The method according to claim 1, wherein the determining the convolution result corresponding to the image to be processed based on the first preset amount of data extracted at each time in each channel and the adjusted weight matrix comprises:

for each channel, determining target data which is not subjected to first conversion processing from the first preset number of data extracted each time; according to a first input transformation matrix, performing first transformation processing on the target data to obtain a first transformation processing result corresponding to the target data;

determining a first conversion processing matrix corresponding to the current feature matrix according to a first conversion processing result corresponding to the data which has undergone the first conversion processing in the first preset number of extracted data each time and a first conversion processing result corresponding to the target data;

3. The method according to claim 2, wherein the determining the convolution result corresponding to the image to be processed based on the first transformation processing matrix corresponding to the current feature matrix in each channel and the adjusted weight matrix comprises:

for each channel, extracting a second preset number of row data from the first conversion processing matrix corresponding to the current feature matrix each time; determining a second conversion processing result corresponding to the second preset number of row data based on the second preset number of row data and a second input transformation matrix;

and determining a convolution result corresponding to the image to be processed based on a second conversion processing result corresponding to the second preset number of rows of data in each channel and the adjusted weight matrix.

4. The method according to claim 3, wherein the determining the convolution result corresponding to the image to be processed based on the second conversion processing result corresponding to the second preset number of line data in each channel and the adjusted weight matrix comprises:

performing dot product processing on the adjusted weight matrix and a second conversion processing result corresponding to the second preset number of line data for each channel to obtain a third conversion processing result corresponding to the second preset number of line data;

accumulating and processing third conversion processing results corresponding to the second preset number of line data under each channel to obtain fourth conversion processing results corresponding to the second preset number of line data;

determining a second conversion processing matrix corresponding to the current feature matrix based on a fourth conversion processing result corresponding to each extracted line data of the second preset number;

and determining a convolution result corresponding to the image to be processed based on a second conversion processing matrix, a first output transformation matrix and a second output transformation matrix corresponding to the current feature matrix.

5. The method according to claim 1, wherein said extracting a first preset amount of data from said current feature matrix at a time according to a first preset rule comprises:

firstly, extracting data of a 2 nd row and a 3 rd row in a 1 st column in the current feature matrix, and then extracting data of a 1 st row and a 4 th row in the 1 st column; then extracting data of a 2 nd row and a 3 rd row in a 2 nd column in the current feature matrix, and then extracting data of a 1 st row and a 4 th row in the 2 nd column; then extracting data of a 2 nd row and a 3 rd row in a 3 rd column in the current feature matrix, and then extracting data of a 1 st row and a 4 th row in the 3 rd column; then extracting data of the 2 nd row and the 3 rd row in the 4 th column in the current feature matrix, and then extracting data of the 1 st row and the 4 th row in the 4 th column.

6. The method of claim 1, wherein the convolution kernel size is 3 × 3, the weight matrix size is 4 × 4, and the adjusting the position of each data in the weight matrix according to the second preset rule to obtain an adjusted weight matrix comprises:

arranging the data of the 2 nd row and the 2 nd column, the 3 rd row and the 2 nd column, the 2 nd row and the 3 rd column and the 3 rd row and the 3 rd column in the weight matrix in the adjusted weight matrix in sequence of the 1 st row and the 1 st column, the 1 st row and the 2 nd column, the 1 st row and the 3 rd column and the 1 st row and the 4 th column;

arranging the data of the 1 st row, the 2 nd column, the 4 th row, the 2 nd column, the 1 st row, the 3 rd column and the 4 th row, the 3 rd column in the weight matrix in the adjusted row, the 2 nd row, the 1 st column, the 2 nd row, the 2 nd column, the 2 nd row, the 3 rd column and the 2 nd row, the 4 th column in the weight matrix in sequence;

arranging the data of the 2 nd row, the 1 st column, the 3 rd row, the 1 st column, the 2 nd row, the 4 th column and the 3 rd row, the 4 th column in the weight matrix in the adjusted weight matrix in sequence from the 3 rd row, the 1 st column, the 3 rd row, the 2 nd column, the 3 rd row, the 3 rd column and the 3 rd row, the 4 th column;

and sequentially arranging the data of the 1 st row and the 1 st column, the 4 th row and the 1 st column, the 1 st row and the 4 th column and the 4 th row and the 4 th column in the weight matrix in the 4 th row and the 1 st column, the 4 th row and the 2 nd column, the 4 th row and the 3 rd column and the 4 th row and the 4 th column in the adjusted weight matrix.

7. A convolution result acquisition apparatus, comprising:

the extraction module is used for sliding a sliding window with a preset size on the feature map by a preset step length according to the feature map under each channel, and extracting a first preset amount of data from a current feature matrix in the sliding window each time according to a first preset rule; the preset size is 4 × 4, and the first preset number is 2;

the adjusting module is used for determining a weight matrix based on a convolution kernel and a convolution kernel transformation matrix, and adjusting the position of each data in the weight matrix according to a second preset rule to obtain an adjusted weight matrix;

and the second acquisition module is used for determining a convolution result corresponding to the image to be processed based on the first preset amount of data extracted in each channel and the adjusted weight matrix.

8. The apparatus according to claim 7, wherein in the aspect of determining the convolution result corresponding to the image to be processed based on the first preset number of data extracted at each time in each channel and the adjusted weight matrix, the second obtaining module is specifically configured to:

determining a first conversion processing matrix corresponding to the current feature matrix according to a first conversion processing result corresponding to data subjected to the first conversion processing in the first preset amount of extracted data each time and a first conversion processing result corresponding to the target data;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.