CN112085167A

CN112085167A - Convolution processing method and device, multi-core DSP platform and readable storage medium

Info

Publication number: CN112085167A
Application number: CN202010951445.3A
Authority: CN
Inventors: 何涛; 施慧莉; 杨峰
Original assignee: Leihua Electronic Technology Research Institute Aviation Industry Corp of China
Current assignee: Leihua Electronic Technology Research Institute Aviation Industry Corp of China
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-12-15

Abstract

A convolution processing method comprising: dividing the image into a plurality of regions; each convolution processing core corresponds to one area, and the convolution processing is carried out on the part, located in the corresponding area, of each image layer of the image; and synthesizing the convolution processing results of all the layers. A convolution processing apparatus comprising: the image processing device comprises an area dividing module, a processing module and a display module, wherein the area dividing module is used for dividing an image into a plurality of areas; the convolution processing module enables each convolution processing core to correspond to one area, and each convolution processing core performs convolution processing on the part, located in the corresponding area, of each image layer of the image; and the comprehensive processing module is used for synthesizing the convolution processing results of all the layers. A multi-core DSP platform comprising: a plurality of convolution processing kernels; a memory storing a computer program configured to be capable of implementing the convolution processing method described above when executed by each convolution processing core. A computer-readable storage medium storing a computer program which, when executed, is capable of implementing the convolution processing method described above.

Description

Convolution processing method and device, multi-core DSP platform and readable storage medium

Technical Field

The application belongs to the technical field of convolutional neural network processing, and particularly relates to a convolutional processing method and device, a multi-core DSP platform and a computer readable storage medium.

Background

The Convolutional Neural Network (CNN) is widely applied to the field of computer vision such as image classification, target identification and positioning and the like, mainly comprises convolution processing, relates to a large number of convolution operations, can be used for carrying out parallel processing on the basis of data of different convolution kernels and different image layers, and is suitable for hardware with high parallelism, a GPU, a CPU and the like to carry out accelerated processing.

The principle of convolution processing is shown in fig. 1, and fig. 1 shows an example in which a single-channel image (1, W, H) is convolved with 3 convolution kernels to generate a 3-channel image (3, W, H), that is, a single-channel image with a map layer number of 1, a width of W, and a height of H is convolved with 3 convolution kernels to generate a 3-channel image with a map layer number of 3, a width of W, and a height of H.

The network architecture of the actual convolution processing is shown in fig. 2, and most of input images are multichannel images (C, W, H) with the number of image layers C, the width W and the height H, the number of convolution kernels is N, and after each convolution kernel and the multichannel image are convolved, multichannel summation processing is required to be performed to generate the multichannel images (N, W, H) with the number of image layers N, the width W and the height H.

At present, in order to improve the speed of convolution processing, the GPU mostly adopts an Nvidia cudnn library for acceleration, the CPU mostly adopts a multithreading parallel and MPI mode for multitask acceleration, the complex calculation optimization of convolution mostly converts multichannel image convolution into matrix operation, according to the design idea of the GPU and the CPU platform, the weight data and input of convolution processing should be stored in a fast access memory, but the fast access memory of the multi-core DSP platform is limited compared with the GPU and the CPU platform, and in most cases, the weight data and input of convolution processing are smaller than those of convolution processing, so that the acceleration of convolution processing on the multi-core DSP platform cannot be realized according to the design idea of the GPU and the CPU platform, and the speed of convolution processing on the multi-core DSP platform is greatly limited.

The present application has been made in view of the above-mentioned technical drawbacks.

It should be noted that the above background disclosure is only for the purpose of assisting understanding of the inventive concept and technical solutions of the present invention, and does not necessarily belong to the prior art of the present patent application, and the above background disclosure should not be used for evaluating the novelty and inventive step of the present application without explicit evidence to suggest that the above content is already disclosed at the filing date of the present application.

Disclosure of Invention

The present application is directed to a convolution processing method and apparatus, a multi-core DSP platform, and a computer readable storage medium, so as to overcome or alleviate at least one of the technical disadvantages of the known existing technology.

The technical scheme of the application is as follows:

one aspect provides a convolution processing method, including:

dividing the image into a plurality of regions;

each convolution processing core corresponds to one area, and the convolution processing is carried out on the part, located in the corresponding area, of each image layer of the image;

and synthesizing the convolution processing results of all the layers.

According to at least one embodiment of the present application, in the convolution processing method, the synthesizing the convolution processing result of each layer specifically includes:

superposing each convolution processing core to check the convolution processing result of each layer in the corresponding area;

and combining convolution processing results of the areas.

According to at least one embodiment of the present application, in the convolution processing method, each convolution processing core performs convolution processing on a portion of each layer located in a corresponding area, specifically:

each convolution processing core performs convolution processing on the part of each layer, which is positioned in the corresponding area, based on the plurality of convolution cores;

the step of superposing each convolution processing core to check the convolution processing result of each layer in the corresponding area specifically comprises the following steps:

corresponding to each convolution kernel, superposing each convolution processing kernel to check the convolution processing result of each layer in the corresponding area;

merging convolution processing results of the regions, specifically:

convolution processing results for the respective regions are combined corresponding to the respective convolution kernels.

According to at least one embodiment of the present application, the convolution processing method further includes:

and carrying out edge PADDING processing on the image.

a convolution kernel is introduced.

According to at least one embodiment of the present application, in the convolution processing method, the importing a convolution kernel specifically includes:

when the weight data quantity of the convolution kernel is less than the shared cache, the convolution kernel is led into the shared cache;

and when the weight data size of the convolution kernel is larger than the shared cache, the convolution kernel is led into the shared cache step by step in operation.

Another aspect provides a convolution processing apparatus including:

the image processing device comprises an area dividing module, a processing module and a display module, wherein the area dividing module is used for dividing an image into a plurality of areas;

the convolution processing module enables each convolution processing core to correspond to one area, and each convolution processing core performs convolution processing on the part, located in the corresponding area, of each image layer of the image;

and the comprehensive processing module is used for synthesizing the convolution processing results of all the layers.

In yet another aspect, a multi-core DSP platform comprises:

a plurality of convolution processing kernels;

a memory storing a computer program configured to be capable of implementing any of the above-described convolution processing methods when executed by the respective convolution processing cores.

Yet another aspect is a computer-readable storage medium storing a computer program that, when executed, is capable of implementing any of the above-described convolution processing methods.

Drawings

FIG. 1 is a schematic diagram of convolution processing a single-channel image;

FIG. 2 is a schematic diagram of a convolution processing multi-channel image;

fig. 3 is a schematic diagram of a convolution processing method according to an embodiment of the present application.

For the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; further, the drawings are for illustrative purposes, and terms describing positional relationships are limited to illustrative illustrations only and are not to be construed as limiting the patent.

Detailed Description

In order to make the technical solutions and advantages of the present application clearer, the technical solutions of the present application will be further clearly and completely described in the following detailed description with reference to the accompanying drawings, and it should be understood that the specific embodiments described herein are only some of the embodiments of the present application, and are only used for explaining the present application, but not limiting the present application. It should be noted that, for convenience of description, only the parts related to the present application are shown in the drawings, other related parts may refer to general designs, and the embodiments and technical features in the embodiments in the present application may be combined with each other to obtain a new embodiment without conflict.

In addition, unless otherwise defined, technical or scientific terms used in the description of the present application shall have the ordinary meaning as understood by one of ordinary skill in the art to which the present application belongs. The terms "upper", "lower", "left", "right", "center", "vertical", "horizontal", "inner", "outer", and the like used in the description of the present application, which indicate orientations, are used only to indicate relative directions or positional relationships, and do not imply that the devices or elements must have a specific orientation, be constructed and operated in a specific orientation, and when the absolute position of the object to be described is changed, the relative positional relationships may be changed accordingly, and thus, should not be construed as limiting the present application. The use of "first," "second," "third," and the like in the description of the present application is for descriptive purposes only to distinguish between different components and is not to be construed as indicating or implying relative importance. The use of the terms "a," "an," or "the" and similar referents in the context of describing the application is not to be construed as an absolute limitation on the number, but rather as the presence of at least one. The word "comprising" or "comprises", and the like, when used in this description, is intended to specify the presence of stated elements or items, but not the exclusion of other elements or items.

Further, it is noted that, unless expressly stated or limited otherwise, the terms "mounted," "connected," and the like are used in the description of the invention in a generic sense, e.g., connected as either a fixed connection or a removable connection or integrally connected; can be mechanically or electrically connected; they may be directly connected or indirectly connected through an intermediate medium, or they may be connected through the inside of two elements, and those skilled in the art can understand their specific meaning in this application according to the specific situation.

The present application is described in further detail below with reference to fig. 1 to 3.

One aspect provides a convolution processing method, including:

dividing the image into a plurality of regions;

and synthesizing the convolution processing results of all the layers.

In some optional embodiments, in the convolution processing method, the synthesizing of the convolution processing result of each layer specifically includes:

and combining convolution processing results of the areas.

In some optional embodiments, in the convolution processing method, each convolution processing core performs convolution processing on a portion of each layer located in a corresponding area, specifically:

merging convolution processing results of the regions, specifically:

In some optional embodiments, the convolution processing method further includes:

performing edge PADDING processing on the image, wherein a specific formula is as follows:

y[m,n]＝x[i,j],m＝i+pad,n＝j+pad

i∈[0,W-1],j∈[0,H-1],m∈[0,2*pad+W-1],n∈[0,2*pad+H-1]。

a convolution kernel is introduced.

In some optional embodiments, in the above convolution processing method, the introducing a convolution kernel specifically includes:

For the convolution processing method disclosed in the above embodiment, it can be understood by those skilled in the art that the convolution processing method can be applied to a multi-core DSP, and each processing core on the multi-core DSP is used as a convolution processing core, and by splitting each layer convolution into a multi-region single convolution, each convolution processing core only needs to traverse a corresponding region, and weight data can be moved step by step, so that all operations of the convolution are performed on a fast access memory, and the number of direct access and DMA access times of an external memory is reduced.

In order to make it easier for those skilled in the art to understand and implement the convolution processing method of the multi-core DSP platform disclosed in the present application, the present application provides the following more specific embodiments:

importing 4 convolution kernels into a multi-kernel DSP platform, and importing the convolution kernels into a multi-kernel DSP shared cache when the weight data volume of the convolution kernels is less than that of the multi-kernel DSP shared cache; when the weight data volume of the convolution kernel is larger than the multi-core DSP shared cache, the convolution kernel is led into the multi-core DSP shared cache step by step during operation;

importing an image with the size of (3, W, H) into a multi-core DSP platform, namely, the image layer of the image is 3, the width is W and the height is H, and carrying out edge PADDING processing on the image;

the image is divided into a plurality of regions, the number of convolution processing kernels is 4, in this embodiment, the number of the multi-core DSP platform processing kernels is 4, that is, 4 convolution processing kernels are provided, and the multi-core DSP platform processing kernels comprise a convolution processing kernel 0, a convolution processing kernel 1, a convolution processing kernel 2 and a convolution processing kernel 3, that is, the image can be divided into 4 regions of A, B, C, D, the size of each region can be W H/4, and the parts of the 3 layers corresponding to the region A are A0, A1 and A2; the parts corresponding to the region B are B0, B1, B2; the parts corresponding to region C are C0, C1, C2; the parts corresponding to region D are D0, D1, D2;

each convolution processing kernel corresponds to an area, and it should be understood that the number of areas for dividing the image should not exceed the number of convolution processing kernels, so as to ensure that each area can have one convolution processing kernel corresponding to it, in this implementation, specifically, convolution processing kernel 0 corresponds to area a, convolution processing kernel 1 corresponds to area B, convolution processing kernel 2 corresponds to area C, and convolution processing kernel 3 corresponds to area D;

each convolution processing core performs convolution processing on the part of each layer located in the corresponding area based on a plurality of convolution checks, that is, 4 convolution processing cores perform convolution processing on the part of 3 layers located in the corresponding area based on 4 convolution checks, specifically, convolution processing core 0 performs convolution processing on parts a0, a1 and a2 of 3 layers located in the corresponding area a based on 4 convolution checks, convolution processing core 1 performs convolution processing on parts B0, B1 and B2 of 3 layers located in the corresponding area B based on 4 convolution checks, convolution processing core 2 performs convolution processing on parts C0, C1 and C2 of 3 layers located in the corresponding area C based on 4 convolution checks, and convolution processing core 3 performs convolution processing on parts D0, D1 and D2 of 3 layers located in the corresponding area D based on 4 convolution checks;

the convolution processing results of the respective image layers in the corresponding areas are superposed by each convolution processing core corresponding to the respective convolution kernel, that is, the convolution processing results of the respective image layers in the corresponding areas are superposed by 4 convolution processing cores corresponding to 4 convolution kernels, respectively, and a more specific expression may be assumed that the 4 convolution kernels are convolution kernel 0, convolution kernel 1, convolution kernel 2, and convolution kernel 3, where,

corresponding to convolution kernel 0, superimposed convolution processing kernel 0 is based on the convolution processing result of convolution kernel 0 on portions a0, a1, a2 of 3 layers located in corresponding regions a

The superimposition convolution processing kernel 1 is based on the convolution processing result of the convolution kernel 0 on the parts B0, B1 and B2 of the 3 layers in the corresponding region B

The superimposition convolution processing kernel 2 is based on the convolution processing result of the convolution kernel 0 to the parts C0, C1 and C2 of the 3 layers in the corresponding region C

The superimposition convolution processing kernel 3 is based on the convolution processing result of the convolution kernel 0 on the parts D0, D1, D2 of the 3 layers located in the corresponding region D

Corresponding to convolution kernel 1, superimposed convolution processing kernel 0 is based on the convolution processing result of convolution kernel 1 for the portions a0, a1, a2 of the 3 layers located in the corresponding region a

The superimposition convolution processing kernel 1 is based on the convolution processing result of the convolution kernel 1 on the parts B0, B1 and B2 of the 3 layers in the corresponding region B

The superposition convolution processing kernel 2 is based on the convolution processing result of the convolution kernel 1 to the parts C0, C1 and C2 of the 3 layers positioned in the corresponding area C

The superimposition convolution processing kernel 3 is based on the convolution processing result of the convolution kernel 1 on the parts D0, D1, D2 of the 3 layers located in the corresponding region D

Corresponding to the convolution kernel 2, the superimposed convolution processing kernel 0 is based on the convolution processing result of the convolution kernel 2 for the portions a0, a1, a2 of the 3 layers located in the corresponding area a

The superimposition convolution processing kernel 1 is based on the convolution processing result of the convolution kernel 2 on the parts B0, B1, and B2 of the 3 layers located in the corresponding region B

The superimposition convolution processing kernel 2 is based on the convolution processing result of the convolution kernel 2 on the parts C0, C1 and C2 of the 3 layers in the corresponding region C

The superimposition convolution processing kernel 3 is based on the convolution processing result of the convolution kernel 2 on the parts D0, D1, D2 of the 3 layers located in the corresponding region D

Corresponding to the convolution kernel 3, the superimposed convolution processing kernel 0 is based on the convolution processing result of the convolution kernel 3 for the portions a0, a1, a2 of the 3 layers located in the corresponding region a

The superimposition convolution processing kernel 1 is based on the convolution processing result of the convolution kernel 3 for the parts B0, B1, B2 of the 3 layers located in the corresponding region B

The superimposition convolution processing kernel 2 is based on the convolution processing result of the convolution kernel 3 on the parts C0, C1 and C2 of the 3 layers located in the corresponding region C

The superimposition convolution processing kernel 3 is based on the convolution processing result of the convolution kernel 3 for the portions D0, D1, D2 of the 3 layers located in the corresponding region D

Combining convolution processing results of the respective regions, i.e., corresponding to 4 convolution kernels, corresponding to the respective convolution kernels, respectively combining convolution processing results of the respective regions, i.e., each convolution kernel of each region and a superposition of convolution processing results of 3 layers located in the region, wherein,

the result of the convolution process corresponding to convolution kernel 0, A region is

The convolution processing result of the B region is

The result of the convolution processing of the C region is

The convolution processing result of the D region is

Combining the convolution processing results of the respective regions, i.e. combining

Thereby obtaining a convolution processing result of the convolution kernel 0;

the result of the convolution process corresponding to the convolution kernel 1, A region is

The convolution processing result of the B region is

The result of the convolution processing of the C region is

The convolution processing result of the D region is

Thereby obtaining a convolution processing result of the convolution kernel 1;

the result of the convolution process corresponding to the convolution kernel 2, area A, is

The convolution processing result of the B region is

The result of the convolution processing of the C region is

The convolution processing result of the D region is

Thereby obtaining the convolution processing result of the convolution kernel 2;

the result of the convolution processing corresponding to the convolution kernel 3, A region, is

The convolution processing result of the B region is

The result of the convolution processing of the C region is

The convolution processing result of the D region is

Thereby obtaining a convolution processing result of the convolution kernel 3.

another aspect provides a convolution processing apparatus including:

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

For the apparatus disclosed in the above embodiment, since it corresponds to the method disclosed in the above embodiment, the description is simple, and specific relevant points can be described with reference to the method part, and the technical effect can also refer to the technical effect of the method part, which is not described herein again.

Furthermore, those skilled in the art should also realize that the various modules, units, and units of the apparatus disclosed in the embodiments of the present application can be implemented by electronic hardware, computer software, or a combination of both, and that for the sake of clarity only explaining the interchangeability of hardware and software, the functions described herein are generally implemented by hardware or software, and that depending on the particular application and design constraints imposed on the technical solution, those skilled in the art can choose different ways to implement the described functions for each particular application and its practical constraints, but such implementation should not be considered as beyond the scope of the present application.

Yet another aspect provides a multi-core DSP platform comprising:

a plurality of convolution processing kernels;

In some alternative embodiments, the memory may include various forms of computer-readable storage media, such as volatile memory, which may be random access memory, RAM, and/or cache memory, and/or non-volatile memory, which may be read-only memory, ROM, a hard disk, flash memory, and so forth. The memory may store thereon a computer program that is executed by the processor to implement the functions of the embodiments of the present application and/or other desired functions, and may store various application programs and various data.

It should be noted that, for clarity and conciseness of representation, not all the constituent units of the multi-core DSP platform are given in the foregoing embodiments, and in order to implement the necessary functions of the multi-core DSP platform, a person skilled in the art may provide and set other constituent units not shown according to specific needs.

For the multi-core DSP platform disclosed in the foregoing embodiments, since the convolution processing core can implement any of the foregoing methods when executing the computer program stored in the memory thereof, the technical effects of the foregoing methods can be referred to accordingly, and no further description is given here.

In some alternative embodiments, the computer-readable storage medium may include a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a random access memory RAM, a read only memory ROM, an erasable programmable read only memory EPROM, a portable compact disc read only memory CD-ROM, a flash memory, or any combination of the above, as well as other suitable storage media.

Having thus described the present application in connection with the preferred embodiments illustrated in the accompanying drawings, it will be understood by those skilled in the art that the scope of the present application is not limited to those specific embodiments, and that equivalent modifications or substitutions of related technical features may be made by those skilled in the art without departing from the principle of the present application, and those modifications or substitutions will fall within the scope of the present application.

Claims

1. A convolution processing method, comprising:

dividing the image into a plurality of regions;

and synthesizing the convolution processing results of all the layers.

2. The convolution processing method according to claim 1,

the convolution processing result of each layer is specifically:

and combining convolution processing results of the areas.

3. The convolution processing method according to claim 2,

each convolution processing core performs convolution processing on the part of each layer, which is located in the corresponding area, specifically:

merging convolution processing results of the regions, specifically:

4. The convolution processing method according to claim 1,

further comprising:

and carrying out edge PADDING processing on the image.

5. The convolution processing method according to claim 1,

further comprising:

a convolution kernel is introduced.

6. The convolution processing method according to claim 5,

the importing convolution kernel specifically includes:

and when the weight data size of the convolution kernel is larger than the shared cache, the convolution kernel is led into the shared cache step by step.

7. A convolution processing apparatus, comprising:

8. A multi-core DSP platform, comprising:

a plurality of convolution processing kernels;

a memory storing a computer program configured to enable the convolution processing method of any one of claims 1 to 6 when executed by the respective convolution processing cores.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed, is capable of implementing the convolution processing method of any one of claims 1 to 6.