CN111105015A

CN111105015A - General CNN reasoning accelerator, control method thereof and readable storage medium

Info

Publication number: CN111105015A
Application number: CN201911243224.4A
Authority: CN
Inventors: 徐天赐; 景璐
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-05-05

Abstract

The application discloses a general CNN reasoning accelerator, including: the preprocessing module is used for acquiring target data, convolution kernel data and a module time sequence; the convolution and activation module is used for performing convolution calculation and activation calculation on the target data; the pooling module is used for performing pooling calculation or data structure conversion on the target data; module timing includes the order in which target data enters the convolution and activation module and/or pooling module. In the application, the pooling module can perform pooling calculation or data structure conversion on the target data, output the target data with required size, and remove the limit of the calculation size of the target data; meanwhile, the sequence of the target data passing through the convolution and activation module or the pooling module can be flexibly adjusted by utilizing the module time sequence, so that the target data can enter or bypass a certain module for many times, and the limitation of the target data on the operator sequence is removed. The application also discloses a control method of the general CNN inference accelerator and a readable storage medium with the same beneficial effects.

Description

General CNN reasoning accelerator, control method thereof and readable storage medium

Technical Field

The invention relates to the field of CNN thrust acceleration, in particular to a general CNN inference accelerator, a control method thereof and a readable storage medium.

Background

In recent years, with the increase of computer computing power and the development of a CNN (Convolutional Neural network) structure, the recognition accuracy of the CNN network is greatly improved, and meanwhile, the depth of the CNN is continuously deepened, the network structure is more complex, and the computational load is also more and more large, so heterogeneous computing devices such as a GPU (Graphics processing unit), an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), and the like are required to accelerate the CNN inference computation.

The general CNN reasoning accelerator based on FPGA has two main realization methods: multi-layer example implementation versus single-layer example implementation: the multilayer example implementation maps each hidden layer reasoning calculation in the CNN model into hardware implementation in the FPGA, data flow is input from a first layer to output from a last layer, and the CNN reasoning calculation is completed by the hardware implementation in the FPGA once; the single-layer example implementation abstracts the reasoning calculation of a hidden layer in the CNN model into the hardware implementation in the FPGA, and the data flow circularly and repeatedly passes through the hardware implementation of the FPGA so as to complete the CNN reasoning calculation.

The implementation of the multilayer example is limited by the number of the CNN layers, and the more the number of the CNN layers, the higher the hardware resource pressure of the CNN inference model implemented in the FPGA is; the single-layer example implementation is limited by the operator sequence in the relatively fixed single-layer neural network calculation, when the operator sequences of different CNN models are different, the FPGA hardware implementation in the general CNN inference accelerator is difficult to meet the flexible operator sequence, and sometimes the hardware implementation needs to be changed again to meet the requirements of the CNN models.

Therefore, how to provide a solution to the above technical problems is a problem to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the present invention provides a generic CNN inference accelerator supporting flexible configuration of module timings with various sizes, a control method thereof, and a readable storage medium. The specific scheme is as follows:

a generic CNN inference accelerator comprising:

the preprocessing module is used for acquiring target data, convolution kernel data and a module time sequence;

the convolution and activation module is used for performing convolution calculation and activation calculation on the target data;

the pooling module is used for performing pooling calculation or data structure conversion on the target data;

the module timing comprises an order in which the target data enters the convolution and activation module and/or the pooling module.

Preferably, the pooling module comprises:

the general pooling module is used for performing pooling calculation or data structure conversion of preset sizes on the target data;

the full-size pooling module is used for performing full-size pooling calculation on the target data;

the module timing specifically includes an order of the target data through the convolution and activation module and/or the general pooling module and/or the full-size pooling module.

Preferably, the pooling calculation is specifically: a maximum pooling calculation or an average pooling calculation.

Preferably, the general CNN inference accelerator is an inference accelerator implemented by a single-layer instance.

Preferably, the generic CNN inference accelerator is an inference accelerator implemented by a multi-layer instance.

Preferably, the generic CNN inference accelerator further includes:

a data organization module for organizing a data stream of the target data;

and the storage access module is used for calculating a storage address corresponding to the target data.

Preferably, the general CNN inference accelerator is an ASIC or FPGA based inference accelerator.

Correspondingly, the invention also discloses a control method of the general CNN inference accelerator, which is applied to the general CNN inference accelerator and comprises the following steps:

acquiring target data, convolution kernel data and a module time sequence through a preprocessing module; the module timing sequence comprises the sequence of the target data entering the convolution and activation module and/or the pooling module;

and according to the module time sequence, performing convolution calculation and activation calculation on the target data through the convolution and activation module and/or performing pooling calculation or data structure conversion on the target data through the pooling module.

Accordingly, the present invention also discloses a readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the control method of the generic CNN inference accelerator as described above.

The application discloses a general CNN reasoning accelerator, including: the preprocessing module is used for acquiring target data, convolution kernel data and a module time sequence; the convolution and activation module is used for performing convolution calculation and activation calculation on the target data; the pooling module is used for performing pooling calculation or data structure conversion on the target data; the module timing comprises an order in which the target data enters the convolution and activation module and/or the pooling module. In the application, the pooling module can perform pooling calculation or data structure conversion on the target data, output the target data with required size, and remove the limit of the calculation size of the target data; meanwhile, the sequence of the target data passing through the convolution and activation module or the pooling module can be flexibly adjusted by utilizing the module time sequence, so that the target data can enter or bypass a certain module for many times, and the limitation of the target data on the operator sequence is removed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a structural distribution diagram of a general CNN inference accelerator according to an embodiment of the present invention;

FIG. 2 is a logic diagram of a general CNN inference accelerator according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating steps of a method for controlling a CNN inference accelerator according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The multi-layer implementation of the general CNN inference accelerator based on the FPGA is limited by the number of CNN layers, and the CNN inference model with more CNN layers has higher hardware resource pressure in the FPGA; whereas single-layer example implementations are limited to the operator order in single-layer neural network computations, which is relatively fixed. When the operator sequences of different CNN models are different, the FPGA hardware implementation in the general CNN inference accelerator is difficult to meet the flexible operator sequences, and sometimes the hardware implementation needs to be changed again to meet the requirements of the CNN models.

In the application, the pooling module can perform pooling calculation or data structure conversion on the target data, so that the target data with the required size is output, and the limit of the calculation size of the target data is removed; meanwhile, the sequence of the target data passing through the convolution and activation module or the pooling module can be flexibly adjusted by utilizing the module time sequence, so that the target data can enter or bypass a certain module for many times, and the limitation of the target data on the operator sequence is removed.

The embodiment of the invention discloses a general CNN reasoning accelerator, which is shown in figure 1 and comprises the following components:

the preprocessing module 1 is used for acquiring target data, convolution kernel data and a module time sequence;

the convolution and activation module 2 is used for performing convolution calculation and activation calculation on the target data;

the pooling module 3 is used for performing pooling calculation or data structure conversion on the target data;

the module timing comprises the order in which the target data enters the convolution and activation module 2 and/or the pooling module 3.

Wherein the pooling calculation includes, but is not limited to, a maximum pooling calculation or an average pooling calculation.

It is understood that the generic CNN inference accelerator can be either a single-layer instance implementation inference accelerator or a multi-layer instance implementation inference accelerator.

It is understood that, in this embodiment, the number of the convolution and activation modules 2 and the pooling modules 3 may be one or more, and whether the target data passes through the convolution and activation modules 2 or the pooling modules 3 and the order of the convolution and activation modules 2 or the pooling modules 3 is determined by the module timing.

It can be understood that, the module time sequence is determined by the external system design and then sent to the generic CNN inference accelerator, because the existence of the internal pooling module 3 in the generic CNN inference accelerator and the module time sequence determine the calculation order of the internal modules, when the present embodiment is applied to calculate the target data of various sizes, types and positions, no change is required to the internal hardware of the generic CNN inference accelerator, and the restrictions on the size and calculation order of the target data are removed.

Further, the pooling module 3 may include:

the general pooling module 31 is used for performing pooling calculation or data structure conversion of preset sizes on the target data;

a full-size pooling module 32 for performing full-size pooling calculation on the target data;

it will be appreciated that the module timing in this case specifically includes the order of the target data through the convolution and activation module 2 and/or the general pooling module 31 and/or the full-size pooling module 32.

The number of the convolution and activation module 2, the general pooling module 31 and the full-size pooling module 32 is not limited, and is set according to actual requirements.

Further, the generic CNN inference accelerator typically further comprises:

a data organization module 4 for organizing the data stream of the target data;

and the storage access module 5 is used for calculating a storage address corresponding to the target data.

Specifically, in the process of operating the general CNN inference accelerator according to the module time sequence to perform inference calculation on target data, firstly, target data, convolution kernel data and the module time sequence are obtained through the preprocessing module 1, then the target data enter the data organization module 4, whether the target data enter the convolution and activation module 2 and/or the general pooling module 31 is determined according to the module time sequence, and after all calculations of the convolution and activation module 2 and the general pooling module 31 are finished, the storage access module 5 receives hidden layer feature map data corresponding to the target data transmitted after the calculation is finished, and calculates the storage address of the hidden layer feature map data, so that the data organization module 4 writes the received hidden layer feature map data into the on-chip memory according to the storage address. At this time, the target data of the storage access module 5 may enter the full-size pooling module 32, enter the data organization module 4 after performing pooling calculation, or directly enter the data organization module 4 by bypassing the full-size pooling module 32. The data calculation path of the whole target data is carried out according to the module time sequence, and whether the target data passes through each convolution and activation module 2, each universal pooling module 31, the full-size pooling module 32 and the passing sequence can be realized by setting the module time sequence.

Specifically, referring to the logic diagram in fig. 2, pooling and convolution are both short for pooling calculation, convolution and activation calculation, such as pooling calculation shown as pooling 11 in fig. 2, and a single layer can be merged with a generic pooling link in hidden layer inference calculation to which the previous pooling calculation belongs after the generic pooling module 31 is located behind the convolution and activation module 2; as shown in pooling 12 in fig. 2, when the general pooling module 31 is located in front of the convolution and activation module 2, it can be used as an independent pooling hidden layer, after the previous inference calculation is completed, the data organization module 4 of the layer of inference calculation performs data organization dedicated to the independent pooling hidden layer, then the target data directly enters the general pooling module 31 through an independent pooling shortcut to perform pooling calculation, and finally enters a subsequent link to complete the cost layer calculation; when there is no pooling calculation requirement before and after the convolution and activation module 2, the target data passes through the general pooling module 31 without pooling calculation, and only data structure conversion, that is, data storage structure change is performed. Therefore, the flexible configuration of the calculation sequence of each module is realized through the module time sequence in the embodiment.

Similarly, in the full-size pooling module 32 generally located at the end of the CNN model, the hidden layer to which the previous convolution calculation belongs is generally merged, and the data stream of the target data receives the hidden layer feature map data corresponding to the target data transmitted after the calculation through the storage access link 5, and calculates the storage address thereof, so that the data organization module 4 writes the received data into the on-chip memory according to the storage address. At this time, the target data of the storage access module 5 may enter the full-size pooling module 32, enter the data organization module 4 after performing pooling calculation, or directly enter the data organization module 4 by bypassing the full-size pooling module 32.

Furthermore, the general CNN inference accelerator is an inference accelerator based on an ASIC (application specific integrated circuit) or FPGA (field programmable gate array), so that the universality of the accelerator is greatly improved, various popular CNN models such as ResNet50, GoogleLeNet, Squeezenet and VGG (virtual grid generator) can be flexibly supported, and the overall performance of the general CNN inference accelerator is improved.

Correspondingly, the present invention also discloses a control method for the general CNN inference accelerator, which is applied to the general CNN inference accelerator described above, and as shown in fig. 3, the method includes:

s1: acquiring target data, convolution kernel data and a module time sequence through a preprocessing module; the module timing sequence comprises the sequence of the target data entering the convolution and activation module and/or the pooling module;

s2: and according to the module time sequence, performing convolution calculation and activation calculation on the target data through the convolution and activation module and/or performing pooling calculation or data structure conversion on the target data through the pooling module.

The content of the general CNN inference accelerator in this embodiment may refer to the detailed description in the above embodiments, and is not described herein again.

The control method of the generic CNN inference accelerator in this embodiment has the same beneficial effects as the generic CNN inference accelerator in the above embodiments, and is not described herein again.

Correspondingly, the present invention also discloses a readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program implements the steps of the control method of the generic CNN inference accelerator, which specifically includes:

Specifically, the content of the control method related to the general CNN inference accelerator in this embodiment may refer to the detailed description in the above embodiments, and is not described herein again.

The readable storage medium in this embodiment has the same beneficial effects as the control method of the general CNN inference accelerator in the above embodiments, and is not described herein again.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The general CNN inference accelerator, the control method thereof, and the readable storage medium provided by the present invention are described in detail above, and a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A generic CNN inference accelerator, comprising:

2. The generic CNN inference accelerator of claim 1, wherein the pooling module comprises:

3. The generic CNN inference accelerator of claim 1, wherein the pooling calculation is specifically: a maximum pooling calculation or an average pooling calculation.

4. The generalized CNN inference accelerator of claim 3, wherein the generalized CNN inference accelerator is a single-layer instance-implemented inference accelerator.

5. The generic CNN inference accelerator of claim 3, wherein the generic CNN inference accelerator is a multi-layer instance-implemented inference accelerator.

6. The generic CNN inference accelerator of any of claims 1-5, further comprising:

a data organization module for organizing a data stream of the target data;

7. The generic CNN inference accelerator of claim 6, wherein the generic CNN inference accelerator is an ASIC or FPGA based inference accelerator.

8. A control method of a generic CNN inference accelerator, applied to the generic CNN inference accelerator of any one of claims 1 to 7, comprising:

9. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of controlling a generic CNN inference accelerator as claimed in claim 8.