CN110414663B

CN110414663B - Convolution implementation method of neural network and related product

Info

Publication number: CN110414663B
Application number: CN201810402644.1A
Authority: CN
Inventors: 曹庆新; 黎立煌; 李炜
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2022-03-25
Anticipated expiration: 2038-04-28
Also published as: CN110414663A

Abstract

The invention provides a convolution implementation method of a neural network and a related product, wherein the method comprises the following steps: acquiring input data and weight data; cutting the weight data into a plurality of data blocks with the core size of [ n ] [ m ], and fitting each data block into X convolution cores with the core size of [ A ] [ B ]; and performing convolution operation on each convolution kernel and the input data to obtain an intermediate result, and processing all the intermediate results to obtain a convolution result. The technical scheme provided by the application has the advantage of high calculation speed.

Description

Convolution implementation method of neural network and related product

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a convolution implementation method of a neural network and a related product.

Background

With the increasing maturity of artificial intelligence technology, the application scenes and product requirements of various industries are increased explosively, the updating iteration speed of the artificial intelligence algorithm is very high, a hardware computing platform needs to be flexible enough to meet the flexible and changeable algorithm requirements, and the research and development period needs to be as short as possible to meet the product competition pressure; for an algorithm model of artificial intelligent computation, particularly a neural network model, convolution operation belongs to basic operation of the neural network model, KERNEL SIZEs (English: Kernel SIZE) used in different neural network models in the convolution operation are not fixed, and any SIZE can be applied possibly, and an existing hardware platform cannot support operation and change of all KERNEL SIZEs, so that the convolution operation speed is influenced, and further the user experience degree is influenced.

Disclosure of Invention

The embodiment of the application provides convolution realization of a neural network and related products, and convolution operation of standard kernel sizes is realized by fitting different kernel sizes into the standard kernel sizes, so that convolution operation speed and user experience are improved.

In a first aspect, an embodiment of the present application provides a convolution implementation method for a neural network, where the method includes the following steps:

acquiring input data and weight data;

cutting the weight data into a plurality of data blocks with the core size of [ n ] [ m ], and fitting each data block into X convolution cores with the core size of [ A ] [ B ]; performing convolution operation on each convolution kernel and the input data to obtain an intermediate result, and processing all the intermediate results to obtain a convolution result;

wherein A is less than or equal to n, B is less than or equal to m, and A, B, m, n and X are integers which are greater than or equal to 1.

Optionally, fitting each data block to X convolution kernels of size [ a ] [ B ] includes:

and if each data block can not be cut into convolution kernels with integer kernel size of [ A ] [ B ], filling zero to the edge of each data block, wherein the kernel size of the data block after zero filling is [ n + B ] [ m + c ], and then cutting the data block with the kernel size of [ n + B ] [ m + c ] into X convolution kernels with the kernel size of [ A ] [ B ], wherein B and c are integers larger than or equal to 0.

Optionally, fitting each data block ] to X convolution kernels of size [ a ] [ B ] includes:

and if each data block can not be cut into convolution kernels with integral kernel size [ A ] [ B ], cutting each data block into E convolution kernels with the kernel size equal to [ A ] [ B ] and F convolution kernels with the kernel size smaller than [ A ] [ B ], filling zero to the edges of the F convolution kernels, and filling zero to the kernel sizes of the F convolution kernels after zero filling to [ A ] [ B ], wherein E + F ═ X, and E and F are integers larger than or equal to zero.

Optionally, the core size [ a ] [ B ] specifically includes: nuclear size [2] [2], nuclear size [3] [3], or nuclear size [5] [5 ].

In a second aspect, a neural network chip is provided;

the neural network chip is used for acquiring input data and weight data;

the neural network chip is used for cutting the weight data into a plurality of data blocks with the core size of [ n ] [ m ], and fitting each data block into X convolution cores with the core size of [ A ] [ B ]; performing convolution operation on each convolution kernel and the input data to obtain an intermediate result, and processing all the intermediate results to obtain a convolution result; (ii) a

A is less than or equal to n, B is less than or equal to m, and A, B, m, n and X are integers which are greater than or equal to 1.

Optionally, the neural network chip is further configured to, when each data block cannot be cut into integer convolution kernels with kernel sizes [ a ] [ B ], zero fill the edge of each data block, the kernel size of the data block after zero fill is [ n + B ] [ m + c ], then cut the data block with kernel size [ n + B ] [ m + c ] into X convolution kernels with kernel size [ a ] [ B ], where B and c are both integers greater than or equal to 0.

Optionally, the neural network chip is further configured to, when each data block cannot be cut into an integer number of convolution kernels with a kernel size of [ a ] [ B ], cut each data block into E convolution kernels with a kernel size equal to [ a ] [ B ] and F convolution kernels with a kernel size smaller than [ a ] [ B ], zero-fill edges of the F convolution kernels, and the kernel size of the F convolution kernels after zero-fill is [ a ] [ B ], where E + F ═ X, and E and F are both integers greater than or equal to zero.

In a third aspect, an electronic device is provided, which may include the neural network chip of the second aspect.

In a fourth aspect, a computer-readable storage medium is provided, storing a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method as provided in the first aspect.

In a fifth aspect, there is provided a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform the method provided by the first aspect.

The embodiment of the application has the following beneficial effects:

it can be seen that no matter how many SIZEs n and m of the KERNEL SIZE are, the technical scheme fits the KERNEL SIZE [ n ] [ m ] into X KERNEL SIZEs KERNEL SIZE [ A ] [ B ] set, so that during subsequent convolution operation, the KERNEL SIZE [ A ] [ B ] is always used as a basic unit for convolution operation, hardware only needs convolution operation matched with the KERNEL SIZE [ A ] [ B ], and the convolution operation speed and the user experience degree are further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic device.

Fig. 2 is a flow chart diagram of a convolution implementation method of a neural network.

Fig. 3a is a schematic diagram of a cut of a data block provided in the present application.

Fig. 3b is a schematic diagram of a data block cutting provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "comprising" and "having," and any variations thereof, in the description and claims of this application and the drawings described herein are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic device in the present application may include: the electronic Devices include, by way of example and not limitation, the electronic Devices described above, and for convenience of description, the electronic Devices described above are referred to as User Equipment (UE), a terminal or an electronic device in the following embodiments. Of course, in practical applications, the user equipment is not limited to the above presentation form, and may also include: intelligent vehicle-mounted terminal, computer equipment and the like.

The electronic device has a structure as shown in fig. 1, and specifically, the electronic device may include: the device comprises a processor 101, a memory 102, and a neural network chip 103, wherein the processor 101 is connected with the memory 102 and the neural network chip 103, and particularly in an alternative embodiment, the neural network chip 103 may be integrated in the processor 101. The memory 102 may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), and the like. The technical solution of the present invention is not limited to whether the neural network chip 103 is separately provided or integrated in the processor 101.

In one embodiment, the neural network chip 103 is used for obtaining input data and weight data.

The neural network chip 103 is further used for cutting the weight data into a plurality of data blocks with the core size of [ n ] [ m ], and fitting each data block into X convolution kernels with the core size of [ A ] [ B ]; and performing convolution operation on each convolution kernel and the input data to obtain an intermediate result, and processing all the intermediate results to obtain a convolution result.

Optionally, the neural network chip 103 is specifically configured to, if each data block cannot be cut into integer convolution kernels with kernel sizes [ a ] [ B ], zero padding an edge of each data block, where the kernel size of the data block after zero padding is [ n + B ] [ m + c ], and then cut the data block with the kernel size of [ n + B ] [ m + c ] into X convolution kernels with kernel sizes [ a ] [ B ], where B and c are both integers greater than or equal to 0.

Optionally, the neural network chip 103 is specifically configured to, if each data block cannot be cut into an integer number of convolution kernels with a kernel size of [ a ] [ B ], cut each data block into E convolution kernels with a kernel size equal to [ a ] [ B ] and F convolution kernels with a kernel size smaller than [ a ] [ B ], zero-fill edges of the F convolution kernels, and the kernel size of the F convolution kernels after zero-fill is [ a ] [ B ], where E + F ═ X, and E and F are both integers greater than or equal to zero.

Referring to fig. 2, fig. 2 provides a convolution implementation method of a neural network, where the method is implemented by an electronic device, and a specific structure of the electronic device may be the electronic device shown in fig. 1, where the method is shown in fig. 2, and includes the following steps:

step S201, the neural network chip 103 obtains input data [ CI ] [ H ] [ W ] and weight data [ CP ] [ CO ] [ n ] [ m ];

wherein, CI is a depth value of the input data, H is a height value of the input data, W is a width value of the input data, CP is a magnitude value of the weight data, CO is a depth value of the weight data, [ n ] [ m ] is a convolution KERNEL SIZE of the weight data kernelsize, CI ═ CP, and CI, H, W, CP, CO, n, and m are integers greater than or equal to 1.

S202, the neural network chip 103 cuts the weight data into a plurality of data blocks with the size of [ n ] [ m ], and each data block is fitted into X convolution kernels with the size of [ A ] [ B ];

step S203, performing convolution operation on each convolution kernel and the input data to obtain an intermediate result, and processing all the intermediate results to obtain a convolution result.

The technical scheme of the application has the advantages that no matter how many the SIZEs n and m of the KERNEL SIZE of the data block are, the neural network chip 103 fits the KERNEL SIZE [ n ] [ m ] into X set KERNEL SIZEs KERNEL SIZE [ A ] [ B ], so that during subsequent convolution operation, the KERNEL SIZE [ A ] [ B ] is always used as a basic unit for convolution operation. The KERNEL SIZE [ A ] [ B ] with the set SIZE can be better matched with corresponding hardware calculation, so that one large-SIZE convolution KERNEL is split into a plurality of convolution KERNELs with the set SIZE under the condition that the total calculation amount is not changed, the hardware only needs to be matched with the convolution KERNELs with the set SIZE, and the convolution operation speed and the user experience degree are improved.

Optionally, the fitting of each data block to X convolution kernels with kernel size [ a ] [ B ] may specifically include:

To illustrate a practical example, assuming that the KERNELs SIZE [ n ] [ m ] of the data block is KERNELs SIZE [5] [5], the KERNELs SIZE [ a ] [ B ] of the convolution KERNEL is KERNELs SIZE [3] [3], adding the element value zero to the rows and columns of the KERNELs SIZE [5] [5] of the data block to obtain KERNELs SIZE [5+1] [5+1], and then cutting the KERNELs SIZE [5+1] [5+1] into 4 KERNELs SIZE [3] [3], which is schematically shown in fig. 3a, each dashed box represents 1 KERNELs SIZE [3] [3], so that a data block with a core SIZE of 5 × 5 is convolved into 4 cores with a core SIZE of 3 × 3. Similarly, data blocks with any kernel size can be fitted into X convolution kernels with the kernel size [ A ] [ B ], so that the hardware structure of the convolution kernels with the kernel size [ A ] [ B ] can be compatible with convolution kernels with all kernel sizes, and the calculation speed and efficiency of the hardware structure are improved.

To illustrate in a practical example, assuming that KERNEL SIZE [ n ] m of a data block is KERNEL SIZE [5] [5] and KERNEL SIZE [ A ] [ B ] of a convolution KERNEL is KERNEL SIZE [3] [3], then the data block KERNEL SIZE [5] [5] is cut into 4 KERNEL SIZEs, KERNEL SIZE [3] [3], KERNEL SIZE [3] [2], KERNEL SIZE [2] [3] and KERNEL SIZE [2] [2], and then the KERNEL SIZE [3] [2], the cut is rendered as shown in FIG. 3B, each dashed box represents 1 KERNEL SIZE [2] [3] and thus, a KERNEL SIZE of 5 is a SIZE, i.e., the data block is convolved into 4 KERNEL SIZE [3] [3] of the same SIZE as the KERNEL SIZE of the data block, and the KERNEL SIZE of the data block is 4 KeRNEL SIZE KERNEL SIZE, therefore, the hardware structure of the convolution kernel with the adaptive kernel size of [ A ] [ B ] can be compatible with the convolution kernels with all kernel sizes, and the calculation speed and efficiency of the hardware structure are improved.

The SIZE of the KERNEL SIZE is merely for example, and in practical applications, the KERNEL SIZE [3] or KERNEL SIZE [5] [5] may be other SIZEs, such as KERNEL SIZE [5] [7], KERNEL SIZE [6] [6], KERNEL SIZE [9] [9], and the like, and the two A, B values in the KERNEL [ A ] [ B ] may be different, but the present application is not limited to the A, B value being necessarily the same.

Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the convolution implementation methods of a neural network as described in the above method embodiments.

Embodiments of the present application also provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, the computer program being operable to cause a computer to perform part or all of the steps of any one of the convolution implementation methods of a neural network as set forth in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative

In addition, the processors and chips in the embodiments of the present application may be integrated into one processing unit, may exist alone physically, or may be integrated into one unit by two or more pieces of hardware. The computer-readable storage medium or the computer-readable program may be stored in a computer-readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A convolution implementation method of a neural network is characterized by comprising the following steps:

acquiring input data and weight data;

wherein A is less than or equal to n, B is less than or equal to m, and A, B, m, n and X are integers which are greater than or equal to 1;

fitting each data block to X convolution kernels of size [ A ] [ B ] includes:

if each data block can not be cut into convolution kernels with integer kernel size of [ A ] [ B ], zero padding is carried out on the edge of each data block, the kernel size of the data block after zero padding is [ n + B ] [ m + c ], then the data block with the kernel size of [ n + B ] [ m + c ] is cut into X convolution kernels with the kernel size of [ A ] [ B ], wherein B and c are integers larger than or equal to 0; alternatively, the first and second electrodes may be,

2. The method of claim 1,

the core size [ A ] [ B ] specifically includes: nuclear size [2] [2], nuclear size [3] [3], or nuclear size [5] [5 ].

3. A neural network chip, characterized in that,

the neural network chip is used for acquiring input data and weight data;

the neural network chip is used for cutting the weight data into a plurality of data blocks with the core size of [ n ] [ m ], and fitting each data block into X convolution cores with the core size of [ A ] [ B ]; performing convolution operation on each convolution kernel and the input data to obtain an intermediate result, and processing all the intermediate results to obtain a convolution result;

the neural network chip is also used for filling zero to the edge of each data block when each data block can not be cut into integral convolution kernels with kernel sizes of [ A ] [ B ], wherein the kernel size of the data block after zero filling is [ n + B ] [ m + c ], then cutting the data block with the kernel size of [ n + B ] [ m + c ] into X convolution kernels with the kernel size of [ A ] [ B ], and both B and c are integers larger than or equal to 0;

the neural network chip is further used for cutting each data block into E convolution kernels with the kernel size equal to [ A ] [ B ] and F convolution kernels with the kernel size smaller than [ A ] [ B ] when each data block cannot be cut into integral convolution kernels with the kernel sizes [ A ] [ B ], filling zero to the edges of the F convolution kernels, enabling the kernel sizes of the F convolution kernels after zero filling to be [ A ] [ B ], wherein E + F is X, and E and F are integers larger than or equal to zero.

4. The neural network chip of claim 3,

5. An electronic device, characterized in that it comprises a neural network chip according to any one of claims 3 or 4.

6. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method according to any one of claims 1 or 2.