CN116306840A

CN116306840A - Neural network operation method, device, chip, electronic equipment and storage medium

Info

Publication number: CN116306840A
Application number: CN202111466758.0A
Authority: CN
Inventors: 徐东; 熊先奎
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2023-06-23
Also published as: WO2023098256A1

Abstract

The present application relates to the field of data computation, and relates to a neural network operation method, a device, a chip, an electronic device and a storage medium. The neural network operation method comprises the following steps: acquiring input data and Wk Hk sub-convolution kernel groups of the neural network operation, and entering an operation step; dividing N Wk Hk C convolution kernels calculated by the neural network into N1X 1C sub convolution kernels, and dividing the N1X 1C sub convolution kernels into Wk Hk sub convolution kernel groups; the operation steps comprise: rearranging the input data according to the data rearranging mode corresponding to each sub-convolution kernel group to obtain rearranged input data corresponding to each sub-convolution kernel group; convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result of each sub-convolution kernel group; and accumulating the convolution results of the sub convolution kernel groups to obtain an accumulated result, and taking the data positioned at the effective position in the accumulated result as an output result of the neural network operation. Eliminating hardware design overhead, increased data access and increased dynamic power consumption due to img2 col.

Description

Neural network operation method, device, chip, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the field of data calculation, in particular to a neural network operation method, a neural network operation device, a neural network operation chip, electronic equipment and a neural network storage medium.

Background

The 90% calculated amount of the neural network is in convolution and full connection, the full connection is a special convolution operation in essence, the convolution operation is basically converted into matrix operation currently, the matrix operation is realized through a pulse array or general matrix multiplication (General Matrix Multiplication, abbreviated as GEMM), the research on the neural network in the prior art mainly realizes multiplication and addition operation in the convolution operation with high efficiency, the influence of data access on the calculation efficiency is ignored, and the power consumption is increased due to access.

For convenience in scheduling, the existing neural network accelerator generally adopts an img2col mode to arrange the weight and the activation data. After the weight and the input data are subjected to img2col, 2 matrixes are input into a matrix operation unit for operation, so that the final 2 matrix multiplication results, namely the neural network convolution output results, are conveniently obtained. The size of the data is not increased after the weight data img2col, the data only needs to be rearranged, and the weight img2col does not increase additional expenditure because the weight can be arranged in an offline mode. However, after img2col is adopted for the input data, the capacity of the input data is obviously increased due to the convolution sliding window, as shown in fig. 1, the original input is a picture with w=10 and h=10, the total data amount is 10) 10=100, the data after img2col is 64×9=576, the data is expanded by nearly 6 times, and if the input size (w×h) is larger, the theoretical data expansion is close to K of the convolution kernel _W *K _H Multiple times. img2col may be implemented in software or hardware, but in whatever way access to the incoming data is increased, which results in an increase in dynamic power consumption. But also this increase in data volume can lead to reduced performance because the neural network computation itself is access limited.

Disclosure of Invention

The embodiment of the application mainly aims at providing a neural network operation method, a device, electronic equipment and a storage medium. The aim is to eliminate the overhead of hardware design, the increase of data access amount, and the increase of dynamic power consumption due to img2 col.

In order to achieve the above object, an embodiment of the present application provides a neural network operation method, including: acquiring input data and Wk Hk sub-convolution kernel groups of the neural network operation, and entering an operation step; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, dividing the N Wk 1 x 1C sub convolution kernels into Wk Hk sub convolution kernel groups, wherein each sub convolution kernel group comprises N1 x 1C sub convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same; the operation step comprises the following steps: rearranging the input data according to a data rearranging mode corresponding to each sub-convolution kernel group to obtain rearranged input data corresponding to each sub-convolution kernel group; convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group; accumulating convolution results corresponding to the sub convolution kernel groups to obtain accumulation results, and taking data positioned at an effective position in the accumulation results as an output result of the neural network operation; the part of the rearranged input data corresponding to each sub convolution kernel group has the same data position, and the same data position is the effective position.

In order to achieve the above object, an embodiment of the present application further provides a neural network operation method, including: acquiring input data and Wk Hk sub-convolution kernel groups of the neural network operation, and entering an operation step; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, wherein the N Wk Hk 1C sub convolution kernels are divided into Wk Hk sub convolution kernel groups, and each sub convolution kernel group comprises N1 x 1C sub convolution kernels; n, wk, hk, C are integers greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same; the operation step comprises the following steps: respectively convolving each sub-convolution kernel group and the input data to obtain a convolution result corresponding to each sub-convolution kernel group; rearranging the convolution results corresponding to the sub-convolution kernel groups according to the data rearranging mode corresponding to the sub-convolution kernel groups to obtain rearranged convolution results corresponding to the sub-convolution kernel groups; accumulating the rearranged convolution results corresponding to each sub convolution kernel group to obtain an accumulated result, and taking the data positioned at the effective position in the accumulated result as an output result of the neural network operation; the effective convolution results in the rearranged convolution results corresponding to the sub-convolution kernel groups have the same data positions, and the same data positions are the effective positions.

In order to achieve the above object, an embodiment of the present application further provides a neural network operation method, including: acquiring input data and Wk×Hk sub-convolution kernel groups of a neural network operation; and entering an operation step; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, wherein the N Wk Hk 1C sub convolution kernels are divided into Wk Hk sub convolution kernel groups, and each sub convolution kernel group comprises N1 x 1C sub convolution kernels; n, wk, hk, C are integers greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same; the operation steps comprise: convolving the ith sub-convolution kernel group with the input data to obtain an ith convolution result; the ith sub-convolution kernel group and the part of input data corresponding to the ith sub-convolution kernel group are convolved to obtain an effective convolution result corresponding to the ith sub-convolution kernel group, wherein the ith convolution result comprises the effective convolution result; rearranging the (i-1) th accumulated result so that the effective convolution result in the rearranged (i-1) th accumulated result and the effective convolution result in the i-th accumulated result have the same data position; accumulating the rearranged accumulated result (i-1) and the ith convolution result to obtain an ith accumulated result; if i is smaller than wk×hk, updating i to i+1, and executing the operation step again; if i is equal to wk×hk, taking the effective convolution result in the ith accumulated result as an output result of the neural network operation; when the initial value of i is 1 and i=1, the 0 th accumulated result is set to zero, and the valid convolution result in the rearranged 0 th accumulated result and the valid convolution result in the 1 st convolution result are defaulted to have the same data position.

To achieve the above object, an embodiment of the present application further provides a neural network computing device, including: the device comprises a first storage unit, a second storage unit, a control unit, a first data rearrangement unit, a convolution unit and an addition unit; the first storage unit is used for storing input data of the neural network operation, and the second storage unit is used for storing Wk×Hk sub-convolution kernel groups of the neural network operation; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, dividing the N Wk 1 x 1C sub convolution kernels into Wk Hk sub convolution kernel groups, wherein each sub convolution kernel group comprises N1 x 1C sub convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same; the control unit is used for acquiring the input data from the first storage unit and inputting the input data into the first data rearrangement unit, and is also used for sending the data rearrangement modes corresponding to the sub convolution kernel groups to the first data rearrangement unit; the first data rearrangement unit is configured to rearrange the input data according to a data rearrangement mode corresponding to each sub-convolution kernel group, obtain rearranged input data corresponding to each sub-convolution kernel group, and output rearranged input data corresponding to each sub-convolution kernel group to the convolution unit; the part of the rearranged input data corresponding to each sub convolution kernel group has the same data position, and the same data position is the effective position; the control unit is further configured to obtain each of the sub-convolution kernel groups from the second storage unit, and send each of the sub-convolution kernel groups to the convolution unit; the convolution unit is used for convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group, and outputting the convolution result corresponding to each sub-convolution kernel group to the addition unit; the adding unit is used for accumulating the convolution results corresponding to the sub convolution kernel groups to obtain an accumulated result, and taking the data at the effective position in the accumulated result as the output result of the neural network operation.

To achieve the above object, an embodiment of the present application further provides a neural network computing device, including: the device comprises a first storage unit, a second storage unit, a control unit, a second data rearrangement unit, a convolution unit and an addition unit; the first storage unit is used for storing input data of the neural network operation, and the second storage unit is used for storing Wk×Hk sub-convolution kernel groups of the neural network operation; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, dividing the N Wk 1 x 1C sub convolution kernels into Wk Hk sub convolution kernel groups, wherein each sub convolution kernel group comprises N1 x 1C sub convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same; the control unit is used for acquiring the input data from the first storage unit and inputting the input data into the convolution unit, and is also used for acquiring each sub-convolution kernel group from the second storage unit and inputting each sub-convolution kernel group into the convolution unit; the convolution unit is used for respectively convolving each sub-convolution kernel group and the input data to obtain a convolution result corresponding to each sub-convolution kernel group, and outputting the convolution result corresponding to each sub-convolution kernel group to the second data rearrangement unit; the control unit is further configured to send a data rearrangement mode corresponding to each of the sub-convolution kernel groups to the second data rearrangement unit; the second data rearrangement unit rearranges the convolution results corresponding to the sub-convolution kernel groups according to the data rearrangement mode corresponding to the sub-convolution kernel groups to obtain rearranged convolution results corresponding to the sub-convolution kernel groups, and outputs the rearranged convolution results corresponding to the sub-convolution kernel groups to the addition unit; the effective convolution results in the rearranged convolution results corresponding to the sub-convolution kernel groups have the same data positions, and the same data positions are the effective positions.

To achieve the above object, an embodiment of the present application further provides a neural network computing device, including: the device comprises a first storage unit, a second storage unit, a third storage unit, a control unit, a third data rearrangement unit, a convolution unit and an addition unit; the first storage unit is used for storing input data of the neural network operation, and the second storage unit is used for storing Wk×Hk sub-convolution kernel groups of the neural network operation; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, dividing the N Wk 1 x 1C sub convolution kernels into Wk Hk sub convolution kernel groups, wherein each sub convolution kernel group comprises N1 x 1C sub convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same; the control unit is used for acquiring the input data from the first storage unit and inputting the input data into the convolution unit, and is also used for acquiring an ith sub-convolution kernel group from the second storage unit and inputting the ith sub-convolution kernel group into the convolution unit; the convolution unit is used for convolving the ith sub-convolution kernel group with the input data to obtain an ith convolution result, and outputting the ith convolution result to the addition unit; the ith sub-convolution kernel group and the part of input data corresponding to the ith sub-convolution kernel group are convolved to obtain an effective convolution result corresponding to the ith sub-convolution kernel group, wherein the ith convolution result comprises the effective convolution result; the control unit is further configured to obtain the (i-1) -th accumulated result from the third storage unit, and send the (i-1) -th accumulated result to the third data rearrangement unit; the third data rearrangement unit is configured to rearrange the (i-1) th accumulated result so that the effective convolution result in the rearranged (i-1) th accumulated result and the effective convolution result in the i-th convolution result have the same data position; and outputting the rearranged (i-1) th accumulated result to the adding unit; accumulating the rearranged accumulated result (i-1) and the ith convolution result to obtain an ith accumulated result, and storing the ith accumulated result into the third storage unit and covering the (i-1) accumulated result; the control unit is further configured to determine the value of i, update i to i+1 if i is less than wk×hk, and execute the operation step again; if i is equal to wk×hk, taking the effective convolution result in the ith accumulated result as an output result of the neural network operation; when the initial value of i is 1 and i=1, the 0 th accumulated result is set to zero, and the valid convolution result in the rearranged 0 th accumulated result and the valid convolution result in the 1 st convolution result are defaulted to have the same data position.

To achieve the above object, an embodiment of the present application further provides a chip, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the neural network operation method described above.

To achieve the above object, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the neural network operation method described above.

To achieve the above object, an embodiment of the present application further provides a computer readable storage medium storing a computer program, where the computer program implements the above neural network operation method when executed by a processor.

In the neural network operation method, in the process of the neural network operation, input data and Wk xHk sub-convolution kernel groups of the neural network operation are obtained, and an operation step is carried out; the method comprises the steps of dividing N Wk Hk C convolution kernels calculated by a neural network into N1 x 1C sub-convolution kernels, dividing the N1 x 1C sub-convolution kernels into Wk Hk sub-convolution kernel groups, wherein each sub-convolution kernel group comprises N1 x 1C sub-convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; the partial input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group are the same under the condition that N is more than or equal to 2; the operation step comprises the following steps: rearranging the input data according to the data rearranging mode corresponding to each sub-convolution kernel group to obtain rearranged input data corresponding to each sub-convolution kernel group; convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group; accumulating convolution results corresponding to each sub convolution kernel group to obtain an accumulation result, and taking data positioned at an effective position in the accumulation result as an output result of neural network operation; the rearranged input data corresponding to each sub convolution kernel group has the same data position, and the same data position is an effective position. By splitting the convolution, multiplexing the input data without the need to reorder the data for scheduling and computation, the computation is directly performed on the original input data without img2col conversion of the input data, thereby eliminating the overhead of hardware design, the increase of data access amount, and the increase of dynamic power consumption due to img2 col.

Drawings

FIG. 1 is a schematic diagram of an img2col process on input data in the prior art;

fig. 2 is a flowchart of a neural network operation method provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of input data provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a set of sub-convolution kernels provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of effective locations of rearranged input data provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of splitting input data provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a prior art convolution of input data;

FIG. 8 is a schematic diagram of convolving input data provided by an embodiment of the present application;

fig. 9 is a flowchart of a neural network operation method provided in an embodiment of the present application;

fig. 10 is a flowchart of a neural network operation method provided in an embodiment of the present application;

fig. 11 is a flowchart of a neural network operation method provided in an embodiment of the present application;

fig. 12 is a schematic structural diagram of a neural network computing device according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a neural network computing device according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a neural network computing device according to an embodiment of the present application;

Fig. 15 is a schematic structural diagram of a chip provided in an embodiment of the present application;

fig. 16 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, as will be appreciated by those of ordinary skill in the art, in the various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present application, and the embodiments may be mutually combined and referred to without contradiction.

One embodiment of the present application relates to a neural network operation method, as shown in fig. 2, including:

step 101, obtaining input data and wk×hk sub-convolution kernel groups of a neural network operation, and entering an operation step; the method comprises the steps of dividing N Wk Hk C convolution kernels calculated by a neural network into N1 x 1C sub-convolution kernels, dividing the N1 x 1C sub-convolution kernels into Wk Hk sub-convolution kernel groups, wherein each sub-convolution kernel group comprises N1 x 1C sub-convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the N sub-convolution kernels in each sub-convolution kernel group correspond to the same part of input data.

In an example implementation, as shown in FIG. 3, the acquired input data is W _{Input device} *H _{Input device} *C _{Input device} When C is the data of _{Input device} When 1, the input data is two-dimensional data, when C _{Input device} When the input data is larger than 1, the input data is three-dimensional data.

In an example implementation, as shown in fig. 4, the obtained sub-convolution kernel groups are split by convolution kernels, and the number of sub-convolution kernels included in one sub-convolution kernel group is determined by the number of convolution kernels that perform the splitting, for example: splitting 9 convolution kernels, wherein each obtained sub-convolution kernel group comprises 9 sub-convolution kernels; and the number of the sub convolution kernel groups is determined by the width W and the length H of the convolution kernels, such as: the convolution kernel for splitting is 3*3 in size, and 3*3 =9 sub-convolution kernel groups can be obtained after splitting.

In an example implementation, each sub-convolution kernel corresponds to a portion of the input data, such as: the sub-convolution kernel 00 of fig. 4 corresponds to 00 through 77 in the input data of fig. 3, the sub-convolution kernel 01 of fig. 4 corresponds to 01 through 78 in the input data of fig. 3, the sub-convolution kernel 02 of fig. 4 corresponds to 02 through 79, … … in the input data of fig. 3, and the sub-convolution kernel 22 of fig. 4 corresponds to 22 through 99 in the input data of fig. 3; when one sub-convolution kernel group contains more than two sub-convolution kernels, the input data of the N sub-convolution kernels in each sub-convolution kernel group are the same, for example: each of the 1 st set of sub-convolution kernels 00 of fig. 4 corresponds to 00 through 77 in the input data of fig. 3, each of the sub-convolution kernels 01 of the 2 nd set of sub-convolution kernels of fig. 4 corresponds to 01 through 78 in the input data of fig. 3, each of the sub-convolution kernels 02 of the 3 rd set of sub-convolution kernels of fig. 4 corresponds to 02 through 79, … … in the input data of fig. 3, and each of the sub-convolution kernels 22 of the 9 th set of sub-convolution kernels 22 of fig. 4 corresponds to 22 through 99 in the input data of fig. 3.

102, rearranging the input data according to a data rearranging mode corresponding to each sub-convolution kernel group to obtain rearranged input data corresponding to each sub-convolution kernel group; convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group; accumulating convolution results corresponding to each sub convolution kernel group to obtain an accumulation result, and taking data positioned at an effective position in the accumulation result as an output result of neural network operation; the rearranged input data corresponding to each sub convolution kernel group has the same data position, and the same data position is an effective position.

In an example implementation, the operation process of each sub-convolution kernel group and the input data is completed in a matrix operation unit of the neural network, each sub-convolution kernel group has a corresponding data rearrangement mode, the data rearrangement mode corresponding to each sub-convolution kernel group needs to be acquired first before each sub-convolution kernel group is convolved with the input data, and the input data is rearranged according to the data rearrangement mode corresponding to each sub-convolution kernel group, so that rearranged input data corresponding to each sub-convolution kernel group is obtained. And then, carrying out convolution on each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group, accumulating the convolution results corresponding to each sub-convolution kernel group to obtain an accumulated result, and taking the data at the effective position in the accumulated result as an output result of the neural network operation.

In an example implementation, the rearranged input data corresponding to each sub-convolution kernel group has the same data position, and the same data position is a valid position. That is: the rearranged input data of each sub-convolution kernel group has the same data position as part of the input data, but the data on the same data position is rearranged, the data rearrangement mode corresponding to the 1 st sub-convolution kernel group 00 in fig. 4 is set to be that the position of each part of the data in the input data is unchanged, and the rearranged input data corresponding to the 1 st sub-convolution kernel group 00 is shown in fig. 5; the data rearrangement corresponding to the 2 nd sub-convolution kernel group 01 in fig. 4 is set to shift each column of data of each part of data in the input data forward by one column, the rearranged input data corresponding to the 2 nd sub-convolution kernel group 01 is shown in fig. 5, and so on, the data rearrangement corresponding to the wk×hk sub-convolution kernel group 22 in fig. 4 is set to shift each column of data of each part of data in the input data forward by two columns and each row by two columns, and the rearranged input data corresponding to the wk×hk sub-convolution kernel group 22 is shown in fig. 5; wherein the effective position is the solid line part data in fig. 5.

In an example implementation, to satisfy efficient operation of the matrix operation unit, to provide matching of the input data bit width requirement and the size of the matrix operation for the matrix operation, assuming that the matrix operation module can output a matrix of m×n in a single cycle, the bandwidth of the input data m×w _i Wherein W is _i Bit width for representation of single data, e.g. W with INT8 precision _i =8, under FP16 precision W _i =16. Let C0 be the depth of the input data participating in one matrix operation, i.e. the granularity of the segmentation of the input data in the depth direction, which is at least 1.C (C) ₁ The total depth C of the input data is the granularity segmentation times according to C0. The storage sequence of the input data in the Buffer is as follows C ₁ HWC0, e.g.As shown in fig. 6, the data with M x Wi bit width is divided into M x Wi/C0 groups, each group of data is stored by a memory block, the bit width is Wi x C0, each group of data has independent address management, and W can be fetched from any position in a single cycle _i *C ₀ bit data. The data rearrangement module rearranges the data read from each Buffer and then sends the rearranged data to the matrix operation module for processing.

In an example implementation, if the input data is stored in the memory block, the instruction input to the matrix operation unit only needs to specify the start address of the input data, the start address of the weight data, and the sizes of the 2 matrices involved in the matrix operation, that is, the sizes of M, N, and K, and assuming that the minimum specification supported by the matrix unit is m×k and k×n matrix operation, the control module will automatically acquire m×k data from the storage unit 1 every cycle, take k×n data from the storage unit 2, load the k×n data to the matrix operation unit for matrix operation, and for the 2 data matrices of m×k and k×n, the control unit will automatically perform segmentation and calculation, so the values of M, N, and K must be integer multiples of M, N, K, if not needed, and if the minimum specification supported by the matrix unit is required for matrix operation, that is, the values of M, N, and K satisfy the condition of integer multiples of M, N, K.

In an example implementation, fig. 7 shows the convolution scheme used in the prior art, and fig. 8 shows the convolution process used in the present application, where the input source data of each sub-convolution kernel group is identical and no data expansion occurs. The convolution calculation flow is basically consistent with the traditional convolution calculation flow, input data and weight data are loaded first, for a single convolution block, the input data are loaded once, the weight data during convolution can be split for batch loading according to the mode shown in fig. 8, but the weight data can also be loaded into a buffer unit of the weight once. To meet better parallelism, the input data and the weight data may be loaded simultaneously to the respective memory cells. After the input data and the weight are loaded, matrix operation is carried out for a plurality of times according to the splitting rule described in fig. 8, the input data is not required to be changed in the middle, and only different starting positions are required to be designated, wherein the input data are highly multiplexed, and for the data in convolution, the weight data in different convolutions are taken each time to participate in matrix operation.

In an example implementation, assuming that the depth C shown in fig. 8 is 16, for the input data of 10×10×16 shown in fig. 8, the convolution kernel shown in fig. 4 is 3×3×16 (split into the respective sub-convolution kernel groups of 1×1×16 in fig. 8), and the invalid computation of 2 columns of outputs (i.e. the portion shown by the dashed line in fig. 5) is added, and assuming that the matrix computation itself is 100% efficient, for the input data of 10×10×16, the convolution kernel lower efficiency of 3×3×16 is 8×8/(8×10) =80% and in general it is difficult for the universal neural network accelerator to achieve an efficiency of greater than 50% in the single graph mode, so this computation mode does not have much influence on the overall network computation efficiency, but the input data access is only (10×10)/(9×8×8) =17.3% in the img2col mode. For input sizes greater than 10 x 10, the ratio of invalid calculations (Wk-1)/W will be lower. If the input size is too small, the calculation can be done in bulk or img2col, and since in this case the input data will not typically be a bottleneck, the generation of img2col data will not affect system performance.

In an example implementation, for a sub-convolution kernel group including N sub-convolution kernels, the N sub-convolution kernels in the sub-convolution kernel group respectively convolve with rearranged input data corresponding to the sub-convolution kernel group to obtain N sub-convolution results, and the N sub-convolution results are respectively used as nth layer data of the convolution results corresponding to the sub-convolution kernel group.

In the embodiment of the application, in the process of the neural network operation, input data and Wk×Hk sub-convolution kernel groups of the neural network operation are obtained, and an operation step is carried out; the method comprises the steps of dividing N Wk Hk C convolution kernels calculated by a neural network into N1 x 1C sub-convolution kernels, dividing the N1 x 1C sub-convolution kernels into Wk Hk sub-convolution kernel groups, wherein each sub-convolution kernel group comprises N1 x 1C sub-convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; the partial input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group are the same under the condition that N is more than or equal to 2; the operation step comprises the following steps: rearranging the input data according to the data rearranging mode corresponding to each sub-convolution kernel group to obtain rearranged input data corresponding to each sub-convolution kernel group; convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group; accumulating convolution results corresponding to each sub convolution kernel group to obtain an accumulation result, and taking data positioned at an effective position in the accumulation result as an output result of neural network operation; the rearranged input data corresponding to each sub convolution kernel group has the same data position, and the same data position is an effective position. By splitting the convolution, multiplexing the input data without the need to reorder the data for scheduling and computation, the computation is directly performed on the original input data without img2col conversion of the input data, thereby eliminating the overhead of hardware design, the increase of data access amount, and the increase of dynamic power consumption due to img2 col.

One embodiment of the present application relates to an operation step of a neural network operation method, as shown in fig. 9, including:

and step 201, rearranging the input data according to a data rearranging mode corresponding to the ith sub convolution kernel group to obtain the input data after the ith rearranging.

In an example implementation, the present application acquires the sub-convolution kernel set after acquiring the input data, but when acquiring the sub-convolution kernel set, wk×hk sub-convolution kernel sets are not all acquired at a time; instead, the 1 st sub-convolution kernel group is acquired first, after all the operation steps are performed on the 1 st sub-convolution kernel group, the 2 nd sub-convolution kernel group is acquired, and so on until the wk×hk sub-convolution kernel group is acquired.

In an example implementation, for an ith sub-convolution kernel group obtained each time, firstly determining an ith data rearrangement mode corresponding to the ith sub-convolution kernel group, and rearranging input data according to the ith data rearrangement mode to obtain ith rearranged input data; wherein i has a value of from 1 to Wk Hk.

In an example implementation, the loading mode of the ith sub-convolution kernel group is a data coverage loading mode, such as: when loading the 2 nd sub-convolution kernel group, the 1 st sub-convolution kernel group is covered by the 2 nd sub-convolution kernel group, so that the memory occupied by storing each sub-convolution kernel group is reduced.

In an example implementation, when the value of i is 1, the data rearrangement corresponding to the 1 st sub-convolution kernel group is set to be that the position of each part of data in the input data is unchanged.

Step 202, convolving the ith sub-convolution kernel group with the ith rearranged input data to obtain an ith convolution result.

In an example implementation, after the ith rearranged input data is obtained, the ith sub-convolution kernel group is convolved with the ith rearranged input data to obtain an ith convolution result.

In an example implementation, when the neural network includes X matrix operation units (taking 3 as an example), the 9 sub-convolution kernels in fig. 4 may be divided into 3 groups, where the 1 st to 3 th sub-convolution kernels are input to the 3 matrix operation units respectively for operation in the first time, the 4 th to 6 th sub-convolution kernels are input to the 3 matrix operation units respectively for operation in the second time, and the 7 th to 9 th sub-convolution kernels are input to the 3 matrix operation units respectively for operation in the third time, so as to obtain the convolution results corresponding to the respective sub-convolution kernels.

And 203, accumulating the ith convolution result and the (i-1) th accumulated result to obtain the ith accumulated result.

In an example implementation, the generated ith convolution result is accumulated with the previous (i-1) th accumulation result to obtain the ith accumulation result.

In an example implementation, when the value of i is 1, the 0 th accumulation result corresponding to the 1 st sub-convolution kernel group is set to zero.

In an example implementation, when the neural network includes Y addition units (taking 3 as an example), 9 convolution results corresponding to the 9 sub-convolution kernel groups in fig. 4 may be divided into 3 groups, the 1 st to 3 rd convolution results are input to the 1 st addition unit to calculate the 1 st to 3 rd convolution accumulation results, the 4 th to 6 th convolution results are input to the 2 nd addition unit to calculate the 4 th to 6 th convolution accumulation results, the 7 th to 8 th convolution results are input to the 3 rd addition unit to calculate the 7 th to 8 th convolution accumulation results, and then the 1 st to 3 th convolution accumulation results, the 4 th to 6 th convolution accumulation results, and the 7 th to 8 th convolution accumulation results are input to any one addition unit to calculate the accumulation results of 9 convolution results.

Step 204, if i is less than wk×hk, updating i to i+1, and executing the operation step again; if i is equal to wk×hk, the data at the effective position in the ith accumulated result is used as the output result of the neural network operation.

In an example implementation, after the operation of the ith sub-convolution kernel group is completed, the size of the i value needs to be determined, when the i value is smaller than wk×hk, the i value is updated to i+1, the (i+1) th sub-convolution kernel group is loaded, and the operation step is executed again; when the i value is equal to wk×hk, the operation step is ended, and the data at the effective position in the i-th accumulated result is used as the output result of the neural network operation.

The convolution operation of the sub convolution kernel group and the input data can be performed in a serial or parallel mode on the basis of other embodiments, so that the neural network operation method can be applied to various types of neural networks.

One embodiment of the present application relates to a neural network operation method, as shown in fig. 10, including:

step 301, obtaining input data and wk×hk sub-convolution kernel groups of a neural network operation, and entering an operation step; the method comprises the steps of dividing N Wk Hk C convolution kernels calculated by a neural network into N1 x 1C sub-convolution kernels, dividing the N1 x 1C sub-convolution kernels into Wk Hk sub-convolution kernel groups, wherein each sub-convolution kernel group comprises N1 x 1C sub-convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the N sub-convolution kernels in each sub-convolution kernel group correspond to the same part of input data.

In an example implementation, the step is substantially the same as step 101 in the embodiment of the present application, and is not described here in detail.

Step 302, respectively convolving each sub-convolution kernel group and input data to obtain a convolution result corresponding to each sub-convolution kernel group; rearranging the convolution results corresponding to the sub-convolution kernel groups according to the data rearranging mode corresponding to the sub-convolution kernel groups to obtain rearranged convolution results corresponding to the sub-convolution kernel groups; accumulating the rearranged convolution results corresponding to each sub convolution kernel group to obtain an accumulated result, and taking the data positioned at the effective position in the accumulated result as an output result of the neural network operation; the method comprises the steps of carrying out partial input data convolution on each sub-convolution kernel group and each sub-convolution kernel group to obtain an effective convolution result corresponding to each sub-convolution kernel group, wherein the effective convolution results in the rearranged convolution results corresponding to each sub-convolution kernel group have the same data position, and the same data position is the effective position.

In an example implementation, each sub-convolution kernel group and input data are convolved respectively to obtain a convolution result corresponding to each sub-convolution kernel group, then a data rearrangement mode corresponding to each sub-convolution kernel group is obtained, and the convolution result corresponding to each sub-convolution kernel group is rearranged according to the data rearrangement mode corresponding to each sub-convolution kernel group to obtain a rearranged convolution result corresponding to each sub-convolution kernel group; and accumulating the rearranged convolution results corresponding to each sub convolution kernel group to obtain an accumulated result, and taking the data positioned at the effective position in the accumulated result as an output result of the neural network operation.

In an example implementation, the data rearrangement corresponding to each sub-convolution kernel group is substantially the same as the data rearrangement mentioned in step 102 in the embodiment of the present application, which is not described herein in detail.

In an example implementation, both the serial operation and the parallel operation mentioned in steps 201 to 204 may be applied in the embodiments of the present application; firstly, carrying out convolution and rearrangement on one sub-convolution kernel group, then accumulating until the last sub-convolution kernel group is subjected to convolution and rearrangement, and then accumulating, so that an output result of neural network operation can be obtained; the operation step may be performed by a plurality of matrix operation units and/or a plurality of addition units.

In the embodiment of the application, on the basis of other embodiments, the convolution of each sub-convolution kernel group and the input data can be performed first, and then the operations such as rearrangement, accumulation and the like are performed on the convolution results of each sub-convolution kernel group and the input data, so that specific rules are provided for the sequence of convolution and rearrangement, and the applicability of the application is improved.

One embodiment of the present application relates to a neural network operation method, as shown in fig. 11, including:

step 401, obtaining input data and wk×hk sub-convolution kernel groups of the neural network operation, and entering an operation step; the method comprises the steps of dividing N Wk Hk C convolution kernels calculated by a neural network into N1 x 1C sub-convolution kernels, dividing the N1 x 1C sub-convolution kernels into Wk Hk sub-convolution kernel groups, wherein each sub-convolution kernel group comprises N1 x 1C sub-convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the N sub-convolution kernels in each sub-convolution kernel group correspond to the same part of input data.

Step 402, convolving the ith sub convolution kernel group with the input data to obtain an ith convolution result; the method comprises the steps of carrying out convolution on partial input data corresponding to an ith sub-convolution kernel group and an ith sub-convolution kernel group to obtain an effective convolution result corresponding to the ith sub-convolution kernel group, wherein the ith convolution result comprises an effective convolution result.

In an example implementation, convolving an ith sub-convolution kernel set with input data to obtain an ith convolution result; in the process of convolving the ith sub-convolution kernel group with the input data, convolving the ith sub-convolution kernel group with part of the input data corresponding to the ith sub-convolution kernel group to obtain a convolution result, namely the ith convolution result comprises an effective convolution result.

In an example implementation, wk×hk sub-convolution kernel groups may be loaded sequentially, with the i-th sub-convolution kernel group being loaded when loaded, in a data-covered manner, i.e.: the ith-1 th set of sub-convolution kernels is covered with the ith set of sub-convolution kernels.

Step 403, rearranging the (i-1) th accumulated result so that the effective convolution result in the rearranged (i-1) th accumulated result and the effective convolution result in the i-th convolution result have the same data position.

In an example implementation, the accumulated results of the (i-1) th time are rearranged according to the data rearrangement mode corresponding to the i < th > sub-convolution kernel group, so that the effective convolution result in the rearranged accumulated results of the (i-1) th time and the effective convolution result in the i < th > convolution result have the same data position.

Step 404, accumulating the rearranged accumulated result (i-1) and the ith convolution result to obtain the ith accumulated result.

In an example implementation, the reordered (i-1) th accumulated result is accumulated with the i-th convolution result to obtain the i-th accumulated result.

In an example implementation, when i is 1 and i=1, the 0 th accumulated result is set to zero, and the valid convolution result in the rearranged 0 th accumulated result and the valid convolution result in the 1 st convolution result are defaulted to have the same data position.

Step 405, if i is less than wk×hk, updating i to i+1, and executing the operation step again; and if i is equal to Wk Hk, taking the effective convolution result in the ith accumulated result as the output result of the neural network operation.

The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.

Another embodiment of the present application relates to a neural network computing device, and details of the neural network computing device of the present embodiment are specifically described below, which are provided for understanding only, but not necessarily for implementing the present embodiment, and fig. 12 is a schematic diagram of the neural network computing device of the present embodiment, including: a first storage unit 1201, a second storage unit 1202, a control unit 1203, a first data rearrangement unit 1204, a convolution unit 1205, and an addition unit 1206.

The first storage unit is used for storing input data of the neural network operation;

the second storage unit is used for storing Wk×Hk sub-convolution kernel groups calculated by the neural network; the method comprises the steps of dividing N Wk Hk C convolution kernels calculated by a neural network into N1 x 1C sub-convolution kernels, dividing the N1 x 1C sub-convolution kernels into Wk Hk sub-convolution kernel groups, wherein each sub-convolution kernel group comprises N1 x 1C sub-convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the N sub-convolution kernels in each sub-convolution kernel group correspond to the same part of input data.

The control unit is used for acquiring input data from the first storage unit and inputting the input data into the first data rearrangement unit, and the control unit is also used for sending the data rearrangement modes corresponding to the sub convolution kernel groups to the first data rearrangement unit.

The first data rearrangement unit is used for rearranging the input data according to the data rearrangement mode corresponding to each sub-convolution kernel group to obtain rearranged input data corresponding to each sub-convolution kernel group, and outputting the rearranged input data corresponding to each sub-convolution kernel group to the convolution unit; the rearranged input data corresponding to each sub convolution kernel group has the same data position, and the same data position is an effective position.

The control unit is also used for acquiring each sub-convolution kernel group from the second storage unit and sending each sub-convolution kernel group to the convolution unit;

the convolution unit is used for convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain convolution results corresponding to each sub-convolution kernel group, and outputting the convolution results corresponding to each sub-convolution kernel group to the addition unit.

And the addition unit is used for accumulating the convolution results corresponding to each sub convolution kernel group to obtain an accumulation result, and taking the data positioned at the effective position in the accumulation result as the output result of the neural network operation.

In an example implementation, the neural network operation device provided in the application further includes a third storage unit, configured to store, when the neural network operation process is a serial operation, an operation result of a previous sub-convolution kernel group and an input data operation.

Another embodiment of the present application relates to a neural network computing device, and details of the neural network computing device of the present embodiment are specifically described below, and the following is merely provided for understanding implementation details, but is not necessary for implementing the present embodiment, and fig. 13 is a schematic diagram of the neural network computing device of the present embodiment, including: a first storage unit 1301, a second storage unit 1302, a control unit 1303, a second data rearrangement unit 1404, a convolution unit 1405, and an addition unit 1406.

The first storage unit is used for storing input data of the neural network operation.

The control unit is used for acquiring input data from the first storage unit and inputting the input data into the convolution unit, and is also used for acquiring each sub-convolution kernel group from the second storage unit and inputting each sub-convolution kernel group into the convolution unit.

The convolution unit is used for respectively convolving each sub-convolution kernel group and the input data to obtain convolution results corresponding to each sub-convolution kernel group, and outputting the convolution results corresponding to each sub-convolution kernel group to the second data rearrangement unit.

The control unit is further used for sending the data rearrangement mode corresponding to each sub convolution kernel group to the second data rearrangement unit.

The second data rearrangement unit is used for rearranging the convolution results corresponding to the sub-convolution kernel groups according to the data rearrangement mode corresponding to the sub-convolution kernel groups to obtain rearranged convolution results corresponding to the sub-convolution kernel groups, and outputting the rearranged convolution results corresponding to the sub-convolution kernel groups to the addition unit; the method comprises the steps of carrying out partial input data convolution on each sub-convolution kernel group and each sub-convolution kernel group to obtain an effective convolution result corresponding to each sub-convolution kernel group, wherein the effective convolution results in the rearranged convolution results corresponding to each sub-convolution kernel group have the same data position, and the same data position is the effective position.

Another embodiment of the present application relates to a neural network computing device, and details of the neural network computing device of the present embodiment are specifically described below, and the following is merely provided for understanding implementation details, but is not necessary for implementing the present embodiment, and fig. 14 is a schematic diagram of the neural network computing device of the present embodiment, including: a first storage unit 1401, a second storage unit 1402, a control unit 1403, a third data rearrangement unit 1404, a convolution unit 1405, and an addition unit 1406.

The control unit is used for acquiring input data from the first storage unit and inputting the input data into the convolution unit, and is also used for acquiring an ith sub-convolution kernel group from the second storage unit and inputting the ith sub-convolution kernel group into the convolution unit.

The convolution unit is used for convolving the ith sub convolution kernel group with the input data to obtain an ith convolution result, and outputting the ith convolution result to the addition unit; the method comprises the steps of carrying out convolution on partial input data corresponding to an ith sub-convolution kernel group and an ith sub-convolution kernel group to obtain an effective convolution result corresponding to the ith sub-convolution kernel group, wherein the ith convolution result comprises an effective convolution result.

The control unit is further used for acquiring the (i-1) th accumulated result from the third storage unit and sending the (i-1) th accumulated result to the third data rearrangement unit.

A third data rearrangement unit, configured to rearrange the (i-1) th accumulated result so that the effective convolution result in the rearranged (i-1) th accumulated result and the effective convolution result in the i-th convolution result have the same data position; outputting the rearranged accumulated result (i-1) to an adding unit; accumulating the rearranged accumulated result (i-1) and the ith convolution result to obtain an ith accumulated result, and storing the ith accumulated result into a third storage unit and covering the ith accumulated result (i-1).

The control unit is also used for judging the value of i, if i is smaller than Wk, updating i into i+1, and executing the operation step again; if i is equal to Wk Hk, taking the effective convolution result in the ith accumulated result as the output result of the neural network operation; when the initial value of i is 1 and i=1, the 0 th accumulated result is set to zero, and the valid convolution result in the rearranged 0 th accumulated result and the valid convolution result in the 1 st convolution result are defaulted to have the same data position.

It is to be noted that this embodiment is a system embodiment corresponding to the above-described method embodiment, and this embodiment may be implemented in cooperation with the above-described method embodiment. The related technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not described here again. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.

It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, elements that are not so close to solving the technical problem presented in the present application are not introduced in the present embodiment, but it does not indicate that other elements are not present in the present embodiment.

Another embodiment of the present application relates to a chip, as shown in fig. 6, comprising: at least one processor 601; and a memory 602 communicatively coupled to the at least one processor 601; wherein the memory 602 stores instructions executable by the at least one processor 601, the instructions being executable by the at least one processor 601 to enable the at least one processor 601 to perform the neural network operation method in the above embodiments.

Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.

Another embodiment of the present application relates to an electronic device, as shown in fig. 6, comprising: at least one processor 601; and a memory 602 communicatively coupled to the at least one processor 601; wherein the memory 602 stores instructions executable by the at least one processor 601, the instructions being executable by the at least one processor 601 to enable the at least one processor 601 to perform the neural network operation method in the above embodiments.

Another embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.

That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments described herein. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of implementing the present application and that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. A neural network operation method, comprising:

acquiring input data and Wk Hk sub-convolution kernel groups of the neural network operation, and entering an operation step; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, dividing the N Wk 1 x 1C sub convolution kernels into Wk Hk sub convolution kernel groups, wherein each sub convolution kernel group comprises N1 x 1C sub convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same;

The operation step comprises the following steps:

rearranging the input data according to a data rearranging mode corresponding to each sub-convolution kernel group to obtain rearranged input data corresponding to each sub-convolution kernel group; convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group; accumulating convolution results corresponding to the sub convolution kernel groups to obtain accumulation results, and taking data positioned at an effective position in the accumulation results as an output result of the neural network operation;

the part of the rearranged input data corresponding to each sub convolution kernel group has the same data position, and the same data position is the effective position.

2. The neural network operation method according to claim 1, characterized in that the operation step includes:

rearranging the input data according to a data rearranging mode corresponding to the ith sub convolution kernel group to obtain input data after the ith rearranging;

convolving the ith sub-convolution kernel group with the ith rearranged input data to obtain an ith convolution result;

Accumulating the ith convolution result and the (i-1) th accumulated result to obtain an ith accumulated result;

if i is smaller than wk×hk, updating i to i+1, and executing the operation step again; if i is equal to wk×hk, taking the data at the effective position in the ith accumulated result as an output result of the neural network operation;

when i is 1 and i=1, the data rearrangement corresponding to the 1 st sub-convolution kernel group is set so that the position of each part of data in the input data is unchanged, and the 0 th accumulation result is set to zero.

3. The neural network operation method according to claim 2, wherein the acquiring the input data and wk×hk sub-convolution kernel group of the neural network operation includes:

loading the input data;

and loading the ith sub-convolution kernel group before rearranging the input data according to the data rearranging mode corresponding to the ith sub-convolution kernel group to obtain the input data after the ith rearrangement.

4. A neural network operation method according to claim 3, wherein said loading the ith sub-convolution kernel group is specifically: and loading the ith sub-convolution kernel group in a data coverage mode.

5. The neural network operation method according to any one of claims 1 to 4, wherein, in the case where N is equal to or greater than 2, the convolving each of the sub-convolution kernel groups and the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain a convolution result corresponding to each of the sub-convolution kernel groups, specifically:

for each sub-convolution kernel group, N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with rearranged input data corresponding to the sub-convolution kernel group to obtain N sub-convolution results, wherein the N sub-convolution results are used as N layers of data in the convolution results corresponding to the sub-convolution kernel group.

6. A neural network operation method, comprising:

acquiring input data and Wk Hk sub-convolution kernel groups of the neural network operation, and entering an operation step; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, wherein the N Wk Hk 1C sub convolution kernels are divided into Wk Hk sub convolution kernel groups, and each sub convolution kernel group comprises N1 x 1C sub convolution kernels; n, wk, hk, C are integers greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same;

The operation step comprises the following steps:

respectively convolving each sub-convolution kernel group and the input data to obtain a convolution result corresponding to each sub-convolution kernel group; rearranging the convolution results corresponding to the sub-convolution kernel groups according to the data rearranging mode corresponding to the sub-convolution kernel groups to obtain rearranged convolution results corresponding to the sub-convolution kernel groups; accumulating the rearranged convolution results corresponding to each sub convolution kernel group to obtain an accumulated result, and taking the data positioned at the effective position in the accumulated result as an output result of the neural network operation;

the effective convolution results in the rearranged convolution results corresponding to the sub-convolution kernel groups have the same data positions, and the same data positions are the effective positions.

7. The neural network operation method according to claim 6, characterized in that the operation step includes:

convolving the ith sub-convolution kernel group with the input data to obtain an ith convolution result;

Rearranging the ith convolution result according to a data rearranging mode corresponding to the ith convolution result to obtain an ith rearranged convolution result;

accumulating the ith rearranged convolution result and the (i-1) th accumulated result to obtain an ith accumulated result;

when the initial value of i is 1 and i=1, the data rearrangement mode corresponding to the 1 st convolution result is set to be that the position of each part of convolution results in the 1 st convolution result is unchanged, and the 0 th accumulation result is set to be zero.

8. The neural network operation method of claim 7, wherein the acquiring the input data and Wk xhk sub-convolution kernel groups of the neural network operation includes:

loading the input data;

and loading the ith sub-convolution kernel group before the ith sub-convolution kernel group is convolved with the input data to obtain an ith convolution result.

9. The neural network operation method according to claim 8, wherein the loading the ith sub-convolution kernel group is specifically: and loading the ith sub-convolution kernel group in a data coverage mode.

10. The neural network operation method according to any one of claims 6 to 9, wherein, in the case where N is greater than or equal to 2, the convolving each of the sub-convolution kernel groups and the input data respectively to obtain a convolution result corresponding to each of the sub-convolution kernel groups, specifically:

for each sub-convolution kernel group, N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with the input data to obtain N sub-convolution results, wherein the N sub-convolution results are used as N layers of data in the convolution results corresponding to the sub-convolution kernel groups.

11. A neural network operation method, comprising:

acquiring input data and Wk×Hk sub-convolution kernel groups of a neural network operation; and entering an operation step; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, wherein the N Wk Hk 1C sub convolution kernels are divided into Wk Hk sub convolution kernel groups, and each sub convolution kernel group comprises N1 x 1C sub convolution kernels; n, wk, hk, C are integers greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same;

The operation steps comprise:

convolving the ith sub-convolution kernel group with the input data to obtain an ith convolution result; the ith sub-convolution kernel group and the part of input data corresponding to the ith sub-convolution kernel group are convolved to obtain an effective convolution result corresponding to the ith sub-convolution kernel group, wherein the ith convolution result comprises the effective convolution result;

rearranging the (i-1) th accumulated result so that the effective convolution result in the rearranged (i-1) th accumulated result and the effective convolution result in the i-th accumulated result have the same data position;

accumulating the rearranged accumulated result (i-1) and the ith convolution result to obtain an ith accumulated result;

if i is smaller than wk×hk, updating i to i+1, and executing the operation step again; if i is equal to wk×hk, taking the effective convolution result in the ith accumulated result as an output result of the neural network operation;

when the initial value of i is 1 and i=1, the 0 th accumulated result is set to zero, and the valid convolution result in the rearranged 0 th accumulated result and the valid convolution result in the 1 st convolution result are defaulted to have the same data position.

12. The neural network operation method of claim 11, wherein the acquiring the input data and Wk xhk sub-convolution kernel group of the neural network operation includes:

loading the input data;

13. The neural network operation method according to claim 12, wherein the loading the ith sub-convolution kernel group is specifically: and loading the ith sub-convolution kernel group in a data coverage mode.

14. The neural network operation method according to any one of claims 11 to 13, wherein, in the case where N is equal to or greater than 2, the convolving the ith sub-convolution kernel group with the input data, to obtain an ith convolution result, specifically:

and respectively convolving N sub-convolution kernels in the ith sub-convolution kernel group with the input data to obtain N sub-convolution results, wherein the N sub-convolution results are used as N layers of data in the convolution results corresponding to the ith sub-convolution kernel group.

15. A neural network computing device, comprising: the device comprises a first storage unit, a second storage unit, a control unit, a first data rearrangement unit, a convolution unit and an addition unit;

The first storage unit is used for storing input data of the neural network operation, and the second storage unit is used for storing Wk×Hk sub-convolution kernel groups of the neural network operation; the method comprises the steps of calculating N Wk Hk C convolution kernels by a neural network, dividing the N Wk Hk C convolution kernels into N1 x 1C sub convolution kernels, dividing the N Wk 1 x 1C sub convolution kernels into Wk Hk sub convolution kernel groups, wherein each sub convolution kernel group comprises N1 x 1C sub convolution kernels, and N, wk, hk, C is an integer greater than or equal to 1; each sub-convolution kernel corresponds to part of input data in the input data, and under the condition that N is more than or equal to 2, the part of input data corresponding to N sub-convolution kernels in each sub-convolution kernel group is the same;

the control unit is used for acquiring the input data from the first storage unit and inputting the input data into the first data rearrangement unit, and is also used for sending the data rearrangement modes corresponding to the sub convolution kernel groups to the first data rearrangement unit;

the first data rearrangement unit is configured to rearrange the input data according to a data rearrangement mode corresponding to each sub-convolution kernel group, obtain rearranged input data corresponding to each sub-convolution kernel group, and output rearranged input data corresponding to each sub-convolution kernel group to the convolution unit; the part of the rearranged input data corresponding to each sub convolution kernel group has the same data position, and the same data position is the effective position;

The control unit is further configured to obtain each of the sub-convolution kernel groups from the second storage unit, and send each of the sub-convolution kernel groups to the convolution unit;

the convolution unit is used for convolving each sub-convolution kernel group and rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group, and outputting the convolution result corresponding to each sub-convolution kernel group to the addition unit;

the adding unit is used for accumulating the convolution results corresponding to the sub convolution kernel groups to obtain an accumulated result, and taking the data at the effective position in the accumulated result as the output result of the neural network operation.

16. A neural network computing device, comprising: the device comprises a first storage unit, a second storage unit, a control unit, a second data rearrangement unit, a convolution unit and an addition unit;

The control unit is used for acquiring the input data from the first storage unit and inputting the input data into the convolution unit, and is also used for acquiring each sub-convolution kernel group from the second storage unit and inputting each sub-convolution kernel group into the convolution unit;

the convolution unit is used for respectively convolving each sub-convolution kernel group and the input data to obtain a convolution result corresponding to each sub-convolution kernel group, and outputting the convolution result corresponding to each sub-convolution kernel group to the second data rearrangement unit;

the control unit is further configured to send a data rearrangement mode corresponding to each of the sub-convolution kernel groups to the second data rearrangement unit;

the second data rearrangement unit rearranges the convolution results corresponding to the sub-convolution kernel groups according to the data rearrangement mode corresponding to the sub-convolution kernel groups to obtain rearranged convolution results corresponding to the sub-convolution kernel groups, and outputs the rearranged convolution results corresponding to the sub-convolution kernel groups to the addition unit;

17. A neural network computing device, comprising: the device comprises a first storage unit, a second storage unit, a third storage unit, a control unit, a third data rearrangement unit, a convolution unit and an addition unit;

the control unit is used for acquiring the input data from the first storage unit and inputting the input data into the convolution unit, and is also used for acquiring an ith sub-convolution kernel group from the second storage unit and inputting the ith sub-convolution kernel group into the convolution unit;

The convolution unit is used for convolving the ith sub-convolution kernel group with the input data to obtain an ith convolution result, and outputting the ith convolution result to the addition unit; the ith sub-convolution kernel group and the part of input data corresponding to the ith sub-convolution kernel group are convolved to obtain an effective convolution result corresponding to the ith sub-convolution kernel group, wherein the ith convolution result comprises the effective convolution result;

the control unit is further configured to obtain the (i-1) -th accumulated result from the third storage unit, and send the (i-1) -th accumulated result to the third data rearrangement unit;

the third data rearrangement unit is configured to rearrange the (i-1) th accumulated result so that the effective convolution result in the rearranged (i-1) th accumulated result and the effective convolution result in the i-th convolution result have the same data position; and outputting the rearranged (i-1) th accumulated result to the adding unit;

accumulating the rearranged accumulated result (i-1) and the ith convolution result to obtain an ith accumulated result, and storing the ith accumulated result into the third storage unit and covering the (i-1) accumulated result;

The control unit is further configured to determine the value of i, update i to i+1 if i is less than wk×hk, and execute the operation step again; if i is equal to wk×hk, taking the effective convolution result in the ith accumulated result as an output result of the neural network operation;

18. A chip, comprising:

at least one processing module; the method comprises the steps of,

a memory module in communication with the at least one processing module; wherein,,

the storage module stores instructions executable by the at least one processing module to enable the at least one processing module to perform the method of any one of claims 1 to 5, or to perform the method of any one of claims 6 to 10, or to perform the method of any one of claims 11 to 14.

19. An electronic device, comprising:

At least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 5, or to perform the method of any one of claims 6 to 10, or to perform the method of any one of claims 11 to 14.

20. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method of any one of claims 1 to 5, or performs the method of any one of claims 6 to 10, or performs the method of any one of claims 11 to 14.