WO2023098256A1

WO2023098256A1 - Neural network operation method and apparatus, chip, electronic device and storage medium

Info

Publication number: WO2023098256A1
Application number: PCT/CN2022/121427
Authority: WO
Inventors: 徐东; 熊先奎
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-12-03
Filing date: 2022-09-26
Publication date: 2023-06-08
Also published as: CN116306840A

Abstract

The present application relates to a neural network operation method and apparatus, a chip, an electronic device and a storage medium. The neural network operation method comprises: acquiring input data and Wk*Hk sub-convolution kernel groups for neural network operation, and executing an operation step, N Wk*Hk*C convolution kernels of the neural network operation being split into N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels being divided into Wk*Hk sub-convolution kernel groups. The operation step comprises: rearranging the input data on the basis of a data rearrangement mode corresponding to each sub-convolution kernel group so as to obtain rearranged input data corresponding to each sub-convolution kernel group; and convolving each sub-convolution kernel group with the rearranged input data corresponding to the sub-convolution kernel group to obtain a convolution result of the sub-convolution kernel group; accumulating the convolution results of the sub-convolution kernel groups to obtain an accumulation result, and taking data located at a valid position in the accumulation result as an output result of the neural network operation.

Description

Neural network computing method, device, chip, electronic equipment and storage medium

related application

This application claims priority to a Chinese patent application with application number 202111466758.0 filed on December 3, 2021, the entire contents of which are incorporated herein by reference.

technical field

The embodiments of the present application relate to the field of data computing, and in particular to a neural network computing method, device, chip, electronic device, and storage medium.

Background technique

90% of the calculation of the neural network is convolution and full connection. The full connection is essentially a special convolution operation. The convolution operation is currently basically converted into a matrix operation, through a systolic array or general matrix multiplication (General Matrix Multiplication, referred to as GEMM), the existing research on neural networks is mainly how to efficiently implement multiplication and addition operations in convolution operations, while ignoring the impact of data access on computing efficiency and the power consumption caused by memory access. Increase.

In order to facilitate scheduling, existing neural network accelerators usually use img2col to arrange weights and activation data. After the weights and input data have passed through img2col, the two matrices are input to the matrix operation unit for operation, and the final result of multiplying the two matrices is obtained, which is the result of the neural network convolution output. The weight data img2col will not increase the size of the data, only need to rearrange the data, and because the weights can be arranged offline, so the weight img2col does not increase additional overhead. However, after using img2col for the input data, due to the convolution sliding window, the capacity of the input data will be significantly increased. As shown in Figure 1, the original input is a picture of W=10, H=10, and the total amount of data is 10*10 =100, the data after img2col is 64*9=576, and the data is expanded by nearly 6 times. If the input size (W*H) is larger, the theoretical data expansion is close to K _W *K _H times of the convolution kernel . img2col can be implemented by software or hardware, but no matter which method is used, it will increase the access to the input data, which will lead to an increase in dynamic power consumption. Moreover, since the neural network calculation itself has limited memory access, the increase in the amount of data will also lead to a decrease in performance.

Contents of the invention

The main purpose of the embodiments of the present application is to provide a neural network computing method, device, electronic equipment, and storage medium. It aims to eliminate the overhead of hardware design, the increase of data access and the increase of dynamic power consumption caused by img2col.

In order to achieve the above purpose, the embodiment of the present application provides a neural network operation method, including: obtaining the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and entering the operation step; wherein, the neural network operation The N Wk*Hk*C convolution kernels are split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into all Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution The kernel corresponds to part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation steps include: Rearrange the input data according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, and obtain the rearranged input data corresponding to each of the sub-convolution kernel groups; Convolving the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain convolution results corresponding to each of the sub-convolution kernel groups; accumulating the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulated As a result, the data in the effective position in the accumulation result is used as the output result of the neural network operation; wherein, the rearranged input data corresponding to each sub-convolution kernel group is related to each sub-convolution kernel The part of input data corresponding to the group has the same data position, and the same data position is the effective position.

In order to achieve the above purpose, the embodiment of the present application also provides a neural network operation method, including: obtaining the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and entering the operation step; wherein, the neural network operation The N Wk*Hk*C convolution kernels are split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into The Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-volume The product kernel corresponds to part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation steps include : Convolving each of the sub-convolution kernel groups and the input data respectively to obtain convolution results corresponding to each of the sub-convolution kernel groups; The convolution results corresponding to the sub-convolution kernel groups are rearranged to obtain the rearranged convolution results corresponding to each of the sub-convolution kernel groups; the rearranged convolution results corresponding to each of the sub-convolution kernel groups are The results are accumulated to obtain an accumulation result, and the data in the effective position in the accumulation result is used as the output result of the neural network operation; wherein, each of the sub-convolution kernel groups corresponds to each of the sub-convolution kernel groups. Part of the input data is convoluted to obtain an effective convolution result corresponding to each of the sub-convolution kernel groups, and the effective convolution results in the rearranged convolution results corresponding to each of the sub-convolution kernel groups have the same data location, and the same data location is the valid location.

In order to achieve the above object, the embodiment of the present application also provides a neural network operation method, including: obtaining the input data of the neural network operation and Wk*Hk sub-convolution kernel groups; and entering the operation step; wherein, the neural network operation The N Wk*Hk*C convolution kernels are split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into The Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-volume The product kernel corresponds to part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation steps include: Convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolution kernel group corresponds to the part corresponding to the i-th sub-convolution kernel group The input data convolution obtains the effective convolution result corresponding to the i-th sub-convolution kernel group, the i-th convolution result contains the effective convolution result; the (i-1)th accumulation result Perform rearrangement so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the ith convolution result have the same data position; Accumulate the (i-1)th accumulation result after the rearrangement with the i-th convolution result to obtain the i-th accumulation result; if i is less than Wk*Hk, update i to i+1, and execute again The operation step: if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as the output result of the neural network operation; wherein, the initial value of i is 1, and i= When 1, the 0th accumulation result is set to zero, and the effective convolution result in the 0th accumulation result after the rearrangement and the effective convolution result in the 1st convolution result are defaulted to have the same data location.

In order to achieve the above purpose, the embodiment of the present application also provides a neural network computing device, including: a first storage unit, a second storage unit, a control unit, a first data rearrangement unit, a convolution unit, and an addition unit; A storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N Wk*Hk of the neural network operation The *C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernels Convolution kernel group, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to the input Part of the input data in the data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; The unit acquires the input data, and inputs the input data into the first data rearrangement unit, and the control unit is further configured to send the data rearrangement mode corresponding to each of the sub-convolution kernel groups to the first data rearrangement unit. A data rearrangement unit; the first data rearrangement unit is used to rearrange the input data according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, so as to obtain the data corresponding to each of the sub-convolution kernel groups The rearranged input data, and output the rearranged input data corresponding to each of the sub-convolution kernel groups to the convolution unit; wherein, the rearranged input data corresponding to each of the sub-convolution kernel groups The part of input data corresponding to each of the sub-convolution kernel groups has the same data position, and the same data position is the effective position; the control unit is also used to acquire each The sub-convolution kernel group, and each of the sub-convolution kernel groups is sent to the convolution unit; the convolution unit is used to combine each of the sub-convolution kernel groups and each of the sub-convolution kernel groups The rearranged input data corresponding to the group is convoluted to obtain the convolution results corresponding to each of the sub-convolution kernel groups, and output the convolution results corresponding to each of the sub-convolution kernel groups to the addition unit; the addition The unit is used for accumulating the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulation result, and taking the data at a valid position in the accumulation result as the output result of the neural network operation.

In order to achieve the above purpose, the embodiment of the present application also provides a neural network computing device, including: a first storage unit, a second storage unit, a control unit, a second data rearrangement unit, a convolution unit, and an addition unit; A storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N Wk*Hk of the neural network operation The *C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernels Convolution kernel group, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to the input Part of the input data in the data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; The unit obtains the input data, and inputs the input data into the convolution unit, and the control unit is also used to obtain each of the sub-convolution kernel groups from the second storage unit, and each of the sub-convolution kernel groups The convolution kernel group is input to the convolution unit; the convolution unit is used to convolve each of the sub-convolution kernel groups and the input data respectively to obtain a convolution result corresponding to each of the sub-convolution kernel groups, And output the convolution result corresponding to each sub-convolution kernel group to the second data rearrangement unit; the control unit is also used to send the data rearrangement mode corresponding to each sub-convolution kernel group to The second data rearrangement unit; the second data rearrangement unit rearranges the convolution results corresponding to each sub-convolution kernel group according to the data rearrangement mode corresponding to each sub-convolution kernel group, Obtain the rearranged convolution results corresponding to each of the sub-convolution kernel groups, and output the rearranged convolution results corresponding to each of the sub-convolution kernel groups to the adding unit; wherein, each of the sub-volumes Convolving the part of the input data corresponding to each of the sub-convolution kernel groups by the product kernel group to obtain an effective convolution result corresponding to each of the sub-convolution kernel groups, after the rearrangement corresponding to each of the sub-convolution kernel groups The valid convolution results in the convolution results have the same data position, and the same data position is the valid position.

In order to achieve the above purpose, the embodiment of the present application also provides a neural network computing device, including: a first storage unit, a second storage unit, a third storage unit, a control unit, a third data rearrangement unit, a convolution unit and an addition unit unit; the first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the neural network operation The N Wk*Hk*C convolution kernels are split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into all Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution The kernel corresponds to part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the control unit is used for from The first storage unit obtains the input data, and inputs the input data into the convolution unit, and the control unit is further configured to obtain the i-th sub-convolution kernel group from the second storage unit, and The i-th sub-convolution kernel group is input to the convolution unit; the convolution unit is used to convolve the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result, and The i-th convolution result is output to the adding unit; wherein, the i-th sub-convolution kernel group is convoluted with the part of the input data corresponding to the i-th sub-convolution kernel group to obtain the i-th sub-convolution kernel group The effective convolution result corresponding to the convolution kernel group, the i-th convolution result includes the effective convolution result; the control unit is also used to obtain the (i-th) from the third storage unit 1) The accumulation result of the second time, and sending the (i-1)th time accumulation result to the third data rearrangement unit; the third data rearrangement unit is used for the (i-1)th time The accumulation result is rearranged, so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the ith convolution result have the same data position; And output the (i-1)th accumulation result after the rearrangement to the adding unit; accumulate the (i-1)th accumulation result after the rearrangement with the ith convolution result , to obtain the i-th accumulation result, and store the i-th accumulation result in the third storage unit and cover the (i-1)-th accumulation result; the control unit is also used to judge the i Numerical size, if i is less than Wk*Hk, update i to i+1, and perform the operation steps again; if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as The output result of the neural network operation; wherein, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the 0th accumulation result after the rearrangement is valid The convolution result and the effective convolution result in the first convolution result are assumed to have the same data location by default.

To achieve the above purpose, an embodiment of the present application further provides a chip, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information that can be used by the at least one processor. An instruction executed by a processor, the instruction is executed by the at least one processor, so that the at least one processor can execute the above-mentioned neural network operation method.

To achieve the above purpose, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the at least one processor An instruction executed by a processor, the instruction is executed by the at least one processor, so that the at least one processor can execute the above-mentioned neural network operation method.

To achieve the above purpose, the embodiment of the present application further provides a computer-readable storage medium storing a computer program, and the computer program implements the above-mentioned neural network operation method when executed by a processor.

The neural network operation method proposed by this application, in the process of neural network operation, obtains the input data of neural network operation and Wk*Hk sub-convolution kernel groups, and enters the operation step; wherein, the N Wk*Hk of neural network operation The *C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups , and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data , and in the case of N≥2, part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation step includes: according to the data rearrangement method corresponding to each sub-convolution kernel group, input Rearrange the data to obtain the rearranged input data corresponding to each sub-convolution kernel group; convolve each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group to obtain each sub-convolution kernel group Corresponding convolution results; accumulate the convolution results corresponding to each sub-convolution kernel group to obtain an accumulation result, and use the data in the effective position in the accumulation result as the output result of the neural network operation; wherein, each sub-convolution kernel group corresponds to Part of the input data corresponding to each sub-convolution kernel group in the rearranged input data has the same data position, and the same data position is a valid position. By splitting the convolution and reusing the input data, there is no need to rearrange the data by img2col in order to meet the scheduling and calculation. Since the input data is not converted to img2col, the calculation is directly performed on the original input data, thereby eliminating the img2col The hardware design overhead, the increase in the amount of data access, and the increase in dynamic power consumption are brought about.

Description of drawings

Fig. 1 is a schematic diagram of performing img2col processing on input data in the prior art;

Fig. 2 is a flow chart of the neural network operation method provided by the embodiment of the present application;

Fig. 3 is a schematic diagram of the input data provided by the embodiment of the present application;

FIG. 4 is a schematic diagram of a sub-convolution kernel group provided in an embodiment of the present application;

Fig. 5 is a schematic diagram of the effective position of the rearranged input data provided by the embodiment of the present application;

FIG. 6 is a schematic diagram of splitting input data provided by an embodiment of the present application;

Fig. 7 is a schematic diagram of performing convolution on input data in the prior art;

FIG. 8 is a schematic diagram of convolution of input data provided by an embodiment of the present application;

Fig. 9 is a flow chart of the neural network operation method method provided by the embodiment of the present application;

Fig. 10 is a flow chart of the neural network operation method provided by the embodiment of the present application;

Fig. 11 is a flow chart of the neural network operation method provided by the embodiment of the present application;

FIG. 12 is a schematic structural diagram of a neural network computing device provided in an embodiment of the present application;

FIG. 13 is a schematic structural diagram of a neural network computing device provided in an embodiment of the present application;

FIG. 14 is a schematic structural diagram of a neural network computing device provided in an embodiment of the present application;

FIG. 15 is a schematic structural diagram of a chip provided in an embodiment of the present application;

FIG. 16 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that in each embodiment of the application, many technical details are proposed in order to enable readers to better understand the application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in this application can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present application, and the embodiments can be combined and referred to each other on the premise of no contradiction.

An embodiment of the present application relates to a neural network operation method, as shown in Figure 2, including:

Step 101, obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1*1*C sub-convolution kernel, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1 *C sub-convolution kernel, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution Part of the input data corresponding to the N sub-convolution kernels in the kernel group is the same.

In an exemplary implementation, as shown in FIG. 3 , the acquired input data is the data input by W _input *H _input *C _input . When the C _input is 1, the input data is two-dimensional data. When the C _input is greater than 1, The input data is three-dimensional data.

In an example implementation, as shown in Figure 4, the obtained sub-convolution kernel group is obtained by splitting the convolution kernel, and the number of sub-convolution kernels contained in a sub-convolution kernel group is split by The number of convolution kernels is determined by the number of convolution kernels. For example, if there are 9 convolution kernels for splitting, each sub-convolution kernel group obtained contains 9 sub-convolution kernels; and the number of sub-convolution kernel groups It is determined by the width W and length H of the convolution kernel. For example, the split convolution kernel is 3*3 in size, and 3*3=9 sub-convolution kernel groups can be obtained after splitting.

In an exemplary implementation, each sub-convolution kernel corresponds to part of the input data in the input data, such as: the sub-convolution kernel 00 in FIG. 4 corresponds to 00 to 77 in the input data in FIG. 3 , and the sub-convolution kernel 01 in FIG. 4 Corresponding to 01 to 78 in the input data of Figure 3, the sub-convolution kernel 02 in Figure 4 corresponds to 02 to 79 in the input data of Figure 3, ..., the sub-convolution kernel 22 in Figure 4 corresponds to the input data in Figure 3 22 to 99; when a sub-convolution kernel group contains more than two sub-convolution kernels, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same, such as: Figure 4 Each sub-convolution kernel 00 in one sub-convolution kernel group corresponds to 00 to 77 in the input data in Figure 3, and each sub-convolution kernel 01 in the second sub-convolution kernel group in Figure 4 corresponds to the input in Figure 3 01 to 78 in the data, each sub-convolution kernel 02 in the third sub-convolution kernel group in Figure 4 corresponds to 02 to 79 in the input data in Figure 3, ..., the ninth sub-convolution kernel group in Figure 4 Each sub-convolution kernel 22 in corresponds to 22 to 99 in the input data of FIG. 3 .

Step 102, rearrange the input data according to the data rearrangement mode corresponding to each sub-convolution kernel group, and obtain the rearranged input data corresponding to each sub-convolution kernel group; combine each sub-convolution kernel group and each sub-convolution kernel group The rearranged input data corresponding to the kernel group is convolved to obtain the convolution result corresponding to each sub-convolution kernel group; the convolution results corresponding to each sub-convolution kernel group are accumulated to obtain the cumulative result, and the cumulative result is located in a valid position The data is used as the output result of the neural network operation; among them, the part of the input data corresponding to each sub-convolution kernel group in the rearranged input data corresponding to each sub-convolution kernel group has the same data position, and the same data position is an effective position .

In an example implementation, the operation process of each sub-convolution kernel group and input data is completed in the matrix operation unit of the neural network, each sub-convolution kernel group has its corresponding data rearrangement mode, and each sub-convolution kernel Before convolving the group with the input data, it is necessary to obtain the data rearrangement method corresponding to each sub-convolution kernel group, rearrange the input data according to the data rearrangement method corresponding to each sub-convolution kernel group, and obtain each sub-convolution kernel group The rearranged input data corresponding to the core group. After that, each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group are convoluted to obtain the convolution results corresponding to each sub-convolution kernel group, and then the convolution results corresponding to each sub-convolution kernel group are The accumulation results are accumulated to obtain the accumulation results, and the data in the effective position in the accumulation results is used as the output result of the neural network operation.

In an example implementation, among the rearranged input data corresponding to each sub-convolution kernel group, part of the input data corresponding to each sub-convolution kernel group has the same data position, and the same data position is an effective position. That is: the rearranged input data of each sub-convolution kernel group has the same data position as some of its input data, but the data on the same data position has been rearranged. The first sub-convolution kernel group 00 in Figure 4 The corresponding data rearrangement method is set so that the position of each part of the data in the input data remains unchanged, and the rearranged input data corresponding to the first sub-convolution kernel group 00 is shown in Figure 5; the second sub-volume in Figure 4 The data rearrangement method corresponding to the kernel group 01 is set to shift the data of each column of each part of the data in the input data forward by one column, and the rearranged input data corresponding to the second sub-convolution kernel group 01 is shown in Figure 5. By analogy, the data rearrangement mode corresponding to the Wk*Hkth sub-convolution kernel group 22 in FIG. 4 is set to shift the data of each column of each part of the data in the input data forward by two columns and each row by two columns upwards, The rearranged input data corresponding to the Wk*Hkth sub-convolution kernel group 22 is shown in FIG. 5 ; wherein, the valid position is the data in the solid line in FIG. 5 .

In an example implementation, in order to meet the efficient operation of the matrix operation unit, the input data bit width for the matrix operation needs to match the scale of the matrix operation. Assuming that the matrix operation module can output an M*N matrix in a single cycle, then the bandwidth of the input data M*W _i , wherein W _i is the bit width of a single data representation, for example, W _i =8 for INT8 precision, and W _i =16 for FP16 precision. Assume that C0 is the depth of the input data participating in a matrix operation, that is, the segmentation granularity of the input data in the depth direction, and the minimum is 1. C ₁ The total depth of input data C is the number of granularity segmentation based on C0. The storage order of the input data in the Buffer is C ₁ HWC0, as shown in Figure 6, the M*Wi bit width data will be divided into M*Wi/C0 groups, and each group of data is stored in a memory block. The bit width is Wi*C0, each group of data has a separate address management, and the data of W _i *C ₀ bit can be fetched from any position in a single cycle. The data rearrangement module rearranges the data read from each Buffer, and then sends it to the matrix operation module for processing.

In an exemplary implementation, if the input data is stored in the memory block, the instruction input to the matrix operation unit only needs to specify the start address of the input data, the start address of the weight data, and the addresses of the two matrices participating in the matrix operation Size, that is, the size of m, n, and k. Assuming that the matrix unit supports the minimum specification of M*K and K*N matrix operations, the control module will automatically obtain M*K data from storage unit 1 every cycle, from The storage unit 2 takes K*N data and loads them into the matrix operation unit for matrix operation. For the two data matrices of m*k and k*n, the control unit will automatically segment and perform calculations, so m, n, k The value must be an integer multiple of M, N, and K. If it is not necessary to complete it when inputting, that is, when participating in matrix operations, the values of m, n, and k satisfy the condition that they are integer multiples of M, N, and K .

In an example implementation, what Fig. 7 shows is the convolution method used in the prior art, Fig. 8 is the convolution process used in the present application, the input source data of each sub-convolution kernel group is the same, and No data bloat occurs. The convolution calculation process is basically the same as the traditional convolution calculation process. The input data and weight data are loaded first. For a single convolution block, the input data is loaded at one time, and the weight data during convolution can be calculated according to Figure 8. The method shown above is divided into batches for loading, but it can also be loaded to the cache unit of the weight at one time. In order to meet better parallelism, input data and weight data can be loaded to their respective storage units at the same time. After the input data and weights are loaded, multiple matrix operations are performed according to the splitting rules described in Figure 8. There is no need to change the input data in the middle, and only need to specify different starting positions. The input data is highly multiplexed. For convolution For the data, the weight data of different convolutions are taken each time to participate in the matrix operation.

In an exemplary implementation, assuming that the depth C shown in FIG. 8 is 16, then for the input data of 10*10*16 shown in FIG. 8, the convolution kernel shown in FIG. 4 is 3*3*16 (split For each sub-convolution kernel group of 1*1*16 in Figure 8), it will increase the invalid calculation of the output of 2 columns (that is, the part shown by the dotted line in Figure 5), assuming that the efficiency of the matrix calculation itself is 100%, then for 10 *10*16 input data, the efficiency under the 3*3*16 convolution kernel is 8*8/(8*10)=80%. In general, it is difficult for a general-purpose neural network accelerator to achieve an efficiency greater than 50% in single-image mode, so this calculation mode will not have much impact on the entire network calculation efficiency, but the input data access is only in img2col mode (10*10)/(9*8*8)=17.3%. For the case where the input size is greater than 10*10, the proportion of invalid calculation will be lower, and the proportion of invalid calculation is (Wk-1)/W. If the input size is too small, it can be calculated in batches or img2col. In this case, the input data is usually not the bottleneck, and the generation of img2col data will not affect system performance.

In an exemplary implementation, for a sub-convolution kernel group containing N sub-convolution kernels, the N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with the rearranged input data corresponding to the sub-convolution kernel group , N sub-convolution results are obtained, and the N sub-convolution results are respectively used as the Nth layer data of the convolution results corresponding to the sub-convolution kernel group.

In the embodiment of the present application, in the process of neural network operation, the input data of neural network operation and Wk*Hk sub-convolution kernel groups are obtained, and enter the operation step; wherein, the N Wk*Hk*C convolutions of neural network operation The kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel The convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in N In the case of ≥2, part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation step includes: rearranging the input data according to the data rearrangement method corresponding to each sub-convolution kernel group , to obtain the rearranged input data corresponding to each sub-convolution kernel group; convolve each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group to obtain the convolution corresponding to each sub-convolution kernel group Result; the convolution results corresponding to each sub-convolution kernel group are accumulated to obtain the accumulation result, and the data in the effective position in the accumulation result is used as the output result of the neural network operation; wherein, the corresponding rearrangement of each sub-convolution kernel group Part of the input data corresponding to each sub-convolution kernel group in the input data has the same data position, and the same data position is a valid position. By splitting the convolution and reusing the input data, there is no need to rearrange the data by img2col in order to meet the scheduling and calculation. Since the input data is not converted to img2col, the calculation is directly performed on the original input data, thereby eliminating the img2col The hardware design overhead, the increase in the amount of data access, and the increase in dynamic power consumption are brought about.

An embodiment of the present application relates to the operation steps of a neural network operation method, as shown in FIG. 9 , including:

Step 201, rearrange the input data according to the data rearrangement mode corresponding to the i-th sub-convolution kernel group, and obtain the i-th rearranged input data.

In an example implementation, after the application obtains the input data, the sub-convolution kernel group is obtained again, but when the sub-convolution kernel group is obtained, Wk*Hk sub-convolution kernel groups are not obtained all at once; Instead, first obtain the first sub-convolution kernel group, after the first sub-convolution kernel group performs all the operation steps, obtain the second sub-convolution kernel group, and so on, until the Wk*Hkth sub-convolution kernel group is obtained Convolution kernel group.

In an exemplary implementation, for each obtained i-th sub-convolution kernel group, first determine the i-th data rearrangement method corresponding to the i-th sub-convolution kernel group, and then according to the i-th data rearrangement method Rearrange the input data to obtain the i-th rearranged input data; wherein, the value of i is from 1 to Wk*Hk.

In an example implementation, the loading method of the i-th sub-convolution kernel group is a data coverage loading method, such as: when loading the second sub-convolution kernel group, cover the first sub-convolution kernel group with the second sub-convolution kernel group , so as to reduce the memory occupied by storing each sub-convolution kernel group.

In an exemplary implementation, when the value of i is 1, the data rearrangement mode corresponding to the first sub-convolution kernel group is set so that the position of each part of the data in the input data remains unchanged.

Step 202, convolving the i-th sub-convolution kernel group with the i-th rearranged input data to obtain the i-th convolution result.

In an exemplary implementation, after the i-th rearranged input data is obtained, the i-th sub-convolution kernel group is convolved with the i-th rearranged input data to obtain the i-th convolution result.

In an exemplary implementation, when X matrix operation units are included in the neural network (taking 3 as an example), the 9 sub-convolution kernel groups in Fig. 4 can be divided into 3 groups, and the 1st- The 3 sub-convolution kernel groups are respectively input to 3 matrix operation units for operation, the 4th-6th sub-convolution kernel groups are respectively input to 3 matrix operation units for operation in the second time, and the 7th-9th sub-convolution kernel groups are respectively input to 3 matrix operation units for operation in the third time The convolution kernel groups are respectively input to three matrix operation units for operation, and the corresponding convolution results are obtained.

Step 203: Accumulate the i-th convolution result and the (i-1)-th accumulation result to obtain the i-th accumulation result.

In an exemplary implementation, the generated i-th convolution result is accumulated with the previous (i-1)-th accumulation result to obtain the i-th accumulation result.

In an example implementation, when the value of i is 1, the 0th accumulation result corresponding to the 1st sub-convolution kernel group is set to zero.

In an exemplary implementation, when the neural network includes Y addition units (taking 3 as an example), the 9 convolution results corresponding to the 9 sub-convolution kernel groups in Figure 4 can be divided into 3 groups, Input the 1st-3rd convolution result into the 1st addition unit to calculate the 1st-3rd convolution accumulation result, input the 4th-6th convolution result into the 2nd addition unit to calculate the 4th- 6 convolution accumulation results, input the 7th to 8th convolution results into the third addition unit to calculate the 7th to 8th convolution accumulation results, and then add the 1st to 3rd convolution accumulation results , the 4th to 6th convolution accumulation results and the 7th to 8th convolution accumulation results are input to any addition unit to calculate and obtain the accumulation results of 9 convolution results.

Step 204, if i is less than Wk*Hk, update i to i+1, and execute the operation step again; if i is equal to Wk*Hk, use the data at the effective position in the i-th accumulation result as the output result of the neural network operation .

In an example implementation, after the operation of the ith sub-convolution kernel group is completed, it is necessary to judge the size of the i value, and when the i value is less than Wk*Hk, update the i value to i+1, and load the i+1th Sub-convolution kernel group, execute the operation step again; when the value of i is equal to Wk*Hk, the operation step ends at this time, and the data at the effective position in the i-th accumulation result will be used as the output result of the neural network operation.

In the embodiment of the present application, on the basis of other embodiments, the convolution operation of the sub-convolution kernel group and the input data can also be performed in a serial or parallel manner, so that the neural network operation method mentioned in the application can Applied to various types of neural networks.

An embodiment of the present application relates to a neural network operation method, as shown in FIG. 10 , including:

Step 301, obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1*1*C sub-convolution kernel, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1 *C sub-convolution kernel, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution Part of the input data corresponding to the N sub-convolution kernels in the kernel group is the same.

In an exemplary implementation, this step is substantially the same as step 101 in the embodiment of the present application, and details are not repeated here.

Step 302, respectively convolving each sub-convolution kernel group with the input data to obtain the convolution result corresponding to each sub-convolution kernel group; corresponding to each sub-convolution kernel group according to the data rearrangement mode corresponding to each sub-convolution kernel group The convolution results are rearranged to obtain the rearranged convolution results corresponding to each sub-convolution kernel group; the rearranged convolution results corresponding to each sub-convolution kernel group are accumulated to obtain the cumulative result, and the accumulated results are located in The data at the effective position is taken as the output result of the neural network operation; among them, each sub-convolution kernel group is convoluted with part of the input data corresponding to each sub-convolution kernel group to obtain an effective convolution result corresponding to each sub-convolution kernel group, and each sub-convolution kernel group The valid convolution results in the rearranged convolution results corresponding to the convolution kernel group have the same data position, and the same data position is an effective position.

In an example implementation, each sub-convolution kernel group is firstly convolved with the input data to obtain the convolution result corresponding to each sub-convolution kernel group, and then the data rearrangement method corresponding to each sub-convolution kernel group is obtained , rearrange the convolution results corresponding to each sub-convolution kernel group according to the data rearrangement method corresponding to each sub-convolution kernel group, and obtain the rearranged convolution results corresponding to each sub-convolution kernel group; The rearranged convolution results corresponding to the convolution kernel group are accumulated to obtain an accumulation result, and the data in the effective position in the accumulation result is used as the output result of the neural network operation.

In an exemplary implementation, for a sub-convolution kernel group containing N sub-convolution kernels, the N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with the rearranged input data corresponding to the sub-convolution kernel group , N sub-convolution results are obtained, and the N sub-convolution results are respectively used as the Nth layer data of the convolution results corresponding to the sub-convolution kernel group. In an exemplary implementation, the data rearrangement manners corresponding to each sub-convolution kernel group are substantially the same as the data rearrangement manners mentioned in step 102 of the embodiment of the present application, and will not be repeated here.

In an exemplary implementation, both the serial operations and parallel operations mentioned in steps 201 to 204 can be applied in the embodiment of the present application; first, a sub-convolution kernel group is convolved, rearranged and accumulated until the The output result of the neural network operation can be obtained by the convolution and rearrangement of the last sub-convolution kernel group and accumulation; the operation steps can also be performed by multiple matrix operation units and or multiple addition units.

In the embodiment of the present application, on the basis of other embodiments, each sub-convolution kernel group and input data may be convolved first, and then the convolution results of each sub-convolution kernel group and input data may be rearranged, Operations such as accumulation make this application have specific regulations on the order of convolution and rearrangement, which improves the applicability of this application.

An embodiment of the present application relates to a neural network computing method, as shown in FIG. 11 , including:

Step 401, obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1*1*C sub-convolution kernel, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1 *C sub-convolution kernel, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution Part of the input data corresponding to the N sub-convolution kernels in the kernel group is the same.

Step 402, convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolution kernel group is convoluted with the part of the input data corresponding to the i-th sub-convolution kernel group to obtain The effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result contains the effective convolution result.

In an example implementation, the ith sub-convolution kernel group is convolved with the input data to obtain the i-th convolution result; and in the process of convolving the i-th sub-convolution kernel group with the input data, the i-th The convolution result obtained by convolving the i sub-convolution kernel group with the part of the input data corresponding to the i-th sub-convolution kernel group is called an effective convolution result, that is, the i-th convolution result contains an effective convolution result.

In an example implementation, the Wk*Hk sub-convolution kernel groups can be loaded in order, and when the i-th sub-convolution kernel group is loaded, it is loaded in a data coverage manner, that is: use the i-th sub-convolution kernel group to cover the first i-1 sub-convolution kernel groups.

Step 403, rearrange the (i-1)th accumulation result, so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the i-th convolution result have same data location.

In an exemplary implementation, the (i-1)th accumulation result is rearranged according to the data rearrangement method corresponding to the i-th sub-convolution kernel group, so that in the rearranged (i-1)th accumulation result The effective convolution result and the effective convolution result in the i-th convolution result have the same data location.

Step 404: Accumulate the rearranged (i-1)th accumulation result and the i-th convolution result to obtain the i-th accumulation result.

In an exemplary implementation, the rearranged (i-1)th accumulation result is accumulated with the i-th convolution result to obtain the i-th accumulation result.

In an exemplary implementation, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result and the 1st convolution result in the 0th accumulation result after rearrangement Valid convolution results are assumed to have the same data position in the result by default.

Step 405, if i is less than Wk*Hk, update i to i+1, and execute the operation step again; if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as the output result of the neural network operation .

The step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

Another embodiment of the present application relates to a neural network computing device. The details of the neural network computing device in this embodiment are described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this embodiment. 12 is a schematic diagram of the neural network operation device described in this embodiment, including: a first storage unit 1201, a second storage unit 1202, a control unit 1203, a first data rearrangement unit 1204, a convolution unit 1205, and an addition unit 1206;

Wherein, the first storage unit is used to store the input data of the neural network operation;

The second storage unit is used to store Wk*Hk sub-convolution kernel groups for neural network operations; wherein, the N Wk*Hk*C convolution kernels for neural network operations are split to obtain N*Wk*Hk 1*1* C sub-convolution kernels, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-volumes Product kernel, N, Wk, Hk, C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution kernel group Part of the input data corresponding to the N sub-convolution kernels is the same;

The control unit is used to obtain input data from the first storage unit, and input the input data to the first data rearrangement unit, and the control unit is also used to send the data rearrangement mode corresponding to each sub-convolution kernel group to the first data rearrangement unit. row unit;

The first data rearrangement unit is used to rearrange the input data according to the data rearrangement mode corresponding to each sub-convolution kernel group, obtain the rearranged input data corresponding to each sub-convolution kernel group, and convert each sub-convolution The rearranged input data corresponding to the kernel group is output to the convolution unit; wherein, in the rearranged input data corresponding to each sub-convolution kernel group, part of the input data corresponding to each sub-convolution kernel group has the same data position, the same The data location of is a valid location;

The control unit is also used to obtain each sub-convolution kernel group from the second storage unit, and send each sub-convolution kernel group to the convolution unit;

The convolution unit is used to convolve each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group, and to convolve each sub-convolution kernel group The corresponding convolution result is output to the addition unit;

The addition unit is used to accumulate the convolution results corresponding to each sub-convolution kernel group to obtain an accumulation result, and use the data in the effective position in the accumulation result as the output result of the neural network operation.

In an exemplary implementation, the neural network computing device provided by the present application further includes a third storage unit, configured to store the operation result of the last sub-convolution kernel group and the input data operation when the neural network computing process is a serial operation .

Another embodiment of the present application relates to a neural network computing device. The details of the neural network computing device in this embodiment are described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this embodiment. 13 is a schematic diagram of the neural network computing device in this embodiment, including: a first storage unit 1301 , a second storage unit 1302 , a control unit 1303 , a second data rearrangement unit 1404 , a convolution unit 1405 and an addition unit 1406 .

Wherein, the first storage unit is used for storing the input data of the neural network operation.

The control unit is used to obtain input data from the first storage unit, and input the input data to the convolution unit, and the control unit is also used to obtain each sub-convolution kernel group from the second storage unit, and input each sub-convolution kernel group convolution unit;

The convolution unit is used to convolve each sub-convolution kernel group and the input data respectively to obtain the convolution result corresponding to each sub-convolution kernel group, and output the convolution result corresponding to each sub-convolution kernel group to the second data rearrangement unit;

The control unit is also used to send the data rearrangement mode corresponding to each sub-convolution kernel group to the second data rearrangement unit;

The second data rearrangement unit is used to rearrange the convolution results corresponding to each sub-convolution kernel group according to the data rearrangement mode corresponding to each sub-convolution kernel group, and obtain the rearranged results corresponding to each sub-convolution kernel group. Convolution results, and output the rearranged convolution results corresponding to each sub-convolution kernel group to the addition unit; wherein, each sub-convolution kernel group is convoluted with part of the input data corresponding to each sub-convolution kernel group to obtain each sub-convolution kernel group The effective convolution results corresponding to the convolution kernel group, and the effective convolution results in the rearranged convolution results corresponding to each sub-convolution kernel group have the same data position, and the same data position is an effective position.

Another embodiment of the present application relates to a neural network computing device. The details of the neural network computing device in this embodiment are described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this embodiment. 14 is a schematic diagram of the neural network operation device described in this embodiment, including: a first storage unit 1401, a second storage unit 1402, a control unit 1403, a third data rearrangement unit 1404, a convolution unit 1405, and an addition unit 1406;

The control unit is used to obtain input data from the first storage unit, and input the input data to the convolution unit, and the control unit is also used to obtain the i-th sub-convolution kernel group from the second storage unit, and input the i-th sub-convolution kernel Group input convolution unit;

The convolution unit is used to convolve the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result, and output the i-th convolution result to the addition unit; wherein, the i-th sub-convolution kernel group Part of the input data corresponding to the i-th sub-convolution kernel group is convoluted to obtain an effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result contains an effective convolution result;

The control unit is also used to obtain the (i-1)th accumulation result from the third storage unit, and send the (i-1)th accumulation result to the third data rearrangement unit;

The third data rearrangement unit is used to rearrange the (i-1)th accumulation result, so that the effective convolution result and the i-th convolution result in the rearranged (i-1)th accumulation result The effective convolution results in have the same data position; and output the rearranged (i-1)th accumulation result to the addition unit; combine the rearranged (i-1)th accumulation result with the i-th volume Accumulate the accumulation results, obtain the i-th accumulation result, and store the i-th accumulation result in the third storage unit and cover the (i-1)-th accumulation result;

The control unit is also used to judge the value of i, if i is less than Wk*Hk, update i to i+1, and execute the operation step again; if i is equal to Wk*Hk, add the effective volume in the i-th accumulation result The result of the accumulation is taken as the output result of the neural network operation; where, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result in the 0th accumulation result after rearrangement The valid convolution result is assumed to have the same data position as the first convolution result by default.

It is not difficult to find that this embodiment is a system embodiment corresponding to the above method embodiment, and this embodiment can be implemented in cooperation with the above method embodiment. The relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.

It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present application, units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.

Another embodiment of the present application relates to a chip, as shown in FIG. 6 , including: at least one processor 601; and a memory 602 communicatively connected to the at least one processor 601; Instructions executed by the at least one processor 601, the instructions are executed by the at least one processor 601, so that the at least one processor 601 can execute the neural network computing methods in the foregoing embodiments.

Wherein, the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.

Another embodiment of the present application relates to an electronic device, as shown in FIG. 6 , including: at least one processor 601; and a memory 602 communicatively connected to the at least one processor 601; wherein, the memory 602 stores Instructions that can be executed by the at least one processor 601, the instructions are executed by the at least one processor 601, so that the at least one processor 601 can execute the neural network computing methods in the foregoing embodiments.

Wherein, the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.

Another embodiment of the present application relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.

That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disc, etc. can store program codes. medium.

Those of ordinary skill in the art can understand that the above-mentioned implementation modes are specific examples for realizing the present application, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present application. scope.

Claims

A neural network operation method, comprising:

Obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1 *1*C sub-convolution kernels, the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernel, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2 , the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;

The operation steps include:

Rearrange the input data according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, and obtain the rearranged input data corresponding to each of the sub-convolution kernel groups; Convolving the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain convolution results corresponding to each of the sub-convolution kernel groups; accumulating the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulated result, and use the data in the effective position in the accumulation result as the output result of the neural network operation;

Wherein, in the rearranged input data corresponding to each of the sub-convolution kernel groups, the part of the input data corresponding to each of the sub-convolution kernel groups has the same data position, and the same data position is the effective Location.
The neural network computing method according to claim 1, wherein said computing step comprises:

rearrange the input data according to the data rearrangement mode corresponding to the i-th sub-convolution kernel group, and obtain the i-th rearranged input data;

Convolving the i-th sub-convolution kernel group with the i-th rearranged input data to obtain the i-th convolution result;

accumulating the ith convolution result and the (i-1)th accumulation result to obtain the ith accumulation result;

If i is less than Wk*Hk, update i to i+1, and perform the operation step again; if i is equal to Wk*Hk, use the data at the effective position in the ith accumulation result as the nerve The output of the network operation;

Wherein, the initial value of i is 1, and when i=1, the data rearrangement method corresponding to the first sub-convolution kernel group is set so that the position of each part of the data in the input data remains unchanged, and the 0th accumulation result is set to zero.
The neural network operation method according to claim 2, wherein said acquisition of input data of neural network operation and Wk*Hk sub-convolution kernel groups includes:

load said input data;

Before the input data is rearranged according to the data rearrangement manner corresponding to the ith sub-convolution kernel group to obtain the i-th rearranged input data, the i-th sub-convolution kernel group is loaded.
The neural network operation method according to claim 3, wherein the loading of the ith sub-convolution kernel group is specifically: loading the i-th sub-convolution kernel group in a data coverage manner.
The neural network operation method according to any one of claims 1 to 4, wherein, in the case of N≥2, each of the sub-convolution kernel groups corresponding to each of the sub-convolution kernel groups After the rearrangement, the input data is convoluted to obtain the convolution results corresponding to each of the sub-convolution kernel groups, specifically:

For each sub-convolution kernel group, the N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with the rearranged input data corresponding to the sub-convolution kernel group to obtain N sub-convolution results, and the The N sub-convolution results are used as N-layer data in the convolution results corresponding to the sub-convolution kernel groups.
A neural network operation method, comprising:

Obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1 *1*C sub-convolution kernels, the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernel; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2 , the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;

The operation steps include:

Each of the sub-convolution kernel groups and the input data are respectively convoluted to obtain convolution results corresponding to each of the sub-convolution kernel groups; The convolution results corresponding to the sub-convolution kernel groups are rearranged to obtain the rearranged convolution results corresponding to each of the sub-convolution kernel groups; the rearranged convolution results corresponding to each of the sub-convolution kernel groups are Accumulate to obtain an accumulation result, and use the data in the effective position in the accumulation result as the output result of the neural network operation;

Wherein, each of the sub-convolution kernel groups is convolved with the part of the input data corresponding to each of the sub-convolution kernel groups to obtain an effective convolution result corresponding to each of the sub-convolution kernel groups, and each of the sub-convolution kernel groups The effective convolution results in the rearranged convolution results corresponding to the core group have the same data position, and the same data position is the effective position.
The neural network computing method according to claim 6, wherein said computing step comprises:

Convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result;

rearrange the ith convolution result according to the data rearrangement mode corresponding to the i-th convolution result, and obtain the i-th rearranged convolution result;

accumulating the ith rearranged convolution result and the (i-1)th accumulation result to obtain the ith accumulation result;

If i is less than Wk*Hk, update i to i+1, and perform the operation step again; if i is equal to Wk*Hk, use the data at the effective position in the ith accumulation result as the nerve The output of the network operation;

Wherein, the initial value of i is 1, and when i=1, the data rearrangement method corresponding to the first convolution result is set so that the position of each part of the convolution result in the first convolution result remains unchanged, The 0th accumulation result is set to zero.
The neural network operation method according to claim 7, wherein said acquisition of input data of neural network operation and Wk*Hk sub-convolution kernel groups includes:

load said input data;

Before the i-th sub-convolution kernel group is convolved with the input data to obtain the i-th convolution result, the i-th sub-convolution kernel group is loaded.
The neural network operation method according to claim 8, wherein the loading of the ith sub-convolution kernel group is specifically: loading the i-th sub-convolution kernel group in a data coverage manner.
According to the neural network operation method described in any one of claims 6 to 9, wherein, in the case of N≥2, the convolution of each of the sub-convolution kernel groups and the input data is obtained by each of the sub-convolution kernel groups. The convolution result corresponding to the sub-convolution kernel group is as follows:

For each sub-convolution kernel group, the N sub-convolution kernels in the sub-convolution kernel group are respectively convoluted with the input data to obtain N sub-convolution results, and the N sub-convolution results are used as the sub-convolution results. The N-layer data in the convolution result corresponding to the convolution kernel group.
A neural network operation method, comprising:

Obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups; and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1s *1*C sub-convolution kernels, the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernel; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2 , the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;

The operation steps include:

Convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolution kernel group corresponds to the part corresponding to the i-th sub-convolution kernel group The input data is convolved to obtain an effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result includes the effective convolution result;

The (i-1)th accumulation result is rearranged so that the effective convolution result in the rearranged (i-1)th accumulation result and the i-th convolution result Valid convolution results have the same data location;

accumulating the (i-1)th accumulation result after the rearrangement with the i-th convolution result to obtain the i-th accumulation result;

If i is less than Wk*Hk, update i to i+1, and perform the operation step again; if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as the neuron The output of the network operation;

Wherein, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result and the 1st accumulation result in the rearranged 0th accumulation result Valid convolution results described in Convolution Results are assumed to have the same data location by default.
The neural network operation method according to claim 11, wherein said obtaining the input data of the neural network operation and Wk*Hk sub-convolution kernel groups includes:

load said input data;

Before the i-th sub-convolution kernel group is convolved with the input data to obtain the i-th convolution result, the i-th sub-convolution kernel group is loaded.
The neural network operation method according to claim 12, wherein the loading of the ith sub-convolution kernel group is specifically: loading the i-th sub-convolution kernel group in a data coverage manner.
The neural network operation method according to any one of claims 11 to 13, wherein, in the case of N≥2, said convolving the i-th sub-convolution kernel group with the input data to obtain the i-th Convolution results, specifically:

Convolving the N sub-convolution kernels in the i-th sub-convolution kernel group with the input data respectively to obtain N sub-convolution results, and the N sub-convolution results are used as the i-th sub-convolution kernel Groups correspond to N layers of data in the convolution result.
A neural network computing device, comprising: a first storage unit, a second storage unit, a control unit, a first data rearrangement unit, a convolution unit, and an addition unit;

The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N of the neural network operations The Wk*Hk*C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk *Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to Part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;

The control unit is used to obtain the input data from the first storage unit, and input the input data to the first data rearrangement unit, and the control unit is also used to transfer each of the sub-convolution kernels sending the data rearrangement mode corresponding to the group to the first data rearrangement unit;

The first data rearrangement unit is used to rearrange the input data according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, to obtain rearranged input data corresponding to each of the sub-convolution kernel groups , and output the rearranged input data corresponding to each of the sub-convolution kernel groups to the convolution unit; The part of input data corresponding to the accumulation group has the same data position, and the same data position is the effective position;

The control unit is further configured to obtain each of the sub-convolution kernel groups from the second storage unit, and send each of the sub-convolution kernel groups to the convolution unit;

The convolution unit is used to convolve each of the sub-convolution kernel groups and the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain a convolution result corresponding to each of the sub-convolution kernel groups, and Outputting the convolution results corresponding to each of the sub-convolution kernel groups to the adding unit;

The adding unit is used for accumulating the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulation result, and taking the data at a valid position in the accumulation result as the output result of the neural network operation.
A neural network computing device, comprising: a first storage unit, a second storage unit, a control unit, a second data rearrangement unit, a convolution unit, and an addition unit;

The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N of the neural network operations The Wk*Hk*C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk *Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to Part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;

The control unit is used to obtain the input data from the first storage unit, and input the input data to the convolution unit, and the control unit is also used to obtain each of the input data from the second storage unit. A sub-convolution kernel group, and each of the sub-convolution kernel groups is input to the convolution unit;

The convolution unit is used to convolve each of the sub-convolution kernel groups and the input data respectively to obtain convolution results corresponding to each of the sub-convolution kernel groups, and to correspond to each of the sub-convolution kernel groups. The convolution result is output to the second data rearrangement unit;

The control unit is further configured to send the data rearrangement mode corresponding to each sub-convolution kernel group to the second data rearrangement unit;

The second data rearrangement unit rearranges the convolution results corresponding to each of the sub-convolution kernel groups according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, to obtain each of the sub-convolution kernel groups The corresponding rearranged convolution results, and output the rearranged convolution results corresponding to each of the sub-convolution kernel groups to the adding unit;

Wherein, each of the sub-convolution kernel groups is convolved with the part of the input data corresponding to each of the sub-convolution kernel groups to obtain an effective convolution result corresponding to each of the sub-convolution kernel groups, and each of the sub-convolution kernel groups The effective convolution results in the rearranged convolution results corresponding to the core group have the same data position, and the same data position is the effective position.
A neural network computing device, comprising: a first storage unit, a second storage unit, a third storage unit, a control unit, a third data rearrangement unit, a convolution unit, and an addition unit;

The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N of the neural network operations The Wk*Hk*C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk *Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to Part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;

The control unit is used to obtain the input data from the first storage unit, and input the input data to the convolution unit, and the control unit is also used to obtain the i-th sub A convolution kernel group, and input the ith sub-convolution kernel group into the convolution unit;

The convolution unit is used to convolve the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result, and output the i-th convolution result to the addition unit; wherein, The i-th sub-convolution kernel group is convolved with the part of the input data corresponding to the i-th sub-convolution kernel group to obtain an effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th The convolution result includes the effective convolution result;

The control unit is further configured to obtain the (i-1)th accumulation result from the third storage unit, and send the (i-1)th accumulation result to the third data rearrangement unit;

The third data rearrangement unit is used to rearrange the (i-1)th accumulation result, so that the effective convolution result and the The effective convolution result in the ith convolution result has the same data position; and the rearranged (i-1)th accumulation result is output to the addition unit;

accumulating the (i-1)th accumulation result after the rearrangement with the i-th convolution result to obtain the i-th accumulation result, and storing the i-th accumulation result in the third storage in the unit and cover the (i-1)th accumulation result;

The control unit is also used to judge the value of i, if i is less than Wk*Hk, update i to i+1, and execute the operation step again; if i is equal to Wk*Hk, accumulate the ith The effective convolution result in the result is used as the output result of the neural network operation;

Wherein, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result and the 1st accumulation result in the rearranged 0th accumulation result Valid convolution results described in Convolution Results are assumed to have the same data location by default.
A chip comprising:

at least one processing module; and,

A storage module communicatively connected to the at least one processing module; wherein,

The storage module stores instructions executable by the at least one processing module, and the instructions are executed by the at least one processing module, so that the at least one processing module can perform any one of claims 1 to 5 The method, or performing the method according to any one of claims 6 to 10, or performing the method according to any one of claims 11 to 14.
An electronic device comprising:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform the operations described in any one of claims 1 to 5. described method, or perform a method as described in any one of claims 6 to 10, or perform a method as described in any one of claims 11 to 14.
A computer-readable storage medium storing a computer program, wherein, when the computer program is executed by a processor, the method according to any one of claims 1 to 5 is implemented, or the method according to any one of claims 6 to 10 is executed. A method according to one, or performing a method according to any one of claims 11 to 14.