CN110826685A

CN110826685A - Method and device for convolution calculation of neural network

Info

Publication number: CN110826685A
Application number: CN201810898766.4A
Authority: CN
Inventors: 郭鑫; 董晓文; 李怀洲; 林芃
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-08-08
Filing date: 2018-08-08
Publication date: 2020-02-21

Abstract

The application provides a method and a device for convolution operation of a neural network, wherein the ith convolution layer of the neural network comprises M weight channels, the M weight channels are divided into N weight channel groups, and the method comprises the following steps: acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolution layer; inputting data into the ith convolution layer, carrying out convolution operation on the data input into each data channel, and summing convolution operation results of each data channel; carrying out inverse quantization calculation on the summed result according to the quantization coefficient corresponding to each channel in the nth weight channel group; and adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain the calculation result corresponding to the nth weight channel group. The method provided by the application can reduce the data volume input to the convolutional neural network and the data volume of the intermediate data each time, and reduce the requirements on hardware equipment.

Description

Method and device for convolution calculation of neural network

Technical Field

The present application relates to the field of neural networks, and more particularly, to a method and an apparatus for performing convolution calculation in a neural network.

Background

The deep convolutional neural network has hundreds or even tens of millions of parameters after training is completed, for example, weight parameters and bias parameters included in convolutional neural network model parameters. At present, in the calculation process of the convolutional neural network, all data to be calculated are input into a convolutional layer, and then calculation is performed layer by layer. Because the number of channels of the conventional convolutional neural network is more and more, if all data channels are input simultaneously, more parameters and larger data volume are caused, and a large amount of storage and calculation resources are consumed in the whole convolutional calculation process. With the development of deep neural networks, more memory and computing resources are consumed, and thus, the deep neural networks are difficult to transplant into a mobile phone end or a memory chip. Even if the calculation result is transmitted to the mobile phone end or the embedded chip through the network, the high occupancy rate of the bandwidth is often a difficult problem for engineering implementation.

At present, in the calculation process of the convolutional neural network, because the calculation capability of hardware equipment (for example, a computer device) is limited, and in the calculation process of the neural network, because there are more input data to be calculated and more weight channels of each convolutional layer, many intermediate results are generated, which leads to an excessively high requirement on the hardware equipment. For some hardware devices with insufficient performance, excessive input data and generated intermediate data may cause data overflow, resulting in calculation errors.

Disclosure of Invention

The application provides a method and a device for convolution operation of a neural network, which can reduce the data volume input to the convolution neural network and the data volume of generated intermediate data each time and reduce the requirement on hardware equipment for convolution calculation.

In a first aspect, a method of convolutional operation of a neural network is provided, where the method is performed by a computer device, an i-th convolutional layer of the neural network includes M weight channels, the M weight channels are divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group has the same quantization coefficient, the M is greater than or equal to 2, the N is less than or equal to M, and the i-th convolutional layer is any convolutional layer of the neural network, and the method includes: acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolution layer; inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel; carrying out inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group; adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain a calculation result corresponding to the nth weight channel group; and calculating the calculation result corresponding to the (n + 1) th weight channel group until all the data to be input into the ith convolution layer are calculated.

According to the neural network convolution operation method, the coefficient matrixes of the weight channels of the convolution neural network are grouped and quantized, and corresponding input data are received in a grouping mode, so that the data volume input to the convolution neural network each time can be reduced, the calculation amount of convolution calculation and the data volume of obtained intermediate data are reduced, and the requirements on hardware equipment of convolution calculation are lowered.

In a possible implementation manner of the first aspect, the method further includes: taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer; determining a value domain interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value domain intervals in the computer device, wherein each value domain interval represents a continuous segment of data, and the data of each value domain interval are not overlapped; dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group; and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.

In a possible implementation manner of the first aspect, the dividing the M weight channels into N groups according to the determined value range interval includes: dividing the determined weight channels with the same value domain interval into a group to obtain L groups of weight channels; and dividing the L groups of weight channels into N groups according to the size sequence of the value range interval. In this implementation of the method of the present invention,

in a possible implementation manner of the first aspect, the dividing the L groups of weight channels into N groups according to the size order of the value range interval includes: determining a first N-1 group of the L groups as a first N-1 group of the N groups; and determining the groups except the first N-1 groups in the L groups as the Nth group of the N groups.

In a second aspect, a device for convolutional operation of a neural network is provided, where an ith convolutional layer of the neural network includes M weight channels, the M weight channels are divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group has the same quantization coefficient, the M is greater than or equal to 2, the N is less than or equal to M, the ith convolutional layer is any convolutional layer of the neural network, and the device includes a transmission module and a calculation module:

the transmission module is used for: acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolution layer;

inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel;

the calculation module is configured to: carrying out inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group;

adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain a calculation result corresponding to the nth weight channel group;

and calculating the calculation result corresponding to the (n + 1) th weight channel group until all the data to be input into the ith convolution layer are calculated.

According to the device provided by the embodiment of the application, the coefficient matrix of the weight channel of the convolutional neural network is subjected to grouping quantization, and corresponding input data is received in a grouping mode, so that the data volume input to the convolutional neural network every time can be reduced, the calculation amount of convolutional calculation and the data volume of obtained intermediate data are reduced, and the requirement on hardware equipment of the convolutional calculation is lowered.

In a possible implementation manner of the second aspect, the apparatus further includes a grouping module, where the grouping module is configured to: taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer; determining a value range interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped; dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group, wherein the calculation module is further used for: the calculation module is further configured to: and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.

In a possible implementation manner of the second aspect, dividing the M weight channels into N groups according to the determined value range interval includes: dividing the determined weight channels with the same value domain interval into a group to obtain L groups of weight channels; and dividing the L groups of weight channels into N groups according to the size sequence of the value range interval.

In a second possible implementation manner of the second aspect, dividing L groups of weight channels into N groups according to a size order of the value domain interval includes: determining a first N-1 group of the L groups as a first N-1 group of the N groups; and determining the groups except the first N-1 groups in the L groups as the Nth group of the N groups.

In a third aspect, an apparatus for performing a neural network convolution operation is provided, the apparatus including: a processor (processing circuit) coupled to the memory, for reading and executing the instructions in the memory to implement the method of the first aspect or any one of the possible implementations of the first aspect. Optionally, the communication device further comprises a memory. Alternatively, the communication device may be a chip, a system-on-chip, an integrated circuit, or the like. Alternatively, the communication device may be integrated in a terminal device or a network device.

In a fourth aspect, a chip is provided, where the chip includes a processing unit and a storage unit, where the storage unit is configured to store instructions, and the processing unit is configured to execute the instructions stored in the storage unit, so as to enable the chip to perform the method of any one of the possible implementation manners of the first aspect

In a fifth aspect, a computer system is provided, where the computer system comprises a computing module and a computing module, and optionally, the computer system further comprises a grouping module for supporting the computer system to perform the corresponding functions of the above method.

A sixth aspect provides a computer-readable storage medium for storing a computer program comprising instructions for performing the method of any one of the possible implementations of the first aspect.

In a seventh aspect, a computer program product is provided, which comprises instructions for carrying out the method of any one of the possible implementations of the first aspect described above.

Drawings

Fig. 1 is a schematic flowchart of grouping and quantizing a coefficient matrix of a weight channel of an i-th layer of a convolutional neural network according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of grouping and quantizing a coefficient matrix of a weight channel of an i-th layer of a convolutional neural network according to another embodiment of the present application.

FIG. 3 is a schematic flow chart diagram of a method of convolution operation of a neural network according to an embodiment of the present application.

FIG. 4 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to an embodiment of the present application.

FIG. 5 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to another embodiment of the present application.

FIG. 6 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to another embodiment of the present application.

Fig. 7 is a schematic block diagram of an apparatus for convolution operation of a neural network according to an embodiment of the present application.

Fig. 8 is a schematic block diagram of an apparatus for convolution operation of a neural network according to another embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

First, some terms related to the present application are explained.

Characteristic diagram: the feature map represents the calculation result of a convolutional layer in a convolutional neural network, and is an intermediate calculation result for the whole convolutional neural network.

And (3) quantification: quantization is the process of mapping a set of numbers within an original range (range interval) of values to another range of target ranges of values by a mathematical transformation. Methods such as table look-up, shifting, truncating, etc. may be employed. Where a linear transformation is often employed, this transformation is usually done using multiplication.

Inverse quantization: the quantized number is inversely transformed into the original value range based on the previous linear transformation (quantization process). The inverse quantization can ensure that the system adopts quantized data to calculate according to a certain calculation rule, and after the inverse quantization, the result can still keep a value domain range which is very similar to the calculation result which is calculated according to the same calculation rule by using the data in the original value domain range, so that the loss of the precision of the convolutional neural network is small.

Reversibility: in the quantization and inverse quantization processes, the requirements of quantization and inverse quantization can be mutually inverse transformation. I.e., quantized data, can keep the data approximately equal to the original data after inverse quantization.

Reversibility of quantization calculation: after the data is quantized, each layer of data obtains an amplification multiplier, and the multiplication-accumulation output after convolution with the amplification multiplier needs to remove the same amplification multiplier (quantization parameter) so as to ensure that the value range of the whole calculation process is reversible and the value range is approximate. Whereas the reversible calculation is premised on the convolution-based calculation of multiply-accumulate being a linear process.

In the calculation process of the convolutional neural network, because the calculation capability of hardware equipment (for example, a computer device) is limited, and in the calculation process of the neural network, because there are more input data to be calculated and more weight channels of each convolutional layer, many intermediate results are generated, and the requirement on the hardware equipment is too high. For some hardware devices with insufficient performance, excessive input data and generated intermediate data may cause data overflow, resulting in calculation errors.

Based on the above problems, the present application provides a method and an apparatus for performing convolution calculation on a neural network, which can reduce the amount of data input to the convolutional neural network each time, reduce the amount of calculation of convolution calculation and the amount of data of obtained intermediate data, and thus reduce the requirements on hardware devices for convolution calculation, by performing grouping quantization on coefficient matrices of weight channels of the convolutional neural network and receiving corresponding input data in groups.

Fig. 1 is a flowchart illustrating a grouping quantization of a coefficient matrix of a weight channel of an i-th layer of a convolutional neural network according to an embodiment of the present application. As shown in fig. 1, the method includes:

s110: the method comprises the steps of taking the maximum absolute value of a coefficient in a coefficient matrix of each weight channel of the ith convolutional layer, determining a value range interval in which the maximum absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous piece of data, and the data of each value range interval are not overlapped.

Specifically, in the process of grouping and quantizing the coefficient matrices of the weight channels of the ith layer of the convolutional neural network, a value with the largest absolute value of the coefficients in the coefficient matrix of each weight channel (the maximum absolute value of the coefficient matrix) may be determined first. And then determining the value domain interval where the matrix is located according to the maximum value of the absolute value of the coefficient matrix. Namely, the value range section where the coefficient matrix is located is determined by using the value range section where the maximum absolute value of the coefficient matrix is located. For example. Assume that the i-th convolutional layer has a weight channel (coefficient matrix) of 5The 5 weight channels correspond to 5 coefficient matrixes, each coefficient matrix is different, the maximum absolute value of the 5 coefficient matrixes can be calculated respectively, the maximum absolute value of each element in the coefficient matrix can be calculated firstly in the calculation process of the maximum absolute value of the coefficient matrix, and then the maximum absolute value in the absolute values is determined, namely the maximum absolute value of the coefficient matrix. Suppose that the maximum absolute values of the 5 coefficient matrices are: the maximum absolute value of the first coefficient matrix is 5, the maximum absolute value of the second coefficient matrix is 7, the maximum absolute value of the third coefficient matrix is 6, the maximum absolute value of the fourth coefficient matrix is 9, and the maximum absolute value of the fifth coefficient matrix is 8. And determining a value domain interval in which the maximum absolute value of the coefficient matrix falls according to the maximum absolute value of the coefficient matrix. The computer device is preset with a plurality of value range intervals, each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped. A value range is understood to be a range of values, for example, the values may be divided into value ranges with a granularity of 2. Suppose that: 2⁰Absolute value maximum value is less than or equal to 2¹Is a first value range interval, 2¹<Maximum absolute value less than or equal to 2²Is a second value range interval, 2²<Maximum absolute value less than or equal to 2³Is a third value range interval, 2³<Maximum absolute value less than or equal to 2⁴A fourth value range interval, etc. The value range interval may be divided in advance. And determining a value range interval in which the maximum absolute value of the 5 coefficient matrixes is located according to the maximum absolute value of the 5 coefficient matrixes. In the above example, the first, second, third and fifth coefficient matrices are in the third value range, and the fourth weight coefficient matrix is in the fourth value range.

S120: and dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group.

Specifically, in S120, the M weight channels are divided into N weight channel groups according to the value domain interval in which the maximum absolute value of each coefficient matrix is located, and the quantization coefficient corresponding to each group is determined. The quantization coefficients corresponding to each group of the N groups are different, and N is a positive integer less than or equal to M.

As an example, in a specific implementation process of step S120, all coefficient matrices may share a quantization coefficient before grouping the coefficient matrices. After all coefficient matrixes are sorted and grouped, each weight channel group (coefficient matrix group) shares one quantized coefficient, namely, each group corresponds to one quantized coefficient, and the quantized coefficients corresponding to each group can be different. Here, the quantization coefficient (quantization parameter) is used to quantize the weight, and corresponds to an amplification multiplier. The quantized coefficients may be calculated during training of the convolutional neural network.

Optionally, the method is implemented as an implementation manner for determining a quantization coefficient corresponding to each weight channel group. The quantization coefficient corresponding to each weight channel group is multiplied by the fixed point number representing range of the maximum value of the absolute value in the coefficient matrix included in the group (the maximum value of the maximum values of the absolute values of all the coefficient matrixes included in the group) which should meet the requirement of the convolutional neural network. Taking 8-bit (bit) fixed point quantization as an example for illustration, the result of multiplying the quantization coefficient by the maximum number of the maximum absolute values of all the coefficient matrixes included in the weight channel group is required to be smaller than the 8-bit fixed point representation range (-128-127), and if the maximum absolute value of the coefficient matrix in a certain weight channel group is 2, the corresponding quantization coefficient of the weight channel group is 2⁵. The quantization coefficients corresponding to each weight channel group can be calculated by the above method.

S130: and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.

In S130, the coefficient matrix of each channel of each set of weight channels is quantized according to the quantization coefficients of each set of weight channels. That is, according to the quantization coefficient of each coefficient matrix group in the N coefficient matrix groups, the coefficient matrices included in the N coefficient matrix groups are quantized respectively. And performing convolution calculation by using the grouped and quantized data to obtain a convolution calculation result of the ith convolution layer. It should be understood that, when the coefficient matrixes included in each of the N coefficient matrix groups are quantized according to the quantization coefficients of the coefficient matrix group, the quantization may be performed by using any feasible quantization formula or quantization method. The embodiments of the present application are not limited thereto.

The above steps S110, S120, and S130 will be described below with specific examples.

Assuming that there are 8 weight channels of the ith convolutional layer, corresponding to 8 coefficient matrices, the coefficient matrices corresponding to the 8 weight channels are named as coefficient matrices 1 to 8, respectively. Each coefficient matrix includes 6 elements, and as shown in table 2, table 2 is a table that divides the 8 weight channels (coefficient matrix) into 4(N is equal to 4) weight channel groups (coefficient matrix groups) according to a value range interval in which the maximum value of the absolute value of the coefficient matrix is located. In table 1, the value range interval is divided by using 2-ary system, and the granularity of the value range interval is 2¹And A to B represent the range of one value range interval. A-B represent a range of values for data greater than or equal to A and less than B. The 8 coefficient matrices share a quantization coefficient of 2 before grouping the weight channels⁴The numbers in the table represent how many elements of a coefficient matrix have their absolute values within the span (in practice only the span in which the maximum of the absolute values of the coefficient matrix is located is of interest).

TABLE 1

The numbers in table 1 represent the number of points (elements) in the coefficient matrix that are in different value range intervals. The coefficient matrix 1 will be described as an example. There are 6 elements in the coefficient matrix 1. Wherein the absolute value of 1 element is 2²～2³The value range, i.e. the point at which the absolute value of the coefficient matrix is maximum, has 1 element with an absolute value of 2¹～2²In the range of value range, there are 2 elements with an absolute value of 2⁰～2¹In the range of value range, there are 2 elements with an absolute value of 2^-1～2⁰The value range, i.e. the value range of the coefficient matrix 1 is 2²～2³. The meaning of the numbers corresponding to the other coefficient matrices is also similar. These 8 coefficient matrices share quantized coefficients 2⁴. After the value domain interval where the absolute value maximum value of each coefficient matrix is located is determined according to the absolute value maximum value of each coefficient matrix, the M weight channels are divided into N groups according to the determined value domain interval, and the quantization coefficient corresponding to each group is determined. As shown in table 1, the coefficient matrices 1, 7, 3 and 6 are divided into one group, and the corresponding quantized coefficient is 2⁴. The coefficient matrices 2, 5, 4 and 8 are divided into a group with a corresponding quantized coefficient of 2⁵. Wherein, each group of corresponding quantization coefficients is multiplied by the fixed point number representing range of the maximum value of the absolute value in the coefficient matrix included in the group (the maximum value of the absolute value of all the coefficient matrixes included in the group) which should satisfy the requirement of the convolutional neural network. Taking 8-bit (bit) fixed-point quantization as an example, the result of multiplying the quantized coefficient by the maximum number of the maximum absolute values of all coefficient matrixes included in the coefficient matrix group is required to be smaller than the 8-bit fixed-point representation range (-128-127), and if the maximum absolute value of a coefficient matrix in a certain coefficient matrix group is 2, the coefficient matrix group corresponds to the quantized coefficient of 2⁵. The quantized coefficients corresponding to each coefficient matrix group may be calculated by the above method, and thus, the quantized coefficients corresponding to each coefficient matrix group may be different.

It should be understood that table 1 is exemplary only and should not impose any limitations on the embodiments of the present application. For example, there may be more weight channels (coefficient matrices) in the ith convolutional layer, each coefficient matrix may include more elements, each coefficient matrix may include different numbers of elements, and a value range interval in which the maximum absolute value of the coefficient matrix is located may also be other value range intervals. The values of M and N may also be other values, etc. The embodiments of the present application are not limited thereto.

Fig. 2 is a schematic flow chart of grouping and quantizing a coefficient matrix of a weight channel of an i-th layer of a convolutional neural network according to another embodiment of the present application. In this embodiment, a value domain interval in which a maximum absolute value is located is determined by determining a maximum absolute value of a coefficient in a coefficient matrix of each weight channel, M weight channels are divided into L weight channel groups according to the value domain interval, the L weight channel groups are secondarily divided, finally, the L weight channel groups are divided into N weight channel groups, a quantization coefficient corresponding to each group is determined, and a coefficient matrix included in the weight matrix group is quantized according to the quantization coefficient. As shown in fig. 2, the flow includes the following steps.

S210: and taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer, determining a value range interval in which the maximum absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped.

S220: and dividing the determined weight channels with the same value range interval into a group to obtain L groups of weight channels.

Step S210 and step S220 will be described below with reference to specific examples.

Assuming that there are 8 weight channels of the ith convolutional layer, corresponding to 8 coefficient matrices, the coefficient matrices corresponding to the 8 weight channels are named as coefficient matrices 1 to 8, respectively. As shown in table 2, each coefficient matrix includes 6 elements. Table 2 is a table in which the 8 weight channels (coefficient matrix) are divided into 4(L is equal to 4) weight channel groups (coefficient matrix groups) according to the value range section in which the maximum value of the absolute value of the coefficient matrix is located. In table 2, the value range interval is divided by using 2-ary system, and the granularity of the value range interval is 2¹And A to B represent the range of one value range interval. A-B represent a range of values for data greater than or equal to A and less than B. The 8 coefficient matrices share a quantization coefficient of 2 before grouping the weight channels⁴The numbers in the table represent how many elements of a coefficient matrix have their absolute values within the value range.

TABLE 2

The numbers in table 2 represent the number of points (elements) in the coefficient matrix that are in different value range intervals. The coefficient matrix 1 will be described as an example. There are 6 elements in the coefficient matrix 1. Wherein the absolute value of 1 element is 2²～2³The value range, i.e. the point at which the absolute value of the coefficient matrix is maximum, has 1 element with an absolute value of 2¹～2²In the range of value range, there are 2 elements with an absolute value of 2⁰～2¹In the range of value range, there are 2 elements with an absolute value of 2^-120 value range, namely the value range of the coefficient matrix 1 is 2²～2³. The meaning of the numbers corresponding to the other coefficient matrices is also similar. These 8 coefficient matrices share quantized coefficients 2⁴. After the value domain interval where the absolute value maximum value of each coefficient matrix is located is determined according to the absolute value maximum value of each coefficient matrix, the M weight channels are divided into L groups according to the determined value domain interval, and the quantization coefficient corresponding to each group is determined. Coefficient matrices with the same value range interval can be determined as a group, that is, the coefficient matrices are grouped according to whether the value range interval in which the maximum absolute value of the coefficient matrices is located is consistent or not. Optionally, the coefficient matrices may be grouped in order from large to small according to a value range interval in which the maximum absolute value of the coefficient matrices is located. As shown in table 2, the coefficient matrices having the same value range interval are determined as one group, the coefficient matrices 1 and 7 are divided into one group, and the corresponding quantized coefficient is 2⁴. The coefficient matrices 3, 6, 2 are divided into a group with a corresponding quantized coefficient of 2⁵. The coefficient matrix 5 is divided individually into a group with a corresponding quantized coefficient of 2⁶. The coefficient matrices 4 and 8 are divided into a group with a corresponding quantized coefficient of 2⁷. The quantized coefficients corresponding to each coefficient matrix group may be calculated by the above method, and thus, the quantized coefficients corresponding to each coefficient matrix group may be different.

It should be understood that table 2 is only exemplary and should not impose any limitation on the embodiments of the present application. For example, there may be more weight channels (coefficient matrices) in the ith convolutional layer, each coefficient matrix may include more elements, each coefficient matrix may include different numbers of elements, the value range interval in which the maximum absolute value of the coefficient matrix is located may also be other value range intervals, the values of M and L may also be other values, and so on. The embodiments of the present application are not limited thereto.

It should also be understood that, in the embodiment of the present application, in addition to determining coefficient matrices having the same value range interval as the same group, grouping may be performed by using other methods according to the value range interval. For example, the coefficient matrices having the maximum absolute value of the coefficient matrices within a preset range may be divided into a group, sharing the quantized coefficients. Alternatively, the coefficient matrices in which the difference value of the value range section in which the maximum absolute value is located is within a preset range in the coefficient matrices may be divided into one group, and quantized coefficients may be shared. The embodiments of the present application are not limited thereto.

S230: dividing the L groups of weight channels into N groups according to the size sequence of the value domain interval, and determining a quantization coefficient corresponding to each group;

in S230, the L groups are divided into N groups, where each group in the N groups corresponds to a different second quantized coefficient, and N is a positive integer less than or equal to L. That is, in S230, after dividing the coefficient matrix of the i-th convolutional layer into L coefficient matrix groups, the second grouping is performed, where the L groups are divided into N groups, each of the N groups corresponds to one quantized coefficient, and the quantized coefficients corresponding to each group may be different. N is a positive integer less than or equal to L.

Because the precision of the convolution model after load balancing and quantization after grouping needs to be considered, the existence of the coefficient matrix with a larger absolute value in the coefficient matrix needs to be ensured as much as possible, because the coefficient matrix with the larger absolute value has a larger influence on the precision and the quantization of the convolution calculation result. The existence of the coefficient matrix with a large absolute value in the coefficient matrix needs to be ensured, and the requirement that the result of multiplying each group of corresponding quantized coefficients by the coefficient matrix included in the group should meet the fixed point number representation range required by the convolutional neural network is also met. Therefore, in the case of ensuring the presence of the coefficient matrix having a larger absolute value among the coefficient matrices, the quantized coefficient of the group of the coefficient matrix having a larger absolute value among the coefficient matrices should be the minimum value of the quantized coefficients of the respective coefficient matrices included in the group of the coefficient matrix having a larger absolute value. Otherwise, the multiplication result of the two is larger than the fixed point number representation range required by the convolutional neural network.

For example, assume that the first of the L groups includes a coefficient matrix 1, 2, and the first group has a corresponding quantized coefficient of 2⁵The second of the L groups comprises a coefficient matrix 3, the second group corresponding to a quantized coefficient of 2⁶. Now the L groups are grouped (combined) again, and the first group and the second group of the L groups are combined into the first group of the N groups, the quantized coefficient of the first group of the N groups becomes 2⁵I.e. the minimum value of the first quantized coefficient corresponding to the coefficient matrix included in the first group of the N groups, the second quantized coefficient of the first group of the N groups is 2⁵。

Optionally, as an embodiment, the number of channels of the weight channel included in each of the N groups is the same.

Specifically, in the process of dividing L groups or M groups into N groups, the value of N may be the upper limit value of the i-th convolutional layer, and load balancing may be understood as: when matrix addition is finally performed, since M weight channels are divided into N groups, convolution results of the N groups need to be added, and all groups need to be calculated before a final result of the convolutional layer is obtained. If a parallel method is adopted for grouping operation, that is, the number of channels included in each group is the same or the difference is not large, the calculation result can be obtained relatively quickly, and therefore, load balancing can be understood as that the number (number) of channels of the weight channel (coefficient matrix) included in each group in the N groups is the same. In the implementation mode, due to the consideration of load balancing, the calculation rate can be improved, and the time for calculation is reduced.

Examples shown in table 2 were combined. In the case of considering load balancing, as shown in table 3, table 3 is a process of grouping again the coefficient matrices grouped in table 3 in consideration of load balancing. Assume that the upper limit of load balancing is 2 and the value of N is the upper limit of load balancing 2.

TABLE 3

Similar to table 2, the value range interval is divided by using 2-ary system, and the granularity of the value range interval is 2¹. According to load balancing and interval proximity principle. The coefficient matrices 1, 7, 3 and 6 are divided into a group with a corresponding quantized coefficient of 2⁴. The coefficient matrices 2, 5, 4 and 8 are divided into a group with a corresponding quantized coefficient of 2⁵. It can be seen that after L groups are combined into N groups according to load balancing, the number of coefficient matrices included in each group is the same.

It should be understood that table 3 is exemplary only and should not impose any limitations on the embodiments of the present application. For example, there may be more weight channels (coefficient matrices) in the ith convolutional layer, each coefficient matrix may include more elements, each coefficient matrix may include different numbers of elements, a value range interval in which the maximum absolute value of the coefficient matrix is located may also be another value range interval, the values of L and N may also be other values, each group of corresponding quantized coefficients may also be other values, the number of coefficient matrices included in each group may also be other values, and the like. The embodiments of the present application are not limited thereto.

As another implementation manner, in step S230, the L weight channel groups are further divided into N groups, and the following steps may also be taken to implement.

Step 1: determining the first N-1 groups in the L groups as the first N-1 groups of the N groups according to the sequence of the maximum upper bound of the value range interval in which the L groups are positioned from large to small;

step 2: the groups of the L groups except the first N-1 groups are determined as the last group of the N groups.

Specifically, after the coefficient matrix is divided into L groups, if there is no need to consider load balancing of the accelerator in the convolutional neural network system, the L groups need to be divided (merged) twice according to the quantization grouping upper limit value. Therefore, in dividing the L groups into N groups, the value of N may be determined according to the quantization grouping upper limit. I.e., the value of N may be different from the quantization packet upper limit. Optionally, the value of N is a quantization grouping upper limit value. The quantized coefficients for each of the N groups are different. After the coefficient matrixes are divided into L groups, since the value range intervals of each group of coefficient matrixes are different, the value range intervals of the maximum absolute values of the coefficient matrixes are ordered in a monotonically increasing manner, that is, the L groups are arranged in sequence from the largest value range interval to the smallest value range interval. The first N-1 of the L groups is then determined to be the first N-1 of the N groups. The groups of the L groups except the first N-1 groups are determined as the last group of the N groups.

For example, assuming that L is equal to 4, the values are sorted from the largest to the smallest in the order of the value range interval, namely, the first group (labeled as L1), the second group (labeled as L2), the third group (labeled as L3), and the fourth group (labeled as L4). Assume that N is equal to 3 (labeled N1, N2, and N3, respectively). The first and second of the L groups are determined to be the first two of the N groups (N1 and N2), respectively, i.e., L1 is actually N1 and L2 is actually N2. The quantized coefficient of N1 is the same as the quantized coefficient corresponding to L1, and the quantized coefficient of N2 is the same as the quantized coefficient corresponding to L2. The groups of the L groups except the first N-1 groups are determined as the last group of the N groups. The groups of the L groups other than the first 2 groups were L3 and L4, and L3 and L4 were combined into the last group of the N groups, i.e., L3 plus L4 corresponded to N3. Each of the N groups corresponds to a quantized coefficient. And the quantized coefficients corresponding to each of the N groups are different.

It should be understood that the L groups may be divided into N groups using other methods than the above-described method of dividing the L groups into N groups. For example, as in the above example, L1 and L2 may be combined into one of N groups, and L3 and L4 may be determined as the other two of the N groups, respectively. Alternatively, any one or more coefficient matrices in the L groups may be combined to obtain N groups. The embodiments of the present application are not limited thereto.

Because the precision of the convolution model after grouping and quantization needs to be considered, the existence of the coefficient matrix with a larger absolute value in the coefficient matrix needs to be ensured as much as possible, because the coefficient matrix with a larger absolute value has a larger influence on the precision of the convolution calculation result and the quantization. The existence of the coefficient matrix with a large absolute value in the coefficient matrix needs to be ensured, and the requirement that the result of multiplying each group of corresponding quantized coefficients by the coefficient matrix included in the group should meet the fixed point number representation range required by the convolutional neural network is also met. Therefore, in the case of ensuring the presence of the coefficient matrix having a larger absolute value among the coefficient matrices, the quantized coefficient of the group of the coefficient matrix having a larger absolute value among the coefficient matrices should be the minimum value of the quantized coefficients of the respective coefficient matrices included in the group of the coefficient matrix having a larger absolute value. Otherwise, the multiplication result of the two is larger than the fixed point number representation range required by the convolutional neural network.

The process of dividing L groups into N groups in this implementation will be described below with a specific example.

In connection with the example shown in table 3, as shown in table 4, table 4 is a process of grouping again the coefficient matrices grouped in table 2 without considering load balancing. Assume that the quantization packet upper limit value is 2 and the value of N is the quantization packet upper limit value 2.

TABLE 4

Similar to table 3, the value range interval is divided by using 2-ary system, and the granularity of the value range interval is 2¹. Grouping upper limit values according to quantization. And determining the first 1 group in the 4 groups as the first 1 group of the 2 groups and determining the last three groups except the first 1 group in the 4 groups as the last group of the 2 groups in descending order of the value range interval in which the 4 groups are positioned. I.e. the coefficient matrices 1, 7 are divided into a group with corresponding quantized coefficients of 2⁴. The coefficient matrices 3, 6, 2, 5, 4 and 8 are divided into one group with a corresponding quantized coefficient of 2⁵。

It should be understood that table 4 is exemplary only and should not impose any limitations on the embodiments of the present application. For example, there may be more coefficient matrices of the i-th convolutional layer, each coefficient matrix may include more elements, each coefficient matrix may include different numbers of elements, a value range interval in which the maximum absolute value of the coefficient matrix is located may also be another value range interval, the values of L and N may also be other values, each group of corresponding quantized coefficients may also be other values, the number of coefficient matrices included in each group may also be other values, and the like. The embodiments of the present application are not limited thereto.

S240: and quantizing the coefficient matrix of each weight channel included in each weight channel group according to the quantization coefficient of each weight channel group.

In S240, the coefficient matrix of each channel of each set of weight channels is quantized according to the quantization coefficients of each set of weight channels. That is, according to the quantization coefficient of each coefficient matrix group in the N coefficient matrix groups, the coefficient matrices included in the N coefficient matrix groups are quantized respectively. And performing convolution calculation by using the grouped and quantized data to obtain a convolution calculation result of the ith convolution layer. It should be understood that, when the coefficient matrixes included in each of the N coefficient matrix groups are quantized according to the quantization coefficients of the coefficient matrix group, the quantization may be performed by using any feasible quantization formula or quantization method. The embodiments of the present application are not limited thereto.

FIG. 3 is a schematic flow chart diagram of a method of convolution operation of a neural network according to an embodiment of the present application. As shown in fig. 3, the method includes:

s310: and acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolutional layer.

In S310, data of a data channel corresponding to each weight channel in the nth weight channel group is obtained from data to be input into the ith convolutional layer. It is understood that the value of N herein may be a positive integer less than or equal to N. In the example shown in table 1, n may take any one of values 1 to 4. And acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from data to be input into the ith convolutional layer (input data of the ith convolutional layer). For example, when n is 2, the data of the data channel corresponding to each weight channel in the 2 nd weight channel group is obtained from the data to be input into the ith convolutional layer. I.e. the process of inputting data into the i-th convolutional layer as a packet input. Each time, only input data corresponding to a set of weight channel groups is input to the i-th convolutional layer.

S320: inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel.

In S320, for example, when n is 2, the obtained input data corresponding to the 2 nd weight channel group is input to the i-th convolutional layer, convolution operation is performed on the input data to obtain a convolution calculation result corresponding to each channel in the 2 nd weight channel group, and the convolution calculation results corresponding to each channel in the 2 nd weight channel group are summed.

S330: and performing inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group.

In S330, since the coefficient matrix of the 2 nd weight channel group is quantized in the process of calculating the convolution calculation result corresponding to the 2 nd weight channel group, the quantization coefficient used for quantization is the quantization coefficient corresponding to each coefficient matrix in the 2 nd weight channel group. In order to ensure that the value range of the quantized calculation result is consistent with the value range of the calculation result which is not quantized, the reversibility of quantization is kept. The convolution calculation needs to be dequantized. Therefore, in S330, the summation result of convolution calculation corresponding to the 2 nd weight channel group needs to be subjected to inverse quantization calculation according to the quantization coefficient corresponding to each channel in the 2 nd weight channel group, so as to obtain an inverse quantization calculation result corresponding to the 2 nd weight channel group.

S340: and adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain the calculation result corresponding to the nth weight channel group.

For step S340, the result of inverse quantization calculation is added to the calculation result of the data channel corresponding to each value channel in the n-1 th group of weight channel groups (the calculation result of inverse quantization corresponding to the n-1 th group of weight channel groups). It should be noted that in the case where n is 1, since there is no previous weight channel group, this operation is not required.

S350: and calculating the calculation result corresponding to the (n + 1) th weight channel group until all the data to be input into the ith convolution layer are calculated.

For step S350, if N is less than N, the values are sequentially calculated in a convolution calculation for the nth weight channel group. That is, steps S110 to S140 are repeated in turn for each weight channel group until the value N is calculated to be N. In S350, the same calculation is performed on the data of the data channel corresponding to each weight channel in the (N + 1) th weight channel group as the data corresponding to each weight channel in the (N) th weight channel group until the value of N +1 is N, that is, until all the data to be input into the i-th convolutional layer are calculated, the final calculation result of the i-th convolutional layer is obtained.

Fig. 4 is a schematic flow chart of convolution calculation of the quantized weight channel (coefficient matrix) in a packet according to an embodiment of the present application. As shown in fig. 4, for the i-th convolutional layer, the coefficient matrices of the i-th convolutional layer are quantized in groups. Fig. 4 shows a process of block quantization of the weight coefficient matrix. It should be understood that the input data to the i-th convolutional layer also needs to be grouped, and optionally, the grouped input data may also be quantized. It should be understood that when the ith convolutional layer is the first convolutional layer, the input data of the 1 st convolutional layer is the original input picture input into the system. And when i is larger than 1, the input data of the ith convolutional layer is characteristic diagram data. Grouping the input data to obtain grouped input data. And grouping and quantizing the weights to obtain grouped and quantized weights. It is assumed that the weight coefficient matrix of the i-th convolutional layer is divided into 3 groups and quantized. Then, convolution calculation (calculation can be performed in a matrix multiplier) is performed on the grouped input data and the grouped quantized weight obtained by each group respectively, and respective convolution calculation results (corresponding to three groups) of the three groups are obtained. Because the grouping quantization of the weight is carried out before, in order to ensure that the value range of the calculation result after the grouping quantization is consistent with the value range of the calculation result without the grouping quantization, the reversibility of the quantization is kept. Each set of convolution calculations needs to be inverse-quantized in weight groups. After the weight grouping dequantization is performed, since it is required to ensure that the data meets the fixed point number requirement (data range requirement) of the convolutional neural network, the feature map quantization (which can be performed in a feature map quantizer) can be performed on the result of the weight grouping dequantization. And respectively quantizing the feature maps of the results obtained by the grouping inverse quantization of the three groups of weights to obtain the quantized results of the three feature maps. The quantized results of these three signatures are then added (which may be done in a matrix adder) to obtain the addition result (convolution calculation result of the i-th convolutional layer). Finally, the addition result is added to the offset value of the ith convolutional layer (which may be performed in an adder), so as to obtain the final result of the ith convolutional layer (i.e., the characteristic map data of the ith convolutional layer).

Optionally, the offset of the i-th convolutional layer may also be quantized, then the quantized offset is corrected (may be performed in a quantization corrector), and the result obtained by adding the corrected offset and the grouping matrix is added to obtain the convolution calculation result of the i-th convolutional layer (that is, the feature map data of the i-th convolutional layer).

When the i +1 th convolutional layer is also a convolutional layer which needs to be subjected to grouping quantization, feature map quantization is performed on the convolution calculation result of the i-th convolutional layer, and then the i +1 th convolutional layer grouping quantization and convolution calculation are performed by using feature map data (input data of the i +1 th convolutional layer) of the i-th convolutional layer after feature map quantization, the weight of the i +1 th convolutional layer after grouping quantization and the offset of the i +1 th convolutional layer, and the flow is similar to the process flow of the i-th convolutional layer. And obtaining the characteristic diagram data of the ith convolution layer.

When the i +1 th convolutional layer is a convolutional layer which does not need to be subjected to grouping quantization, the convolution calculation result of the i-th convolutional layer may be directly input to the i +1 th convolutional layer and convolution calculation may be performed as input data of the i +1 th convolutional layer. Optionally, feature map quantization may be performed on convolution calculation results of the i-th convolutional layer, and then the result after feature map quantization is input to the i + 1-th convolutional layer, which is used as input data of the i + 1-th convolutional layer to perform convolution calculation. The (i + 1) th convolutional layer may be a convolutional layer that requires quantization, or may be a convolutional layer that does not require quantization. The embodiments of the present application are not limited thereto.

When the ith convolutional layer is the last convolutional layer, the convolution calculation result of the ith convolutional layer is subjected to feature map inverse quantization (which can be performed in a feature map inverse quantizer), and the result obtained after the feature map inverse quantization is the output result of all convolutional layers.

For the layer 1 convolutional layer, the grouped pictures and the grouped and quantized weights of the layer 1 convolutional layer are subjected to convolution calculation. For other convolutional layers, the grouped feature map data and the grouped and quantized weight of the convolutional layer are subjected to convolution calculation. That is, for the convolution calculation of an original picture, the original picture is only grouped once, and for the layer 1 convolution layer, the grouped picture is input. The steps shown in the dashed box in fig. 4 are for the layer 1 convolutional layer only, the original input picture is block quantized. For the other convolutional layers, the convolutional calculation results (feature maps) of the convolutional layer of the previous layer are grouped.

It should be appreciated that the process flow shown in fig. 4 is primarily directed to a process flow in which the bias is not subject to packet quantization. Fig. 4 is only an example, and should not set any limit to the embodiments of the present application. For example, certain steps may be added to the process flow, certain steps may be removed, and the like, which are not required. The embodiments of the present application are not limited thereto.

FIG. 5 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to another embodiment of the present application. As shown in fig. 5, for the i-th convolutional layer, the weighting coefficient matrix of the i-th convolutional layer is quantized in groups. As shown in fig. 5, the quantization of the weight coefficient matrix, the quantization of the offset coefficient matrix, and the grouping of the input data to the i-th convolutional layer are performed. It should be understood that when the ith convolutional layer is the first convolutional layer, the input data of the 1 st convolutional layer is the original input picture input into the system. And when i is larger than 1, the input data of the ith convolutional layer is characteristic diagram data. And respectively carrying out grouping quantization on the bias and the weight to obtain the bias and the weight after the grouping quantization. It is assumed that the weight coefficient matrix of the i-th convolutional layer is divided into 3 groups and quantized. The input data for each group is then convolved with the quantized weights (which may be calculated in a matrix multiplier) to obtain the results of three convolution calculations (corresponding to three groups of groups). Then, the result of the convolution calculation obtained by each group is added to the quantized offset of the group, so as to obtain the intermediate calculation result of each group (three intermediate calculation results in total, corresponding to three groups). For each grouping intermediate calculation result, because the grouping quantization of the weight is performed before, in order to ensure that the value range of the calculation result after the grouping quantization is consistent with the value range of the calculation result without the grouping quantization, the reversibility of the quantization is maintained. The intermediate calculation results of each group need to be subjected to weight grouping inverse quantization respectively. After the weight grouping inverse quantization is performed, since it is required to ensure that the data meets the fixed point number requirement (data range requirement) of the convolutional neural network, it is required to perform the weight grouping inverse quantization on each group to obtain a result, and perform the feature map quantization (which can be performed in a feature map quantizer). And respectively quantizing the feature maps of the results obtained by the grouping inverse quantization of the three groups of weights to obtain the quantized results of the three feature maps. Finally, the results of quantization of the three feature maps are added (which corresponds to the addition of the calculation results obtained by all the grouping after grouping), and the convolution calculation result of the i-th convolutional layer (i-th convolutional layer feature map data) is obtained. Alternatively, the offset after the block quantization of the i-th convolutional layer may be corrected (may be performed in a quantization corrector), and the calculation may be performed using the corrected offset.

When the i +1 th convolutional layer is also a convolutional layer which needs to be subjected to grouping quantization, feature map quantization is performed on the convolution calculation result of the i-th convolutional layer, and then the i +1 th convolutional layer grouping quantization and convolution calculation are performed by using feature map data (input data of the i +1 th convolutional layer) of the i-th convolutional layer after feature map quantization, the weight of the i +1 th convolutional layer after grouping quantization and the offset of the i +1 th convolutional layer after grouping quantization, and the flow is similar to the process flow of the i-th convolutional layer. And obtaining the characteristic diagram data of the ith convolution layer.

For the layer 1 convolutional layer, the grouped pictures and the grouped and quantized weights of the layer 1 convolutional layer are subjected to convolution calculation. For other convolutional layers, the grouped feature map data and the grouped and quantized weight of the convolutional layer are subjected to convolution calculation. That is, for the convolution calculation of an original picture, the original picture is only grouped once, and for the layer 1 convolution layer, the grouped picture is input. The steps shown in the dashed box in fig. 5 are for the layer 1 convolutional layer only, the original input pictures are grouped. For the other convolutional layers, the convolutional calculation results (feature maps) of the convolutional layer of the previous layer are grouped.

It should be understood that the process flow shown in fig. 5 is primarily directed to the process flow in the case where the bias also requires block quantization. Unlike the processing flow shown in fig. 4, the flow shown in fig. 5 is a process in which the addition process of the convolution calculation results and the offset for each group is set before the convolution calculation results for all the groups are accumulated, and finally the addition process of the convolution calculation results for all the groups is performed. The processing flow shown in fig. 4 is to add the convolution calculation results obtained by all the grouping to obtain the convolution calculation result of the ith convolution layer, and finally add the convolution calculation of the ith convolution layer and the offset of the ith convolution layer to obtain the convolution calculation result of the ith convolution layer.

It should also be understood that fig. 5 is only exemplary and should not impose any limitations on the embodiments of the present application. For example, certain steps may be added to the process flow, certain steps may be removed, and the like, which are not required. The embodiments of the present application are not limited thereto.

FIG. 6 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to another embodiment of the present application. As shown in fig. 6, for the i-th convolutional layer, the coefficient matrices of the i-th convolutional layer are quantized in groups. As shown in fig. 6, the quantization of the weight coefficient matrix, the quantization of the offset coefficient matrix, and the grouping of the input data to the i-th convolutional layer are performed. It should be understood that when the ith convolutional layer is the first convolutional layer, the input data of the 1 st convolutional layer is the original input picture input into the system. And when i is larger than 1, the input data of the ith convolutional layer is characteristic diagram data. And respectively carrying out grouping quantization on the bias and the weight to obtain the bias and the weight after the grouping quantization. It is assumed that the coefficient matrix of the i-th convolutional layer is divided into 3 groups and quantized. And then, performing convolution calculation on the input data after grouping quantization and the weights after grouping quantization which are obtained by each group respectively to obtain grouping convolution results of three groups (corresponding to three groups). Because the grouping quantization of the weight is carried out before, in order to ensure that the value range of the calculation result after the grouping quantization is consistent with the value range of the calculation result without the grouping quantization, the reversibility of the quantization is kept. The obtained convolution result of each group needs to be subjected to weight grouping inverse quantization to obtain the result of each group subjected to weight grouping inverse quantization. Then, the result (intermediate result) of the dequantization of each group of weights is added to the quantized offset of the corresponding group, and the convolution calculation result (three convolution calculation results in total, corresponding to three groups) of each group is obtained. For each block convolution calculation result, feature map quantization (which may be performed in a feature map quantizer) needs to be performed on each block convolution calculation result. And respectively carrying out characteristic diagram grouping quantization on the three groups of convolution calculation results to obtain three characteristic diagram quantization results. Finally, the results of quantization of the three feature maps are added (which corresponds to addition of convolution calculations obtained by all the groups after grouping), and the convolution calculation result of the i-th convolutional layer (feature map data of the i-th convolutional layer) is obtained. Alternatively, the offset after the block quantization of the i-th convolutional layer may be corrected (may be performed in a quantization corrector), and the calculation may be performed using the corrected offset.

When the i +1 th convolutional layer is also a convolutional layer which needs to be grouped and quantized, the feature map grouping is performed on the convolution calculation result of the i-th convolutional layer, and then the i +1 th convolutional layer grouping quantization and convolution calculation are performed by using the feature map data (input data of the i +1 th convolutional layer) of the i-th convolutional layer after the feature map grouping, the weight of the i +1 th convolutional layer after the grouping quantization and the offset of the i +1 th convolutional layer after the grouping quantization, and the flow is similar to the process flow of the i-th convolutional layer. And obtaining the characteristic diagram data of the ith convolution layer.

For layer 1 convolutional layer, the grouped pictures are matrix multiplied by the grouped weights of the layer 1 convolutional layer. For other convolutional layers, the grouped feature map data and the grouped and quantized weight of the convolutional layer are subjected to convolution calculation. That is, for the convolution calculation of an original picture, the original picture is only grouped once, and for the layer 1 convolution layer, the grouped picture is input. The steps shown in the dashed box in fig. 6 are for the layer 1 convolutional layer only, the original input pictures are grouped. For the other convolutional layers, the convolutional calculation results (feature maps) of the convolutional layer of the previous layer are grouped.

It should also be understood that fig. 6 is only exemplary and should not impose any limitations on the embodiments of the present application. For example, certain steps may be added to the process flow, certain steps may be removed, and the like, which are not required. The embodiments of the present application are not limited thereto.

It should also be understood that, in the embodiment of the present application, besides the flows of the above-described grouping quantization convolution calculation, other processing flows are also possible, for example, weight grouping inverse quantization may be set after feature map grouping quantization, and the like. The embodiments of the present application are not limited thereto.

It should also be understood that the above description is only for the purpose of facilitating a better understanding of the embodiments of the present application by those skilled in the art, and is not intended to limit the scope of the embodiments of the present application. Various equivalent modifications or changes will be apparent to those skilled in the art in light of the above examples given, for example, some of the steps described in the method 100 and fig. 4-6 above may not be necessary, or some steps may be newly added, etc. Or a combination of any two or more of the above embodiments. Such modifications, variations, or combinations are also within the scope of the embodiments of the present application.

It should also be understood that the foregoing descriptions of the embodiments of the present application focus on highlighting differences between the various embodiments, and that the same or similar elements that are not mentioned may be referred to one another and, for brevity, are not repeated herein.

It should also be understood that the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic thereof, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The method of convolution operation of the neural network according to the embodiment of the present application is described in detail above with reference to fig. 1 to 6. Hereinafter, the neural network convolution operation device according to the present application will be described in detail with reference to fig. 7 to 8.

Fig. 7 shows a schematic block diagram of an apparatus 400 for convolution operation of a neural network according to an embodiment of the present application, where an ith convolution layer of the neural network includes M weight channels, the M weight channels are divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group has the same quantization coefficient, M is greater than or equal to 2, N is less than or equal to M, and the ith convolution layer is any convolution layer of the neural network. The modules or units in the apparatus 400 are respectively configured to perform the actions or processes in the methods 100 to 300, and as shown in fig. 7, the apparatus 400 may include: a transmission module 410, a calculation module 420, and a grouping module 430. And the modules are in communication connection.

A transmission module 410, configured to obtain data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolutional layer;

a calculating module 420, configured to perform inverse quantization calculation on the summation result according to a quantization coefficient corresponding to each channel in the nth weight channel group;

The grouping module 430 is configured to:

taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer;

determining a value range interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped;

dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group, wherein the calculation module is further used for:

the calculation module is further configured to 420: and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.

Optionally, as an embodiment, the grouping module 430 is further configured to:

dividing the determined weight channels with the same value domain interval into a group to obtain L groups of weight channels;

and dividing the L groups of weight channels into N groups according to the size sequence of the value range interval.

Optionally, as an embodiment, the grouping module 430 is specifically configured to:

the first N-1 groups of the L groups are determined as the first N-1 groups of the N groups;

and determining the groups except the first N-1 groups in the L groups as the Nth group of the N groups.

The apparatus 400 provided in this embodiment of the present application can implement the method flows described in fig. 1 to fig. 6, specifically, the transmission module 410 can implement steps S310 and S320, the calculation module 420 can implement steps S130, S240, S330, S340, and S350, and the grouping module 430 can implement steps S110, S120, S210, S220, and S230, and for the description of the method that can be implemented by the above modules, reference may be made to the description in the corresponding method flows described above. To avoid repetition, further description is omitted here.

The application also provides a device for convolution operation of the neural network. As shown in fig. 8, the apparatus 500 includes a processor 510 and a memory 520 for supporting the apparatus to perform the corresponding functions of the above-described method. The processor and the memory are connected through communication, and the memory stores instructions, and the processor is used for calling the instructions to realize the convolution operation method of the neural network of the above-mentioned various embodiments.

The present embodiments also provide a computer readable medium for storing a computer program code, the computer program including instructions for executing the method of the present embodiments in the methods 100 to 300 described above. The readable medium may be a read-only memory (ROM) or a Random Access Memory (RAM), which is not limited in this embodiment of the present application.

The present application also provides a computer program product comprising instructions that, when executed, cause an apparatus to perform operations corresponding to the above-described methods.

The present application further provides a computer system including a chip or an apparatus for performing the method of neural network convolution calculation according to the embodiments of the present application. The chip or the device may be the convolutional neural network system provided in the present application.

An embodiment of the present application further provides a system chip, where the system chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute computer instructions to enable a chip in the communication device to execute any one of the methods for calculating the neural network convolution provided by the embodiments of the present application.

Optionally, the computer instructions are stored in a storage unit.

Alternatively, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the terminal, such as a ROM or other types of static storage devices that can store static information and instructions, a RAM, and the like. The processor mentioned in any of the above may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for executing programs for controlling the above method for quantizing a convolutional neural network. The processing unit and the storage unit may be decoupled, and are respectively disposed on different physical devices, and are connected in a wired or wireless manner to implement respective functions of the processing unit and the storage unit, so as to support the system chip to implement various functions in the foregoing embodiments. Alternatively, the processing unit and the memory may be coupled to the same device.

It should be understood that the foregoing descriptions of the embodiments of the present application focus on highlighting differences between the various embodiments, and that the same or similar parts that are not mentioned may be referred to one another, and thus, for brevity, will not be described again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of convolution operation of a neural network, the method being performed by a computer device, an i-th convolutional layer of the neural network including M weight channels, the M weight channels being divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group having the same quantization coefficient, the M being greater than or equal to 2, the N being less than or equal to M, the i-th convolutional layer being any convolutional layer of the neural network, the method comprising:

acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolution layer;

performing inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group;

and calculating a calculation result corresponding to the (n + 1) th weight channel group until all data to be input into the ith convolution layer are calculated.

2. The method of claim 1, further comprising:

dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group;

and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.

3. The method of claim 2, wherein the dividing the M weight channels into N groups according to the determined value range interval comprises:

and dividing the L groups of weight channels into N groups according to the size sequence of the value domain interval.

4. The method according to claim 3, wherein the dividing the L groups of weight channels into N groups according to the size order of the value range interval comprises:

determining a first N-1 group of the L groups as a first N-1 group of the N groups;

determining a group other than the first N-1 groups among the L groups as an Nth group of the N groups.

5. An apparatus of convolutional operation of a neural network, wherein an i-th convolutional layer of the neural network includes M weight channels, the M weight channels are divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group has the same quantization coefficient, M is greater than or equal to 2, and N is less than or equal to M, the i-th convolutional layer is any convolutional layer of the neural network, the apparatus includes:

a transmission module, configured to obtain data of a data channel corresponding to each weight channel in an nth weight channel group from data to be input to the ith convolutional layer;

the calculation module is used for carrying out inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group;

6. The apparatus of claim 5, further comprising a grouping module to,

determining a value range interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, wherein a plurality of value range intervals are preset in the computer device, each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped;

the calculation module is further configured to quantize the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.

7. The apparatus of claim 6, wherein the grouping the M weight channels into N groups according to the determined value range interval comprises:

8. The apparatus according to claim 7, wherein the dividing the L groups of weight channels into N groups according to the size order of the value range interval comprises:

9. An apparatus for neural network convolution operations, the apparatus comprising: a processor and a memory, the memory for storing instructions, the processor for reading and executing the instructions in the memory to perform the method of any of claims 1 to 4.