CN110826685A - Method and device for convolution calculation of neural network - Google Patents

Method and device for convolution calculation of neural network Download PDF

Info

Publication number
CN110826685A
CN110826685A CN201810898766.4A CN201810898766A CN110826685A CN 110826685 A CN110826685 A CN 110826685A CN 201810898766 A CN201810898766 A CN 201810898766A CN 110826685 A CN110826685 A CN 110826685A
Authority
CN
China
Prior art keywords
weight
group
groups
data
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810898766.4A
Other languages
Chinese (zh)
Inventor
郭鑫
董晓文
李怀洲
林芃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201810898766.4A priority Critical patent/CN110826685A/en
Publication of CN110826685A publication Critical patent/CN110826685A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

The application provides a method and a device for convolution operation of a neural network, wherein the ith convolution layer of the neural network comprises M weight channels, the M weight channels are divided into N weight channel groups, and the method comprises the following steps: acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolution layer; inputting data into the ith convolution layer, carrying out convolution operation on the data input into each data channel, and summing convolution operation results of each data channel; carrying out inverse quantization calculation on the summed result according to the quantization coefficient corresponding to each channel in the nth weight channel group; and adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain the calculation result corresponding to the nth weight channel group. The method provided by the application can reduce the data volume input to the convolutional neural network and the data volume of the intermediate data each time, and reduce the requirements on hardware equipment.

Description

Method and device for convolution calculation of neural network
Technical Field
The present application relates to the field of neural networks, and more particularly, to a method and an apparatus for performing convolution calculation in a neural network.
Background
The deep convolutional neural network has hundreds or even tens of millions of parameters after training is completed, for example, weight parameters and bias parameters included in convolutional neural network model parameters. At present, in the calculation process of the convolutional neural network, all data to be calculated are input into a convolutional layer, and then calculation is performed layer by layer. Because the number of channels of the conventional convolutional neural network is more and more, if all data channels are input simultaneously, more parameters and larger data volume are caused, and a large amount of storage and calculation resources are consumed in the whole convolutional calculation process. With the development of deep neural networks, more memory and computing resources are consumed, and thus, the deep neural networks are difficult to transplant into a mobile phone end or a memory chip. Even if the calculation result is transmitted to the mobile phone end or the embedded chip through the network, the high occupancy rate of the bandwidth is often a difficult problem for engineering implementation.
At present, in the calculation process of the convolutional neural network, because the calculation capability of hardware equipment (for example, a computer device) is limited, and in the calculation process of the neural network, because there are more input data to be calculated and more weight channels of each convolutional layer, many intermediate results are generated, which leads to an excessively high requirement on the hardware equipment. For some hardware devices with insufficient performance, excessive input data and generated intermediate data may cause data overflow, resulting in calculation errors.
Disclosure of Invention
The application provides a method and a device for convolution operation of a neural network, which can reduce the data volume input to the convolution neural network and the data volume of generated intermediate data each time and reduce the requirement on hardware equipment for convolution calculation.
In a first aspect, a method of convolutional operation of a neural network is provided, where the method is performed by a computer device, an i-th convolutional layer of the neural network includes M weight channels, the M weight channels are divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group has the same quantization coefficient, the M is greater than or equal to 2, the N is less than or equal to M, and the i-th convolutional layer is any convolutional layer of the neural network, and the method includes: acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolution layer; inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel; carrying out inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group; adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain a calculation result corresponding to the nth weight channel group; and calculating the calculation result corresponding to the (n + 1) th weight channel group until all the data to be input into the ith convolution layer are calculated.
According to the neural network convolution operation method, the coefficient matrixes of the weight channels of the convolution neural network are grouped and quantized, and corresponding input data are received in a grouping mode, so that the data volume input to the convolution neural network each time can be reduced, the calculation amount of convolution calculation and the data volume of obtained intermediate data are reduced, and the requirements on hardware equipment of convolution calculation are lowered.
In a possible implementation manner of the first aspect, the method further includes: taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer; determining a value domain interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value domain intervals in the computer device, wherein each value domain interval represents a continuous segment of data, and the data of each value domain interval are not overlapped; dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group; and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.
In a possible implementation manner of the first aspect, the dividing the M weight channels into N groups according to the determined value range interval includes: dividing the determined weight channels with the same value domain interval into a group to obtain L groups of weight channels; and dividing the L groups of weight channels into N groups according to the size sequence of the value range interval. In this implementation of the method of the present invention,
in a possible implementation manner of the first aspect, the dividing the L groups of weight channels into N groups according to the size order of the value range interval includes: determining a first N-1 group of the L groups as a first N-1 group of the N groups; and determining the groups except the first N-1 groups in the L groups as the Nth group of the N groups.
In a second aspect, a device for convolutional operation of a neural network is provided, where an ith convolutional layer of the neural network includes M weight channels, the M weight channels are divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group has the same quantization coefficient, the M is greater than or equal to 2, the N is less than or equal to M, the ith convolutional layer is any convolutional layer of the neural network, and the device includes a transmission module and a calculation module:
the transmission module is used for: acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolution layer;
inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel;
the calculation module is configured to: carrying out inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group;
adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain a calculation result corresponding to the nth weight channel group;
and calculating the calculation result corresponding to the (n + 1) th weight channel group until all the data to be input into the ith convolution layer are calculated.
According to the device provided by the embodiment of the application, the coefficient matrix of the weight channel of the convolutional neural network is subjected to grouping quantization, and corresponding input data is received in a grouping mode, so that the data volume input to the convolutional neural network every time can be reduced, the calculation amount of convolutional calculation and the data volume of obtained intermediate data are reduced, and the requirement on hardware equipment of the convolutional calculation is lowered.
In a possible implementation manner of the second aspect, the apparatus further includes a grouping module, where the grouping module is configured to: taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer; determining a value range interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped; dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group, wherein the calculation module is further used for: the calculation module is further configured to: and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.
In a possible implementation manner of the second aspect, dividing the M weight channels into N groups according to the determined value range interval includes: dividing the determined weight channels with the same value domain interval into a group to obtain L groups of weight channels; and dividing the L groups of weight channels into N groups according to the size sequence of the value range interval.
In a second possible implementation manner of the second aspect, dividing L groups of weight channels into N groups according to a size order of the value domain interval includes: determining a first N-1 group of the L groups as a first N-1 group of the N groups; and determining the groups except the first N-1 groups in the L groups as the Nth group of the N groups.
In a third aspect, an apparatus for performing a neural network convolution operation is provided, the apparatus including: a processor (processing circuit) coupled to the memory, for reading and executing the instructions in the memory to implement the method of the first aspect or any one of the possible implementations of the first aspect. Optionally, the communication device further comprises a memory. Alternatively, the communication device may be a chip, a system-on-chip, an integrated circuit, or the like. Alternatively, the communication device may be integrated in a terminal device or a network device.
In a fourth aspect, a chip is provided, where the chip includes a processing unit and a storage unit, where the storage unit is configured to store instructions, and the processing unit is configured to execute the instructions stored in the storage unit, so as to enable the chip to perform the method of any one of the possible implementation manners of the first aspect
In a fifth aspect, a computer system is provided, where the computer system comprises a computing module and a computing module, and optionally, the computer system further comprises a grouping module for supporting the computer system to perform the corresponding functions of the above method.
A sixth aspect provides a computer-readable storage medium for storing a computer program comprising instructions for performing the method of any one of the possible implementations of the first aspect.
In a seventh aspect, a computer program product is provided, which comprises instructions for carrying out the method of any one of the possible implementations of the first aspect described above.
Drawings
Fig. 1 is a schematic flowchart of grouping and quantizing a coefficient matrix of a weight channel of an i-th layer of a convolutional neural network according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of grouping and quantizing a coefficient matrix of a weight channel of an i-th layer of a convolutional neural network according to another embodiment of the present application.
FIG. 3 is a schematic flow chart diagram of a method of convolution operation of a neural network according to an embodiment of the present application.
FIG. 4 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to an embodiment of the present application.
FIG. 5 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to another embodiment of the present application.
FIG. 6 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to another embodiment of the present application.
Fig. 7 is a schematic block diagram of an apparatus for convolution operation of a neural network according to an embodiment of the present application.
Fig. 8 is a schematic block diagram of an apparatus for convolution operation of a neural network according to another embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
First, some terms related to the present application are explained.
Characteristic diagram: the feature map represents the calculation result of a convolutional layer in a convolutional neural network, and is an intermediate calculation result for the whole convolutional neural network.
And (3) quantification: quantization is the process of mapping a set of numbers within an original range (range interval) of values to another range of target ranges of values by a mathematical transformation. Methods such as table look-up, shifting, truncating, etc. may be employed. Where a linear transformation is often employed, this transformation is usually done using multiplication.
Inverse quantization: the quantized number is inversely transformed into the original value range based on the previous linear transformation (quantization process). The inverse quantization can ensure that the system adopts quantized data to calculate according to a certain calculation rule, and after the inverse quantization, the result can still keep a value domain range which is very similar to the calculation result which is calculated according to the same calculation rule by using the data in the original value domain range, so that the loss of the precision of the convolutional neural network is small.
Reversibility: in the quantization and inverse quantization processes, the requirements of quantization and inverse quantization can be mutually inverse transformation. I.e., quantized data, can keep the data approximately equal to the original data after inverse quantization.
Reversibility of quantization calculation: after the data is quantized, each layer of data obtains an amplification multiplier, and the multiplication-accumulation output after convolution with the amplification multiplier needs to remove the same amplification multiplier (quantization parameter) so as to ensure that the value range of the whole calculation process is reversible and the value range is approximate. Whereas the reversible calculation is premised on the convolution-based calculation of multiply-accumulate being a linear process.
In the calculation process of the convolutional neural network, because the calculation capability of hardware equipment (for example, a computer device) is limited, and in the calculation process of the neural network, because there are more input data to be calculated and more weight channels of each convolutional layer, many intermediate results are generated, and the requirement on the hardware equipment is too high. For some hardware devices with insufficient performance, excessive input data and generated intermediate data may cause data overflow, resulting in calculation errors.
Based on the above problems, the present application provides a method and an apparatus for performing convolution calculation on a neural network, which can reduce the amount of data input to the convolutional neural network each time, reduce the amount of calculation of convolution calculation and the amount of data of obtained intermediate data, and thus reduce the requirements on hardware devices for convolution calculation, by performing grouping quantization on coefficient matrices of weight channels of the convolutional neural network and receiving corresponding input data in groups.
Fig. 1 is a flowchart illustrating a grouping quantization of a coefficient matrix of a weight channel of an i-th layer of a convolutional neural network according to an embodiment of the present application. As shown in fig. 1, the method includes:
s110: the method comprises the steps of taking the maximum absolute value of a coefficient in a coefficient matrix of each weight channel of the ith convolutional layer, determining a value range interval in which the maximum absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous piece of data, and the data of each value range interval are not overlapped.
Specifically, in the process of grouping and quantizing the coefficient matrices of the weight channels of the ith layer of the convolutional neural network, a value with the largest absolute value of the coefficients in the coefficient matrix of each weight channel (the maximum absolute value of the coefficient matrix) may be determined first. And then determining the value domain interval where the matrix is located according to the maximum value of the absolute value of the coefficient matrix. Namely, the value range section where the coefficient matrix is located is determined by using the value range section where the maximum absolute value of the coefficient matrix is located. For example. Assume that the i-th convolutional layer has a weight channel (coefficient matrix) of 5The 5 weight channels correspond to 5 coefficient matrixes, each coefficient matrix is different, the maximum absolute value of the 5 coefficient matrixes can be calculated respectively, the maximum absolute value of each element in the coefficient matrix can be calculated firstly in the calculation process of the maximum absolute value of the coefficient matrix, and then the maximum absolute value in the absolute values is determined, namely the maximum absolute value of the coefficient matrix. Suppose that the maximum absolute values of the 5 coefficient matrices are: the maximum absolute value of the first coefficient matrix is 5, the maximum absolute value of the second coefficient matrix is 7, the maximum absolute value of the third coefficient matrix is 6, the maximum absolute value of the fourth coefficient matrix is 9, and the maximum absolute value of the fifth coefficient matrix is 8. And determining a value domain interval in which the maximum absolute value of the coefficient matrix falls according to the maximum absolute value of the coefficient matrix. The computer device is preset with a plurality of value range intervals, each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped. A value range is understood to be a range of values, for example, the values may be divided into value ranges with a granularity of 2. Suppose that: 20Absolute value maximum value is less than or equal to 21Is a first value range interval, 21<Maximum absolute value less than or equal to 22Is a second value range interval, 22<Maximum absolute value less than or equal to 23Is a third value range interval, 23<Maximum absolute value less than or equal to 24A fourth value range interval, etc. The value range interval may be divided in advance. And determining a value range interval in which the maximum absolute value of the 5 coefficient matrixes is located according to the maximum absolute value of the 5 coefficient matrixes. In the above example, the first, second, third and fifth coefficient matrices are in the third value range, and the fourth weight coefficient matrix is in the fourth value range.
S120: and dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group.
Specifically, in S120, the M weight channels are divided into N weight channel groups according to the value domain interval in which the maximum absolute value of each coefficient matrix is located, and the quantization coefficient corresponding to each group is determined. The quantization coefficients corresponding to each group of the N groups are different, and N is a positive integer less than or equal to M.
As an example, in a specific implementation process of step S120, all coefficient matrices may share a quantization coefficient before grouping the coefficient matrices. After all coefficient matrixes are sorted and grouped, each weight channel group (coefficient matrix group) shares one quantized coefficient, namely, each group corresponds to one quantized coefficient, and the quantized coefficients corresponding to each group can be different. Here, the quantization coefficient (quantization parameter) is used to quantize the weight, and corresponds to an amplification multiplier. The quantized coefficients may be calculated during training of the convolutional neural network.
Optionally, the method is implemented as an implementation manner for determining a quantization coefficient corresponding to each weight channel group. The quantization coefficient corresponding to each weight channel group is multiplied by the fixed point number representing range of the maximum value of the absolute value in the coefficient matrix included in the group (the maximum value of the maximum values of the absolute values of all the coefficient matrixes included in the group) which should meet the requirement of the convolutional neural network. Taking 8-bit (bit) fixed point quantization as an example for illustration, the result of multiplying the quantization coefficient by the maximum number of the maximum absolute values of all the coefficient matrixes included in the weight channel group is required to be smaller than the 8-bit fixed point representation range (-128-127), and if the maximum absolute value of the coefficient matrix in a certain weight channel group is 2, the corresponding quantization coefficient of the weight channel group is 25. The quantization coefficients corresponding to each weight channel group can be calculated by the above method.
S130: and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.
In S130, the coefficient matrix of each channel of each set of weight channels is quantized according to the quantization coefficients of each set of weight channels. That is, according to the quantization coefficient of each coefficient matrix group in the N coefficient matrix groups, the coefficient matrices included in the N coefficient matrix groups are quantized respectively. And performing convolution calculation by using the grouped and quantized data to obtain a convolution calculation result of the ith convolution layer. It should be understood that, when the coefficient matrixes included in each of the N coefficient matrix groups are quantized according to the quantization coefficients of the coefficient matrix group, the quantization may be performed by using any feasible quantization formula or quantization method. The embodiments of the present application are not limited thereto.
The above steps S110, S120, and S130 will be described below with specific examples.
Assuming that there are 8 weight channels of the ith convolutional layer, corresponding to 8 coefficient matrices, the coefficient matrices corresponding to the 8 weight channels are named as coefficient matrices 1 to 8, respectively. Each coefficient matrix includes 6 elements, and as shown in table 2, table 2 is a table that divides the 8 weight channels (coefficient matrix) into 4(N is equal to 4) weight channel groups (coefficient matrix groups) according to a value range interval in which the maximum value of the absolute value of the coefficient matrix is located. In table 1, the value range interval is divided by using 2-ary system, and the granularity of the value range interval is 21And A to B represent the range of one value range interval. A-B represent a range of values for data greater than or equal to A and less than B. The 8 coefficient matrices share a quantization coefficient of 2 before grouping the weight channels4The numbers in the table represent how many elements of a coefficient matrix have their absolute values within the span (in practice only the span in which the maximum of the absolute values of the coefficient matrix is located is of interest).
TABLE 1
Figure BDA0001758857190000051
Figure BDA0001758857190000061
The numbers in table 1 represent the number of points (elements) in the coefficient matrix that are in different value range intervals. The coefficient matrix 1 will be described as an example. There are 6 elements in the coefficient matrix 1. Wherein the absolute value of 1 element is 22~23The value range, i.e. the point at which the absolute value of the coefficient matrix is maximum, has 1 element with an absolute value of 21~22In the range of value range, there are 2 elements with an absolute value of 20~21In the range of value range, there are 2 elements with an absolute value of 2-1~20The value range, i.e. the value range of the coefficient matrix 1 is 22~23. The meaning of the numbers corresponding to the other coefficient matrices is also similar. These 8 coefficient matrices share quantized coefficients 24. After the value domain interval where the absolute value maximum value of each coefficient matrix is located is determined according to the absolute value maximum value of each coefficient matrix, the M weight channels are divided into N groups according to the determined value domain interval, and the quantization coefficient corresponding to each group is determined. As shown in table 1, the coefficient matrices 1, 7, 3 and 6 are divided into one group, and the corresponding quantized coefficient is 24. The coefficient matrices 2, 5, 4 and 8 are divided into a group with a corresponding quantized coefficient of 25. Wherein, each group of corresponding quantization coefficients is multiplied by the fixed point number representing range of the maximum value of the absolute value in the coefficient matrix included in the group (the maximum value of the absolute value of all the coefficient matrixes included in the group) which should satisfy the requirement of the convolutional neural network. Taking 8-bit (bit) fixed-point quantization as an example, the result of multiplying the quantized coefficient by the maximum number of the maximum absolute values of all coefficient matrixes included in the coefficient matrix group is required to be smaller than the 8-bit fixed-point representation range (-128-127), and if the maximum absolute value of a coefficient matrix in a certain coefficient matrix group is 2, the coefficient matrix group corresponds to the quantized coefficient of 25. The quantized coefficients corresponding to each coefficient matrix group may be calculated by the above method, and thus, the quantized coefficients corresponding to each coefficient matrix group may be different.
It should be understood that table 1 is exemplary only and should not impose any limitations on the embodiments of the present application. For example, there may be more weight channels (coefficient matrices) in the ith convolutional layer, each coefficient matrix may include more elements, each coefficient matrix may include different numbers of elements, and a value range interval in which the maximum absolute value of the coefficient matrix is located may also be other value range intervals. The values of M and N may also be other values, etc. The embodiments of the present application are not limited thereto.
Fig. 2 is a schematic flow chart of grouping and quantizing a coefficient matrix of a weight channel of an i-th layer of a convolutional neural network according to another embodiment of the present application. In this embodiment, a value domain interval in which a maximum absolute value is located is determined by determining a maximum absolute value of a coefficient in a coefficient matrix of each weight channel, M weight channels are divided into L weight channel groups according to the value domain interval, the L weight channel groups are secondarily divided, finally, the L weight channel groups are divided into N weight channel groups, a quantization coefficient corresponding to each group is determined, and a coefficient matrix included in the weight matrix group is quantized according to the quantization coefficient. As shown in fig. 2, the flow includes the following steps.
S210: and taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer, determining a value range interval in which the maximum absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped.
S220: and dividing the determined weight channels with the same value range interval into a group to obtain L groups of weight channels.
Step S210 and step S220 will be described below with reference to specific examples.
Assuming that there are 8 weight channels of the ith convolutional layer, corresponding to 8 coefficient matrices, the coefficient matrices corresponding to the 8 weight channels are named as coefficient matrices 1 to 8, respectively. As shown in table 2, each coefficient matrix includes 6 elements. Table 2 is a table in which the 8 weight channels (coefficient matrix) are divided into 4(L is equal to 4) weight channel groups (coefficient matrix groups) according to the value range section in which the maximum value of the absolute value of the coefficient matrix is located. In table 2, the value range interval is divided by using 2-ary system, and the granularity of the value range interval is 21And A to B represent the range of one value range interval. A-B represent a range of values for data greater than or equal to A and less than B. The 8 coefficient matrices share a quantization coefficient of 2 before grouping the weight channels4The numbers in the table represent how many elements of a coefficient matrix have their absolute values within the value range.
TABLE 2
Figure BDA0001758857190000071
The numbers in table 2 represent the number of points (elements) in the coefficient matrix that are in different value range intervals. The coefficient matrix 1 will be described as an example. There are 6 elements in the coefficient matrix 1. Wherein the absolute value of 1 element is 22~23The value range, i.e. the point at which the absolute value of the coefficient matrix is maximum, has 1 element with an absolute value of 21~22In the range of value range, there are 2 elements with an absolute value of 20~21In the range of value range, there are 2 elements with an absolute value of 2-120 value range, namely the value range of the coefficient matrix 1 is 22~23. The meaning of the numbers corresponding to the other coefficient matrices is also similar. These 8 coefficient matrices share quantized coefficients 24. After the value domain interval where the absolute value maximum value of each coefficient matrix is located is determined according to the absolute value maximum value of each coefficient matrix, the M weight channels are divided into L groups according to the determined value domain interval, and the quantization coefficient corresponding to each group is determined. Coefficient matrices with the same value range interval can be determined as a group, that is, the coefficient matrices are grouped according to whether the value range interval in which the maximum absolute value of the coefficient matrices is located is consistent or not. Optionally, the coefficient matrices may be grouped in order from large to small according to a value range interval in which the maximum absolute value of the coefficient matrices is located. As shown in table 2, the coefficient matrices having the same value range interval are determined as one group, the coefficient matrices 1 and 7 are divided into one group, and the corresponding quantized coefficient is 24. The coefficient matrices 3, 6, 2 are divided into a group with a corresponding quantized coefficient of 25. The coefficient matrix 5 is divided individually into a group with a corresponding quantized coefficient of 26. The coefficient matrices 4 and 8 are divided into a group with a corresponding quantized coefficient of 27. The quantized coefficients corresponding to each coefficient matrix group may be calculated by the above method, and thus, the quantized coefficients corresponding to each coefficient matrix group may be different.
It should be understood that table 2 is only exemplary and should not impose any limitation on the embodiments of the present application. For example, there may be more weight channels (coefficient matrices) in the ith convolutional layer, each coefficient matrix may include more elements, each coefficient matrix may include different numbers of elements, the value range interval in which the maximum absolute value of the coefficient matrix is located may also be other value range intervals, the values of M and L may also be other values, and so on. The embodiments of the present application are not limited thereto.
It should also be understood that, in the embodiment of the present application, in addition to determining coefficient matrices having the same value range interval as the same group, grouping may be performed by using other methods according to the value range interval. For example, the coefficient matrices having the maximum absolute value of the coefficient matrices within a preset range may be divided into a group, sharing the quantized coefficients. Alternatively, the coefficient matrices in which the difference value of the value range section in which the maximum absolute value is located is within a preset range in the coefficient matrices may be divided into one group, and quantized coefficients may be shared. The embodiments of the present application are not limited thereto.
S230: dividing the L groups of weight channels into N groups according to the size sequence of the value domain interval, and determining a quantization coefficient corresponding to each group;
in S230, the L groups are divided into N groups, where each group in the N groups corresponds to a different second quantized coefficient, and N is a positive integer less than or equal to L. That is, in S230, after dividing the coefficient matrix of the i-th convolutional layer into L coefficient matrix groups, the second grouping is performed, where the L groups are divided into N groups, each of the N groups corresponds to one quantized coefficient, and the quantized coefficients corresponding to each group may be different. N is a positive integer less than or equal to L.
Because the precision of the convolution model after load balancing and quantization after grouping needs to be considered, the existence of the coefficient matrix with a larger absolute value in the coefficient matrix needs to be ensured as much as possible, because the coefficient matrix with the larger absolute value has a larger influence on the precision and the quantization of the convolution calculation result. The existence of the coefficient matrix with a large absolute value in the coefficient matrix needs to be ensured, and the requirement that the result of multiplying each group of corresponding quantized coefficients by the coefficient matrix included in the group should meet the fixed point number representation range required by the convolutional neural network is also met. Therefore, in the case of ensuring the presence of the coefficient matrix having a larger absolute value among the coefficient matrices, the quantized coefficient of the group of the coefficient matrix having a larger absolute value among the coefficient matrices should be the minimum value of the quantized coefficients of the respective coefficient matrices included in the group of the coefficient matrix having a larger absolute value. Otherwise, the multiplication result of the two is larger than the fixed point number representation range required by the convolutional neural network.
For example, assume that the first of the L groups includes a coefficient matrix 1, 2, and the first group has a corresponding quantized coefficient of 25The second of the L groups comprises a coefficient matrix 3, the second group corresponding to a quantized coefficient of 26. Now the L groups are grouped (combined) again, and the first group and the second group of the L groups are combined into the first group of the N groups, the quantized coefficient of the first group of the N groups becomes 25I.e. the minimum value of the first quantized coefficient corresponding to the coefficient matrix included in the first group of the N groups, the second quantized coefficient of the first group of the N groups is 25
Optionally, as an embodiment, the number of channels of the weight channel included in each of the N groups is the same.
Specifically, in the process of dividing L groups or M groups into N groups, the value of N may be the upper limit value of the i-th convolutional layer, and load balancing may be understood as: when matrix addition is finally performed, since M weight channels are divided into N groups, convolution results of the N groups need to be added, and all groups need to be calculated before a final result of the convolutional layer is obtained. If a parallel method is adopted for grouping operation, that is, the number of channels included in each group is the same or the difference is not large, the calculation result can be obtained relatively quickly, and therefore, load balancing can be understood as that the number (number) of channels of the weight channel (coefficient matrix) included in each group in the N groups is the same. In the implementation mode, due to the consideration of load balancing, the calculation rate can be improved, and the time for calculation is reduced.
Examples shown in table 2 were combined. In the case of considering load balancing, as shown in table 3, table 3 is a process of grouping again the coefficient matrices grouped in table 3 in consideration of load balancing. Assume that the upper limit of load balancing is 2 and the value of N is the upper limit of load balancing 2.
TABLE 3
Figure BDA0001758857190000091
Similar to table 2, the value range interval is divided by using 2-ary system, and the granularity of the value range interval is 21. According to load balancing and interval proximity principle. The coefficient matrices 1, 7, 3 and 6 are divided into a group with a corresponding quantized coefficient of 24. The coefficient matrices 2, 5, 4 and 8 are divided into a group with a corresponding quantized coefficient of 25. It can be seen that after L groups are combined into N groups according to load balancing, the number of coefficient matrices included in each group is the same.
It should be understood that table 3 is exemplary only and should not impose any limitations on the embodiments of the present application. For example, there may be more weight channels (coefficient matrices) in the ith convolutional layer, each coefficient matrix may include more elements, each coefficient matrix may include different numbers of elements, a value range interval in which the maximum absolute value of the coefficient matrix is located may also be another value range interval, the values of L and N may also be other values, each group of corresponding quantized coefficients may also be other values, the number of coefficient matrices included in each group may also be other values, and the like. The embodiments of the present application are not limited thereto.
As another implementation manner, in step S230, the L weight channel groups are further divided into N groups, and the following steps may also be taken to implement.
Step 1: determining the first N-1 groups in the L groups as the first N-1 groups of the N groups according to the sequence of the maximum upper bound of the value range interval in which the L groups are positioned from large to small;
step 2: the groups of the L groups except the first N-1 groups are determined as the last group of the N groups.
Specifically, after the coefficient matrix is divided into L groups, if there is no need to consider load balancing of the accelerator in the convolutional neural network system, the L groups need to be divided (merged) twice according to the quantization grouping upper limit value. Therefore, in dividing the L groups into N groups, the value of N may be determined according to the quantization grouping upper limit. I.e., the value of N may be different from the quantization packet upper limit. Optionally, the value of N is a quantization grouping upper limit value. The quantized coefficients for each of the N groups are different. After the coefficient matrixes are divided into L groups, since the value range intervals of each group of coefficient matrixes are different, the value range intervals of the maximum absolute values of the coefficient matrixes are ordered in a monotonically increasing manner, that is, the L groups are arranged in sequence from the largest value range interval to the smallest value range interval. The first N-1 of the L groups is then determined to be the first N-1 of the N groups. The groups of the L groups except the first N-1 groups are determined as the last group of the N groups.
For example, assuming that L is equal to 4, the values are sorted from the largest to the smallest in the order of the value range interval, namely, the first group (labeled as L1), the second group (labeled as L2), the third group (labeled as L3), and the fourth group (labeled as L4). Assume that N is equal to 3 (labeled N1, N2, and N3, respectively). The first and second of the L groups are determined to be the first two of the N groups (N1 and N2), respectively, i.e., L1 is actually N1 and L2 is actually N2. The quantized coefficient of N1 is the same as the quantized coefficient corresponding to L1, and the quantized coefficient of N2 is the same as the quantized coefficient corresponding to L2. The groups of the L groups except the first N-1 groups are determined as the last group of the N groups. The groups of the L groups other than the first 2 groups were L3 and L4, and L3 and L4 were combined into the last group of the N groups, i.e., L3 plus L4 corresponded to N3. Each of the N groups corresponds to a quantized coefficient. And the quantized coefficients corresponding to each of the N groups are different.
It should be understood that the L groups may be divided into N groups using other methods than the above-described method of dividing the L groups into N groups. For example, as in the above example, L1 and L2 may be combined into one of N groups, and L3 and L4 may be determined as the other two of the N groups, respectively. Alternatively, any one or more coefficient matrices in the L groups may be combined to obtain N groups. The embodiments of the present application are not limited thereto.
Because the precision of the convolution model after grouping and quantization needs to be considered, the existence of the coefficient matrix with a larger absolute value in the coefficient matrix needs to be ensured as much as possible, because the coefficient matrix with a larger absolute value has a larger influence on the precision of the convolution calculation result and the quantization. The existence of the coefficient matrix with a large absolute value in the coefficient matrix needs to be ensured, and the requirement that the result of multiplying each group of corresponding quantized coefficients by the coefficient matrix included in the group should meet the fixed point number representation range required by the convolutional neural network is also met. Therefore, in the case of ensuring the presence of the coefficient matrix having a larger absolute value among the coefficient matrices, the quantized coefficient of the group of the coefficient matrix having a larger absolute value among the coefficient matrices should be the minimum value of the quantized coefficients of the respective coefficient matrices included in the group of the coefficient matrix having a larger absolute value. Otherwise, the multiplication result of the two is larger than the fixed point number representation range required by the convolutional neural network.
The process of dividing L groups into N groups in this implementation will be described below with a specific example.
In connection with the example shown in table 3, as shown in table 4, table 4 is a process of grouping again the coefficient matrices grouped in table 2 without considering load balancing. Assume that the quantization packet upper limit value is 2 and the value of N is the quantization packet upper limit value 2.
TABLE 4
Figure BDA0001758857190000111
Similar to table 3, the value range interval is divided by using 2-ary system, and the granularity of the value range interval is 21. Grouping upper limit values according to quantization. And determining the first 1 group in the 4 groups as the first 1 group of the 2 groups and determining the last three groups except the first 1 group in the 4 groups as the last group of the 2 groups in descending order of the value range interval in which the 4 groups are positioned. I.e. the coefficient matrices 1, 7 are divided into a group with corresponding quantized coefficients of 24. The coefficient matrices 3, 6, 2, 5, 4 and 8 are divided into one group with a corresponding quantized coefficient of 25
It should be understood that table 4 is exemplary only and should not impose any limitations on the embodiments of the present application. For example, there may be more coefficient matrices of the i-th convolutional layer, each coefficient matrix may include more elements, each coefficient matrix may include different numbers of elements, a value range interval in which the maximum absolute value of the coefficient matrix is located may also be another value range interval, the values of L and N may also be other values, each group of corresponding quantized coefficients may also be other values, the number of coefficient matrices included in each group may also be other values, and the like. The embodiments of the present application are not limited thereto.
S240: and quantizing the coefficient matrix of each weight channel included in each weight channel group according to the quantization coefficient of each weight channel group.
In S240, the coefficient matrix of each channel of each set of weight channels is quantized according to the quantization coefficients of each set of weight channels. That is, according to the quantization coefficient of each coefficient matrix group in the N coefficient matrix groups, the coefficient matrices included in the N coefficient matrix groups are quantized respectively. And performing convolution calculation by using the grouped and quantized data to obtain a convolution calculation result of the ith convolution layer. It should be understood that, when the coefficient matrixes included in each of the N coefficient matrix groups are quantized according to the quantization coefficients of the coefficient matrix group, the quantization may be performed by using any feasible quantization formula or quantization method. The embodiments of the present application are not limited thereto.
FIG. 3 is a schematic flow chart diagram of a method of convolution operation of a neural network according to an embodiment of the present application. As shown in fig. 3, the method includes:
s310: and acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolutional layer.
In S310, data of a data channel corresponding to each weight channel in the nth weight channel group is obtained from data to be input into the ith convolutional layer. It is understood that the value of N herein may be a positive integer less than or equal to N. In the example shown in table 1, n may take any one of values 1 to 4. And acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from data to be input into the ith convolutional layer (input data of the ith convolutional layer). For example, when n is 2, the data of the data channel corresponding to each weight channel in the 2 nd weight channel group is obtained from the data to be input into the ith convolutional layer. I.e. the process of inputting data into the i-th convolutional layer as a packet input. Each time, only input data corresponding to a set of weight channel groups is input to the i-th convolutional layer.
S320: inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel.
In S320, for example, when n is 2, the obtained input data corresponding to the 2 nd weight channel group is input to the i-th convolutional layer, convolution operation is performed on the input data to obtain a convolution calculation result corresponding to each channel in the 2 nd weight channel group, and the convolution calculation results corresponding to each channel in the 2 nd weight channel group are summed.
S330: and performing inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group.
In S330, since the coefficient matrix of the 2 nd weight channel group is quantized in the process of calculating the convolution calculation result corresponding to the 2 nd weight channel group, the quantization coefficient used for quantization is the quantization coefficient corresponding to each coefficient matrix in the 2 nd weight channel group. In order to ensure that the value range of the quantized calculation result is consistent with the value range of the calculation result which is not quantized, the reversibility of quantization is kept. The convolution calculation needs to be dequantized. Therefore, in S330, the summation result of convolution calculation corresponding to the 2 nd weight channel group needs to be subjected to inverse quantization calculation according to the quantization coefficient corresponding to each channel in the 2 nd weight channel group, so as to obtain an inverse quantization calculation result corresponding to the 2 nd weight channel group.
S340: and adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain the calculation result corresponding to the nth weight channel group.
For step S340, the result of inverse quantization calculation is added to the calculation result of the data channel corresponding to each value channel in the n-1 th group of weight channel groups (the calculation result of inverse quantization corresponding to the n-1 th group of weight channel groups). It should be noted that in the case where n is 1, since there is no previous weight channel group, this operation is not required.
S350: and calculating the calculation result corresponding to the (n + 1) th weight channel group until all the data to be input into the ith convolution layer are calculated.
For step S350, if N is less than N, the values are sequentially calculated in a convolution calculation for the nth weight channel group. That is, steps S110 to S140 are repeated in turn for each weight channel group until the value N is calculated to be N. In S350, the same calculation is performed on the data of the data channel corresponding to each weight channel in the (N + 1) th weight channel group as the data corresponding to each weight channel in the (N) th weight channel group until the value of N +1 is N, that is, until all the data to be input into the i-th convolutional layer are calculated, the final calculation result of the i-th convolutional layer is obtained.
Fig. 4 is a schematic flow chart of convolution calculation of the quantized weight channel (coefficient matrix) in a packet according to an embodiment of the present application. As shown in fig. 4, for the i-th convolutional layer, the coefficient matrices of the i-th convolutional layer are quantized in groups. Fig. 4 shows a process of block quantization of the weight coefficient matrix. It should be understood that the input data to the i-th convolutional layer also needs to be grouped, and optionally, the grouped input data may also be quantized. It should be understood that when the ith convolutional layer is the first convolutional layer, the input data of the 1 st convolutional layer is the original input picture input into the system. And when i is larger than 1, the input data of the ith convolutional layer is characteristic diagram data. Grouping the input data to obtain grouped input data. And grouping and quantizing the weights to obtain grouped and quantized weights. It is assumed that the weight coefficient matrix of the i-th convolutional layer is divided into 3 groups and quantized. Then, convolution calculation (calculation can be performed in a matrix multiplier) is performed on the grouped input data and the grouped quantized weight obtained by each group respectively, and respective convolution calculation results (corresponding to three groups) of the three groups are obtained. Because the grouping quantization of the weight is carried out before, in order to ensure that the value range of the calculation result after the grouping quantization is consistent with the value range of the calculation result without the grouping quantization, the reversibility of the quantization is kept. Each set of convolution calculations needs to be inverse-quantized in weight groups. After the weight grouping dequantization is performed, since it is required to ensure that the data meets the fixed point number requirement (data range requirement) of the convolutional neural network, the feature map quantization (which can be performed in a feature map quantizer) can be performed on the result of the weight grouping dequantization. And respectively quantizing the feature maps of the results obtained by the grouping inverse quantization of the three groups of weights to obtain the quantized results of the three feature maps. The quantized results of these three signatures are then added (which may be done in a matrix adder) to obtain the addition result (convolution calculation result of the i-th convolutional layer). Finally, the addition result is added to the offset value of the ith convolutional layer (which may be performed in an adder), so as to obtain the final result of the ith convolutional layer (i.e., the characteristic map data of the ith convolutional layer).
Optionally, the offset of the i-th convolutional layer may also be quantized, then the quantized offset is corrected (may be performed in a quantization corrector), and the result obtained by adding the corrected offset and the grouping matrix is added to obtain the convolution calculation result of the i-th convolutional layer (that is, the feature map data of the i-th convolutional layer).
When the i +1 th convolutional layer is also a convolutional layer which needs to be subjected to grouping quantization, feature map quantization is performed on the convolution calculation result of the i-th convolutional layer, and then the i +1 th convolutional layer grouping quantization and convolution calculation are performed by using feature map data (input data of the i +1 th convolutional layer) of the i-th convolutional layer after feature map quantization, the weight of the i +1 th convolutional layer after grouping quantization and the offset of the i +1 th convolutional layer, and the flow is similar to the process flow of the i-th convolutional layer. And obtaining the characteristic diagram data of the ith convolution layer.
When the i +1 th convolutional layer is a convolutional layer which does not need to be subjected to grouping quantization, the convolution calculation result of the i-th convolutional layer may be directly input to the i +1 th convolutional layer and convolution calculation may be performed as input data of the i +1 th convolutional layer. Optionally, feature map quantization may be performed on convolution calculation results of the i-th convolutional layer, and then the result after feature map quantization is input to the i + 1-th convolutional layer, which is used as input data of the i + 1-th convolutional layer to perform convolution calculation. The (i + 1) th convolutional layer may be a convolutional layer that requires quantization, or may be a convolutional layer that does not require quantization. The embodiments of the present application are not limited thereto.
When the ith convolutional layer is the last convolutional layer, the convolution calculation result of the ith convolutional layer is subjected to feature map inverse quantization (which can be performed in a feature map inverse quantizer), and the result obtained after the feature map inverse quantization is the output result of all convolutional layers.
For the layer 1 convolutional layer, the grouped pictures and the grouped and quantized weights of the layer 1 convolutional layer are subjected to convolution calculation. For other convolutional layers, the grouped feature map data and the grouped and quantized weight of the convolutional layer are subjected to convolution calculation. That is, for the convolution calculation of an original picture, the original picture is only grouped once, and for the layer 1 convolution layer, the grouped picture is input. The steps shown in the dashed box in fig. 4 are for the layer 1 convolutional layer only, the original input picture is block quantized. For the other convolutional layers, the convolutional calculation results (feature maps) of the convolutional layer of the previous layer are grouped.
It should be appreciated that the process flow shown in fig. 4 is primarily directed to a process flow in which the bias is not subject to packet quantization. Fig. 4 is only an example, and should not set any limit to the embodiments of the present application. For example, certain steps may be added to the process flow, certain steps may be removed, and the like, which are not required. The embodiments of the present application are not limited thereto.
FIG. 5 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to another embodiment of the present application. As shown in fig. 5, for the i-th convolutional layer, the weighting coefficient matrix of the i-th convolutional layer is quantized in groups. As shown in fig. 5, the quantization of the weight coefficient matrix, the quantization of the offset coefficient matrix, and the grouping of the input data to the i-th convolutional layer are performed. It should be understood that when the ith convolutional layer is the first convolutional layer, the input data of the 1 st convolutional layer is the original input picture input into the system. And when i is larger than 1, the input data of the ith convolutional layer is characteristic diagram data. And respectively carrying out grouping quantization on the bias and the weight to obtain the bias and the weight after the grouping quantization. It is assumed that the weight coefficient matrix of the i-th convolutional layer is divided into 3 groups and quantized. The input data for each group is then convolved with the quantized weights (which may be calculated in a matrix multiplier) to obtain the results of three convolution calculations (corresponding to three groups of groups). Then, the result of the convolution calculation obtained by each group is added to the quantized offset of the group, so as to obtain the intermediate calculation result of each group (three intermediate calculation results in total, corresponding to three groups). For each grouping intermediate calculation result, because the grouping quantization of the weight is performed before, in order to ensure that the value range of the calculation result after the grouping quantization is consistent with the value range of the calculation result without the grouping quantization, the reversibility of the quantization is maintained. The intermediate calculation results of each group need to be subjected to weight grouping inverse quantization respectively. After the weight grouping inverse quantization is performed, since it is required to ensure that the data meets the fixed point number requirement (data range requirement) of the convolutional neural network, it is required to perform the weight grouping inverse quantization on each group to obtain a result, and perform the feature map quantization (which can be performed in a feature map quantizer). And respectively quantizing the feature maps of the results obtained by the grouping inverse quantization of the three groups of weights to obtain the quantized results of the three feature maps. Finally, the results of quantization of the three feature maps are added (which corresponds to the addition of the calculation results obtained by all the grouping after grouping), and the convolution calculation result of the i-th convolutional layer (i-th convolutional layer feature map data) is obtained. Alternatively, the offset after the block quantization of the i-th convolutional layer may be corrected (may be performed in a quantization corrector), and the calculation may be performed using the corrected offset.
When the i +1 th convolutional layer is also a convolutional layer which needs to be subjected to grouping quantization, feature map quantization is performed on the convolution calculation result of the i-th convolutional layer, and then the i +1 th convolutional layer grouping quantization and convolution calculation are performed by using feature map data (input data of the i +1 th convolutional layer) of the i-th convolutional layer after feature map quantization, the weight of the i +1 th convolutional layer after grouping quantization and the offset of the i +1 th convolutional layer after grouping quantization, and the flow is similar to the process flow of the i-th convolutional layer. And obtaining the characteristic diagram data of the ith convolution layer.
When the i +1 th convolutional layer is a convolutional layer which does not need to be subjected to grouping quantization, the convolution calculation result of the i-th convolutional layer may be directly input to the i +1 th convolutional layer and convolution calculation may be performed as input data of the i +1 th convolutional layer. Optionally, feature map quantization may be performed on convolution calculation results of the i-th convolutional layer, and then the result after feature map quantization is input to the i + 1-th convolutional layer, which is used as input data of the i + 1-th convolutional layer to perform convolution calculation. The (i + 1) th convolutional layer may be a convolutional layer that requires quantization, or may be a convolutional layer that does not require quantization. The embodiments of the present application are not limited thereto.
When the ith convolutional layer is the last convolutional layer, the convolution calculation result of the ith convolutional layer is subjected to feature map inverse quantization (which can be performed in a feature map inverse quantizer), and the result obtained after the feature map inverse quantization is the output result of all convolutional layers.
For the layer 1 convolutional layer, the grouped pictures and the grouped and quantized weights of the layer 1 convolutional layer are subjected to convolution calculation. For other convolutional layers, the grouped feature map data and the grouped and quantized weight of the convolutional layer are subjected to convolution calculation. That is, for the convolution calculation of an original picture, the original picture is only grouped once, and for the layer 1 convolution layer, the grouped picture is input. The steps shown in the dashed box in fig. 5 are for the layer 1 convolutional layer only, the original input pictures are grouped. For the other convolutional layers, the convolutional calculation results (feature maps) of the convolutional layer of the previous layer are grouped.
It should be understood that the process flow shown in fig. 5 is primarily directed to the process flow in the case where the bias also requires block quantization. Unlike the processing flow shown in fig. 4, the flow shown in fig. 5 is a process in which the addition process of the convolution calculation results and the offset for each group is set before the convolution calculation results for all the groups are accumulated, and finally the addition process of the convolution calculation results for all the groups is performed. The processing flow shown in fig. 4 is to add the convolution calculation results obtained by all the grouping to obtain the convolution calculation result of the ith convolution layer, and finally add the convolution calculation of the ith convolution layer and the offset of the ith convolution layer to obtain the convolution calculation result of the ith convolution layer.
It should also be understood that fig. 5 is only exemplary and should not impose any limitations on the embodiments of the present application. For example, certain steps may be added to the process flow, certain steps may be removed, and the like, which are not required. The embodiments of the present application are not limited thereto.
FIG. 6 is a schematic flow chart diagram illustrating convolution calculation for grouping quantized weight channels according to another embodiment of the present application. As shown in fig. 6, for the i-th convolutional layer, the coefficient matrices of the i-th convolutional layer are quantized in groups. As shown in fig. 6, the quantization of the weight coefficient matrix, the quantization of the offset coefficient matrix, and the grouping of the input data to the i-th convolutional layer are performed. It should be understood that when the ith convolutional layer is the first convolutional layer, the input data of the 1 st convolutional layer is the original input picture input into the system. And when i is larger than 1, the input data of the ith convolutional layer is characteristic diagram data. And respectively carrying out grouping quantization on the bias and the weight to obtain the bias and the weight after the grouping quantization. It is assumed that the coefficient matrix of the i-th convolutional layer is divided into 3 groups and quantized. And then, performing convolution calculation on the input data after grouping quantization and the weights after grouping quantization which are obtained by each group respectively to obtain grouping convolution results of three groups (corresponding to three groups). Because the grouping quantization of the weight is carried out before, in order to ensure that the value range of the calculation result after the grouping quantization is consistent with the value range of the calculation result without the grouping quantization, the reversibility of the quantization is kept. The obtained convolution result of each group needs to be subjected to weight grouping inverse quantization to obtain the result of each group subjected to weight grouping inverse quantization. Then, the result (intermediate result) of the dequantization of each group of weights is added to the quantized offset of the corresponding group, and the convolution calculation result (three convolution calculation results in total, corresponding to three groups) of each group is obtained. For each block convolution calculation result, feature map quantization (which may be performed in a feature map quantizer) needs to be performed on each block convolution calculation result. And respectively carrying out characteristic diagram grouping quantization on the three groups of convolution calculation results to obtain three characteristic diagram quantization results. Finally, the results of quantization of the three feature maps are added (which corresponds to addition of convolution calculations obtained by all the groups after grouping), and the convolution calculation result of the i-th convolutional layer (feature map data of the i-th convolutional layer) is obtained. Alternatively, the offset after the block quantization of the i-th convolutional layer may be corrected (may be performed in a quantization corrector), and the calculation may be performed using the corrected offset.
When the i +1 th convolutional layer is also a convolutional layer which needs to be grouped and quantized, the feature map grouping is performed on the convolution calculation result of the i-th convolutional layer, and then the i +1 th convolutional layer grouping quantization and convolution calculation are performed by using the feature map data (input data of the i +1 th convolutional layer) of the i-th convolutional layer after the feature map grouping, the weight of the i +1 th convolutional layer after the grouping quantization and the offset of the i +1 th convolutional layer after the grouping quantization, and the flow is similar to the process flow of the i-th convolutional layer. And obtaining the characteristic diagram data of the ith convolution layer.
When the i +1 th convolutional layer is a convolutional layer which does not need to be subjected to grouping quantization, the convolution calculation result of the i-th convolutional layer may be directly input to the i +1 th convolutional layer and convolution calculation may be performed as input data of the i +1 th convolutional layer. Optionally, feature map quantization may be performed on convolution calculation results of the i-th convolutional layer, and then the result after feature map quantization is input to the i + 1-th convolutional layer, which is used as input data of the i + 1-th convolutional layer to perform convolution calculation. The (i + 1) th convolutional layer may be a convolutional layer that requires quantization, or may be a convolutional layer that does not require quantization. The embodiments of the present application are not limited thereto.
When the ith convolutional layer is the last convolutional layer, the convolution calculation result of the ith convolutional layer is subjected to feature map inverse quantization (which can be performed in a feature map inverse quantizer), and the result obtained after the feature map inverse quantization is the output result of all convolutional layers.
For layer 1 convolutional layer, the grouped pictures are matrix multiplied by the grouped weights of the layer 1 convolutional layer. For other convolutional layers, the grouped feature map data and the grouped and quantized weight of the convolutional layer are subjected to convolution calculation. That is, for the convolution calculation of an original picture, the original picture is only grouped once, and for the layer 1 convolution layer, the grouped picture is input. The steps shown in the dashed box in fig. 6 are for the layer 1 convolutional layer only, the original input pictures are grouped. For the other convolutional layers, the convolutional calculation results (feature maps) of the convolutional layer of the previous layer are grouped.
It should also be understood that fig. 6 is only exemplary and should not impose any limitations on the embodiments of the present application. For example, certain steps may be added to the process flow, certain steps may be removed, and the like, which are not required. The embodiments of the present application are not limited thereto.
It should also be understood that, in the embodiment of the present application, besides the flows of the above-described grouping quantization convolution calculation, other processing flows are also possible, for example, weight grouping inverse quantization may be set after feature map grouping quantization, and the like. The embodiments of the present application are not limited thereto.
It should also be understood that the above description is only for the purpose of facilitating a better understanding of the embodiments of the present application by those skilled in the art, and is not intended to limit the scope of the embodiments of the present application. Various equivalent modifications or changes will be apparent to those skilled in the art in light of the above examples given, for example, some of the steps described in the method 100 and fig. 4-6 above may not be necessary, or some steps may be newly added, etc. Or a combination of any two or more of the above embodiments. Such modifications, variations, or combinations are also within the scope of the embodiments of the present application.
It should also be understood that the foregoing descriptions of the embodiments of the present application focus on highlighting differences between the various embodiments, and that the same or similar elements that are not mentioned may be referred to one another and, for brevity, are not repeated herein.
It should also be understood that the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic thereof, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The method of convolution operation of the neural network according to the embodiment of the present application is described in detail above with reference to fig. 1 to 6. Hereinafter, the neural network convolution operation device according to the present application will be described in detail with reference to fig. 7 to 8.
Fig. 7 shows a schematic block diagram of an apparatus 400 for convolution operation of a neural network according to an embodiment of the present application, where an ith convolution layer of the neural network includes M weight channels, the M weight channels are divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group has the same quantization coefficient, M is greater than or equal to 2, N is less than or equal to M, and the ith convolution layer is any convolution layer of the neural network. The modules or units in the apparatus 400 are respectively configured to perform the actions or processes in the methods 100 to 300, and as shown in fig. 7, the apparatus 400 may include: a transmission module 410, a calculation module 420, and a grouping module 430. And the modules are in communication connection.
A transmission module 410, configured to obtain data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolutional layer;
inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel;
a calculating module 420, configured to perform inverse quantization calculation on the summation result according to a quantization coefficient corresponding to each channel in the nth weight channel group;
adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain a calculation result corresponding to the nth weight channel group;
and calculating the calculation result corresponding to the (n + 1) th weight channel group until all the data to be input into the ith convolution layer are calculated.
The grouping module 430 is configured to:
taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer;
determining a value range interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped;
dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group, wherein the calculation module is further used for:
the calculation module is further configured to 420: and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.
Optionally, as an embodiment, the grouping module 430 is further configured to:
dividing the determined weight channels with the same value domain interval into a group to obtain L groups of weight channels;
and dividing the L groups of weight channels into N groups according to the size sequence of the value range interval.
Optionally, as an embodiment, the grouping module 430 is specifically configured to:
the first N-1 groups of the L groups are determined as the first N-1 groups of the N groups;
and determining the groups except the first N-1 groups in the L groups as the Nth group of the N groups.
The apparatus 400 provided in this embodiment of the present application can implement the method flows described in fig. 1 to fig. 6, specifically, the transmission module 410 can implement steps S310 and S320, the calculation module 420 can implement steps S130, S240, S330, S340, and S350, and the grouping module 430 can implement steps S110, S120, S210, S220, and S230, and for the description of the method that can be implemented by the above modules, reference may be made to the description in the corresponding method flows described above. To avoid repetition, further description is omitted here.
The application also provides a device for convolution operation of the neural network. As shown in fig. 8, the apparatus 500 includes a processor 510 and a memory 520 for supporting the apparatus to perform the corresponding functions of the above-described method. The processor and the memory are connected through communication, and the memory stores instructions, and the processor is used for calling the instructions to realize the convolution operation method of the neural network of the above-mentioned various embodiments.
The present embodiments also provide a computer readable medium for storing a computer program code, the computer program including instructions for executing the method of the present embodiments in the methods 100 to 300 described above. The readable medium may be a read-only memory (ROM) or a Random Access Memory (RAM), which is not limited in this embodiment of the present application.
The present application also provides a computer program product comprising instructions that, when executed, cause an apparatus to perform operations corresponding to the above-described methods.
The present application further provides a computer system including a chip or an apparatus for performing the method of neural network convolution calculation according to the embodiments of the present application. The chip or the device may be the convolutional neural network system provided in the present application.
An embodiment of the present application further provides a system chip, where the system chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute computer instructions to enable a chip in the communication device to execute any one of the methods for calculating the neural network convolution provided by the embodiments of the present application.
Optionally, the computer instructions are stored in a storage unit.
Alternatively, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the terminal, such as a ROM or other types of static storage devices that can store static information and instructions, a RAM, and the like. The processor mentioned in any of the above may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for executing programs for controlling the above method for quantizing a convolutional neural network. The processing unit and the storage unit may be decoupled, and are respectively disposed on different physical devices, and are connected in a wired or wireless manner to implement respective functions of the processing unit and the storage unit, so as to support the system chip to implement various functions in the foregoing embodiments. Alternatively, the processing unit and the memory may be coupled to the same device.
It should be understood that the foregoing descriptions of the embodiments of the present application focus on highlighting differences between the various embodiments, and that the same or similar parts that are not mentioned may be referred to one another, and thus, for brevity, will not be described again.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A method of convolution operation of a neural network, the method being performed by a computer device, an i-th convolutional layer of the neural network including M weight channels, the M weight channels being divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group having the same quantization coefficient, the M being greater than or equal to 2, the N being less than or equal to M, the i-th convolutional layer being any convolutional layer of the neural network, the method comprising:
acquiring data of a data channel corresponding to each weight channel in the nth weight channel group from the data to be input into the ith convolution layer;
inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel;
performing inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group;
adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain a calculation result corresponding to the nth weight channel group;
and calculating a calculation result corresponding to the (n + 1) th weight channel group until all data to be input into the ith convolution layer are calculated.
2. The method of claim 1, further comprising:
taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer;
determining a value range interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, presetting a plurality of value range intervals in the computer device, wherein each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped;
dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group;
and quantizing the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.
3. The method of claim 2, wherein the dividing the M weight channels into N groups according to the determined value range interval comprises:
dividing the determined weight channels with the same value domain interval into a group to obtain L groups of weight channels;
and dividing the L groups of weight channels into N groups according to the size sequence of the value domain interval.
4. The method according to claim 3, wherein the dividing the L groups of weight channels into N groups according to the size order of the value range interval comprises:
determining a first N-1 group of the L groups as a first N-1 group of the N groups;
determining a group other than the first N-1 groups among the L groups as an Nth group of the N groups.
5. An apparatus of convolutional operation of a neural network, wherein an i-th convolutional layer of the neural network includes M weight channels, the M weight channels are divided into N weight channel groups according to a coefficient matrix of each weight channel, the coefficient matrix of the weight channel in each weight channel group has the same quantization coefficient, M is greater than or equal to 2, and N is less than or equal to M, the i-th convolutional layer is any convolutional layer of the neural network, the apparatus includes:
a transmission module, configured to obtain data of a data channel corresponding to each weight channel in an nth weight channel group from data to be input to the ith convolutional layer;
inputting the data of the acquired data channel into the ith convolution layer, performing convolution operation on the input data of each data channel, and summing the convolution operation results of each data channel;
the calculation module is used for carrying out inverse quantization calculation on the summation result according to the quantization coefficient corresponding to each channel in the nth weight channel group;
adding the result of the inverse quantization calculation and the calculation result corresponding to the (n-1) th weight channel group to obtain a calculation result corresponding to the nth weight channel group;
and calculating a calculation result corresponding to the (n + 1) th weight channel group until all data to be input into the ith convolution layer are calculated.
6. The apparatus of claim 5, further comprising a grouping module to,
taking the maximum absolute value of the coefficient in the coefficient matrix of each weight channel of the ith convolutional layer;
determining a value range interval in which the maximum value of the absolute value of the coefficient matrix of each weight channel falls, wherein a plurality of value range intervals are preset in the computer device, each value range interval represents a continuous segment of data, and the data of each value range interval are not overlapped;
dividing the M weight channels into N groups according to the determined value range interval, and determining a quantization coefficient corresponding to each group;
the calculation module is further configured to quantize the coefficient matrix of each weight channel of each group of weight channels according to the quantization coefficient of each group of weight channels.
7. The apparatus of claim 6, wherein the grouping the M weight channels into N groups according to the determined value range interval comprises:
dividing the determined weight channels with the same value domain interval into a group to obtain L groups of weight channels;
and dividing the L groups of weight channels into N groups according to the size sequence of the value domain interval.
8. The apparatus according to claim 7, wherein the dividing the L groups of weight channels into N groups according to the size order of the value range interval comprises:
determining a first N-1 group of the L groups as a first N-1 group of the N groups;
determining a group other than the first N-1 groups among the L groups as an Nth group of the N groups.
9. An apparatus for neural network convolution operations, the apparatus comprising: a processor and a memory, the memory for storing instructions, the processor for reading and executing the instructions in the memory to perform the method of any of claims 1 to 4.
CN201810898766.4A 2018-08-08 2018-08-08 Method and device for convolution calculation of neural network Pending CN110826685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810898766.4A CN110826685A (en) 2018-08-08 2018-08-08 Method and device for convolution calculation of neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810898766.4A CN110826685A (en) 2018-08-08 2018-08-08 Method and device for convolution calculation of neural network

Publications (1)

Publication Number Publication Date
CN110826685A true CN110826685A (en) 2020-02-21

Family

ID=69540751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810898766.4A Pending CN110826685A (en) 2018-08-08 2018-08-08 Method and device for convolution calculation of neural network

Country Status (1)

Country Link
CN (1) CN110826685A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461302A (en) * 2020-03-30 2020-07-28 杭州嘉楠耘智信息科技有限公司 Data processing method, device and storage medium based on convolutional neural network
CN111723924A (en) * 2020-05-28 2020-09-29 西安交通大学 Deep neural network accelerator based on channel sharing
CN111767980A (en) * 2019-04-02 2020-10-13 杭州海康威视数字技术股份有限公司 Model optimization method, device and equipment
CN111898081A (en) * 2020-07-09 2020-11-06 上海兆芯集成电路有限公司 Convolution operation method and convolution operation device
CN111950713A (en) * 2020-08-23 2020-11-17 云知声智能科技股份有限公司 Method and device for increasing running speed of channel random mixed operation
WO2021169914A1 (en) * 2020-02-24 2021-09-02 中科寒武纪科技股份有限公司 Data quantification processing method and apparatus, electronic device and storage medium
CN114254740A (en) * 2022-01-18 2022-03-29 长沙金维信息技术有限公司 Convolution neural network accelerated calculation method, calculation system, chip and receiver

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767980B (en) * 2019-04-02 2024-03-05 杭州海康威视数字技术股份有限公司 Model optimization method, device and equipment
CN111767980A (en) * 2019-04-02 2020-10-13 杭州海康威视数字技术股份有限公司 Model optimization method, device and equipment
WO2021169914A1 (en) * 2020-02-24 2021-09-02 中科寒武纪科技股份有限公司 Data quantification processing method and apparatus, electronic device and storage medium
JP2022532439A (en) * 2020-02-24 2022-07-14 中科寒武紀科技股▲分▼有限公司 Data Quantization Processing Method, Device, Electronic Equipment and Storage Medium This application was submitted to the National Knowledge and Industrial Rights Bureau of China on February 24, 2020, the application number is 201011184.3, and the title of the invention is "Data Quantization". Quantization methods, devices, electronic devices and storage media "claim the priority of the Chinese patent application, the entire contents of which are incorporated herein by reference.
JP7233636B2 (en) 2020-02-24 2023-03-07 中科寒武紀科技股▲分▼有限公司 Data quantization processing method, device, electronic device and storage medium
CN111461302A (en) * 2020-03-30 2020-07-28 杭州嘉楠耘智信息科技有限公司 Data processing method, device and storage medium based on convolutional neural network
CN111461302B (en) * 2020-03-30 2024-04-19 嘉楠明芯(北京)科技有限公司 Data processing method, device and storage medium based on convolutional neural network
CN111723924A (en) * 2020-05-28 2020-09-29 西安交通大学 Deep neural network accelerator based on channel sharing
CN111723924B (en) * 2020-05-28 2022-07-12 西安交通大学 Deep neural network accelerator based on channel sharing
CN111898081A (en) * 2020-07-09 2020-11-06 上海兆芯集成电路有限公司 Convolution operation method and convolution operation device
CN111898081B (en) * 2020-07-09 2024-02-27 上海兆芯集成电路股份有限公司 Convolution operation method and convolution operation device
CN111950713A (en) * 2020-08-23 2020-11-17 云知声智能科技股份有限公司 Method and device for increasing running speed of channel random mixed operation
CN114254740A (en) * 2022-01-18 2022-03-29 长沙金维信息技术有限公司 Convolution neural network accelerated calculation method, calculation system, chip and receiver
CN114254740B (en) * 2022-01-18 2022-09-30 长沙金维信息技术有限公司 Convolution neural network accelerated calculation method, calculation system, chip and receiver

Similar Documents

Publication Publication Date Title
CN110826685A (en) Method and device for convolution calculation of neural network
CN110363279B (en) Image processing method and device based on convolutional neural network model
US20220027717A1 (en) Convolutional Neural Network Hardware Configuration
CN111247527B (en) Method and device for determining characteristic images in convolutional neural network model
CN110109646B (en) Data processing method, data processing device, multiplier-adder and storage medium
CN110598839A (en) Convolutional neural network system and method for quantizing convolutional neural network
CN110929865B (en) Network quantification method, service processing method and related product
CN110008952B (en) Target identification method and device
CN106855952B (en) Neural network-based computing method and device
CN110390075B (en) Matrix preprocessing method, device, terminal and readable storage medium
CN112836806B (en) Data format adjustment method, device, computer equipment and storage medium
CN110647974A (en) Network layer operation method and device in deep neural network
CN111373436A (en) Image processing method, terminal device and storage medium
CN108922478B (en) Backlight brightness adjusting method and system and display device
CN112243119B (en) White balance processing method and device, electronic equipment and storage medium
CN113052290B (en) Neural network generation method, neural network data processing method, neural network generation device, neural network data processing device, electronic equipment and medium
CN111160517A (en) Convolutional layer quantization method and device of deep neural network
CN110378479B (en) Image input method and device based on deep learning and terminal equipment
CN113177634A (en) Image analysis system, method and equipment based on neural network input and output quantification
CN109375952B (en) Method and apparatus for storing data
CN111384971B (en) Method, device and decoder for processing data in finite field
CN113705791B (en) Neural network reasoning quantification method and device, electronic equipment and storage medium
CN113902928A (en) Image feature extraction method and device and electronic equipment
CN112784957A (en) Data processing apparatus, operation method thereof, and program
CN113255576B (en) Face recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination