CN111339490B

CN111339490B - Matrix multiplication calculation method and device

Info

Publication number: CN111339490B
Application number: CN202010099356.0A
Authority: CN
Inventors: 党博超; 王皓
Original assignee: Samsung China Semiconductor Co Ltd; Samsung Electronics Co Ltd
Current assignee: Samsung China Semiconductor Co Ltd; Samsung Electronics Co Ltd
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2024-04-19
Anticipated expiration: 2040-02-18
Also published as: KR20210105285A; CN111339490A

Abstract

A matrix multiplication method and apparatus are provided, the matrix multiplication method including: determining a first multiplication matrix and a second multiplication matrix according to the input multiplicand matrix and the input multiplier matrix; determining a matrix to be restored according to the determined first multiplication matrix and second multiplication matrix; determining a matrix reduction constraint value according to the determined matrix to be reduced; and determining the multiplication result of the input multiplier matrix and the input multiplicand matrix according to the determined matrix reduction constraint value and the matrix to be reduced. The matrix multiplication calculating method and the device can reduce calculation errors and improve the speed of matrix multiplication.

Description

Matrix multiplication calculation method and device

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a matrix multiplication calculation method and device.

Background

A matrix is an important basic concept in mathematics, and an mxn matrix is a rectangular array of M rows and N columns of elements. Currently, matrix multiplication is one of the very important operations in performing data computation when a Graphics Processor (GPU) performs image processing, performs input trajectory analysis in handwriting recognition, and/or performs input audio analysis in speech recognition.

The common calculation modes of matrix multiplication mainly include the following two modes:

Universal matrix multiplication (GEMM).

As shown in fig. 1, if the element values of the elements of the mth row and the nth column in the multiplication result matrix C are calculated in the general matrix multiplication, it is necessary to calculate the inner product of the row vector of the mth row of the matrix a and the column vector of the nth column of the matrix B. Taking the general matrix multiplication in fig. 1 as an example, calculating the element values of the elements in the 1 st row and 1 st column of the matrix C requires calculating the inner product of the row vector of the 1 st row of the matrix a and the column vector of the 1 st column of the matrix B, that is, (0.6,0.4,0.5) · (0.3,1.1,0.0) =0.6x0.3+0.4 x 1.1+0.5 x 0.0=0.62≡0.6, and the calculation of the remaining elements in the multiplication result matrix C is similar.

By adopting the calculation mode of general matrix multiplication, if floating-point matrix (float matrix) multiplication is needed, hardware equipment is needed to support the multiplication operation of float type data, but the general matrix multiplication cannot be completed on equipment supporting integer type (int 8 type) data calculation.

In addition, the multiplication of float type data used in the universal matrix multiplication is time-consuming to process in the computing device and occupies a large memory.

And (II) quantization matrix multiplication.

Google implements a quantization and dequantization method in tensorflow that by estimating the maximum (max) and minimum (min) values for dequantization during the training process, both values remain unchanged during the dequantization phase, thus deducing that the element values of all elements of the process float result matrix are dequantized back to the fixed min and max intervals.

Since the min and max required for inverse quantization in the quantization matrix multiplication are fixed empirical values, processing each matrix multiplication by using the fixed empirical values causes a large calculation error.

Furthermore, in the above quantization matrix multiplication, since the estimated min and max need to satisfy the requirements of all matrix multiplication, the computation for min and max is complicated.

Disclosure of Invention

It is an object of exemplary embodiments of the present invention to provide a matrix multiplication method and apparatus that overcomes at least one of the above-mentioned drawbacks.

In a general aspect, there is provided a matrix multiplication method, comprising: determining a first multiplication matrix and a second multiplication matrix according to the input multiplicand matrix and the input multiplier matrix; determining a matrix to be restored according to the determined first multiplication matrix and second multiplication matrix; determining a matrix reduction constraint value according to the determined matrix to be reduced; and determining the multiplication result of the input multiplicand matrix and the input multiplier matrix according to the determined matrix reduction constraint value and the matrix to be reduced. Based on the matrix multiplication calculation method, the matrix multiplication speed can be improved.

Optionally, the step of determining the first multiplication matrix and the second multiplication matrix from the input multiplicand matrix and the input multiplier matrix may comprise: each element in the input multiplicand matrix and the input multiplier matrix is quantized based on a positive number interval and a negative number interval of the multiplication matrix value range to obtain a first multiplication matrix and a second multiplication matrix, wherein the numerical range of the positive number interval and the numerical range of the negative number interval of the multiplication matrix value range are asymmetric. Based on the matrix multiplication calculation method, calculation errors can be effectively reduced, and calculation accuracy is improved.

Optionally, the step of determining a matrix reduction constraint value according to the determined matrix to be reduced may include: determining the position of a preset reference value in the matrix to be reduced; extracting row vectors and column vectors corresponding to the positions from an input multiplicand matrix and an input multiplier matrix respectively according to the positions of the preset reference values; matrix reduction constraint values are determined using the extracted row and column vectors. Based on the matrix multiplication calculation method, the accuracy of matrix reduction can be effectively improved.

Alternatively, the matrix reduction constraint values may include constraint maximum values and constraint minimum values, and the preset reference values may include element maximum values and element minimum values, wherein the constraint maximum values may be determined by: determining a first position of the maximum value of the element in the matrix to be reduced, extracting a first row vector corresponding to a row number of the first position from an input multiplicand matrix, extracting a first column vector corresponding to a column number of the first position from an input multiplier matrix, multiplying the extracted first row vector by the first column vector to obtain the constraint maximum value, and/or determining the constraint minimum value by: determining a second position of the minimum value of the element in the matrix to be reduced, extracting a second row vector corresponding to the row number of the second position from the input multiplicand matrix, extracting a second column vector corresponding to the column number of the second position from the input multiplier matrix, and multiplying the extracted second row vector by the second column vector to obtain the constraint minimum value. Based on the matrix multiplication calculation method, the accuracy of determining the values of each element in the reduction matrix can be improved.

Optionally, the step of determining the multiplication result of the input multiplicand matrix and the input multiplier matrix according to the determined matrix reduction constraint value and the matrix to be reduced may include: determining the position of a matrix reduction constraint value in a reduction matrix according to the position of a preset reference value in the matrix to be reduced; and reducing other elements except the position of the preset reference value in the matrix to be reduced according to the matrix reduction constraint value to obtain a reduction matrix, and determining the reduction matrix as a multiplication result of the input multiplicand matrix and the input multiplier matrix. Based on the matrix multiplication calculation method, the matrix reduction speed can be improved, and the matrix reduction accuracy can be improved.

Optionally, the position of the matrix reduction constraint value in the reduction matrix is the same as the position of the preset reference value in the matrix to be reduced. Based on the matrix multiplication calculation method, the matrix reduction speed is improved.

Optionally, the matrix reduction constraint value may include a constraint maximum value and a constraint minimum value, and the preset reference value may include an element maximum value and an element minimum value, where the step of reducing, according to the matrix reduction constraint value, other elements in the matrix to be reduced except for the position where the preset reference value is located, to obtain a reduction matrix may include: determining the constraint maximum value as a reduction element maximum value of a reduction matrix, and determining the position of the element maximum value in the matrix to be reduced as the position of the constraint maximum value in the reduction matrix; determining the constraint minimum value as a reduction element minimum value of a reduction matrix, and determining the position of the element minimum value in the matrix to be reduced as the position of the constraint minimum value in the reduction matrix; and reducing the other elements according to the maximum value, the minimum value, the constraint maximum value and the constraint minimum value of the reduction elements to obtain a reduction matrix. Based on the matrix multiplication calculation method, the accuracy of matrix restoration can be ensured and the speed of matrix restoration can be effectively improved by dynamically determining the constraint maximum value and the constraint minimum value.

Optionally, any other element in the matrix to be reduced may be reduced by: if the element value of any other element is a positive number, calculating the product of the element value of any other element and the maximum value of the reduced element, calculating the ratio of the product to the maximum value of the element, and determining the ratio as the element value of any other element after the reduction; if the element value of any other element is a negative number, calculating the product of the element value of any other element and the minimum value of the reduced element, calculating the ratio of the product to the minimum value of the element, and determining the ratio as the element value of any other element after the reduction; and if the element value of any other element is zero, the element value of any other element after the reduction is still zero. Based on the matrix multiplication calculation method, the accuracy of determining the values of each element in the reduction matrix is improved.

Alternatively, the input multiplicand matrix and the input multiplier matrix may comprise floating point matrices and the first multiplication matrix and the second multiplication matrix may comprise integer matrices. Based on the matrix multiplication calculation method, the matrix multiplication speed of the floating point matrix can be effectively improved.

In another general aspect, there is provided a matrix multiplication computing device, comprising: the quantization module is used for determining a first multiplication matrix and a second multiplication matrix according to the input multiplicand matrix and the input multiplier matrix; the matrix operation module is used for determining a matrix to be restored according to the determined first multiplication matrix and the determined second multiplication matrix; the constraint value determining module is used for determining a matrix reduction constraint value according to the determined matrix to be reduced; and the matrix reduction module is used for determining the multiplication result of the input multiplicand matrix and the input multiplier matrix according to the determined matrix reduction constraint value and the matrix to be reduced. Based on the matrix multiplication calculating device, the speed of matrix multiplication can be improved, and calculation errors can be reduced.

Optionally, the quantization module may quantize each element in the input multiplicand matrix and the input multiplier matrix based on a positive number interval and a negative number interval of the multiplication matrix value range to obtain a first multiplication matrix and a second multiplication matrix, wherein a numerical range of the positive number interval and a numerical range of the negative number interval of the multiplication matrix value range are asymmetric. Based on the matrix multiplication calculating device, calculation errors can be effectively reduced, and calculation accuracy is improved.

Alternatively, the constraint value determination module may include: a position determination submodule for determining the position of a preset reference value in the matrix to be reduced; the vector extraction submodule extracts row vectors and column vectors corresponding to the positions from an input multiplicand matrix and an input multiplier matrix respectively according to the positions of the preset reference values; the restoration constraint value determination submodule determines a matrix restoration constraint value by using the extracted row vector and the column vector. Based on the matrix multiplication calculating device, the accuracy of matrix reduction can be effectively improved.

Optionally, the matrix reduction constraint value may include a constraint maximum value and a constraint minimum value, and the preset reference value may include an element maximum value and an element minimum value, where the position determining submodule determines a first position where the element maximum value in the matrix to be reduced is located, the vector extracting submodule extracts a first row vector corresponding to a row number where the first position is located from the input multiplicand matrix, extracts a first column vector corresponding to a column number where the first position is located from the input multiplier matrix, the reduction constraint value determining submodule multiplies the extracted first row vector with the first column vector to obtain the constraint maximum value, and/or the position determining submodule determines a second position where the element minimum value in the matrix to be reduced is located, the vector extracting submodule extracts a second row vector corresponding to the row number where the second position is located from the input multiplicand matrix, and extracts a second column vector corresponding to the column number where the second position is located from the input multiplier matrix, and the reduction constraint value determining submodule multiplies the extracted second row vector with the second column vector to obtain the constraint minimum value. Based on the matrix multiplication calculating device, the accuracy of determining the values of each element in the reduction matrix can be improved.

Optionally, the matrix reduction module may determine a position of a matrix reduction constraint value in the reduction matrix according to a position of a preset reference value in the matrix to be reduced, reduce other elements in the matrix to be reduced except for the position of the preset reference value according to the matrix reduction constraint value, obtain a reduction matrix, and determine the reduction matrix as a multiplication result of the input multiplicand matrix and the input multiplier matrix. Based on the matrix multiplication calculating device, the matrix reduction speed can be improved, and the matrix reduction accuracy can be improved.

Optionally, the position of the matrix reduction constraint value in the reduction matrix is the same as the position of the preset reference value in the matrix to be reduced. Based on the matrix multiplication calculating device, the speed of matrix reduction is improved.

Optionally, the matrix reduction constraint value may include a constraint maximum value and a constraint minimum value, and the preset reference value may include an element maximum value and an element minimum value, where the matrix reduction module may determine the constraint maximum value as a reduction element maximum value of the reduction matrix, determine a position of the element maximum value in the reduction matrix as a position of the constraint maximum value in the reduction matrix, determine the constraint minimum value as a reduction element minimum value of the reduction matrix, determine a position of the element minimum value in the reduction matrix as a position of the constraint minimum value in the reduction matrix, and reduce the other elements according to the reduction element maximum value, the reduction element minimum value, the constraint maximum value and the constraint minimum value, to obtain the reduction matrix. Based on the matrix multiplication calculation device, the accuracy of matrix restoration can be ensured and the speed of matrix restoration can be effectively improved by dynamically determining the constraint maximum value and the constraint minimum value.

Optionally, the matrix reduction module may reduce any other element in the matrix to be reduced by: if the element value of any other element is a positive number, calculating the product of the element value of any other element and the maximum value of the reduced element, calculating the ratio of the product to the maximum value of the element, and determining the ratio as the element value of any other element after the reduction; if the element value of any other element is a negative number, calculating the product of the element value of any other element and the minimum value of the reduced element, calculating the ratio of the product to the minimum value of the element, and determining the ratio as the element value of any other element after the reduction; and if the element value of any other element is zero, the element value of any other element after the reduction is still zero. Based on the matrix multiplication calculation device, the accuracy of determining the values of each element in the reduction matrix is improved.

Alternatively, the input multiplicand matrix and the input multiplier matrix may comprise floating point matrices and the first multiplication matrix and the second multiplication matrix may comprise integer matrices. Based on the matrix multiplication calculating device, the matrix multiplication speed of the floating point matrix can be effectively improved.

In another general aspect, there is provided a computing device, the computing device comprising: a processor; and a memory storing a computer program which, when executed by the processor, implements the matrix multiplication method described above.

In another general aspect, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the matrix multiplication method described above.

By adopting the matrix multiplication calculation method and the matrix multiplication calculation device, the calculation error can be reduced and the matrix multiplication speed can be improved.

Drawings

The foregoing and other objects, features, and advantages of exemplary embodiments of the invention will become more apparent from the following detailed description, taken in conjunction with the accompanying drawings that illustrate exemplary embodiments in which:

FIG. 1 shows a schematic diagram of a conventional general matrix multiplication;

FIG. 2 illustrates a flow chart of a matrix multiplication calculation method according to an exemplary embodiment of the invention;

Fig. 3 shows a schematic diagram of a quantization process according to an exemplary embodiment of the present invention;

FIG. 4 is a flowchart illustrating the steps of determining a matrix reduction constraint value according to an exemplary embodiment of the present invention;

FIG. 5 illustrates a schematic diagram of determining the locations of constraint maxima and constraint minima in a reduction matrix according to an exemplary embodiment of the present invention;

FIG. 6 is a flowchart illustrating steps for determining a constraint maximum in accordance with an exemplary embodiment of the present invention;

FIG. 7 is a flowchart illustrating the steps of determining a constraint minimum in accordance with an exemplary embodiment of the present invention;

FIG. 8 illustrates a schematic diagram of determining a constraint minimum according to an exemplary embodiment of the present invention;

FIG. 9 is a flowchart illustrating steps of determining a reduction matrix according to an exemplary embodiment of the present invention;

Fig. 10 shows a schematic diagram of obtaining a matrix to be reduced according to an exemplary embodiment of the present invention;

Fig. 11 illustrates a schematic diagram of reducing the matrix to be reduced illustrated in fig. 10 to obtain a reduction matrix according to an exemplary embodiment of the present invention;

FIG. 12 illustrates a schematic diagram of a matrix multiplication run time versus an existing generic matrix multiplication run time of a matrix multiplication calculation method according to an exemplary embodiment of the present invention;

FIG. 13 illustrates a block diagram of a matrix multiplication computing device in accordance with an exemplary embodiment of the present invention;

FIG. 14 illustrates a block diagram of a constraint value determination module in accordance with an exemplary embodiment of the present invention;

fig. 15 shows a block diagram of a computing device according to an exemplary embodiment of the invention.

Detailed Description

Various example embodiments will now be described more fully with reference to the accompanying drawings, in which some example embodiments are shown.

Fig. 2 shows a flowchart of a matrix multiplication calculation method according to an exemplary embodiment of the present invention.

Referring to fig. 2, in step S10, a first multiplication matrix and a second multiplication matrix are determined from an input multiplicand matrix and an input multiplier matrix.

Here, it should be appreciated that the input multiplicand matrix and the input multiplier matrix may be generated from signals that may refer to various signals for which matrix multiplication is desired, such as, but not limited to, image signals, handwriting input signals, voice recognition input signals, and the like.

That is, the matrix multiplication method of the exemplary embodiment of the present invention can be used in the analysis and recognition process of various signals. For example, in an example of analyzing and identifying signals by using a neural network, the matrix multiplication method described above may be applied to matrix multiplication operations of the neural network.

In a preferred example, the first multiplication matrix and the second multiplication matrix may be determined by: determining the maximum value of absolute values in all elements of the input multiplier matrix and the input multiplicand matrix, and respectively quantizing each element in the input multiplier matrix and the input multiplicand matrix based on the determined maximum value of absolute values to obtain a first multiplication matrix and a second multiplication matrix. For example, a first multiplication matrix may be obtained by quantizing each element in the input multiplicand matrix, and a second multiplication matrix may be obtained by quantizing each element in the input multiplier matrix.

By way of example, the input multiplicand matrix and the input multiplier matrix may include, but are not limited to, a floating-point matrix (hereinafter referred to as float matrix), where the data type of each element in the floating-point matrix is a floating-point data type. As an example, the first multiplication matrix and the second multiplication matrix may include, but are not limited to, integer matrices (hereinafter referred to as int8 matrices), where the data types of the elements in the integer matrices are integer data types. That is, float type data is quantized into int8 type data in step S10.

For example, the element values of each element in the input multiplicand matrix and the input multiplier matrix may include, but are not limited to, at least one of the following: positive number, negative number, zero. That is, the matrix multiplication method according to the exemplary embodiment of the present invention can cope with the case where there is both positive and negative numbers, zero in the float matrix.

Fig. 3 shows a schematic diagram of a quantization process according to an exemplary embodiment of the present invention.

In an exemplary embodiment of the present invention, a symmetric quantization method is employed for the quantization process of the input multiplicand matrix and the input multiplier matrix. In the quantization process, zero is kept unchanged, and element values in float matrixes in positive number intervals and negative number intervals are quantized respectively.

That is, each element in the input multiplicand matrix and the input multiplier matrix is quantized based on the positive number interval and the negative number interval of the multiplication matrix value field to obtain the first multiplication matrix and the second multiplication matrix.

As shown in fig. 3, -float_max represents the maximum negative value of the element values in the float matrix, float_max represents the maximum positive value of the element values in the float matrix, the positive interval of the multiplication matrix value range is (0,127), the negative interval of the multiplication matrix value range is [ -128,0 ], and the negative representation range of the quantized int8 type data is larger. Here, the numerical range of the positive number section of the multiplication matrix value range is asymmetric with the numerical range of the negative number section, and the negative number section is increased by one quantization value relative to the positive number section, so that the quantization accuracy is improved.

Preferably, the specific quantization process may be: for the case where the element value of any element in the input multiplicand matrix and the input multiplier matrix is positive, the any element may be quantized based on the positive interval of the multiplication matrix value field and the determined maximum value of the absolute values in all elements of the input multiplicand matrix and the input multiplier matrix.

In this case, any element whose element value is positive can be quantized by: calculating the ratio of the element value of any element to the maximum value of the determined absolute value, calculating the product of the ratio and the upper limit value of the positive number interval, and determining the calculated product as the quantized element value of any element.

For the case where the element value of any element in the input multiplier matrix and the input multiplicand matrix is negative, the any element may be quantized based on the negative interval of the multiplication matrix value field and the determined maximum value of the absolute values in all elements of the input multiplicand matrix and the input multiplier matrix.

In this case, any element whose element value is negative can be quantized by: calculating the ratio of the element value of any element to the maximum value of the determined absolute value, calculating the product of the ratio and the absolute value of the lower limit value of the negative number interval, and determining the calculated product as the quantized element value of any element.

For the case where the element value of any element in the input multiplicand matrix and the input multiplier matrix is zero, the quantized element value of that any element is still zero.

For example, any element can be quantized using the following formula:

In the formula (1), float_value represents an element value of any element in the float matrix, max represents a maximum value of absolute values in all elements of the input multiplicand matrix and the input multiplier matrix, and int8_value represents an element value after quantization of the any element in the int8 matrix.

In the quantization process described above, the maximum value of the absolute value of all the elements of the input float matrix needs to be calculated, and each element is quantized according to the maximum value of the absolute value, and the time complexity of the quantization process is O (n ²), where n represents the matrix dimension.

Returning to fig. 2, in step S20, a matrix to be restored is determined according to the determined first multiplication matrix and second multiplication matrix.

For example, the first multiplication matrix and the second multiplication matrix may be multiplied, and the multiplication result is determined as a matrix to be restored. As an example, a matrix multiplication operation may be performed on the first multiplication matrix and the second multiplication matrix using GEMM (generic matrix multiplication).

Here, since the first multiplication matrix and the second multiplication matrix are both int8 matrices, the above matrix multiplication is multiplication of two int8 matrices, that is, a matrix multiplication may be performed on a device supporting int8 type data, and the time complexity of the matrix multiplication is O (n ³).

In step S30, a matrix reduction constraint value is determined according to the determined matrix to be reduced.

Here, the matrix reduction constraint value may refer to a constraint value for dequantizing a matrix to be reduced. The process of determining the matrix reduction constraint values from the matrix to be reduced is described below with reference to fig. 4.

Fig. 4 shows a flowchart of the steps of determining a matrix reduction constraint value according to an exemplary embodiment of the present invention.

Referring to fig. 4, in step S301, a position where a preset reference value in a matrix to be reduced is located is determined. Here, the preset reference value is an element value of an element designated in advance in the matrix to be reduced.

In step S302, according to the position of the preset reference value, a row vector and a column vector corresponding to the position of the preset reference value in the matrix to be restored are extracted from the input multiplicand matrix and the input multiplier matrix, respectively.

For example, a row vector corresponding to a row number of a position where a preset reference value is located is extracted from the input multiplicand matrix, and a column vector corresponding to a column number of a position where a preset reference value is located is extracted from the input multiplier matrix.

In step S303, a matrix reduction constraint value is determined using the extracted row vector and column vector.

For example, the extracted row vector may be multiplied by the column vector and the multiplication result determined as a matrix reduction constraint value.

In a preferred example, the matrix reduction constraint values may include constraint maximum values and constraint minimum values, and the preset reference values may include element maximum values and element minimum values, where the element maximum values may refer to maximum values among the element values of all elements in the matrix to be reduced, and the element minimum values may refer to minimum values among the element values of all elements in the matrix to be reduced.

As an example, the position of the constraint maximum in the reduction matrix coincides with the position of the element maximum in the matrix to be reduced, and the position of the constraint minimum in the reduction matrix coincides with the position of the element minimum in the matrix to be reduced.

FIG. 5 illustrates a schematic diagram of determining the locations of constraint maxima and constraint minima in a reduction matrix according to an exemplary embodiment of the present invention.

As shown in fig. 5, in the process of performing matrix multiplication on the first multiplication matrix and the second multiplication matrix and generating a matrix to be reduced, the positions of the maximum value and the minimum value of the elements in the matrix to be reduced are recorded.

In the exemplary embodiment of the present invention, by analyzing the multiplication result of the int8 matrix (i.e., the matrix to be reduced) and the multiplication result of the float matrix (i.e., the reduction matrix), it is known that the multiplication results of the two types of matrices exhibit a proportional relationship, as shown in the following formula:

In the formula (2), r _ij represents the element value of the element in the ith row and jth column in the multiplication result of the int8 matrix, fr _ij represents the element value of the element in the ith row and jth column in the multiplication result of the float matrix, max ₁ represents the maximum value of the absolute values of all elements in the input multiplicand matrix, max ₂ represents the maximum value of the absolute values of all elements in the input multiplier matrix, 127 is the upper limit value of the value range of the int8 type data, and 128 is the lower limit value of the value range of the int8 type data.

Based on the element value proportion relationship presented by the multiplication results of the two types of matrixes, the position of the maximum value in the multiplication result of the int8 matrix is consistent with the position of the maximum value in the multiplication result of the float matrix, and the position of the minimum value in the multiplication result of the int8 matrix is consistent with the position of the minimum value in the multiplication result of the float matrix.

It should be understood that in the above preferred embodiment, the position of the matrix reduction constraint value in the reduction matrix is determined based on the premise that the position of the matrix reduction constraint value in the reduction matrix is the same as the position of the preset reference value in the matrix to be reduced. It should be appreciated that there may be situations where the two locations are not identical, but even if there is an inconsistency, the two locations are relatively close, so the above-described manner of determining the locations of the matrix reduction constraint values in the reduction matrix may be considered accurate.

Here, the reason why the deviation occurs in the position determination is that the value range of the negative number section of the value range of the int8 type data is larger than the value range of the positive number section, but the addition of a quantization value in the negative number section has a larger advantage than the negative effect of the position deviation, so that the calculation accuracy of the matrix multiplication calculation method can be ensured not to be affected, and the calculation error can be reduced while the speed of matrix multiplication is improved.

The steps of determining the constraint maximum and constraint minimum are described below with reference to fig. 6 and 7, respectively. Here, the temporal complexity of determining constraint maxima and constraint minima is O (n ²).

Fig. 6 shows a flowchart of the steps of determining a constraint maximum according to an exemplary embodiment of the present invention.

Referring to fig. 6, in step S31-1, a first position where the maximum value of the element in the matrix to be reduced is located is determined.

In step S31-2, a first row vector corresponding to the row number at the first position is extracted from the input multiplicand matrix.

In step S31-3, a first column vector corresponding to the column number at the first position is extracted from the input multiplier matrix.

In step S31-4, the extracted first row vector is multiplied by the first column vector to obtain a constraint maximum.

The step of determining the constraint minimum is described below in connection with fig. 7 and 8.

Fig. 7 shows a flowchart of the steps of determining a constraint minimum according to an exemplary embodiment of the present invention.

Referring to fig. 7, in step S32-1, a second position where the minimum value of the element in the matrix to be reduced is located is determined.

In step S32-2, a second row vector corresponding to the row number at the second position is extracted from the input multiplicand matrix.

In step S32-3, a second column vector corresponding to the column number at the second position is extracted from the input multiplier matrix.

Taking the example shown in fig. 8 as an example, assuming that the minimum value of the element in the multiplication result of the int8 matrix is located in the 5 th row and the 4 th column, the row vector of the 5 th row can be extracted from the input multiplicand matrix, and the column vector of the 4 th column can be extracted from the input multiplier matrix accordingly.

In step S32-4, the extracted second row vector is multiplied by the second column vector to obtain a constraint minimum.

For example, the constraint minimum may be obtained by performing a vector inner product on the second row vector and the second column vector once.

In the matrix multiplication calculation method according to the exemplary embodiment of the present invention, the constraint maximum value and the constraint minimum value for reduction (inverse quantization) are calculated by the float matrix, that is, the constraint maximum value and the constraint minimum value are changed each time matrix multiplication is performed, and are not fixed values, so that calculation errors caused by the fixed maximum value and the fixed minimum value in the existing quantization matrix multiplication can be effectively reduced.

Returning to fig. 2, in step S40, a multiplication result of the input multiplicand matrix and the input multiplier matrix is determined according to the determined matrix reduction constraint value and the matrix to be reduced.

For example, the position of the matrix reduction constraint value in the reduction matrix can be determined according to the position of the preset reference value in the matrix to be reduced, other elements except the position of the preset reference value in the matrix to be reduced are reduced according to the matrix reduction constraint value, the reduction matrix is obtained, and the reduction matrix is determined as the multiplication result of the input multiplicand matrix and the input multiplier matrix. In a preferred example, the position of the matrix reduction constraint value in the reduction matrix is the same as the position of the preset reference value in the matrix to be reduced.

And for the situation that the matrix reduction constraint value comprises a constraint maximum value and a constraint minimum value, and the preset reference value comprises an element maximum value and an element minimum value, based on the element value proportion relation presented by the multiplication results of the two types of matrices, the positions of the constraint maximum value and the constraint minimum value in the reduction matrix are reversely deduced through the positions of the element maximum value and the element minimum value in the matrix to be reduced.

The process of determining the reduction matrix is described below with reference to fig. 9.

Fig. 9 shows a flowchart of the steps of determining a reduction matrix according to an exemplary embodiment of the present invention.

Referring to fig. 9, in step S401, the position of the maximum value of the reduction element in the reduction matrix is determined.

Here, the reduction element maximum value may refer to the maximum value among the element values of all elements in the reduction matrix. For example, the constraint maximum is determined as a reduction element maximum of the reduction matrix, and the position of the element maximum in the matrix to be reduced is determined as the position of the reduction element maximum in the reduction matrix.

That is, the position where the maximum value of the element values of all the elements in the reduction matrix is located is the same as the position where the maximum value of the element values of all the elements in the matrix to be reduced is located.

In step S402, the position of the reduction element minimum value in the reduction matrix is determined.

Here, the reduction element minimum value may refer to the minimum value among the element values of all elements in the reduction matrix. For example, the constraint minimum value is determined as a reduction element minimum value of the reduction matrix, and the position of the element minimum value in the matrix to be reduced is determined as the position of the reduction element minimum value in the reduction matrix.

That is, the position of the minimum value of the element values of all the elements in the reduction matrix is the same as the position of the minimum value of the element values of all the elements in the matrix to be reduced.

In step S403, according to the maximum value, the minimum value, the constraint maximum value and the constraint minimum value of the reduction elements, the reduction matrix is obtained by reducing the other elements except the position where the preset reference value is located in the matrix to be reduced.

For example, for the case that the element value of any other element is a positive number, the product of the element value of any other element and the maximum value of the reduced element may be calculated, the ratio of the product to the maximum value of the element is calculated, and the calculated ratio is determined as the element value of any other element after the reduction.

For the case that the element value of any other element is negative, the product of the element value of any other element and the minimum value of the reduced element can be calculated, the ratio of the product to the minimum value of the element is calculated, and the calculated ratio is determined as the element value of any other element after the reduction.

For the case that the element value of any other element is zero, the element value of any other element after the reduction is still zero.

For example, the element value of any other element after reduction may be determined using the following formula:

In the formula (3), int8_result represents an element value of any other element except a position where a preset reference value is located in the matrix to be reduced, float_result represents an element value of any other element after reduction, fmax represents a reduction element maximum value, fmin represents a reduction element minimum value, rmax represents an element minimum value in the matrix to be reduced, and rmin represents an element minimum value in the matrix to be reduced.

Here, the time complexity of performing the dequantization calculation for each element in the matrix to be reduced is O (n ²).

The calculation flow of the matrix multiplication calculation method of the present invention will be described below with reference to fig. 10 and 11 by taking two matrix multiplication calculation processes of 3×3 as an example.

Fig. 10 shows a schematic diagram of obtaining a matrix to be reduced according to an exemplary embodiment of the present invention. Fig. 11 illustrates a schematic diagram of reducing the matrix to be reduced illustrated in fig. 10 to obtain a reduction matrix according to an exemplary embodiment of the present invention.

As shown in fig. 10, assuming that the input multiplicand matrix and the input multiplier matrix are respectively a float matrix a and a float matrix B, an int8 matrix a is obtained by quantizing each element in the float matrix a, an int8 matrix B is obtained by quantizing each element in the float matrix B, and a matrix multiplication operation is performed on the int8 matrix a and the int8 matrix B to obtain a matrix to be reduced.

Since the position of the minimum value of the element in the matrix to be reduced is the 3 rd row and the 2 nd column, the corresponding row vectors and column vectors can be extracted from the float matrix a and the float matrix B respectively to perform inner product, for example, (0.3, -0.3, -0.5) · (0.1, 0.2, -0.7) = -0.41, so as to obtain the minimum value of the reduction element (constraint minimum value), and the similar calculation method is adopted to obtain the maximum value of the reduction element (constraint maximum value) of 0.62.

As shown in fig. 11, according to the constraint maximum value and the constraint minimum value obtained by the calculation, other elements in the reduction matrix except for the positions of the element maximum value and the element minimum value are dequantized, the reduction element maximum value and the reduction element minimum value in the dequantized reduction matrix are accurate, and the other elements depend on the constraint maximum value and the constraint minimum value to dequantize. The inverse quantized element values are rounded to reserve two digits after decimal points, and the multiplication result obtained by the matrix multiplication calculation method of the exemplary embodiment of the present invention is consistent with the result of directly multiplying two float matrices.

Taking mnist handwriting digital recognition in tensorflow as an example, assuming that input trajectory analysis is performed based on a Recurrent Neural Network (RNN), comparison of recognition accuracy and execution speed of a conventional RNN and an RNN using the matrix multiplication computation method shown in the present invention is shown in table 1:

TABLE 1

Application of	tensorflow mnist RNN	Mnist RNN using the present invention
			Accuracy of identification	98.03％	98.02％
Execution time	96.888s	34.971s

Taking speech recognition system DEEP SPEECH V1 as an example, a comparison of the error word rate (WER) of conventional DEEP SPEECH V1 and DEEP SPEECH V1 using the matrix multiplication method of the present invention is shown in table 2:

TABLE 2

Application of	Deep Speech v1	DEEP SPEECH V1 using the present invention
			WER on Tensorflow lite	10.87％	12.33％
WER on TensorFlow	10.88％	11.65％

For matrices of different dimensions, the pair of matrix multiplication execution times using GEMM and the matrix multiplication calculation method of the present invention is shown in table 3:

TABLE 3 Table 3

Matrix dimension	GEMM	Matrix multiplication of the present invention
			[100×100]×[100×100]	2.528ms	1.358ms
[1000×1000]×[1000×1000]	1241.372ms	243.300ms
			[10000×10000]×[10000×10000]	1155.98s	211.867s

Fig. 12 shows a schematic diagram of a comparison of a matrix multiplication run time of a matrix multiplication calculation method according to an exemplary embodiment of the present invention with a matrix multiplication execution time of an existing general matrix multiplication.

As can be seen from fig. 12, as the matrix dimension increases, the matrix multiplication calculation method according to the present invention can effectively reduce the matrix multiplication execution time relative to the general matrix multiplication.

Taking image recognition as an example, assuming that an input image is processed based on a Convolutional Neural Network (CNN), the comparison of the image recognition accuracy of a conventional CNN (see Google correspondence) with the image recognition accuracy of a CNN using the matrix multiplication calculation method shown in the present invention (see SRCX correspondence) is shown in table 4:

TABLE 4 Table 4

In the quantization and inverse quantization processes of the present invention, since the time complexity of the quantization process is O (n ²), the time complexity of the matrix multiplication operation is O (n ³), and the time complexity of the inverse quantization process is O (n ²). The time complexity of the float matrix multiplication calculation is O (n ³) due to O (n ²)+O(n³)+O(n²)＝O(n³), i.e., the most complex time complexity in each process is determined as the final time complexity. Therefore, the matrix multiplication calculation process based on quantization and inverse quantization does not increase the time complexity of matrix multiplication, and simultaneously improves the performance by replacing float matrix multiplication by int8 matrix multiplication. Here, the time complexity may be determined in various ways, which the present invention is not limited to.

In the matrix multiplication calculation method of the exemplary embodiment of the invention, a method of respectively quantizing positive and negative intervals is adopted in the process of quantizing an input float matrix, and each data representation in the int8 data type is fully utilized.

In addition, the matrix multiplication calculation method of the exemplary embodiment of the present invention performs matrix multiplication operation on the int8 matrix, and may be executed on a device supporting the int8 matrix operation.

In addition, after the end of the int8 matrix multiplication calculation, the positions of the reduction element maximum value and the reduction element minimum value in the float multiplication result are reversely pushed according to the positions of the element maximum value and the element minimum value in the int8 matrix multiplication result, and the accurate reduction element maximum value and the accurate reduction element minimum value are calculated according to the positions of the element maximum value and the element minimum value in the int8 matrix multiplication result.

In addition, other elements in the multiplication result of the int8 are restored by taking the calculated accurate constraint maximum and constraint minimum as a reference, so that the error after restoration is smaller, and the accuracy of inverse quantization is improved.

Fig. 13 shows a block diagram of a matrix multiplication computing device according to an exemplary embodiment of the present invention.

As shown in fig. 13, a matrix multiplication calculating apparatus according to an exemplary embodiment of the present invention includes: quantization module 10, matrix operation module 20, constraint value determination module 30, and matrix reduction module 40.

Specifically, the quantization module 10 determines a first multiplication matrix and a second multiplication matrix from the input multiplicand matrix and the input multiplier matrix.

As an example, the input multiplicand matrix and the input multiplier matrix may include, but are not limited to, floating point matrices, and the first multiplication matrix and the second multiplication matrix may include, but are not limited to, integer matrices.

For example, the quantization module 10 may determine a maximum value of absolute values in all elements of the input multiplicand matrix and the input multiplier matrix, and quantize each element of the input multiplicand matrix and the input multiplier matrix, respectively, based on the determined maximum value of absolute values, to obtain the first multiplication matrix and the second multiplication matrix.

As an example, the element values of each element in the input multiplicand matrix and the input multiplier matrix may include, but are not limited to, at least one of the following: positive number, negative number, zero.

For example, the quantization module 10 may quantize each element in the input multiplicand matrix and the input multiplier matrix based on the positive number interval and the negative number interval of the multiplication matrix value range to obtain the first multiplication matrix and the second multiplication matrix. Here, the numerical range of the positive number interval of the multiplication matrix value range is asymmetric to the numerical range of the negative number interval.

In a preferred example, the quantization module 10 may quantize any element of the input multiplier matrix and the input multiplicand matrix in the following manner.

For the case where the element value of any element is positive, quantization module 10 may quantize any element based on the positive interval of the multiplication matrix value range and the maximum value of the determined absolute value.

For example, quantization module 10 may quantize any element whose element value is positive by: calculating the ratio of the element value of any element to the maximum value of the determined absolute value, calculating the product of the ratio and the upper limit value of the positive number interval, and determining the calculated product as the quantized element value of any element.

For the case where the element value of any element is negative, quantization module 10 may quantize any element based on the negative interval of the multiplication matrix value range and the determined maximum value of the absolute value.

For example, quantization module 10 quantizes any element whose element value is negative by: calculating the ratio of the element value of any element to the maximum value of the determined absolute value, calculating the product of the ratio and the absolute value of the lower limit value of the negative number interval, and determining the calculated product as the quantized element value of any element.

For the case where the element value of any element is zero, the quantized element value of that element is still zero.

The matrix operation module 20 determines a matrix to be restored according to the determined first multiplication matrix and second multiplication matrix.

For example, the matrix operation module 20 may multiply the first multiplication matrix with the second multiplication matrix and determine the multiplication result as a matrix to be restored.

The constraint value determination module 30 determines a matrix reduction constraint value based on the determined matrix to be reduced.

Fig. 14 shows a block diagram of the constraint value determination module 30 according to an exemplary embodiment of the present invention.

As shown in fig. 14, the constraint value determination module 30 according to an exemplary embodiment of the present invention may include: a position determination sub-module 301, a vector extraction sub-module 302 and a restoration constraint value determination sub-module 303.

Specifically, the position determination submodule 301 determines the position where the preset reference value in the matrix to be restored is located.

The vector extraction sub-module 302 extracts a row vector and a column vector corresponding to the position of the preset reference value from the input multiplicand matrix and the input multiplier matrix, respectively, according to the position of the preset reference value.

The restoration constraint value determination submodule 303 determines a matrix restoration constraint value using the extracted row vector and column vector.

In a preferred example, the matrix reduction constraint values include constraint maximum values and constraint minimum values, and the preset reference values include element maximum values and element minimum values.

For example, the process of determining the constraint maximum may be: the position determining submodule 301 determines a first position where the maximum value of the element in the matrix to be restored is located, the vector extracting submodule 302 extracts a first row vector corresponding to a row number where the first position is located from the input multiplicand matrix, and extracts a first column vector corresponding to a column number where the first position is located from the input multiplier matrix, and the restoring constraint value determining submodule 303 multiplies the extracted first row vector by the first column vector to obtain the constraint maximum value.

For example, the process of determining the constraint minimum may be: the position determining submodule 301 determines a second position where the minimum value of the element in the matrix to be restored is located, the vector extracting submodule 302 extracts a second row vector corresponding to a row number where the second position is located from the input multiplicand matrix, and extracts a second column vector corresponding to a column number where the second position is located from the input multiplier matrix, and the restoring constraint value determining submodule 303 multiplies the extracted second row vector by the second column vector to obtain the constraint minimum value.

Returning to fig. 13, the matrix reduction module 40 determines a multiplication result of the input multiplicand matrix and the input multiplier matrix according to the determined matrix reduction constraint value and the matrix to be reduced.

For example, the matrix reduction module 40 may determine the position of the matrix reduction constraint value in the reduction matrix according to the position of the preset reference value in the matrix to be reduced, reduce other elements of the matrix to be reduced except for the position of the preset reference value according to the matrix reduction constraint value, obtain a reduction matrix, and determine the reduction matrix as the multiplication result of the input multiplicand matrix and the input multiplier matrix.

Preferably, the position of the matrix reduction constraint value in the reduction matrix is the same as the position of the preset reference value in the matrix to be reduced.

For the case that the matrix reduction constraint value includes a constraint maximum value and a constraint minimum value, and the preset reference value includes an element maximum value and an element minimum value, the matrix reduction module 40 may determine the constraint maximum value as a reduction element maximum value of the reduction matrix, determine a position of the element maximum value in the reduction matrix as a position of the constraint maximum value in the reduction matrix, determine the constraint minimum value as a reduction element minimum value of the reduction matrix, determine a position of the element minimum value in the reduction matrix as a position of the constraint minimum value in the reduction matrix, and reduce other elements in the reduction matrix according to the reduction element maximum value, the reduction element minimum value, the constraint maximum value and the constraint minimum value to obtain the reduction matrix.

For the case that the element value of any other element is positive, the matrix reduction module 40 may calculate a product of the element value of any other element and the reduced element maximum value, calculate a ratio of the product to the element maximum value, and determine the ratio as the element value of any other element after reduction.

For the case that the element value of any other element is negative, the matrix reduction module 40 may calculate a product of the element value of any other element and the minimum value of the reduced element, calculate a ratio of the product to the minimum value of the element, and determine the ratio as the element value of any other element after the reduction.

There is also provided, in accordance with an exemplary embodiment of the present invention, a computing device. As shown in fig. 15, the computing device includes a processor 100 and a memory 200. The memory 200 is used for storing a computer program. The computer program is executed by the processor 100 to cause the processor 100 to perform the matrix multiplication calculation method as described above.

There is also provided, in accordance with an exemplary embodiment of the present invention, a computer-readable storage medium storing a computer program. The computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the matrix multiplication method described above. The computer readable recording medium is any data storage device that can store data which can be read out by a computer system. Examples of the computer-readable recording medium include: read-only memory, random access memory, compact disc read-only, magnetic tape, floppy disk, optical data storage device, and carrier waves (such as data transmission through the internet via wired or wireless transmission paths).

The matrix multiplication calculation method and the matrix multiplication calculation device can be deployed in various servers, PC terminals and webpages, and can be applied to high-performance numerical calculation of Graphic Processors (GPU) and Tensor Processors (TPU).

According to the embodiment of the invention, a quantization and inverse quantization method capable of supporting float matrix multiplication on an int8 type data computing device is provided, and the matrix multiplication computing method and device improve the speed of float matrix multiplication while keeping small computing errors.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.

Claims

1. A matrix multiplication method, comprising:

Determining a first multiplication matrix and a second multiplication matrix according to an input multiplicand matrix and an input multiplier matrix, wherein the data types of the elements in the first multiplication matrix and the second multiplication matrix are integer data types, and the input multiplicand matrix and the input multiplier matrix are generated based on one signal of the following: image signals, handwriting input signals, voice recognition input signals;

determining a matrix to be restored according to the determined first multiplication matrix and second multiplication matrix;

determining a matrix reduction constraint value according to the determined matrix to be reduced;

Determining the multiplication result of the input multiplicand matrix and the input multiplier matrix according to the determined matrix reduction constraint value and the matrix to be reduced,

Wherein,

Wherein, according to the determined matrix to be reduced, the step of determining the matrix reduction constraint value comprises:

determining the position of a preset reference value in the matrix to be reduced,

Extracting row vectors and column vectors corresponding to the positions from the input multiplicand matrix and the input multiplier matrix respectively according to the positions of the preset reference values,

Matrix reduction constraint values are determined using the extracted row and column vectors.

2. The matrix multiplication method of claim 1, wherein the step of determining the first multiplication matrix and the second multiplication matrix based on the input multiplicand matrix and the input multiplier matrix comprises:

Each element in the input multiplicand matrix and the input multiplier matrix is quantized based on a positive number interval and a negative number interval of the multiplication matrix value field to obtain a first multiplication matrix and a second multiplication matrix,

Wherein the range of the positive number interval of the multiplication matrix value range is asymmetric to the range of the negative number interval.

3. The matrix multiplication method according to claim 1, wherein the matrix reduction constraint values include constraint maximum values and constraint minimum values, the preset reference values include element maximum values and element minimum values,

Wherein the constraint maximum is determined by:

Determining a first position of the maximum value of the elements in the matrix to be reduced,

Extracting a first row vector corresponding to the row number at which the first position is located from the input multiplicand matrix,

Extracting a first column vector corresponding to the column number at which the first position is located from the input multiplier matrix,

Multiplying the extracted first row vector with the first column vector to obtain the constraint maximum,

And/or determining the constraint minimum by:

Determining a second position of the minimum value of the element in the matrix to be reduced,

Extracting a second row vector corresponding to the row number at the second position from the input multiplicand matrix,

Extracting a second column vector corresponding to the column number at the second position from the input multiplier matrix,

Multiplying the extracted second row vector with a second column vector to obtain the constraint minimum.

4. The matrix multiplication method according to claim 1, wherein the step of determining the multiplication result of the input multiplicand matrix and the input multiplier matrix based on the determined matrix reduction constraint value and the matrix to be reduced comprises:

Determining the position of a matrix reduction constraint value in a reduction matrix according to the position of a preset reference value in the matrix to be reduced;

and reducing other elements except the position of the preset reference value in the matrix to be reduced according to the matrix reduction constraint value to obtain a reduction matrix, and determining the reduction matrix as a multiplication result of the input multiplicand matrix and the input multiplier matrix.

5. The matrix multiplication method according to claim 4, wherein the position of the matrix reduction constraint value in the reduction matrix is the same as the position of the preset reference value in the matrix to be reduced.

6. The matrix multiplication method according to claim 5, wherein the matrix reduction constraint values include constraint maximum values and constraint minimum values, the preset reference values include element maximum values and element minimum values,

The step of reducing other elements except the position of the preset reference value in the matrix to be reduced according to the matrix reduction constraint value to obtain a reduction matrix comprises the following steps:

determining the constraint maximum value as a reduction element maximum value of a reduction matrix, and determining the position of the element maximum value in the matrix to be reduced as the position of the constraint maximum value in the reduction matrix;

Determining the constraint minimum value as a reduction element minimum value of a reduction matrix, and determining the position of the element minimum value in the matrix to be reduced as the position of the constraint minimum value in the reduction matrix;

And reducing the other elements according to the maximum value, the minimum value, the constraint maximum value and the constraint minimum value of the reduction elements to obtain a reduction matrix.

7. The matrix multiplication method according to claim 6, wherein any other element in the matrix to be reduced is reduced by:

If the element value of any other element is a positive number, calculating the product of the element value of any other element and the maximum value of the reduced element, calculating the ratio of the product to the maximum value of the element, and determining the ratio as the element value of any other element after the reduction;

If the element value of any other element is a negative number, calculating the product of the element value of any other element and the minimum value of the reduced element, calculating the ratio of the product to the minimum value of the element, and determining the ratio as the element value of any other element after the reduction;

And if the element value of any other element is zero, the element value of any other element after the reduction is still zero.

8. The matrix multiplication method of claim 1 wherein the input multiplicand matrix and the input multiplier matrix comprise floating point matrices and the first multiplication matrix and the second multiplication matrix comprise integer matrices.

9. A matrix multiplication computing device, comprising:

The quantization module is used for determining a first multiplication matrix and a second multiplication matrix according to an input multiplicand matrix and an input multiplier matrix, wherein the data types of the elements in the first multiplication matrix and the second multiplication matrix are integer data types, and the input multiplicand matrix and the input multiplier matrix are generated based on one signal of the following: image signals, handwriting input signals, voice recognition input signals;

the matrix operation module is used for determining a matrix to be restored according to the determined first multiplication matrix and the determined second multiplication matrix;

the constraint value determining module is used for determining a matrix reduction constraint value according to the determined matrix to be reduced;

the matrix reduction module is used for determining the multiplication result of the input multiplicand matrix and the input multiplier matrix according to the determined matrix reduction constraint value and the matrix to be reduced,

Wherein, the method comprises the steps of,

Wherein the constraint value determination module comprises:

A position determination submodule for determining the position of a preset reference value in the matrix to be reduced;

The vector extraction submodule extracts row vectors and column vectors corresponding to the positions from an input multiplicand matrix and an input multiplier matrix respectively according to the positions of the preset reference values;

the restoration constraint value determination submodule determines a matrix restoration constraint value by using the extracted row vector and the column vector.

10. The matrix multiplication computing device of claim 9 wherein the quantization module quantizes each element of the input multiplicand matrix and the input multiplier matrix based on positive and negative intervals of a multiplication matrix value range to obtain a first multiplication matrix and a second multiplication matrix,

11. The matrix multiplication computing device of claim 9, wherein the matrix reduction constraint values include constraint maximum values and constraint minimum values, the preset reference values include element maximum values and element minimum values,

Wherein the position determining sub-module determines a first position of the maximum value of the element in the matrix to be restored, the vector extracting sub-module extracts a first row vector corresponding to a row number of the first position from the input multiplicand matrix, extracts a first column vector corresponding to a column number of the first position from the input multiplier matrix, the restoring constraint value determining sub-module multiplies the extracted first row vector with the first column vector to obtain the constraint maximum value,

And/or the position determining submodule determines a second position of the minimum value of the element in the matrix to be restored, the vector extracting submodule extracts a second row vector corresponding to a row number of the second position from the input multiplicand matrix, extracts a second column vector corresponding to a column number of the second position from the input multiplier matrix, and the restoring constraint value determining submodule multiplies the extracted second row vector with the second column vector to obtain the constraint minimum value.

12. The matrix multiplication computing device according to claim 9, wherein the matrix reduction module determines a position of a matrix reduction constraint value in a reduction matrix according to a position of a preset reference value in the matrix to be reduced, reduces other elements of the matrix to be reduced except for the position of the preset reference value according to the matrix reduction constraint value, obtains a reduction matrix, and determines the reduction matrix as a multiplication result of an input multiplicand matrix and an input multiplier matrix.

13. The matrix multiplication computing device of claim 12, wherein a position of the matrix reduction constraint value in the reduction matrix is the same as a position of the preset reference value in the matrix to be reduced.

14. The matrix multiplication computing device of claim 13, wherein the matrix reduction constraint values include constraint maximum values and constraint minimum values, the preset reference values include element maximum values and element minimum values,

The matrix reduction module determines the constraint maximum value as a reduction element maximum value of the reduction matrix, determines the position of the element maximum value in the matrix to be reduced as the position of the constraint maximum value in the reduction matrix, determines the constraint minimum value as a reduction element minimum value of the reduction matrix, determines the position of the element minimum value in the matrix to be reduced as the position of the constraint minimum value in the reduction matrix, and reduces the other elements according to the reduction element maximum value, the reduction element minimum value, the constraint maximum value and the constraint minimum value to obtain the reduction matrix.

15. The matrix multiplication computing device of claim 14, wherein the matrix reduction module reduces any other element in the matrix to be reduced by:

16. The matrix multiplication computing device of claim 9, wherein the input multiplicand matrix and the input multiplier matrix comprise floating point matrices, and the first multiplication matrix and the second multiplication matrix comprise integer matrices.

17. A computing device, the computing device comprising:

A processor;

Memory storing a computer program which, when executed by a processor, implements a matrix multiplication method according to any one of claims 1-8.

18. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the matrix multiplication method according to any one of claims 1-8.