CN117786297A - Operator operation method, device, electronic equipment and storage medium - Google Patents

Operator operation method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117786297A
CN117786297A CN202311827823.7A CN202311827823A CN117786297A CN 117786297 A CN117786297 A CN 117786297A CN 202311827823 A CN202311827823 A CN 202311827823A CN 117786297 A CN117786297 A CN 117786297A
Authority
CN
China
Prior art keywords
operator
matrix
identity matrix
tensor
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311827823.7A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bi Ren Technology Co ltd
Original Assignee
Shanghai Bi Ren Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bi Ren Technology Co ltd filed Critical Shanghai Bi Ren Technology Co ltd
Priority to CN202311827823.7A priority Critical patent/CN117786297A/en
Publication of CN117786297A publication Critical patent/CN117786297A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention provides an operator operation method, an operator operation device, electronic equipment and a storage medium, wherein the method comprises the following steps: in response to operator operation corresponding to the operator to be operated, the two axes of the innermost layer are exchanged, and an identity matrix is constructed based on input tensor and output tensor corresponding to the operator to be operated; and on the tensor kernel, performing matrix multiplication operation based on the transposed matrix and the identity matrix of the input tensor to obtain an operation result. According to the operator operation method, the device, the electronic equipment and the storage medium, the two axes of the innermost layer are exchanged in response to the operator operation corresponding to the operator to be operated, and the identity matrix is constructed based on the input tensor and the output tensor corresponding to the operator to be operated, so that matrix multiplication operation can be performed on tensor cores with higher calculation force based on the transposed matrix of the input tensor and the identity matrix, an operation result is obtained rapidly, and the operator operation speed is improved.

Description

Operator operation method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of neural network reasoning technologies, and in particular, to an operator operation method, an apparatus, an electronic device, and a storage medium.
Background
The operator performance optimization refers to targeted optimization of basic computation operators (such as convolution operators (cov operators), rearrangement operators (permite operators) and the like) in the deep learning model so as to improve the computation efficiency and performance of the basic computation operators, namely the operator performance optimization aims to enable the deep learning model to complete computation tasks more quickly in the training and reasoning process, so that the overall efficiency and performance are improved.
In the related art, operator performance optimization is achieved by performing operator elimination on operators to be operated, but not all scenes can perform operator elimination, for example, certain operators have a dependency relationship on the whole model structure, and eliminating the operators may damage the integrity and accuracy of the model, that is, under certain scenes, operator elimination may cause performance degradation of the model or generate erroneous output.
Disclosure of Invention
The invention provides an operator operation method, an operator operation device, electronic equipment and a storage medium, which are used for solving the defect that the operator performance optimization cannot be applied to all scenes in the prior art by an operator elimination mode.
The invention provides an operator operation method, which comprises the following steps:
responding to the operator operation corresponding to the operator to be operated, namely, exchanging the two axes of the innermost layer, and constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated;
and on the tensor core, performing matrix multiplication operation based on the transposed matrix of the input tensor and the identity matrix to obtain an operation result.
According to the operator operation method provided by the invention, the construction of the identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated comprises the following steps:
the identity matrix is constructed based on the shape of the input tensor and the shape of the output tensor.
According to the operator operation method provided by the invention, the shape of the input tensor comprises the number of lines of the input tensor and the number of columns of the input tensor, and the shape of the output tensor comprises the number of lines of the output tensor and the number of columns of the output tensor;
the constructing the identity matrix based on the shape of the input tensor and the shape of the output tensor includes:
constructing a first candidate identity matrix based on the number of rows of the input tensor or the number of columns of the output tensor;
constructing a second candidate identity matrix based on the number of columns of the input tensor or the number of rows of the output tensor;
and determining the identity matrix from the first candidate identity matrix and the second candidate identity matrix based on the operation cost corresponding to the first candidate identity matrix and the operation cost corresponding to the second candidate identity matrix.
According to the operator operation method provided by the invention, the determining the identity matrix from the first candidate identity matrix and the second candidate identity matrix based on the operation cost corresponding to the first candidate identity matrix and the operation cost corresponding to the second candidate identity matrix comprises the following steps:
and selecting the candidate identity matrix with the minimum operation cost from the first candidate identity matrix and the second candidate identity matrix as the identity matrix.
According to the operator operation method provided by the invention, the operation cost of the first candidate identity matrix is determined based on the following steps:
determining a first matrix multiplication operand of the transposed matrix of the input tensor and the first candidate identity matrix based on the shape of the transposed matrix of the input tensor and the shape of the first candidate identity matrix;
determining a first amount of occupied memory of the first candidate identity matrix based on a shape of the first candidate identity matrix;
and determining the operation cost of the first candidate identity matrix based on the first matrix multiplication operation amount and the first occupied memory amount.
According to the operator operation method provided by the invention, the operation cost of the second candidate identity matrix is determined based on the following steps:
determining a second matrix multiplication operand of the transposed matrix of the input tensor and the second candidate identity matrix based on the shape of the transposed matrix of the input tensor and the shape of the second candidate identity matrix;
determining a second amount of occupied memory for the second candidate identity matrix based on the shape of the second candidate identity matrix;
and determining the operation cost of the second candidate identity matrix based on the second matrix multiplication operation amount and the second occupied memory amount.
According to the operator operation method provided by the invention, the matrix multiplication operation is performed on the basis of the transposed matrix of the input tensor and the identity matrix to obtain an operation result, and the method further comprises the following steps:
and replacing the operator to be operated with a matrix multiplication operator.
According to the operator operation method provided by the invention, the operator to be operated is a rearrangement operator.
The invention also provides an operator operation method device, which comprises the following steps:
the construction unit is used for responding to the operator operation corresponding to the operator to be operated to swap the two axes of the innermost layer and constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated;
and the operation unit is used for performing matrix multiplication operation on the tensor kernel based on the transposed matrix of the input tensor and the identity matrix to obtain an operation result.
The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the operator operation method is realized by the processor when the processor executes the computer program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an operator operation method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements an operator operation method as described in any one of the above.
According to the operator operation method, the device, the electronic equipment and the storage medium, the two axes of the innermost layer are exchanged in response to the operator operation corresponding to the operator to be operated, and the identity matrix is constructed based on the input tensor and the output tensor corresponding to the operator to be operated, so that matrix multiplication operation can be performed on tensor cores with higher calculation force based on the transposed matrix of the input tensor and the identity matrix, an operation result is obtained rapidly, and the operator operation speed is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an operator operation method provided by the invention;
FIG. 2 is a schematic diagram of a construction flow of a first candidate identity matrix according to the present invention;
FIG. 3 is a schematic diagram of a construction flow of a second candidate identity matrix according to the present invention;
FIG. 4 is a schematic flow chart of a method for determining the operation cost of the first candidate identity matrix according to the present invention;
FIG. 5 is a schematic flow chart of a method for determining the operation cost of the second candidate identity matrix according to the present invention;
FIG. 6 is a schematic diagram of an operator computing device according to the present invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the related art, operator performance optimization is achieved by performing operator elimination on operators to be operated, but not all scenes can perform operator elimination, for example, certain operators have a dependency relationship on the whole model structure, and eliminating the operators may damage the integrity and accuracy of the model, that is, under certain scenes, operator elimination may cause performance degradation of the model or generate erroneous output. Here, the operator to be operated on may be a permite operator.
In addition, for the operators that cannot be eliminated in the related art, a vector core (vector core) is generally adopted to implement the corresponding operator operation, but the computational power of the vector core is lower, so that the operator operation speed is slower.
In this regard, the present invention provides an operator operation method. Fig. 1 is a flow chart of an operator operation method provided by the present invention, as shown in fig. 1, the method includes the following steps:
and 110, responding to the operator operation corresponding to the operator to be operated, namely, exchanging the two axes of the innermost layer, and constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated.
And 120, performing matrix multiplication operation on the tensor kernel based on the transpose matrix and the identity matrix of the input tensor to obtain an operation result.
Here, the operator to be operated is an operator that needs to perform an operator operation, and for example, the operator to be operated may be a permite operator. The input tensor corresponding to the operator to be operated refers to the original data of the operator operation to be operated by the operator to be operated, which can be generally understood as the input of the model or the output of the upper layer. The output tensor corresponding to the operator to be operated refers to the result data obtained by operating the operator on the input tensor, which can be generally understood as the output of the model or the input of the next layer.
In addition, the operator operation corresponding to the operator to be operated on may be understood as a specific operation performed on the input tensor corresponding to the operator to be operated on, and the operation may include a mathematical operation, a vectorization operation, a convolution operation, a pooling operation, and the like. In other words, the operator operation of the operator to be operated on is for transforming and processing the input tensor to obtain the output tensor.
Considering that the computational power of Tensor core is much higher than that of vector core, in order to increase the operation speed, the embodiment of the invention performs operator operation on Tensor core. However, the form of the operation performed on the tensor kernel is A T E (wherein A is an arbitrary matrix, A T Transposed matrix of a, E is the identity matrix), that is, the requirement for performing the corresponding operator operations on the tensor kernel is: (1) operator operation corresponding to the operator to be operated is tensor transposition operation; (2) and constructing to obtain a corresponding identity matrix.
In contrast, when the operator operation corresponding to the operator to be operated is to swap the two axes of the innermost layer, the operator operation performed on the input tensor is shown to be tensor transposition operation, and the requirement (1) of performing the corresponding operator operation on the tensor core is met.
For example, the shape of the input tensor is (a, B, C, D), and the shape of the output tensor is (a, B, D, C), that is, the operator operation corresponding to the operator to be operated is to perform the swap on the two axes of the innermost layer of the input tensor, so as to obtain the output tensor, that is, perform the transpose operation on the input tensor.
On the basis of meeting the requirement (1), constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated so as to meet the requirement (2). In constructing the identity matrix, the following conditions need to be satisfied: when the transposed matrix of the input tensor and the identity matrix are subjected to matrix multiplication operation, the shape of the matrix corresponding to the obtained operation result is the same as the shape of the output tensor. The unit matrix may be a left matrix or a right matrix when performing matrix multiplication operation, which is not particularly limited in the embodiment of the present invention.
After the unit matrix is constructed, the calculation power of the tensor kernel is higher, so that matrix multiplication operation is performed on the tensor kernel based on the transposed matrix and the unit matrix of the input tensor, and an operation result can be rapidly obtained.
According to the operator operation method provided by the embodiment of the invention, in response to the operator operation corresponding to the operator to be operated, the two axes of the innermost layer are exchanged, and the identity matrix is constructed based on the input tensor and the output tensor corresponding to the operator to be operated, so that matrix multiplication operation can be performed on tensor cores with higher calculation force based on the transposed matrix of the input tensor and the identity matrix, an operation result is rapidly obtained, and the operator operation speed is improved.
As an alternative embodiment, the operator to be operated on may be an operator in a text generation model, and the input tensor may be determined based on the input text of the text generation model, i.e. the input tensor is related to the input text. Further, input text is input into the text generation model, an input tensor is extracted, and an operator operation corresponding to an operator to be operated is performed on the input tensor by adopting the method of the embodiment, so that an output tensor can be quickly obtained, and an output text corresponding to the input text is quickly obtained based on the output tensor, that is, the text generation efficiency of the text generation model is improved by adopting the method of the embodiment.
Based on the above embodiment, constructing an identity matrix based on an input tensor and an output tensor corresponding to an operator to be operated, including:
an identity matrix is constructed based on the shape of the input tensor and the shape of the output tensor.
Specifically, the shape of the input tensor is used to characterize the size of the input tensor in each dimension, such as the size of the input tensor in the row direction (i.e., the number of rows of the input tensor), and the size of the input tensor in the column direction (i.e., the number of columns of the input tensor). Similarly, the shape of the output tensor is used to characterize the size of the output tensor in each dimension, such as the size of the output tensor in the row direction (i.e., the number of rows of the output tensor), and the size of the output tensor in the column direction (i.e., the number of columns of the output tensor).
When constructing the identity matrix, in order to realize the logical equivalent operator semantic expression of the input tensor and the output tensor, the following conditions need to be satisfied: when the transposed matrix of the input tensor and the identity matrix are subjected to matrix multiplication operation, the shape of the matrix corresponding to the obtained operation result is the same as the shape of the output tensor.
Alternatively, when the unit matrix is a left matrix, the matrix corresponding to the operation result is "the unit matrix×the transposed matrix of the input tensor", and the matrix shape is required to be the same as the shape of the output tensor. When the unit matrix is a right matrix, the matrix corresponding to the operation result is "transposed matrix of input tensor×unit matrix", and the matrix shape is the same as the shape of output tensor.
Based on any of the above embodiments, the shape of the input tensor includes a number of rows of the input tensor and a number of columns of the input tensor, and the shape of the output tensor includes a number of rows of the output tensor and a number of columns of the output tensor;
constructing an identity matrix based on the shape of the input tensor and the shape of the output tensor, comprising:
constructing a first candidate identity matrix based on the number of rows of the input tensor or the number of columns of the output tensor;
constructing a second candidate identity matrix based on the number of columns of the input tensor or the number of rows of the output tensor;
and determining the identity matrix from the first candidate identity matrix and the second candidate identity matrix based on the operation cost corresponding to the first candidate identity matrix and the operation cost corresponding to the second candidate identity matrix.
Specifically, since the operator operation corresponding to the operator to be operated is to swap the two axes of the innermost layer, that is, swap the rows and columns of the input tensor, the output tensor is obtained, that is, the number of rows of the input tensor is the same as the number of columns of the output tensor, and the number of columns of the input tensor is the same as the number of rows of the output tensor.
Based on the number of rows of the input tensor or the number of columns of the output tensor, the constructed first candidate unit matrix is a right matrix, that is, the number of rows and the number of columns of the first candidate unit matrix are the same, and the number of rows and the number of columns of the first candidate unit matrix are equal to the number of rows of the input tensor and the number of columns of the output tensor.
Based on the number of columns of the input tensor or the number of columns of the output tensor, the constructed second candidate unit matrix is a left matrix, that is, the number of columns of the second candidate unit matrix is the same as the number of columns of the input tensor, and the number of columns of the second candidate unit matrix is equal to the number of columns of the output tensor.
Fig. 2 is a schematic diagram of a construction flow of a first candidate unit matrix provided in the present invention, as shown in fig. 2, the shape of an input tensor input0 is [128,256], the shape of an output tensor input1 is [256,128], an operator to be operated is a rearrangement operator (permite), the shape of a transpose matrix input 0T corresponding to the input tensor input0 is [256,128], the constructed first candidate unit matrix ey 0 is a right matrix, the shape of which is [128,128], and further the shape of a result matrix output obtained after the transpose matrix of the input tensor input0 and the first candidate unit matrix ey 0 perform a matrix multiplication operator operation (gemm) is [256,128], which is the same as the shape of the output tensor input 1.
Fig. 3 is a schematic diagram of a construction flow for the second candidate unit matrix according to the present invention, as shown in fig. 3, the shape of the input tensor input0 is [128,256], the shape of the output tensor input1 is [256,128], the operator to be operated is a rearrangement operator (permite), the shape of the transpose matrix input 0T corresponding to the input tensor input0 is [256,128], the constructed first candidate unit matrix ey 1 is a left matrix, the shape thereof is [256,256], and the shape of the result matrix output obtained after the transpose matrix of the input tensor input0 and the first candidate unit matrix ey 0 perform matrix multiplication operator operation (gemm) is [256,128], which is the same as the shape of the output tensor input 1.
Furthermore, the operational costs may generally include computational costs and storage costs. The calculation cost is used for representing the calculation complexity of the corresponding operation, for example, the larger the calculation amount is, the larger the calculation complexity is, and the larger the corresponding calculation cost is. The memory cost is used for representing the memory quantity occupied by the corresponding operation, and the larger the occupied memory quantity is, the larger the memory cost is.
When the identity matrix is selected, the candidate identity matrix with the minimum operation cost can be selected from the first candidate identity matrix and the second candidate identity matrix to serve as the identity matrix, so that the minimum calculation complexity of operation and the minimum occupied memory amount can be ensured, and the operation speed can be improved.
Based on any of the above embodiments, the operation cost of the first candidate identity matrix is determined based on the following steps:
determining a first matrix multiplication operand of the transposed matrix of the input tensor and the first candidate identity matrix based on the shape of the transposed matrix of the input tensor and the shape of the first candidate identity matrix;
determining a first occupied memory quantity of the first candidate identity matrix based on the shape of the first candidate identity matrix;
and determining the operation cost of the first candidate identity matrix based on the first matrix multiplication operation amount and the first occupied memory amount.
Specifically, the matrix multiplication operation amount of the transposed matrix of the input tensor and the first candidate unit matrix can be determined from the shape of the transposed matrix of the input tensor and the shape of the first candidate unit matrix. For example, if the transposed matrix of the input tensor has a shape of mxn and the first candidate unit matrix has a shape of nxn, the first matrix multiplication amount is: the number of m x n is calculated, and the number of times of calculation corresponding to each number is n x (n-1).
The amount of occupied memory of the first candidate identity matrix may be determined based on the shape of the first candidate identity matrix. For example, if the first candidate unit matrix has an n×n shape, the first occupied amount is n×n.
And finally, according to the first matrix multiplication operation amount and the first occupied memory amount, the operation cost of the first candidate identity matrix can be determined. Alternatively, the operation cost of the first candidate identity matrix may be determined based on a functional relationship of the operation cost with the matrix multiplication operand and the occupied memory. The functional relation can be obtained by fitting and analyzing different matrix multiplication operation amounts, different occupied internal storage amounts and corresponding operation cost.
Correspondingly, based on the same method for determining the operation cost of the first candidate identity matrix, the operation cost of the second candidate identity matrix is determined:
determining a second matrix multiplication operand of the transposed matrix of the input tensor and the second candidate identity matrix based on the shape of the transposed matrix of the input tensor and the shape of the second candidate identity matrix;
determining a second occupied memory quantity of the second candidate identity matrix based on the shape of the second candidate identity matrix;
and determining the operation cost of the second candidate identity matrix based on the second matrix multiplication operation amount and the second occupied memory amount.
Fig. 4 is a flow chart of a method for determining an operation cost of a first candidate identity matrix according to the present invention, where as shown in fig. 4, the first matrix multiplication operand is: 256×128 numbers are calculated, and the number of times of calculation corresponding to each number is 128× (128-1). The first occupied memory quantity is the memory quantity occupied by the first candidate identity matrix eye 0: 128 x 128.
Fig. 5 is a flow chart of a method for determining an operation cost of a second candidate identity matrix according to the present invention, where, as shown in fig. 5, the second matrix multiplication operand is: 256×128 numbers are calculated, and the number of times of calculation corresponding to each number is 256× (256-1). The second occupied memory amount is the memory amount occupied by the second candidate identity matrix eye 1: 256×256.
As can be seen from fig. 5 and fig. 6, the first matrix multiplication operand is smaller than the second matrix multiplication operand, and the first occupied memory is smaller than the second occupied memory, so that the operation cost corresponding to the first candidate unit matrix is smaller than the operation cost corresponding to the second candidate unit matrix, and the first candidate unit matrix with smaller operation cost can be selected as the unit matrix.
Based on any of the above embodiments, performing matrix multiplication operation based on the transposed matrix and the identity matrix of the input tensor to obtain an operation result, and further including:
and replacing the operator to be operated with a matrix multiplication operator.
Specifically, since matrix multiplication is performed on the transposed matrix and the unit matrix of the input tensor, the operator to be calculated is replaced by a matrix multiplication operator (gemm), so that matrix multiplication can be performed based on the transposed matrix and the unit matrix of the input tensor, and an operation result can be obtained. Wherein the operator to be operated is a rearrangement operator (permite).
Table 1 is a performance comparison list of the operator operations performed by using the vector core and the tensor core, as shown in table 1, the performance multiple=the clock cycle number (cycle number) operated on the tensor core/the clock cycle number (cycle number) operated on the vector core, and as can be seen from table 1, the clock cycle operated on the tensor core is significantly less than the clock cycle operated on the vector core, that is, the operation speed is higher when operated on the tensor core.
TABLE 1
The operator operation device provided by the invention is described below, and the operator operation device described below and the operator operation method described above can be referred to correspondingly.
Based on any of the above embodiments, fig. 6 is a schematic structural diagram of an operator computing device provided by the present invention, as shown in fig. 6, the device includes:
a construction unit 610, configured to construct an identity matrix based on an input tensor and an output tensor corresponding to an operator to be operated in response to an operator operation corresponding to the operator to be operated being to swap two axes of an innermost layer;
the operation unit 620 is configured to perform a matrix multiplication operation on the tensor kernel based on the transpose matrix and the identity matrix of the input tensor, to obtain an operation result.
Based on any of the above embodiments, constructing an identity matrix based on an input tensor and an output tensor corresponding to an operator to be operated, including:
an identity matrix is constructed based on the shape of the input tensor and the shape of the output tensor.
Based on any of the above embodiments, the shape of the input tensor includes a number of rows of the input tensor and a number of columns of the input tensor, and the shape of the output tensor includes a number of rows of the output tensor and a number of columns of the output tensor;
constructing an identity matrix based on the shape of the input tensor and the shape of the output tensor, comprising:
constructing a first candidate identity matrix based on the number of rows of the input tensor or the number of columns of the output tensor;
constructing a second candidate identity matrix based on the number of columns of the input tensor or the number of rows of the output tensor;
and determining the identity matrix from the first candidate identity matrix and the second candidate identity matrix based on the operation cost corresponding to the first candidate identity matrix and the operation cost corresponding to the second candidate identity matrix.
Based on any of the above embodiments, determining an identity matrix from the first candidate identity matrix and the second candidate identity matrix based on the operation cost corresponding to the first candidate identity matrix and the operation cost corresponding to the second candidate identity matrix includes:
and selecting the candidate identity matrix with the minimum operation cost from the first candidate identity matrix and the second candidate identity matrix as the identity matrix.
Based on any of the above embodiments, the operation cost of the first candidate identity matrix is determined based on the following steps:
determining a first matrix multiplication operand of the transposed matrix of the input tensor and the first candidate identity matrix based on the shape of the transposed matrix of the input tensor and the shape of the first candidate identity matrix;
determining a first occupied memory quantity of the first candidate identity matrix based on the shape of the first candidate identity matrix;
determining the operation cost of the first candidate identity matrix based on the first matrix multiplication operation amount and the first occupied memory amount;
accordingly, the operational cost of the second candidate identity matrix is determined based on the following steps:
determining a second matrix multiplication operand of the transposed matrix of the input tensor and the second candidate identity matrix based on the shape of the transposed matrix of the input tensor and the shape of the second candidate identity matrix;
determining a second occupied memory quantity of the second candidate identity matrix based on the shape of the second candidate identity matrix;
and determining the operation cost of the second candidate identity matrix based on the second matrix multiplication operation amount and the second occupied memory amount.
Based on any of the above embodiments, performing matrix multiplication operation based on the transposed matrix and the identity matrix of the input tensor to obtain an operation result, and further including:
and replacing the operator to be operated with a matrix multiplication operator.
Based on any of the above embodiments, the operator to be operated is a re-arranged operator.
Fig. 7 is a schematic structural diagram of an electronic device according to the present invention, and as shown in fig. 7, the electronic device may include: processor 710, memory 720, communication interface (Communications Interface) 730, and communication bus 740, wherein processor 710, memory 720, and communication interface 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 720 to perform an operator operation method comprising: responding to the operator operation corresponding to the operator to be operated, namely, exchanging the two axes of the innermost layer, and constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated; and on the tensor core, performing matrix multiplication operation based on the transposed matrix of the input tensor and the identity matrix to obtain an operation result.
Further, the logic instructions in the memory 720 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the operator operation method provided by the above methods, the method comprising: responding to the operator operation corresponding to the operator to be operated, namely, exchanging the two axes of the innermost layer, and constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated; and on the tensor core, performing matrix multiplication operation based on the transposed matrix of the input tensor and the identity matrix to obtain an operation result.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the operator operation methods provided above, the method comprising: responding to the operator operation corresponding to the operator to be operated, namely, exchanging the two axes of the innermost layer, and constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated; and on the tensor core, performing matrix multiplication operation based on the transposed matrix of the input tensor and the identity matrix to obtain an operation result.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (11)

1. An operator operation method, comprising:
responding to the operator operation corresponding to the operator to be operated, namely, exchanging the two axes of the innermost layer, and constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated;
and on the tensor core, performing matrix multiplication operation based on the transposed matrix of the input tensor and the identity matrix to obtain an operation result.
2. The operator operation method according to claim 1, wherein the constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated includes:
the identity matrix is constructed based on the shape of the input tensor and the shape of the output tensor.
3. The operator operation method according to claim 2, wherein the shape of the input tensor includes a number of rows of the input tensor and a number of columns of the input tensor, and the shape of the output tensor includes a number of rows of the output tensor and a number of columns of the output tensor;
the constructing the identity matrix based on the shape of the input tensor and the shape of the output tensor includes:
constructing a first candidate identity matrix based on the number of rows of the input tensor or the number of columns of the output tensor;
constructing a second candidate identity matrix based on the number of columns of the input tensor or the number of rows of the output tensor;
and determining the identity matrix from the first candidate identity matrix and the second candidate identity matrix based on the operation cost corresponding to the first candidate identity matrix and the operation cost corresponding to the second candidate identity matrix.
4. The operator operation method according to claim 3, wherein the determining the identity matrix from the first candidate identity matrix and the second candidate identity matrix based on the operation cost corresponding to the first candidate identity matrix and the operation cost corresponding to the second candidate identity matrix includes:
and selecting the candidate identity matrix with the minimum operation cost from the first candidate identity matrix and the second candidate identity matrix as the identity matrix.
5. The operator operation method according to claim 3, wherein the operation cost of the first candidate identity matrix is determined based on the steps of:
determining a first matrix multiplication operand of the transposed matrix of the input tensor and the first candidate identity matrix based on the shape of the transposed matrix of the input tensor and the shape of the first candidate identity matrix;
determining a first amount of occupied memory of the first candidate identity matrix based on a shape of the first candidate identity matrix;
and determining the operation cost of the first candidate identity matrix based on the first matrix multiplication operation amount and the first occupied memory amount.
6. The operator operation method according to claim 3, wherein the operation cost of the second candidate identity matrix is determined based on the steps of:
determining a second matrix multiplication operand of the transposed matrix of the input tensor and the second candidate identity matrix based on the shape of the transposed matrix of the input tensor and the shape of the second candidate identity matrix;
determining a second amount of occupied memory for the second candidate identity matrix based on the shape of the second candidate identity matrix;
and determining the operation cost of the second candidate identity matrix based on the second matrix multiplication operation amount and the second occupied memory amount.
7. The operator operation method according to any one of claims 1 to 6, wherein the performing matrix multiplication operation based on the transpose matrix of the input tensor and the identity matrix to obtain an operation result further includes:
and replacing the operator to be operated with a matrix multiplication operator.
8. The operator operation method according to any one of claims 1 to 6, wherein the operator to be operated is a rearrangement operator.
9. An operator arithmetic device, comprising:
the construction unit is used for responding to the operator operation corresponding to the operator to be operated to swap the two axes of the innermost layer and constructing an identity matrix based on the input tensor and the output tensor corresponding to the operator to be operated;
and the operation unit is used for performing matrix multiplication operation on the tensor kernel based on the transposed matrix of the input tensor and the identity matrix to obtain an operation result.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the operator operation method according to any one of claims 1 to 8 when executing the computer program.
11. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the operator operation method according to any one of claims 1 to 8.
CN202311827823.7A 2023-12-27 2023-12-27 Operator operation method, device, electronic equipment and storage medium Pending CN117786297A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311827823.7A CN117786297A (en) 2023-12-27 2023-12-27 Operator operation method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311827823.7A CN117786297A (en) 2023-12-27 2023-12-27 Operator operation method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117786297A true CN117786297A (en) 2024-03-29

Family

ID=90388919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311827823.7A Pending CN117786297A (en) 2023-12-27 2023-12-27 Operator operation method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117786297A (en)

Similar Documents

Publication Publication Date Title
Bouboulis et al. Online distributed learning over networks in RKH spaces using random Fourier features
Su et al. Vitas: Vision transformer architecture search
CN107944545B (en) Computing method and computing device applied to neural network
CN105512723A (en) Artificial neural network calculating device and method for sparse connection
CN114764549B (en) Quantum circuit simulation calculation method and device based on matrix product state
CN107451097B (en) High-performance implementation method of multi-dimensional FFT on domestic Shenwei 26010 multi-core processor
CN108171328B (en) Neural network processor and convolution operation method executed by same
CN111461335B (en) MPI multi-process-based noise-containing single quantum logic gate implementation method and device
CN112200300A (en) Convolutional neural network operation method and device
CN110633785A (en) Method and system for calculating convolutional neural network
CN113344172A (en) Mapping convolutions to channel convolution engines
Zhang et al. Automatic design of deterministic and non-halting membrane systems by tuning syntactical ingredients
Chowdhury et al. A computational model for tensor core units
CN112613577A (en) Neural network training method and device, computer equipment and storage medium
CN112132281A (en) Model training method, device, server and medium based on artificial intelligence
CN117786297A (en) Operator operation method, device, electronic equipment and storage medium
Ivutin et al. Design efficient schemes of applied algorithms parallelization based on semantic Petri-Markov net
KR20230104235A (en) Method and system for convolution with workload-balanced activation sparsity
KR102372869B1 (en) Matrix operator and matrix operation method for artificial neural network
CN113297860A (en) Method, system, electronic device and storage medium for optimizing machine translation model
CN110825311B (en) Method and apparatus for storing data
CN112862071A (en) Data processing method and device
CN113869517A (en) Inference method based on deep learning model
CN114692847B (en) Data processing circuit, data processing method and related products
EP4273758A1 (en) Quantum preprocessing method and apparatus, storage medium, and electronic apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination