CN118132156B - Operator execution method, device, storage medium and program product - Google Patents
Operator execution method, device, storage medium and program product Download PDFInfo
- Publication number
- CN118132156B CN118132156B CN202410552100.9A CN202410552100A CN118132156B CN 118132156 B CN118132156 B CN 118132156B CN 202410552100 A CN202410552100 A CN 202410552100A CN 118132156 B CN118132156 B CN 118132156B
- Authority
- CN
- China
- Prior art keywords
- batch
- tasks
- calculation unit
- vector
- operator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000004364 calculation method Methods 0.000 claims abstract description 847
- 239000013598 vector Substances 0.000 claims abstract description 590
- 239000011159 matrix material Substances 0.000 claims abstract description 444
- 238000013473 artificial intelligence Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 24
- 230000007246 mechanism Effects 0.000 claims abstract description 10
- 238000004590 computer program Methods 0.000 claims description 24
- 101100134058 Caenorhabditis elegans nth-1 gene Proteins 0.000 claims description 11
- 230000008569 process Effects 0.000 abstract description 38
- 238000010586 diagram Methods 0.000 description 22
- 238000010606 normalization Methods 0.000 description 11
- 230000005540 biological transmission Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- DOIRQSBPFJWKBE-UHFFFAOYSA-N dibutyl phthalate Chemical compound CCCCOC(=O)C1=CC=CC=C1C(=O)OCCCC DOIRQSBPFJWKBE-UHFFFAOYSA-N 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
Abstract
The embodiment of the application provides an operator execution method, equipment, a storage medium and a program product, and relates to the technical field of artificial intelligence, wherein in the method, an artificial intelligent chip is used for processing N batches of tasks by adopting an attention mechanism operator; the attention mechanism operator at least comprises a first matrix operator, a vector operator and a second matrix operator; when the vector calculation unit adopts a vector operator to perform vector calculation on the nth batch of tasks, the tensor calculation unit adopts a first matrix operator to perform matrix calculation on the (n+i) th batch of tasks, or adopts a second matrix operator to perform matrix calculation on the (n-j) th batch of tasks. In the process, the tensor calculation unit and the vector calculation unit can process tasks of different batches at the same time, so that the waiting time of each other is avoided, the tensor calculation unit and the vector calculation unit are fully utilized, and the time expenditure caused by processing tasks of a plurality of batches by the attention mechanism operator can be reduced.
Description
Technical Field
Embodiments of the present invention relate to the field of artificial intelligence, and in particular, to an operator execution method, apparatus, storage medium, and program product.
Background
With the rapid development of artificial intelligence technology, large language models (Large Language Models, LLM) are widely used in the fields of natural language processing, computer vision, speech recognition, etc. Currently, large language models generally use a transducer structure as a core infrastructure, and the core computing part of the transducer structure is an Attention mechanism (Attention) operator.
When a large language model is used for reasoning a plurality of batch tasks, an Attention operator can sequentially perform matrix calculation and vector calculation for each batch task, but when the number of batch tasks to be inferred is too large, huge time expenditure is brought, and how to reduce the time expenditure brought by reasoning a large number of batch tasks is a technical problem which needs to be solved at present.
Disclosure of Invention
The embodiment of the application provides an operator execution method, operator execution equipment, a storage medium and a program product, which are used for reducing time expenditure caused by reasoning a large number of batch tasks.
In one aspect, an embodiment of the present application provides an operator execution method applied to an artificial intelligence chip including a tensor calculation unit and a vector calculation unit; the artificial intelligent chip is used for processing N batches of tasks by adopting an attention mechanism operator; the attention mechanism operator at least comprises a first matrix operator, a vector operator and a second matrix operator which are sequentially executed on any batch of tasks in the N batches of tasks; n is an integer greater than 1; comprising the following steps:
When the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks, the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the n+i batch of tasks, or the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch of tasks; the value range of n is [1, N ], and i and j are integers greater than 0.
Optionally, the vector calculation unit performs vector calculation on the nth batch of tasks by adopting the vector operator, including:
The vector calculation unit receives a first instruction corresponding to the nth batch of tasks sent by the tensor calculation unit, and the vector calculation unit finishes the vector calculation of the nth-1 batch of tasks, so that the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks; the first instruction corresponding to the nth batch of tasks is used for indicating the tensor calculation unit to finish matrix calculation of the nth batch of tasks by adopting the first matrix operator.
Optionally, the tensor calculation unit performs matrix calculation on the n+i batch task by using the first matrix operator, including:
The tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n+i-x batch task, and then adopts the first matrix operator to perform matrix calculation on the n+i batch task; x is an integer greater than 0.
Optionally, the tensor calculation unit performs matrix calculation on the n+i batch task by using the first matrix operator, including:
And after the matrix calculation of the n+i-1 batch of tasks is finished by the tensor calculation unit through the first matrix operator, the matrix calculation of the n+i batch of tasks is performed by the tensor calculation unit through the first matrix operator.
Optionally, the tensor calculation unit performs matrix calculation on the n-j batch of tasks by using the second matrix operator, including:
the tensor calculation unit receives a second instruction corresponding to the n-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the n-j+y batch task, so that the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch task; the second instruction corresponding to the n-j batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the n-j batch of tasks by adopting the vector operator; y is an integer greater than 0.
Optionally, the tensor calculation unit performs matrix calculation on the n-j batch of tasks by using the second matrix operator, including:
The tensor calculation unit receives a second instruction corresponding to the nth-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the nth batch task, so that the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the first batch task; or alternatively
The tensor calculation unit receives a second instruction corresponding to the n-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch task after matrix calculation on the n-j-1 batch task is finished, and adopts the second matrix operator to perform matrix calculation on the n-j batch task;
And the second instruction corresponding to the n-j batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the n-j batch of tasks by adopting the vector operator.
Optionally, before the vector calculation unit performs vector calculation on the nth batch of tasks by adopting the vector operator, the vector calculation unit further includes:
And the tensor calculation unit adopts the first matrix operator to finish matrix calculation of the nth batch of tasks, and sends a first instruction corresponding to the nth batch of tasks to the vector calculation unit.
Optionally, after the vector calculation unit performs vector calculation on the nth batch of tasks by using the vector operator, N is equal to N, and the method further includes:
The tensor calculation unit receives a second instruction corresponding to the Nth batch of tasks sent by the vector calculation unit, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the Nth batch of tasks after matrix calculation on the Nth-1 th batch of tasks is finished; and the second instruction corresponding to the Nth batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the Nth batch of tasks by adopting the vector operator.
In one aspect, an embodiment of the present application provides an operator execution apparatus, including:
the tensor calculation module is used for carrying out matrix calculation on the n+i batch of tasks by adopting a first matrix operator when the vector calculation unit carries out vector calculation on the n batch of tasks by adopting a vector operator, or carrying out matrix calculation on the n-j batch of tasks by adopting a second matrix operator; the value range of N is [1, N ], i and j are integers greater than 0, and N is an integer greater than 1.
Optionally, the system further comprises a vector calculation module, wherein the vector calculation module is specifically used for: the vector calculation unit receives a first instruction corresponding to the nth batch of tasks sent by the tensor calculation unit, and the vector calculation unit finishes the vector calculation of the nth-1 batch of tasks, so that the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks; the first instruction corresponding to the nth batch of tasks is used for indicating the tensor calculation unit to finish matrix calculation of the nth batch of tasks by adopting the first matrix operator.
Optionally, the tensor calculation module is specifically configured to:
The tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n+i-x batch task, and then adopts the first matrix operator to perform matrix calculation on the n+i batch task; x is an integer greater than 0.
Optionally, the tensor calculation module is specifically configured to:
And after the matrix calculation of the n+i-1 batch of tasks is finished by the tensor calculation unit through the first matrix operator, the matrix calculation of the n+i batch of tasks is performed by the tensor calculation unit through the first matrix operator.
Optionally, the tensor calculation module is specifically configured to:
the tensor calculation unit receives a second instruction corresponding to the n-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the n-j+y batch task, so that the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch task; the second instruction corresponding to the n-j batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the n-j batch of tasks by adopting the vector operator; y is an integer greater than 0.
Optionally, the tensor calculation module is specifically configured to:
The tensor calculation unit receives a second instruction corresponding to the nth-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the nth batch task, so that the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the first batch task; or alternatively
The tensor calculation unit receives a second instruction corresponding to the n-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch task after matrix calculation on the n-j-1 batch task is finished, and adopts the second matrix operator to perform matrix calculation on the n-j batch task;
And the second instruction corresponding to the n-j batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the n-j batch of tasks by adopting the vector operator.
Optionally, before the vector calculation unit performs vector calculation on the nth batch of tasks by using the vector operator, the tensor calculation module is further configured to:
And the tensor calculation unit adopts the first matrix operator to finish matrix calculation of the nth batch of tasks, and sends a first instruction corresponding to the nth batch of tasks to the vector calculation unit.
Optionally, after the vector calculation unit performs vector calculation on the nth batch of tasks by using the vector operator, N is equal to N, and the tensor calculation module is further configured to:
The tensor calculation unit receives a second instruction corresponding to the Nth batch of tasks sent by the vector calculation unit, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the Nth batch of tasks after matrix calculation on the Nth-1 th batch of tasks is finished; and the second instruction corresponding to the Nth batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the Nth batch of tasks by adopting the vector operator.
In one aspect, an embodiment of the present application provides a computer device, including a memory, an artificial intelligence chip, and a computer program stored in the memory and capable of running on the artificial intelligence chip, where the steps of the operator execution method described above are implemented when the computer program is executed by the artificial intelligence chip.
In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program executable by a computer device, which when run on the computer device causes the computer device to perform the steps of the operator execution method described above.
In one aspect, embodiments of the present application provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer device, cause the computer device to perform the steps of the operator performing method described above.
In the embodiment of the application, in the process of processing N batches of tasks by using the attention mechanism operator, when vector calculation is performed on the nth batch of tasks by using the vector operator, the tensor calculation unit does not wait for the vector calculation result of the nth batch of tasks and further leads to an idle state, but performs matrix calculation on the n+i batch of tasks by using the first matrix operator or performs matrix calculation on the N-j batch of tasks by using the second matrix operator. In the process, the tensor calculation unit and the vector calculation unit can process tasks of different batches at the same time, so that the waiting time of each other is avoided, the tensor calculation unit and the vector calculation unit are fully utilized, and the time expenditure caused by processing tasks of a plurality of batches by the attention mechanism operator can be reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic structural diagram of an artificial intelligence chip according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an attribute operator according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a computing process of an Attention operator according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an operator executing apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, which is a block diagram of an artificial intelligence chip to which an embodiment of the present application is applied, the artificial intelligence chip 100 includes at least: the system comprises a video memory 101 and a plurality of execution units 102, wherein each execution unit 102 comprises: the on-chip cache 103, the register 104, the tensor calculation unit 105 and the vector calculation unit 106, wherein the tensor calculation unit 105 and the vector calculation unit 106 are heterogeneous cores.
The video memory 101 may be a high bandwidth memory (High Bandwidth Memory, abbreviated as HBM) or other types of memory. The on-chip cache 103 is a temporary memory that has less capacity than the memory 101 but has a faster data exchange speed than the memory 101.
On-chip cache 103: comprises a first buffer (i.e. a common matrix main buffer (Gemm Main Buffer, abbreviated GMB)) shared by the tensor calculation unit 105 and the vector calculation unit 106, and a second buffer (bufA) and a third buffer (bufB) for buffering input data of the tensor calculation unit 105.
The register 104 is a register shared by the tensor calculation unit 105 and the vector calculation unit 106, and the tensor calculation unit 105 and the vector calculation unit 106 can communicate through the register 104. The capacity of the register 104 is smaller than the on-chip cache 103, but the data exchange speed is faster than the on-chip cache 103, compared to the on-chip cache 103.
If the capacity of the register 104 can satisfy the data transfer between the tensor calculation unit 105 and the vector calculation unit 106, the data between the tensor calculation unit 105 and the vector calculation unit 106 is transferred through the register 104.
If the capacity of the register 104 cannot satisfy the data transfer between the tensor calculation unit 105 and the vector calculation unit 106, the data between the tensor calculation unit 105 and the vector calculation unit 106 is transferred through the on-chip cache 103.
If the capacity of the on-chip cache 103 cannot meet the data transfer between the tensor calculation unit 105 and the vector calculation unit 106, the data between the tensor calculation unit 105 and the vector calculation unit 106 is transferred through the memory 101.
The artificial intelligence chip 100 of the present application may include other structures in addition to the above-described structures, and the present application is not particularly limited thereto.
The artificial intelligence chip 100 may be: central processing units (Central Processing Unit, CPU for short), graphics processing units (Graphics Processing Unit, GPU for short), general-purpose graphics processing units (General-purpose Computing on Graphics Processing Units, GPGPU for short, domain specific architecture (Domain Specific Architecture, DSA for short), etc.
In practical application, one of the structures of the Attention operator is shown in fig. 2, and the Attention operator includes a first matrix operator, a scaling operator (Scale), a Mask operator (Mask), a normalization operator (SoftMax) and a second matrix operator. Wherein the first matrix operator and the second matrix operator are both matrix operators (MatMul), and the matrix operators may include at least one matrix multiply-accumulate operation (Matrix Multiply Accumulate, abbreviated as MMA). The scaling operator, the mask operator and the normalization operator are vector operators.
When training or reasoning any batch of tasks by adopting the Attention operator, a calculation process of the Attention operator can be specifically as follows:
The specific calculation process of the Attention operator can refer to the steps shown in fig. 3, which are specifically as follows: the tensor calculation unit firstly adopts a first matrix operator to perform matrix calculation on the query vector Q and the key vector K to obtain the correlation degree between the query vector Q and the key vector K, then adopts a scaling operator, a mask operator and a normalization operator to perform vector calculation on the correlation degree between the query vector Q and the key vector K in sequence to obtain weights, and finally adopts a second matrix operator to perform matrix calculation on the weights and the value vector V to obtain final output vectors.
If there are multiple batches of tasks to be trained or inferred, the Attention operator may perform the steps shown in fig. 3 in sequence for each batch of tasks.
For the convenience of understanding, the calculation process of the tasks of the plurality of batches to be inferred is described in detail, and the calculation process of the tasks of the plurality of batches to be trained can refer to the calculation process of the tasks of the plurality of batches to be inferred, which is not described herein.
For example, when there are two batches of tasks to be inferred, the Attention operator may perform calculation sequentially for the two batches of tasks, and the specific calculation process may refer to the steps shown in fig. 4. As can be seen from the calculation process of the Attention operator on the two batches of tasks in fig. 4, for any batch of tasks, when the tensor calculation unit adopts the first matrix operator to perform matrix calculation, the vector calculation unit waits for the calculation result of the first matrix operator sent by the tensor calculation unit, so that the vector calculation unit is in an idle state; when the vector calculation unit performs vector calculation by adopting a vector operator, the tensor calculation unit waits for the calculation result of the normalization operator sent by the vector calculation unit, so that the tensor calculation unit is in an idle state. Therefore, in the calculation process of the Attention operator on the two batches of tasks, the tensor calculation unit and the vector calculation unit are not fully utilized, so that the time cost caused by reasoning of the Attention operator on the two batches of tasks is relatively large.
In view of this, in the case that the Attention operator processes N batches of tasks, and the Attention operator includes at least a first matrix operator, a vector operator, and a second matrix operator that sequentially execute any batch of tasks, N is an integer greater than 1, in order to increase the speed of processing N batches of tasks by the Attention operator and reduce the time overhead, the application proposes an operator execution method. It should be understood that the Attention operator may include one vector operator or may include a plurality of vector operators in the present application, which is not limited herein. The vector operator may include at least one of a scaling operator, a masking operator, a normalization operator, and the like, and is not limited herein.
In combination with the system architecture diagram shown in fig. 1 in the present application, the present application proposes an operator execution method, and the flow of the operator execution method is executed by a vector calculation unit and a tensor calculation unit in the artificial intelligence chip in fig. 1, where the operator execution method may include the following steps:
When the vector calculation unit adopts a vector operator to perform vector calculation on the nth batch of tasks, the tensor calculation unit can adopt a first matrix operator to perform matrix calculation on the n+i batch of tasks, or the tensor calculation unit can adopt a second matrix operator to perform matrix calculation on the n-j batch of tasks; the value range of n is [1, N ], and i and j are integers greater than 0.
Specifically, since the vector calculation unit performs vector calculation on the nth batch of tasks by using the vector operator, the dependent tensor calculation unit needs to perform matrix calculation on the nth batch of tasks by using the first matrix operator, so that before the vector calculation unit performs vector calculation on the nth batch of tasks by using the vector operator, the nth batch of tasks is already subjected to matrix calculation by using the first matrix operator and a calculation result is obtained, when the vector calculation unit performs vector calculation on the nth batch of tasks by using the vector operator, the tensor calculation unit at this time can perform matrix calculation on other batches of tasks subsequent to the nth batch of tasks by using the first matrix operator, for example, the tensor calculation unit can perform matrix calculation on the (n+i) th batch of tasks by using the first matrix operator; or the tensor calculation unit may use the second matrix operator to perform matrix calculation on other batch tasks before the nth batch, for example, the tensor calculation unit may use the second matrix operator to perform matrix calculation on the nth-j batch task.
It should be understood that the calculation result of the tensor calculation unit performing tensor calculation on the nth batch of tasks by using the first matrix operator, the calculation result of the vector calculation unit performing vector calculation on the nth batch of tasks by using the vector operator, and the calculation result of the tensor calculation unit performing tensor calculation on the nth batch of tasks by using the second matrix operator may be stored in the register 104 or the on-chip cache 103 or the display memory 101 shown in fig. 1.
In the above method, in the process of processing N batches of tasks by using the Attention operator, when the vector calculation unit uses the vector operator to perform vector calculation on the nth batch of tasks, the tensor calculation unit does not wait for the vector calculation result of the nth batch of tasks and further causes the idle state, but uses the first matrix operator to perform matrix calculation on the n+i batch of tasks, or uses the second matrix operator to perform matrix calculation on the N-j batch of tasks. In the process, the tensor calculation unit and the vector calculation unit can process tasks of different batches at the same time, so that waiting time of each other is avoided, the tensor calculation unit and the vector calculation unit are fully utilized, and time expenditure caused by processing tasks of a plurality of batches by an Attention operator can be reduced.
The processing of the vector calculation unit and the tensor calculation unit is described in detail below for each batch task, including the following two possible implementations.
A first possible embodiment is as follows:
when the tensor calculation unit is in an idle state, the tensor calculation unit can adopt a first matrix operator to perform matrix calculation on the nth batch of tasks.
After the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the nth batch of tasks, the tensor calculation unit sends a first instruction corresponding to the nth batch of tasks to the vector calculation unit, wherein the first instruction corresponding to the nth batch of tasks is used for indicating the tensor calculation unit to perform matrix calculation on the nth batch of tasks by adopting the first matrix operator. The vector calculation unit receives a first instruction corresponding to the nth batch of tasks, and the vector calculation unit finishes the vector calculation of the nth-1 batch of tasks, and then the vector calculation unit adopts a vector operator to perform vector calculation on the nth batch of tasks.
When the vector calculation unit performs vector calculation on the nth batch of tasks by adopting a vector operator, if the tensor calculation unit finishes matrix calculation on the n+i-x th batch of tasks by adopting a second matrix operator, the tensor calculation unit may perform matrix calculation on the n+i th batch of tasks by adopting a first matrix operator, wherein x is an integer greater than 0. The tensor calculation unit performs matrix calculation on the n+i batch of tasks by using the first matrix operator, and the first matrix operator may be used to perform matrix calculation on the n batch of tasks by referring to the tensor calculation unit, which is not described herein.
Or when the vector calculation unit performs vector calculation on the nth batch of tasks by adopting a vector operator, if the tensor calculation unit receives a second instruction corresponding to the nth-j batch of tasks sent by the vector calculation unit and the tensor calculation unit finishes matrix calculation on the nth-j+y batch of tasks by adopting a first matrix operator, the tensor calculation unit can perform matrix calculation on the nth-j batch of tasks by adopting the second matrix operator, wherein the second instruction corresponding to the nth-j batch of tasks is used for indicating that the vector calculation unit finishes vector calculation on the nth-j batch of tasks by adopting the vector operator, and y is an integer larger than 0. The tensor calculation unit performs matrix calculation on the n-j+y batch of tasks by using the first matrix operator, and the first matrix operator may be used to perform matrix calculation on the n batch of tasks by referring to the tensor calculation unit, which is not described herein.
It should be appreciated that in a first possible implementation, in the vector calculation of N batches of tasks by the vector calculation unit using the vector operator, the tensor calculation unit performs matrix calculation alternately using the first matrix operator and the second matrix operator.
After the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks, the vector calculation unit sends a second instruction corresponding to the nth batch of tasks to the tensor calculation unit, wherein the second instruction corresponding to the nth batch of tasks is used for indicating the vector calculation unit to perform vector calculation on the nth batch of tasks by adopting the vector operator. The tensor calculation unit may determine whether to perform matrix calculation on the nth batch of tasks by using the second matrix operator according to the second instruction corresponding to the nth batch of tasks and whether the tensor calculation unit is in an idle state.
Specifically, whether the tensor calculation unit performs matrix calculation on the nth batch of tasks by using the second matrix operator includes the following two possible embodiments:
Under the condition that the n takes any value of [1, N-1], if the tensor calculation unit receives a second instruction corresponding to the n-th batch task and the tensor calculation unit adopts the first matrix operator to finish the matrix calculation of the n+y-th batch task, the tensor calculation unit can adopt the second matrix operator to perform matrix calculation on the n-th batch task.
Under the condition that the value of N is N, if the tensor calculation unit receives a second instruction corresponding to the N-th batch of tasks and the tensor calculation unit adopts a second matrix operator to finish the matrix calculation of the N-1-th batch of tasks, the tensor calculation unit can adopt the second matrix operator to perform matrix calculation on the N-th batch of tasks.
In order to facilitate understanding, the following takes the inference of the Attention operator on 3 batch tasks as an example, and specifically discusses the operator execution method in the present application. Because the computation time lengths of the first matrix operator, the second matrix operator and the vector operator are not the same, for convenience of discussion, the computation time length of the first matrix operator and the computation time length of the second matrix operator are assumed to be the same in the application. The operator execution method in the present application is specifically discussed below in connection with the relationships between the computation durations corresponding to the first matrix operator, the second matrix operator, and the vector operator, and the three possible embodiments are as follows.
In embodiment A1, the computation time of the vector operator is less than or equal to the computation time of the first matrix operator (or the second matrix operator). The Attention operator sequentially calculates for three batches of tasks, and the specific calculation process can refer to the steps shown in fig. 5.
It is assumed that the tensor calculation unit and the vector calculation unit are both in an idle state at time t0, and that instruction transmission between the tensor calculation unit and the vector calculation unit does not take time.
The tensor calculation unit calculates a matrix of the query vector Q and the key vector K of the first batch of tasks by adopting a first matrix operator at the time t0, obtains the correlation degree between the query vector Q and the key vector K of the first batch of tasks at the time t1, and sends a first instruction corresponding to the first batch of tasks to the vector calculation unit at the time t1, wherein the first instruction corresponding to the first batch of tasks is used for indicating the tensor calculation unit to end the matrix calculation of the first batch of tasks by adopting the first matrix operator.
The vector calculation unit receives a first instruction corresponding to a first batch of tasks at a time t1, performs vector calculation on the correlation degree corresponding to the first batch of tasks by adopting a vector operator at the time t1, obtains the weight corresponding to the first batch of tasks at a time t2, and sends a second instruction corresponding to the first batch of tasks to the tensor calculation unit at the time t2, wherein the second instruction corresponding to the first batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the first batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the first batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the query vector Q and the key vector K of the second batch of tasks by the tensor calculation unit through the first matrix operator at the time t1, the correlation degree between the query vector Q and the key vector K of the second batch of tasks is obtained at the time t3, and a first instruction corresponding to the second batch of tasks is sent to the vector calculation unit at the time t3, wherein the first instruction corresponding to the second batch of tasks is used for indicating that matrix calculation of the second batch of tasks by the tensor calculation unit through the first matrix operator is finished.
The vector calculation unit receives a first instruction corresponding to the second batch of tasks at the time t3 and finishes vector calculation of the first batch of tasks at the time t2, and because the time t3 is larger than the time t2, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the second batch of tasks at the time t3, obtains the weight corresponding to the second batch of tasks at the time t4, and sends a second instruction corresponding to the second batch of tasks to the tensor calculation unit at the time t4, wherein the second instruction corresponding to the second batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the second batch of tasks by adopting the vector operator.
When the vector calculation unit adopts a vector operator to perform vector calculation on the correlation degree corresponding to the second batch of tasks, the tensor calculation unit adopts a second matrix operator to perform matrix calculation on the weight corresponding to the first batch of tasks and the value vector V of the first batch of tasks at the time t3, and obtains the output vector of the first batch of tasks at the time t 5.
The tensor calculation unit is in an idle state at a time t5, so that the tensor calculation unit can perform matrix calculation on the query vector Q and the key vector K of the third batch of tasks by adopting a first matrix operator at the time t5, obtain the correlation degree between the query vector Q and the key vector K of the third batch of tasks at the time t6, and send a first instruction corresponding to the third batch of tasks to the vector calculation unit at the time t6, wherein the first instruction corresponding to the third batch of tasks is used for indicating the end of matrix calculation of the third batch of tasks by adopting the first matrix operator by the tensor calculation unit.
The vector calculation unit receives a first instruction corresponding to a third batch of tasks at a time t6 and finishes vector calculation of a second batch of tasks at a time t4, and because the time t6 is greater than the time t4, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the third batch of tasks at the time t6, obtains the weight corresponding to the third batch of tasks at the time t7, and sends a second instruction corresponding to the third batch of tasks to the tensor calculation unit at the time t7, wherein the second instruction corresponding to the third batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the third batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the third batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the weight corresponding to the second batch of tasks and the value vector V of the second batch of tasks by the tensor calculation unit through the second matrix operator at the time t6, and the output vector of the second batch of tasks is obtained at the time t 8.
The tensor calculation unit receives a second instruction corresponding to the third batch of tasks at the time t7, and the tensor calculation unit adopts a second matrix operator to perform matrix calculation on the second batch of tasks at the time t8, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the third batch of tasks at the time t8 because the time t8 is larger than the time t7, so that an output vector of the third batch of tasks is obtained at the time t 9.
In embodiment A2, the computation time of the vector operator is longer than the computation time of the first matrix operator (or the second matrix operator) and is smaller than the sum of the computation time of the first matrix operator and the computation time of the second matrix operator. The Attention operator sequentially calculates for three batches of tasks, and the specific calculation process can refer to the steps shown in fig. 6.
It is assumed that the tensor calculation unit and the vector calculation unit are both in an idle state at time t0, and that instruction transmission between the tensor calculation unit and the vector calculation unit does not take time.
The tensor calculation unit calculates a matrix of the query vector Q and the key vector K of the first batch of tasks by adopting a first matrix operator at the time t0, obtains the correlation degree between the query vector Q and the key vector K of the first batch of tasks at the time t1, and sends a first instruction corresponding to the first batch of tasks to the vector calculation unit at the time t1, wherein the first instruction corresponding to the first batch of tasks is used for indicating the tensor calculation unit to end the matrix calculation of the first batch of tasks by adopting the first matrix operator.
The vector calculation unit receives a first instruction corresponding to a first batch of tasks at a time t1, performs vector calculation on the correlation degree corresponding to the first batch of tasks by adopting a vector operator at the time t1, obtains the weight corresponding to the first batch of tasks at a time t3, and sends a second instruction corresponding to the first batch of tasks to the tensor calculation unit at the time t3, wherein the second instruction corresponding to the first batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the first batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the first batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the query vector Q and the key vector K of the second batch of tasks by the tensor calculation unit through the first matrix operator at the time t1, the correlation degree between the query vector Q and the key vector K of the second batch of tasks is obtained at the time t2, and a first instruction corresponding to the second batch of tasks is sent to the vector calculation unit at the time t2, wherein the first instruction corresponding to the second batch of tasks is used for indicating that matrix calculation of the second batch of tasks by the tensor calculation unit through the first matrix operator is finished.
The vector calculation unit receives a first instruction corresponding to the second batch of tasks at the time t2 and finishes vector calculation of the first batch of tasks at the time t3, and because the time t3 is larger than the time t2, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the second batch of tasks at the time t3, obtains the weight corresponding to the second batch of tasks at the time t5, and sends a second instruction corresponding to the second batch of tasks to the tensor calculation unit at the time t5, wherein the second instruction corresponding to the second batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the second batch of tasks by adopting the vector operator.
When the vector calculation unit adopts a vector operator to perform vector calculation on the correlation degree corresponding to the second batch of tasks, the tensor calculation unit receives a second instruction corresponding to the first batch of tasks at the time t3 and adopts a first matrix operator to perform matrix calculation on the second batch of tasks at the time t2, and because the time t3 is larger than the time t2, the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the weight corresponding to the first batch of tasks and the value vector V of the first batch of tasks at the time t3, and obtains the output vector of the first batch of tasks at the time t 4.
The tensor calculation unit is in an idle state at a time t4, so that the tensor calculation unit can perform matrix calculation on the query vector Q and the key vector K of the third batch of tasks by adopting a first matrix operator at the time t4, obtain the correlation degree between the query vector Q and the key vector K of the third batch of tasks at the time t6, and send a first instruction corresponding to the third batch of tasks to the vector calculation unit at the time t6, wherein the first instruction corresponding to the third batch of tasks is used for indicating the end of matrix calculation of the third batch of tasks by adopting the first matrix operator by the tensor calculation unit.
The vector calculation unit receives a first instruction corresponding to a third batch of tasks at a time t6 and finishes vector calculation of a second batch of tasks at a time t5, and because the time t6 is larger than the time t5, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the third batch of tasks at the time t6, obtains the weight corresponding to the third batch of tasks at the time t8, and sends a second instruction corresponding to the third batch of tasks to the tensor calculation unit at the time t8, wherein the second instruction corresponding to the third batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the third batch of tasks by adopting the vector operator.
When the vector calculation unit adopts a vector operator to perform vector calculation on the correlation degree corresponding to the third batch of tasks, the tensor calculation unit receives a second instruction corresponding to the second batch of tasks at the time t5 and adopts a first matrix operator to perform matrix calculation on the third batch of tasks at the time t6, and because the time t6 is larger than the time t5, the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the weight corresponding to the second batch of tasks and the value vector V of the second batch of tasks at the time t6, and obtains the output vector of the second batch of tasks at the time t 7.
The tensor calculation unit receives a second instruction corresponding to the third batch of tasks at the time t8, and adopts a second matrix operator to perform matrix calculation on the second batch of tasks at the time t7, and because the time t8 is larger than the time t7, the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the third batch of tasks at the time t8, and obtains an output vector of the third batch of tasks at the time t 9.
In embodiment A3, the computation time period of the vector operator is greater than or equal to the sum of the computation time period of the first matrix operator and the computation time period of the second matrix operator. The Attention operator sequentially calculates for three batches of tasks, and the specific calculation process can refer to the steps shown in fig. 7.
It is assumed that the tensor calculation unit and the vector calculation unit are both in an idle state at time t0, and that instruction transmission between the tensor calculation unit and the vector calculation unit does not take time.
The tensor calculation unit calculates a matrix of the query vector Q and the key vector K of the first batch of tasks by adopting a first matrix operator at the time t0, obtains the correlation degree between the query vector Q and the key vector K of the first batch of tasks at the time t1, and sends a first instruction corresponding to the first batch of tasks to the vector calculation unit at the time t1, wherein the first instruction corresponding to the first batch of tasks is used for indicating the tensor calculation unit to end the matrix calculation of the first batch of tasks by adopting the first matrix operator.
The vector calculation unit receives a first instruction corresponding to a first batch of tasks at a time t1, performs vector calculation on the correlation degree corresponding to the first batch of tasks by adopting a vector operator at the time t1, obtains the weight corresponding to the first batch of tasks at a time t3, and sends a second instruction corresponding to the first batch of tasks to the tensor calculation unit at the time t3, wherein the second instruction corresponding to the first batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the first batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the first batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the query vector Q and the key vector K of the second batch of tasks by the tensor calculation unit through the first matrix operator at the time t1, the correlation degree between the query vector Q and the key vector K of the second batch of tasks is obtained at the time t2, and a first instruction corresponding to the second batch of tasks is sent to the vector calculation unit at the time t2, wherein the first instruction corresponding to the second batch of tasks is used for indicating that matrix calculation of the second batch of tasks by the tensor calculation unit through the first matrix operator is finished.
The vector calculation unit receives a first instruction corresponding to the second batch of tasks at the time t2 and finishes vector calculation of the first batch of tasks at the time t3, and because the time t3 is larger than the time t2, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the second batch of tasks at the time t3, obtains the weight corresponding to the second batch of tasks at the time t6, and sends a second instruction corresponding to the second batch of tasks to the tensor calculation unit at the time t6, wherein the second instruction corresponding to the second batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the second batch of tasks by adopting the vector operator.
When the vector calculation unit adopts a vector operator to perform vector calculation on the correlation degree corresponding to the second batch of tasks, the tensor calculation unit receives a second instruction corresponding to the first batch of tasks at the time t3 and adopts a first matrix operator to perform matrix calculation on the second batch of tasks at the time t2, and because the time t3 is larger than the time t2, the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the weight corresponding to the first batch of tasks and the value vector V of the first batch of tasks at the time t3, and obtains the output vector of the first batch of tasks at the time t 4.
The tensor calculation unit is in an idle state at time t4, so that the tensor calculation unit can perform matrix calculation on the query vector Q and the key vector K of the third batch of tasks by adopting a first matrix operator at time t4, obtain the correlation degree between the query vector Q and the key vector K of the third batch of tasks at time t5, and send a first instruction corresponding to the third batch of tasks to the vector calculation unit at time t5, wherein the first instruction corresponding to the third batch of tasks is used for indicating that the tensor calculation unit finishes matrix calculation of the third batch of tasks by adopting the first matrix operator.
The vector calculation unit receives a first instruction corresponding to a third batch of tasks at a time t5 and finishes vector calculation of a second batch of tasks at a time t6, and because the time t6 is greater than the time t5, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the third batch of tasks at the time t6, obtains the weight corresponding to the third batch of tasks at the time t8, and sends a second instruction corresponding to the third batch of tasks to the tensor calculation unit at the time t8, wherein the second instruction corresponding to the third batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the third batch of tasks by adopting the vector operator.
When the vector calculation unit adopts a vector operator to perform vector calculation on the correlation degree corresponding to the third batch of tasks, the tensor calculation unit receives a second instruction corresponding to the second batch of tasks at the time t6 and adopts a first matrix operator to perform matrix calculation on the third batch of tasks at the time t5, and because the time t6 is larger than the time t5, the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the weight corresponding to the second batch of tasks and the value vector V of the second batch of tasks at the time t6, and obtains the output vector of the second batch of tasks at the time t 7.
The tensor calculation unit receives a second instruction corresponding to the third batch of tasks at the time t8, and adopts a second matrix operator to perform matrix calculation on the second batch of tasks at the time t7, and because the time t8 is larger than the time t7, the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the third batch of tasks at the time t8, and obtains an output vector of the third batch of tasks at the time t 9.
In the above embodiments A1 to A3, any one of the times t0 to t9 in the different embodiments may be the same time or may be different times.
A second possible embodiment is as follows:
when the tensor calculation unit is in an idle state, the tensor calculation unit can adopt a first matrix operator to perform matrix calculation on the nth batch of tasks.
After the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the nth batch of tasks, the tensor calculation unit sends a first instruction corresponding to the nth batch of tasks to the vector calculation unit, wherein the first instruction corresponding to the nth batch of tasks is used for indicating the tensor calculation unit to perform matrix calculation on the nth batch of tasks by adopting the first matrix operator. The vector calculation unit receives a first instruction corresponding to the nth batch of tasks, and the vector calculation unit finishes the vector calculation of the nth-1 batch of tasks, and then the vector calculation unit adopts a vector operator to perform vector calculation on the nth batch of tasks.
When the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks, if the tensor calculation unit adopts the first matrix operator to finish matrix calculation on the (n+i) -1 th batch of tasks, the tensor calculation unit can adopt the first matrix operator to perform matrix calculation on the (n+i) -th batch of tasks. The tensor calculation unit performs matrix calculation on the n+i batch of tasks by using the first matrix operator, and the first matrix operator may be used to perform matrix calculation on the n batch of tasks by referring to the tensor calculation unit, which is not described herein.
Or when the vector calculation unit performs vector calculation on the nth batch of tasks by adopting the vector operator, if the tensor calculation unit receives a second instruction corresponding to the nth-j batch of tasks sent by the vector calculation unit and the tensor calculation unit finishes the matrix calculation of the nth batch of tasks by adopting the first matrix operator, or if the tensor calculation unit receives a second instruction corresponding to the nth-j batch of tasks sent by the vector calculation unit and finishes the matrix calculation of the nth-j-1 batch of tasks by adopting the second matrix operator, the tensor calculation unit can perform matrix calculation on the nth-j batch of tasks by adopting the second matrix operator, wherein the second instruction corresponding to the nth-j batch of tasks is used for indicating that the vector calculation unit finishes the vector calculation of the nth-j batch of tasks by adopting the vector operator.
It should be understood that in a second possible implementation, the tensor calculation unit performs matrix calculation on the N batches of tasks by using the first matrix operator, and then performs matrix calculation on the N batches of tasks by using the second matrix operator.
After the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks, the vector calculation unit sends a second instruction corresponding to the nth batch of tasks to the tensor calculation unit, wherein the second instruction corresponding to the nth batch of tasks is used for indicating the vector calculation unit to perform vector calculation on the nth batch of tasks by adopting the vector operator. The tensor calculation unit may determine whether to perform matrix calculation on the nth batch of tasks by using the second matrix operator according to the second instruction corresponding to the nth batch of tasks and whether the tensor calculation unit is in an idle state.
Specifically, whether the tensor calculation unit performs matrix calculation on the nth batch of tasks by using the second matrix operator includes the following two possible embodiments:
Under the condition that the value of N is 1, if the tensor calculation unit receives a second instruction corresponding to the first batch of tasks and the tensor calculation unit adopts the first matrix operator to finish the matrix calculation of the nth batch of tasks, the tensor calculation unit can adopt the second matrix operator to perform matrix calculation on the first batch of tasks.
And under the condition that the value of n is any one of [2, N ], if the tensor calculation unit receives a second instruction corresponding to the nth batch of tasks and the tensor calculation unit adopts a second matrix operator to finish the matrix calculation of the nth-1 batch of tasks, the tensor calculation unit can adopt the second matrix operator to perform matrix calculation on the nth batch of tasks.
In order to facilitate understanding, the following takes the inference of the Attention operator on 3 batch tasks as an example, and specifically discusses the operator execution method in the present application. Because the computation time lengths of the first matrix operator, the second matrix operator and the vector operator are not the same, for convenience of discussion, the computation time length of the first matrix operator and the computation time length of the second matrix operator are assumed to be the same in the application. The operator execution method in the present application is specifically discussed below in connection with the relationships between the computation durations corresponding to the first matrix operator, the second matrix operator, and the vector operator, and the three possible embodiments are as follows.
In embodiment B1, the computation time of the vector operator is less than or equal to the computation time of the first matrix operator (or the second matrix operator). The Attention operator sequentially calculates for three batches of tasks, and the specific calculation process can refer to the steps shown in fig. 8.
It is assumed that the tensor calculation unit and the vector calculation unit are both in an idle state at time t0, and that instruction transmission between the tensor calculation unit and the vector calculation unit does not take time.
The tensor calculation unit calculates a matrix of the query vector Q and the key vector K of the first batch of tasks by adopting a first matrix operator at the time t0, obtains the correlation degree between the query vector Q and the key vector K of the first batch of tasks at the time t1, and sends a first instruction corresponding to the first batch of tasks to the vector calculation unit at the time t1, wherein the first instruction corresponding to the first batch of tasks is used for indicating the tensor calculation unit to end the matrix calculation of the first batch of tasks by adopting the first matrix operator.
The vector calculation unit receives a first instruction corresponding to a first batch of tasks at a time t1, performs vector calculation on the correlation degree corresponding to the first batch of tasks by adopting a vector operator at the time t1, obtains the weight corresponding to the first batch of tasks at a time t2, and sends a second instruction corresponding to the first batch of tasks to the tensor calculation unit at the time t2, wherein the second instruction corresponding to the first batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the first batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the first batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the query vector Q and the key vector K of the second batch of tasks by the tensor calculation unit through the first matrix operator at the time t1, the correlation degree between the query vector Q and the key vector K of the second batch of tasks is obtained at the time t3, and a first instruction corresponding to the second batch of tasks is sent to the vector calculation unit at the time t3, wherein the first instruction corresponding to the second batch of tasks is used for indicating that matrix calculation of the second batch of tasks by the tensor calculation unit through the first matrix operator is finished.
The vector calculation unit receives a first instruction corresponding to the second batch of tasks at the time t3 and finishes vector calculation of the first batch of tasks at the time t2, and because the time t3 is larger than the time t2, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the second batch of tasks at the time t3, obtains the weight corresponding to the second batch of tasks at the time t4, and sends a second instruction corresponding to the second batch of tasks to the tensor calculation unit at the time t4, wherein the second instruction corresponding to the second batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the second batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the second batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the query vector Q and the key vector K of the third batch of tasks by the tensor calculation unit through the first matrix operator at the time t3, the correlation degree between the query vector Q and the key vector K of the third batch of tasks is obtained at the time t5, and a first instruction corresponding to the third batch of tasks is sent to the vector calculation unit at the time t5, wherein the first instruction corresponding to the third batch of tasks is used for indicating that matrix calculation of the third batch of tasks by the tensor calculation unit through the first matrix operator is finished.
The vector calculation unit receives a first instruction corresponding to a third batch of tasks at a time t5 and finishes vector calculation of a second batch of tasks at a time t4, and because the time t5 is greater than the time t4, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the third batch of tasks at the time t5, obtains the weight corresponding to the third batch of tasks at the time t6, and sends a second instruction corresponding to the third batch of tasks to the tensor calculation unit at the time t6, wherein the second instruction corresponding to the third batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the third batch of tasks by adopting the vector operator.
When the vector calculation unit adopts a vector operator to perform vector calculation on the correlation degree corresponding to the third batch of tasks, the tensor calculation unit adopts a first matrix operator to perform matrix calculation on the third batch of tasks at the time t5, and receives a second instruction corresponding to the first batch of tasks at the time t2, and because the time t5 is larger than the time t2, the tensor calculation unit adopts a second matrix operator to perform matrix calculation on the weight corresponding to the first batch of tasks and the value vector V of the first batch of tasks at the time t5, and obtains an output vector corresponding to the first batch of tasks at the time t 7.
The tensor calculation unit adopts a second matrix operator to calculate the matrix of the first batch of tasks at the time t7, receives a second instruction corresponding to the second batch of tasks at the time t4, and can adopt the second matrix operator to calculate the matrix of the weight corresponding to the second batch of tasks and the value vector V of the second batch of tasks at the time t7 because the time t7 is larger than the time t4, and obtains the output vector corresponding to the second batch of tasks at the time t 8.
The tensor calculation unit adopts the second matrix operator to calculate the matrix of the second batch of tasks at the time t8, and receives the second instruction corresponding to the third batch of tasks at the time t6, and because the time t8 is greater than the time t6, the tensor calculation unit can adopt the second matrix operator to calculate the matrix of the weight corresponding to the third batch of tasks and the value vector V of the third batch of tasks at the time t8, and obtain the output vector corresponding to the third batch of tasks at the time t 9.
In embodiment B2, the computation time of the vector operator is longer than the computation time of the first matrix operator (or the second matrix operator) and is smaller than the sum of the computation time of the first matrix operator and the computation time of the second matrix operator. The Attention operator sequentially calculates for three batches of tasks, and the specific calculation process can refer to the steps shown in fig. 9.
It is assumed that the tensor calculation unit and the vector calculation unit are both in an idle state at time t0, and that instruction transmission between the tensor calculation unit and the vector calculation unit does not take time.
The tensor calculation unit calculates a matrix of the query vector Q and the key vector K of the first batch of tasks by adopting a first matrix operator at the time t0, obtains the correlation degree between the query vector Q and the key vector K of the first batch of tasks at the time t1, and sends a first instruction corresponding to the first batch of tasks to the vector calculation unit at the time t1, wherein the first instruction corresponding to the first batch of tasks is used for indicating the tensor calculation unit to end the matrix calculation of the first batch of tasks by adopting the first matrix operator.
The vector calculation unit receives a first instruction corresponding to a first batch of tasks at a time t1, performs vector calculation on the correlation degree corresponding to the first batch of tasks by adopting a vector operator at the time t1, obtains the weight corresponding to the first batch of tasks at a time t3, and sends a second instruction corresponding to the first batch of tasks to the tensor calculation unit at the time t3, wherein the second instruction corresponding to the first batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the first batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the first batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the query vector Q and the key vector K of the second batch of tasks by the tensor calculation unit through the first matrix operator at the time t1, the correlation degree between the query vector Q and the key vector K of the second batch of tasks is obtained at the time t2, and a first instruction corresponding to the second batch of tasks is sent to the vector calculation unit at the time t2, wherein the first instruction corresponding to the second batch of tasks is used for indicating that matrix calculation of the second batch of tasks by the tensor calculation unit through the first matrix operator is finished.
The vector calculation unit receives a first instruction corresponding to the second batch of tasks at the time t2 and finishes vector calculation of the first batch of tasks at the time t3, and because the time t3 is larger than the time t2, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the second batch of tasks at the time t3, obtains the weight corresponding to the second batch of tasks at the time t5, and sends a second instruction corresponding to the second batch of tasks to the tensor calculation unit at the time t5, wherein the second instruction corresponding to the second batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the second batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the second batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the query vector Q and the key vector K of the third batch of tasks by the tensor calculation unit through the first matrix operator at the time t2, the correlation degree between the query vector Q and the key vector K of the third batch of tasks is obtained at the time t4, and a first instruction corresponding to the third batch of tasks is sent to the vector calculation unit at the time t4, wherein the first instruction corresponding to the third batch of tasks is used for indicating that matrix calculation of the third batch of tasks by the tensor calculation unit through the first matrix operator is finished.
The vector calculation unit receives a first instruction corresponding to a third batch of tasks at a time t4 and finishes vector calculation of a second batch of tasks at a time t5, and because the time t5 is greater than the time t4, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the third batch of tasks at the time t5, obtains the weight corresponding to the third batch of tasks at the time t7, and sends a second instruction corresponding to the third batch of tasks to the tensor calculation unit at the time t7, wherein the second instruction corresponding to the third batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the third batch of tasks by adopting the vector operator.
When the vector calculation unit adopts a vector operator to perform vector calculation on the correlation degree corresponding to the third batch of tasks, the tensor calculation unit adopts a first matrix operator to perform matrix calculation on the third batch of tasks at the time t4 and receives a second instruction corresponding to the first batch of tasks at the time t3, and because the time t4 is larger than the time t3, the tensor calculation unit adopts a second matrix operator to perform matrix calculation on the weight corresponding to the first batch of tasks and the value vector V of the first batch of tasks at the time t4, and obtains an output vector corresponding to the first batch of tasks at the time t 6.
The tensor calculation unit adopts a second matrix operator to calculate the matrix of the first batch of tasks at the time t6, receives a second instruction corresponding to the second batch of tasks at the time t5, and can adopt the second matrix operator to calculate the matrix of the weight corresponding to the second batch of tasks and the value vector V of the second batch of tasks at the time t6 because the time t6 is larger than the time t5, and obtains the output vector corresponding to the second batch of tasks at the time t 8.
The tensor calculation unit adopts the second matrix operator to calculate the matrix of the second batch of tasks at the time t8, and receives the second instruction corresponding to the third batch of tasks at the time t7, and because the time t8 is greater than the time t7, the tensor calculation unit can adopt the second matrix operator to calculate the matrix of the weight corresponding to the third batch of tasks and the value vector V of the third batch of tasks at the time t8, and obtain the output vector corresponding to the third batch of tasks at the time t 9.
In embodiment B3, the computation time period of the vector operator is greater than or equal to the sum of the computation time period of the first matrix operator and the computation time period of the second matrix operator. The Attention operator sequentially calculates for three batches of tasks, and the specific calculation process can refer to the steps shown in fig. 10.
It is assumed that the tensor calculation unit and the vector calculation unit are both in an idle state at time t0, and that instruction transmission between the tensor calculation unit and the vector calculation unit does not take time.
The tensor calculation unit calculates a matrix of the query vector Q and the key vector K of the first batch of tasks by adopting a first matrix operator at the time t0, obtains the correlation degree between the query vector Q and the key vector K of the first batch of tasks at the time t1, and sends a first instruction corresponding to the first batch of tasks to the vector calculation unit at the time t1, wherein the first instruction corresponding to the first batch of tasks is used for indicating the tensor calculation unit to end the matrix calculation of the first batch of tasks by adopting the first matrix operator.
The vector calculation unit receives a first instruction corresponding to a first batch of tasks at a time t1, performs vector calculation on the correlation degree corresponding to the first batch of tasks by adopting a vector operator at the time t1, obtains the weight corresponding to the first batch of tasks at a time t4, and sends a second instruction corresponding to the first batch of tasks to the tensor calculation unit at the time t4, wherein the second instruction corresponding to the first batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the first batch of tasks by adopting the vector operator.
When vector calculation is carried out on the correlation degree corresponding to the first batch of tasks by the vector calculation unit through the vector operator, matrix calculation is carried out on the query vector Q and the key vector K of the second batch of tasks by the tensor calculation unit through the first matrix operator at the time t1, the correlation degree between the query vector Q and the key vector K of the second batch of tasks is obtained at the time t2, and a first instruction corresponding to the second batch of tasks is sent to the vector calculation unit at the time t2, wherein the first instruction corresponding to the second batch of tasks is used for indicating that matrix calculation of the second batch of tasks by the tensor calculation unit through the first matrix operator is finished. In addition, the tensor calculation unit performs matrix calculation on the query vector Q and the key vector K of the third batch of tasks by adopting a first matrix operator at time t2, obtains the correlation degree between the query vector Q and the key vector K of the third batch of tasks at time t3, and sends a first instruction corresponding to the third batch of tasks to the vector calculation unit at time t3, wherein the first instruction corresponding to the third batch of tasks is used for indicating that the tensor calculation unit finishes matrix calculation on the third batch of tasks by adopting the first matrix operator.
The vector calculation unit receives a first instruction corresponding to the second batch of tasks at the time t2 and finishes vector calculation of the first batch of tasks at the time t4, and because the time t4 is larger than the time t2, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the second batch of tasks at the time t4, obtains a weight corresponding to the third batch of tasks at the time t6, and sends a second instruction corresponding to the third batch of tasks to the tensor calculation unit at the time t6, wherein the second instruction corresponding to the third batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the third batch of tasks by adopting the vector operator.
When the vector calculation unit adopts the vector operator to perform vector calculation on the correlation degree corresponding to the second batch of tasks, the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the first batch of tasks at the time t3, and receives the second instruction corresponding to the first batch of tasks at the time t4, and because the time t4 is greater than the time t3, the tensor calculation unit can adopt the second matrix operator to perform matrix calculation on the weight corresponding to the first batch of tasks and the value vector V of the first batch of tasks at the time t4, and obtain the output vector corresponding to the first batch of tasks at the time t 5.
The vector calculation unit receives a first instruction corresponding to a third batch of tasks at a time t3 and finishes vector calculation of a second batch of tasks at a time t6, and because the time t6 is greater than the time t3, the vector calculation unit adopts a vector operator to perform vector calculation on the correlation corresponding to the third batch of tasks at the time t6, obtains the weight corresponding to the third batch of tasks at the time t8, and sends a second instruction corresponding to the third batch of tasks to the tensor calculation unit at the time t8, wherein the second instruction corresponding to the third batch of tasks is used for indicating the vector calculation unit to finish vector calculation of the third batch of tasks by adopting the vector operator.
When the vector calculation unit adopts the vector operator to perform vector calculation on the correlation degree corresponding to the third batch of tasks, the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the first batch of tasks at the time t5, and receives the second instruction corresponding to the second batch of tasks at the time t6, and because the time t6 is greater than the time t5, the tensor calculation unit can adopt the second matrix operator to perform matrix calculation on the weight corresponding to the second batch of tasks and the value vector V of the second batch of tasks at the time t6, and obtain the output vector corresponding to the second batch of tasks at the time t 7.
The tensor calculation unit adopts the second matrix operator to calculate the matrix of the second batch of tasks at the time t7, and receives the second instruction corresponding to the third batch of tasks at the time t8, and because the time t8 is greater than the time t7, the tensor calculation unit can adopt the second matrix operator to calculate the matrix of the weight corresponding to the third batch of tasks and the value vector V of the third batch of tasks at the time t8, and obtain the output vector corresponding to the third batch of tasks at the time t 9.
In the above embodiments B1 to B3, any one of the times t0 to t9 in the different embodiments may be the same time or may be different times.
In the embodiment of the present application, for each batch of tasks, other arrangement relationships may exist among the first matrix operator, the second matrix operator, and the vector operator besides the two possible implementation manners of the vector calculation unit and the tensor calculation unit, which are not listed here.
In the embodiment of the application, when the vector operator in the Attention operator is a normalization operator, the calculation formula of the normalization operator is shown as a formula (1):
(1)
In the above formula (1), n represents a vector dimension of the correlation corresponding to any batch of tasks, Represents the ith vector in the correlation corresponding to any batch of tasks, max (x) represents the maximum value in the n-dimensional vector in the correlation corresponding to any batch of tasks,Representation ofIs a normalized result of (a).
The calculation formula based on the normalization operator can be seen that the realization of the normalization operator comprises three steps of: 1. maximum max (x), 2. Sum calculationDividing calculation of 3。
For any batch of tasks, since the vector calculation unit does not need to wait for the first matrix operator to finish all execution of the batch of tasks when calculating the batch of tasks by adopting the maximum value calculation operation in the normalization operator, in order to improve the calculation efficiency of the Attention operator, for any batch of tasks, the vector calculation unit can execute the maximum value calculation in the normalization operator when executing the first matrix operator, as shown in fig. 11.
Based on the calculation process of the Attention operator shown in fig. 11, the operator execution scheme in the application can combine the calculation process of the Attention operator shown in fig. 11 with two possible implementation modes of the processing processes of the vector calculation unit and the tensor calculation unit, so that the time expenditure caused by processing a plurality of batch tasks by the Attention operator can be further reduced. In a detailed combination, the application is not described in any great detail.
Based on the same technical concept, an embodiment of the present application provides a schematic structural diagram of an operator executing apparatus, as shown in fig. 12, the operator executing apparatus 1200 includes:
the tensor calculation module 1201 is configured to perform matrix calculation on the n+i batch of tasks by using a first matrix operator when the vector calculation unit performs vector calculation on the n batch of tasks by using a vector operator, or perform matrix calculation on the n-j batch of tasks by using a second matrix operator; the value range of N is [1, N ], i and j are integers greater than 0, and N is an integer greater than 1.
Optionally, a vector calculation module 1202 is further included, and the vector calculation module is specifically configured to: the vector calculation unit receives a first instruction corresponding to the nth batch of tasks sent by the tensor calculation unit, and the vector calculation unit finishes the vector calculation of the nth-1 batch of tasks, so that the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks; the first instruction corresponding to the nth batch of tasks is used for indicating the tensor calculation unit to finish matrix calculation of the nth batch of tasks by adopting the first matrix operator.
Optionally, the tensor calculation module 1201 is specifically configured to:
The tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n+i-x batch task, and then adopts the first matrix operator to perform matrix calculation on the n+i batch task; x is an integer greater than 0.
Optionally, the tensor calculation module 1201 is specifically configured to:
And after the matrix calculation of the n+i-1 batch of tasks is finished by the tensor calculation unit through the first matrix operator, the matrix calculation of the n+i batch of tasks is performed by the tensor calculation unit through the first matrix operator.
Optionally, the tensor calculation module 1201 is specifically configured to:
the tensor calculation unit receives a second instruction corresponding to the n-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the n-j+y batch task, so that the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch task; the second instruction corresponding to the n-j batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the n-j batch of tasks by adopting the vector operator; y is an integer greater than 0.
Optionally, the tensor calculation module 1201 is specifically configured to:
The tensor calculation unit receives a second instruction corresponding to the nth-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the nth batch task, so that the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the first batch task; or alternatively
The tensor calculation unit receives a second instruction corresponding to the n-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch task after matrix calculation on the n-j-1 batch task is finished, and adopts the second matrix operator to perform matrix calculation on the n-j batch task;
And the second instruction corresponding to the n-j batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the n-j batch of tasks by adopting the vector operator.
Optionally, before the vector calculation unit performs vector calculation on the nth batch of tasks using the vector operator, the tensor calculation module 1201 is further configured to:
And the tensor calculation unit adopts the first matrix operator to finish matrix calculation of the nth batch of tasks, and sends a first instruction corresponding to the nth batch of tasks to the vector calculation unit.
Optionally, after the vector calculation unit performs vector calculation on the nth batch of tasks by using the vector operator, where N is equal to N, the tensor calculation module 1201 is further configured to:
The tensor calculation unit receives a second instruction corresponding to the Nth batch of tasks sent by the vector calculation unit, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the Nth batch of tasks after matrix calculation on the Nth-1 th batch of tasks is finished; and the second instruction corresponding to the Nth batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the Nth batch of tasks by adopting the vector operator.
Based on the same technical concept, the embodiment of the present application provides a computer device, as shown in fig. 13, including at least one artificial intelligence chip 1301 and a memory 1302 connected with the at least one artificial intelligence chip, where a specific connection medium between the artificial intelligence chip 1301 and the memory 1302 is not limited in the embodiment of the present application, and in fig. 13, the artificial intelligence chip 1301 and the memory 1302 are connected by a bus as an example. The buses may be divided into address buses, data buses, control buses, etc.
In the embodiment of the present application, the memory 1302 stores instructions executable by the at least one artificial intelligence chip 1301, and the at least one artificial intelligence chip 1301 can execute the steps of the operator execution method by executing the instructions stored in the memory 1302.
Wherein the artificial intelligence chip 1301 is a control center of a computer device, various interfaces and lines may be utilized to connect various parts of the computer device, through execution or execution of instructions stored in the memory 1302 and invocation of data stored in the memory 1302, thereby implementing operator execution. Alternatively, the artificial intelligence chip 1301 may include one or more processing units, and the artificial intelligence chip 1301 may integrate an application processor and a modem processor, wherein the application processor primarily processes an operating system, a user interface, an application program, and the like, and the modem processor primarily processes wireless communication. It is understood that the modem processor described above may not be integrated into the artificial intelligence chip 1301. In some embodiments, the artificial intelligence chip 1301 and the memory 1302 may be implemented on the same chip, and in some embodiments they may be implemented separately on separate chips.
The artificial intelligence chip 1301 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, which may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
The memory 1302, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1302 may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. Memory 1302 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer device, but is not limited to such. The memory 1302 in embodiments of the present application may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.
Based on the same inventive concept, an embodiment of the present application provides a computer-readable storage medium storing a computer program executable by a computer device, which when run on the computer device causes the computer device to perform the steps of the operator performing method described above.
Based on the same inventive concept, embodiments of the present application provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer device, cause the computer device to perform the steps of the operator performing method described above.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (11)
1. An operator execution method is characterized by being applied to an artificial intelligent chip comprising a tensor calculation unit and a vector calculation unit; the artificial intelligent chip is used for processing N batches of tasks by adopting an attention mechanism operator; the attention mechanism operator at least comprises a first matrix operator, a vector operator and a second matrix operator which are sequentially executed for any batch of tasks in the N batches of tasks; n is an integer greater than 1;
At the moment that the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks, the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the n+i batch of tasks, or the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch of tasks; the value range of n is [1, N ], and i and j are integers greater than 0.
2. The method of claim 1, wherein the vector calculation unit performs vector calculation on the nth batch of tasks using the vector operator, comprising:
The vector calculation unit receives a first instruction corresponding to the nth batch of tasks sent by the tensor calculation unit, and the vector calculation unit finishes the vector calculation of the nth-1 batch of tasks, so that the vector calculation unit adopts the vector operator to perform vector calculation on the nth batch of tasks; the first instruction corresponding to the nth batch of tasks is used for indicating the tensor calculation unit to finish matrix calculation of the nth batch of tasks by adopting the first matrix operator.
3. The method of claim 1, wherein the tensor calculation unit performs matrix calculation on the n+i-th batch of tasks using the first matrix operator, including:
The tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n+i-x batch task, and then adopts the first matrix operator to perform matrix calculation on the n+i batch task; x is an integer greater than 0.
4. The method of claim 1, wherein the tensor calculation unit performs matrix calculation on the n+i-th batch of tasks using the first matrix operator, including:
And after the matrix calculation of the n+i-1 batch of tasks is finished by the tensor calculation unit through the first matrix operator, the matrix calculation of the n+i batch of tasks is performed by the tensor calculation unit through the first matrix operator.
5. The method of claim 1, wherein the tensor calculation unit performs matrix calculation on the n-j-th batch of tasks using the second matrix operator, comprising:
the tensor calculation unit receives a second instruction corresponding to the n-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the n-j+y batch task, so that the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch task; the second instruction corresponding to the n-j batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the n-j batch of tasks by adopting the vector operator; y is an integer greater than 0.
6. The method of claim 1, wherein the tensor calculation unit performs matrix calculation on the n-j-th batch of tasks using the second matrix operator, comprising:
The tensor calculation unit receives a second instruction corresponding to the nth-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the first matrix operator to perform matrix calculation on the nth batch task, so that the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the first batch task; or alternatively
The tensor calculation unit receives a second instruction corresponding to the n-j batch task sent by the vector calculation unit, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the n-j batch task after matrix calculation on the n-j-1 batch task is finished, and adopts the second matrix operator to perform matrix calculation on the n-j batch task;
And the second instruction corresponding to the n-j batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the n-j batch of tasks by adopting the vector operator.
7. The method according to any one of claims 1-6, wherein before the vector calculation unit performs vector calculation on the nth batch of tasks using the vector operator, the method further comprises:
And the tensor calculation unit adopts the first matrix operator to finish matrix calculation of the nth batch of tasks, and sends a first instruction corresponding to the nth batch of tasks to the vector calculation unit.
8. The method of claim 7, wherein after the vector calculation unit performs vector calculation on the nth batch of tasks using the vector operator, wherein N is equal to N, further comprising:
The tensor calculation unit receives a second instruction corresponding to the Nth batch of tasks sent by the vector calculation unit, and the tensor calculation unit adopts the second matrix operator to perform matrix calculation on the Nth batch of tasks after matrix calculation on the Nth-1 th batch of tasks is finished; and the second instruction corresponding to the Nth batch of tasks is used for indicating the vector calculation unit to finish the vector calculation of the Nth batch of tasks by adopting the vector operator.
9. A computer device comprising a memory, said artificial intelligence chip and a computer program stored on the memory and running on said artificial intelligence chip, characterized in that the artificial intelligence chip implements the steps of the method according to any one of claims 1-8 when said computer program is executed.
10. A computer readable storage medium, characterized in that it stores a computer program executable by a computer device, which computer program, when run on the computer device, causes the computer device to perform the steps of the method according to any of claims 1-8.
11. A computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer device, cause the computer device to carry out the steps of the method according to any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410552100.9A CN118132156B (en) | 2024-05-07 | 2024-05-07 | Operator execution method, device, storage medium and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410552100.9A CN118132156B (en) | 2024-05-07 | 2024-05-07 | Operator execution method, device, storage medium and program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118132156A CN118132156A (en) | 2024-06-04 |
CN118132156B true CN118132156B (en) | 2024-08-09 |
Family
ID=91234026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410552100.9A Active CN118132156B (en) | 2024-05-07 | 2024-05-07 | Operator execution method, device, storage medium and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118132156B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118394415B (en) * | 2024-06-25 | 2024-11-01 | 北京壁仞科技开发有限公司 | Operator execution method, device, storage medium and program product |
CN118708360A (en) * | 2024-08-26 | 2024-09-27 | 北京壁仞科技开发有限公司 | Operator execution method, device, storage medium and program product |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116451174A (en) * | 2023-04-17 | 2023-07-18 | 昆仑芯(北京)科技有限公司 | Task execution device, method, electronic device, and storage medium |
CN117827463A (en) * | 2024-02-02 | 2024-04-05 | 上海壁仞科技股份有限公司 | Method, apparatus and storage medium for performing attention calculations |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230177351A1 (en) * | 2021-12-06 | 2023-06-08 | International Business Machines Corporation | Accelerating decision tree inferences based on tensor operations |
CN114491399A (en) * | 2021-12-30 | 2022-05-13 | 深圳云天励飞技术股份有限公司 | Data processing method and device, terminal equipment and computer readable storage medium |
-
2024
- 2024-05-07 CN CN202410552100.9A patent/CN118132156B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116451174A (en) * | 2023-04-17 | 2023-07-18 | 昆仑芯(北京)科技有限公司 | Task execution device, method, electronic device, and storage medium |
CN117827463A (en) * | 2024-02-02 | 2024-04-05 | 上海壁仞科技股份有限公司 | Method, apparatus and storage medium for performing attention calculations |
Also Published As
Publication number | Publication date |
---|---|
CN118132156A (en) | 2024-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN118132156B (en) | Operator execution method, device, storage medium and program product | |
CN108416422B (en) | FPGA-based convolutional neural network implementation method and device | |
US20180260710A1 (en) | Calculating device and method for a sparsely connected artificial neural network | |
CN111338695B (en) | Data processing method based on pipeline technology and related product | |
CN110738308B (en) | Neural network accelerator | |
JP2023506343A (en) | Vector reduction using shared scratchpad memory | |
CN117574970A (en) | Inference acceleration method, system, terminal and medium for large-scale language model | |
EP3836031A2 (en) | Neural network processor, chip and electronic device | |
US20220043770A1 (en) | Neural network processor, chip and electronic device | |
WO2022001301A1 (en) | Neural network operation method and related device | |
CN114556260A (en) | Apparatus and system for performing neural networks | |
CN110825514A (en) | Artificial intelligence chip and instruction execution method for artificial intelligence chip | |
CN114298329A (en) | Model training method, device, equipment and storage medium | |
CN118394415B (en) | Operator execution method, device, storage medium and program product | |
CN118114737A (en) | Method and equipment for optimizing back propagation of Attention operator | |
Lee et al. | Accelerating deep neural networks using FPGAs and ZYNQ | |
CN113780539A (en) | Neural network data processing method, device, equipment and storage medium | |
CN113722668B (en) | Processing unit, correlation device and tensor operation method | |
CN111026258B (en) | Processor and method for reducing power supply ripple | |
US20230289580A1 (en) | Neural network circuit and neural network circuit control method | |
Bai et al. | An OpenCL-based FPGA accelerator with the Winograd’s minimal filtering algorithm for convolution neuron networks | |
CN117940934A (en) | Data processing apparatus and method | |
Chiu et al. | Design and implementation of the CNN accelator based on multi-streaming SIMD mechanisms | |
US12073317B2 (en) | Method and system for processing a neural network | |
CN118093452B (en) | Memory architecture mapping method, device, storage medium and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |