CN114816526B - Operand domain multiplexing-based multi-operand instruction processing method and device - Google Patents

Operand domain multiplexing-based multi-operand instruction processing method and device Download PDF

Info

Publication number
CN114816526B
CN114816526B CN202210409706.8A CN202210409706A CN114816526B CN 114816526 B CN114816526 B CN 114816526B CN 202210409706 A CN202210409706 A CN 202210409706A CN 114816526 B CN114816526 B CN 114816526B
Authority
CN
China
Prior art keywords
operand
instruction
target operation
execution result
ready
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210409706.8A
Other languages
Chinese (zh)
Other versions
CN114816526A (en
Inventor
郇丹丹
李祖松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Micro Core Technology Co ltd
Original Assignee
Beijing Micro Core Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Micro Core Technology Co ltd filed Critical Beijing Micro Core Technology Co ltd
Priority to CN202210409706.8A priority Critical patent/CN114816526B/en
Publication of CN114816526A publication Critical patent/CN114816526A/en
Application granted granted Critical
Publication of CN114816526B publication Critical patent/CN114816526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • G06F9/30109Register structure having multiple operands in a single register
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking

Abstract

The application provides a processing method and a device of a multi-operand instruction based on operand domain multiplexing, which relate to the technical field of computers, and the method comprises the following steps: the method comprises the steps that N candidate operations of a multi-operand instruction comprise a first operation and a target operation, an operation object corresponding to the first operation is an operand, an operation object corresponding to the target operation is an operand and an intermediate execution result, the multi-operand instruction is executed according to the sequence of the candidate operations, for each target operation, in the process of generating the intermediate execution result corresponding to the target operation, whether the corresponding operand is ready or not is monitored, if the operand is ready and the intermediate execution result is not generated, the operand is sent to a pipeline where a current execution stage is located to be transmitted, if the operand is not ready and the intermediate execution result is generated, the intermediate execution result is stored to an idle operand domain and monitoring is continued, and the target operation is executed when the operand is ready. So as to reduce the instruction processing delay and improve the processing efficiency.

Description

Operand domain multiplexing-based multi-operand instruction processing method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a multi-operand instruction based on operand domain multiplexing.
Background
A multiple-operand instruction is commonly used to perform one or more different types of operations on multiple operands, such as an operation instruction, where the main function of the operation instruction is to perform arithmetic operations such as addition, subtraction, multiplication, and division, and an execution result of the operation instruction is obtained by executing an operation corresponding to the operation instruction on a source operand (src). The floating-point multiply-add FMA (fused multiply-add) instruction combines independent floating-point multiplication and floating-point addition into one operation, and the floating-point multiply-add instruction comprises operations of multiply-add, multiply-subtract, negative multiply-add, negative multiply-subtract and the like, such as src1 × src2+ src3, src1 × src2-src3, -src1 × src2+ src3 and-src 1 × src2-src3.
At present, the efficiency of processing multi-operand instructions has become an important indicator of processor performance, and how to reduce the instruction execution delay and improve the processing efficiency of multi-operand instructions is a problem to be solved urgently.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
To this end, a first objective of the present application is to propose a method for processing a multi-operand instruction based on operand domain multiplexing.
A second object of the present application is to provide a processing apparatus for multi-operand instructions based on operand domain multiplexing.
A third object of the present application is to provide an electronic device.
A fourth object of the present application is to propose a computer readable storage medium.
To achieve the above object, a first aspect of the present application provides a method for processing a multiple-operand instruction based on operand domain multiplexing, including: obtaining a multi-operand instruction, wherein the multi-operand instruction includes N candidate operations, N is a positive integer greater than or equal to 2, the N candidate operations include a first operation and a subsequent target operation, an operand corresponding to the first operation is an operand in the multi-operand instruction, an operand corresponding to the target operation is an operand in the multi-operand instruction and an intermediate execution result, and the intermediate execution result is generated by a candidate operation executed before the target operation; executing the multi-operand instruction according to the sequence of the candidate operation, and monitoring whether the operand corresponding to the target operation is ready or not in the process of generating an intermediate execution result corresponding to the target operation for each target operation; if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, the operand corresponding to the target operation is sent to a pipeline where the current execution stage is located, and the operand is transmitted in the pipeline along with the operation of the current execution stage, so that after the intermediate execution result is generated, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation; if the operand corresponding to the target operation is not ready and the intermediate execution result is generated, storing the intermediate execution result to an idle operand domain, and continuously monitoring the operand corresponding to the target operation so that when the operand corresponding to the target operation is ready, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation.
According to an embodiment of the present application, the monitoring whether an operand corresponding to the target operation is ready includes: in response to snooping by a functional component that generates the intermediate execution result, the snooping ends at or before a last pipeline beat in which the functional component executes a corresponding candidate operation, the candidate operation to generate the intermediate execution result.
According to an embodiment of the present application, before said executing the multiple-operand instruction in the order of the candidate operation, further comprises: monitoring whether an operand corresponding to the first operation is ready; in response to the operand corresponding to the first operation being ready, the multi-operand instruction is issued to an instruction execution pipeline to perform the first operation.
According to an embodiment of the present application, in a case that an operand corresponding to the target operation is not ready and the intermediate execution result is generated, before the target operation is executed on the intermediate execution result and the operand corresponding to the target operation by the functional unit corresponding to the target operation, the method further includes: in response to the condition that the operand corresponding to the target operation is monitored to be ready, the intermediate execution result is taken out from the idle operand domain, and the intermediate execution result and the operand corresponding to the target operation are sent to the functional unit corresponding to the target operation.
According to an embodiment of the present application, the sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation includes: and identifying whether the functional unit corresponding to the target operation is in an idle state, and if the functional unit corresponding to the target operation is in the idle state, sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation.
According to one embodiment of the application, the free operand domain is located in a reservation station or register file.
According to one embodiment of the present application, snooping whether any operand within the multi-operand instruction is ready comprises: monitoring a flag bit corresponding to any operand, wherein the flag bit exists in a state flag bit field of the multi-operand instruction; and if the flag bit corresponding to any operand is a first set value, determining that the operand is ready.
According to one embodiment of the present application, snooping whether any operand within the multi-operand instruction is ready comprises: monitoring a write-back bus of the instruction; judging whether the monitored instruction is a related instruction of the multi-operand instruction, wherein the related instruction is a candidate instruction which is closest to the multi-operand instruction in a plurality of candidate instructions, the destination register number corresponding to the candidate instruction is the same as the register number of any operand, and the program serial execution sequence of the candidate instruction is before the multi-operand instruction; determining that any operand is ready if the snooped instruction is the dependent instruction.
To achieve the above object, a second aspect of the present application provides an apparatus for processing a multi-operand instruction based on operand domain multiplexing, comprising: an obtaining module, configured to obtain a multi-operand instruction, where the multi-operand instruction includes N candidate operations, where N is a positive integer greater than or equal to 2, and the N candidate operations include a first operation and a subsequent target operation, where an operation object corresponding to the first operation is an operand in the multi-operand instruction, and an operation object corresponding to the target operation is an operand and an intermediate execution result in the multi-operand instruction, where the intermediate execution result is generated by a candidate operation that is executed before the target operation; a monitoring module, configured to execute the multi-operand instruction according to the order of the candidate operations, and monitor, for each target operation, whether an operand corresponding to the target operation is ready in a process of generating an intermediate execution result corresponding to the target operation; a sending module, configured to send the operand corresponding to the target operation to a pipeline where a current execution stage is located if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, and transmit the operand along with the operation of the current execution stage in the pipeline, so that after the intermediate execution result is generated, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation; and the storage module is used for storing the intermediate execution result to an idle operand domain if the operand corresponding to the target operation is not ready and the intermediate execution result is generated, and continuing to monitor the operand corresponding to the target operation so that the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation when the operand corresponding to the target operation is ready.
According to an embodiment of the present application, the listening module is further configured to: responsive to snooping by a functional component that generates the intermediate execution result, the snooping ends at or before a last pipeline beat in which the functional component executes a corresponding candidate operation, the candidate operation to generate the intermediate execution result.
According to an embodiment of the present application, the listening module is further configured to: monitoring whether an operand corresponding to the first operation is ready; in response to the operand corresponding to the first operation being ready, the multi-operand instruction is issued to an instruction execution pipeline to perform the first operation.
According to an embodiment of the present application, the sending module is further configured to: and in response to the condition that the operand corresponding to the target operation is ready, taking out the intermediate execution result from the idle operand domain, and sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation.
According to an embodiment of the present application, the sending module is further configured to: and identifying whether the functional unit corresponding to the target operation is in an idle state, and if the functional unit corresponding to the target operation is in the idle state, sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation.
According to one embodiment of the application, the free operand domain is located in a reservation station or register file.
According to an embodiment of the present application, the listening module is further configured to: monitoring a flag bit corresponding to any operand, wherein the flag bit exists in a state flag bit field of the multi-operand instruction; and if the flag bit corresponding to any operand is a first set value, determining that the operand is ready.
According to an embodiment of the present application, the monitoring module is further configured to: monitoring a write-back bus of the instruction; judging whether the monitored instruction is a related instruction of the multi-operand instruction, wherein the related instruction is a candidate instruction which is closest to the multi-operand instruction in a plurality of candidate instructions, the destination register number corresponding to the candidate instruction is the same as the register number of any operand, and the program serial execution sequence of the candidate instruction is before the multi-operand instruction; determining that any operand is ready if the snooped instruction is the dependent instruction.
To achieve the above object, a third aspect of the present application provides an electronic device, comprising: the present invention relates to a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the processing method of operand domain multiplexing based multi-operand instruction according to the first aspect of the present application when executing the program.
To achieve the above object, a fourth aspect of the present application proposes a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a method for processing an operand domain multiplexing-based multi-operand instruction according to the first aspect of the present application.
The method for processing the multi-operand instruction based on operand domain multiplexing in the embodiment of the application obtains the multi-operand instruction, wherein the multi-operand instruction includes N candidate operations, N is a positive integer greater than or equal to 2, the N candidate operations include a first operation and a subsequent target operation, an operation object corresponding to the first operation is an operand in the multi-operand instruction, an operation object corresponding to the target operation is an operand and an intermediate execution result in the multi-operand instruction, and the intermediate execution result is generated by a candidate operation executed before the target operation; executing the multi-operand instruction according to the sequence of the candidate operation, and monitoring whether the operand corresponding to the target operation is ready or not in the process of generating an intermediate execution result corresponding to the target operation aiming at each target operation; if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, the operand corresponding to the target operation is sent to a pipeline where the current execution stage is located, the operand is transmitted in the pipeline along with the operation of the current execution stage, and after the intermediate execution result is generated, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation; if the operand corresponding to the target operation is not ready and the intermediate execution result is generated, storing the intermediate execution result to an idle operand domain, and continuing to monitor the operand corresponding to the target operation so that when the operand corresponding to the target operation is ready, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation. The method can start the execution of the instruction without waiting for all the operands of the multi-operand instruction to be ready, monitors whether other operands (operands corresponding to the operation) are ready or not in the process of generating the operation object (intermediate execution result generated by other candidate operations) of the operation aiming at each target operation, stores the intermediate execution result into an idle operand domain if the operation object formed by the operands is not ready after the intermediate execution result corresponding to the target operation is generated, does not need to occupy additional storage space, and executes the target operation on the operation object corresponding to the target operation by the functional unit corresponding to the target operation until the other operation object is ready, thereby avoiding occupying a pipeline due to waiting for the operands, reducing the delay of the execution of the instruction and improving the processing efficiency of the instruction.
Drawings
FIG. 1 is a flow diagram illustrating a method of processing a multi-operand instruction based on operand domain multiplexing in accordance with an illustrative embodiment of the present application;
FIG. 2 is a flow diagram illustrating a method of processing a multi-operand instruction based on operand domain multiplexing in accordance with another illustrative embodiment of the present application;
FIG. 3 is a flow diagram illustrating a method of processing a operand domain multiplexing based multiple operand instruction in accordance with another illustrative embodiment of the present application;
FIG. 4 is a diagram illustrating an issue queue structure in a method for processing an operand domain multiplexing based multi-operand instruction according to an exemplary embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a reservation station architecture in a method for processing operand domain multiplexing based multi-operand instructions according to an illustrative embodiment of the present application;
FIG. 6 is a block diagram illustrating an operand domain multiplexing based multiple operand instruction processing apparatus in accordance with an illustrative embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
Fig. 1 is a flowchart illustrating a processing method of an operand domain multiplexing based multi-operand instruction according to an exemplary embodiment of the present application, where the processing method of the instruction includes the following steps, as shown in fig. 1:
s101, a multi-operand instruction is obtained, wherein the multi-operand instruction comprises N candidate operations, N is a positive integer greater than or equal to 2, and the N candidate operations comprise a first operation and a subsequent target operation.
The processing method of the operand domain multiplexing-based multi-operand instruction can be applied to multi-operand instructions with a plurality of source operands, such as floating point operation instructions, fixed point operation instructions, vector floating point instructions, vector fixed point instructions and the like which comprise three or more operands.
The multi-operand instruction is obtained from the instruction register, and the instruction execution process can be realized by executing the operation corresponding to the instruction on the multiple operands of the instruction. The method divides the execution process of the operand into N candidate operations according to the operation steps, wherein N is a positive integer greater than or equal to 2.
In one implementation, the first operation of the N candidate operations to be executed in the pipeline is determined as a first operation, and subsequent candidate operations except the first operation of the N candidate operations are determined as target operations. It should be noted that the candidate operations in the embodiment of the present application are divided according to the execution steps, and the operation type corresponding to each candidate operation may be the same or different, for example, each candidate operation may be any type of operation or operation other than addition, subtraction, multiplication, and division.
The operation object of the first operation is an operand of the multi-operation instruction, and the operation object of the target operation is the operand of the multi-operation instruction and an intermediate execution result, wherein the intermediate execution result is generated by other candidate operations except the target operation. It is understood that at least one operation object corresponding to the first operation exists, such as a shift operation; at least 2 operands corresponding to the target operation exist, such as an intermediate execution result generated by a candidate operation executed before the target operation and an operand in a multi-operand instruction, wherein the operand refers to a source operand (src) of the multi-operand instruction.
For example, for a multi-operand instruction that needs to implement the src1 × src2+ src3 operation, the operation implementing src1 × src2 is determined as the first operation, and the operation implementing the addition of the multiplication result of src1 × src2 and the operand src3 is determined as the target operation, where the operation objects of the first operation are the operand src1 and the operand src2, and the operation objects of the target operation are the intermediate execution result (src 1 × src 2) and the operand src3 generated by the first operation.
S102, executing the multi-operand instruction according to the sequence of the candidate operation, and monitoring whether the operand corresponding to the target operation is ready or not in the process of generating the intermediate execution result corresponding to the target operation aiming at each target operation.
The instructions are executed based on the logical order or specific order of the candidate operations, wherein for each target operation it is necessary to listen whether the operands, consisting of operands, are ready during the generation of operands, consisting of intermediate execution results.
For example, for the previous example, src1 × src2+ src3, an intermediate execution result obtained after the multiplication operation of src1 × src2 is executed is used as an operation object of the target operation, and it is monitored during generation of the intermediate execution result (i.e., during execution of the multiplication operation in this example) whether operand src3 is ready.
Taking a floating-point multiply-add (FMA) instruction including four operands in a floating-point operation instruction as an example, the FMA instruction may also be referred to as a floating-point multiply-accumulate instruction, which is used to perform a multiply-add operation, such as: src1 + src2+ src3+ src4, when executing the instruction, three candidate operations need to be implemented, namely, one multiply operation and two add operations, where the multiply operation is determined as the first operation, the two add operations are determined as the target operations, the operand src1 is used as a floating-point multiplier, the operand src2 is used as a floating-point multiplicand, and the operand src3 and the operand src4 are used as floating-point addends, and the processing of the FMA instruction is completed by executing the corresponding candidate operations on the four operands.
The instruction execution pipeline may be monitored for readiness of the operand src3 during execution of a multiply operation by a functional unit (e.g., a multiply unit) to which the first operation corresponds, as the operands src1 and src2 enter the instruction execution pipeline; during the process of performing an add operation on the intermediate execution result obtained by the multiply operation and the operand src3 to the functional unit corresponding to the first target operation (e.g., the add unit), it is monitored whether the operand src4 is ready.
It will be understood that any operand may be the result of execution of an instruction immediately preceding or earlier than the current multi-operand instruction, for example, the immediately preceding instruction may be an accumulation instruction, and the operand may be obtained by performing an accumulation operation on the operand corresponding to the accumulation instruction. For another example, the operand is an execution result of another instruction (e.g., a fetch (Load) instruction, a compute instruction, etc.) earlier than the multi-operand instruction, and if the register number of the operand is a register number of another operand earlier than the destination operand of the current multi-operand instruction, there is a dependency, and if the result of the dependent instruction is not obtained, a wait is required. It is therefore necessary to determine whether an operand is ready so that when a current multiple operand instruction is performing an operation on an operand, the operand can be fetched with the operand ready.
S103, if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, the operand corresponding to the target operation is sent to the pipeline where the current execution stage is located, and the operand is transmitted in the pipeline along with the operation of the current execution stage, so that after the intermediate execution result is generated, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation.
In the embodiment of the present application, if during the process of generating an operand (an intermediate execution result generated by other candidate operations) of a target operation, other operands (operands) of the target operation are ready, the operands may be sent to the pipeline in which the current execution stage is located. For example, when the FMA instruction is processed, if the operand src3 is ready when the operands src1 and src2 perform a floating-point opcode operation, the operand src3 is sent to the pipeline in which the stage is located, and the instruction is further sent down the pipeline along with the operation in the current execution stage, and after the intermediate execution result is generated, the operand src3 and the intermediate execution result enter the functional unit corresponding to the target operation to execute the target operation.
And S104, if the operand corresponding to the target operation is not ready and the intermediate execution result is generated, storing the intermediate execution result into an idle operand domain, and continuously monitoring the operand corresponding to the target operation so that when the operand corresponding to the target operation is ready, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation.
In the embodiment of the present application, if the operand corresponding to the target operation is not ready when the operand of the target operation (the intermediate execution result generated by the other candidate operation) has been generated, to avoid pipeline occupation, the intermediate execution result may be stored in the idle operand domain, and the operand corresponding to the target operation is waited for being ready. And continuously monitoring whether the operand corresponding to the target operation is ready or not, and sending the operand corresponding to the target operation and an intermediate execution result to the functional unit corresponding to the target operation to execute the target operation when the operand corresponding to the target operation is ready.
The method for processing the multi-operand instruction based on operand domain multiplexing includes that N candidate operations of the multi-operand instruction include a first operation and a subsequent target operation, an operand corresponding to the first operation is an operand in the multi-operand instruction, operands corresponding to the target operations are an operand and an intermediate execution result in the multi-operand instruction, the intermediate execution result is generated by the candidate operations executed before the target operation, the multi-operand instruction is executed according to an order of the candidate operations, for each target operation, whether the operand corresponding to the target operation is ready is monitored in the process of generating the intermediate execution result corresponding to the target operation, if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, the operand corresponding to the target operation is sent to a pipeline where a current execution stage is located, the operand corresponding to the target operation is transmitted in the pipeline along with the operation of the current execution stage, after the intermediate execution result is generated, the function unit corresponding to the intermediate execution result and the operand corresponding to the target operation executes the target operation, if the operand corresponding to the target operation is not ready and the intermediate execution result is generated, the intermediate execution result is stored to the idle operand corresponding to the target operation domain, and the operand corresponding to the target operation is monitored when the operand corresponding to the target operation and the intermediate execution result is executed, and the target operation corresponding to the target operation is continued. This approach does not need to wait for the operands to be fully ready to begin executing the instruction. For each target operation, in the process of generating an operation object (an intermediate execution result generated by other candidate operations) of the operation, whether another operation object (an operand corresponding to the operation) is ready is monitored, if the operation object formed by the operands is not ready after the intermediate execution result corresponding to the target operation is generated, the intermediate execution result is stored into an idle operand domain, no additional storage space is occupied, and until the other operation object is ready, the functional unit corresponding to the target operation executes the target operation on the operation object corresponding to the target operation, so that the pipeline occupation caused by the waiting of the operand is avoided, the delay of the instruction execution is reduced, and the instruction processing efficiency is improved.
In some embodiments, the candidate operations of the multi-operand instruction may include a plurality of candidate operations whose operands are operands and whose targets are operands and intermediate execution results. For example, for a multi-operand instruction including four operands, the candidate operation a corresponds to two operands, the candidate operation B corresponds to two operands, and the operation object corresponding to the candidate operation C is an intermediate execution result obtained by the candidate operation a and an intermediate execution result obtained by the candidate operation B. When the instruction is executed, the instruction can be sent to enter a pipeline to execute the candidate operation A or the candidate operation B when the condition that the operand corresponding to the candidate operation A or the candidate operation B is ready is monitored, whether the operand of the other operation is ready is monitored, the operand is sent to the pipeline to execute the corresponding operation when the operand is ready, the generated intermediate execution result can be stored to a free operand domain to wait for the operand which is not ready, or the intermediate execution result enters the pipeline to be transmitted along with the operation which is being executed until the operation object corresponding to the candidate operation C is generated, and the candidate operation C is executed based on the functional unit corresponding to the candidate operation C.
In some embodiments, the reservation stations, the register file, and the functional units may each implement a process for snooping whether an operand corresponding to a target operation is ready, wherein to avoid adding delay, the snoop process implemented by a functional unit (e.g., a functional unit that may generate an intermediate execution result corresponding to the target operation) may be set to end at or before the last pipeline beat in which the functional unit executes the corresponding candidate operation when the operand corresponding to the target operation is snooped. For example, if an operand of the target operation is an intermediate execution result obtained by the candidate operation a, the snooping process of the operand by the functional unit may end at the last pipeline cycle of executing the candidate operation a, or end the snooping before entering the last pipeline cycle, and the specific time for ending the snooping may be determined according to the timing situation.
On the basis of the embodiment shown in fig. 1, as shown in fig. 2, the embodiment of the present application further includes the following steps before "execute the multiple-operand instruction in the order of the candidate operation" in step S102:
s201, monitoring whether an operand corresponding to the first operation is ready.
In the embodiment of the application, before sending the obtained multi-operand instruction to the instruction execution pipeline, the state of the operand corresponding to the first operation is monitored, and whether the operand corresponding to the first operation is ready or not is judged.
S202, responding to the ready of the operand corresponding to the first operation, sending a multi-operand instruction to the instruction execution pipeline to execute the first operation.
In the embodiment of the application, the multi-operand instruction is sent to the pipeline to be executed in response to the operand corresponding to the first operation being ready, and the first operation can be preferentially executed on the operand corresponding to the first operation without waiting for all operands to be ready. And monitoring whether an operand corresponding to the next target operation is ready or not in the process of executing the first operation.
In some embodiments, the positions of the operands may be obtained according to different requirements of the instructions in the registers, for example, after the instructions are sent, the operands corresponding to the first operation are obtained from the register file according to the register number, and sent to the pipeline for execution; or if the operands corresponding to the first operation are ready, the operands are obtained according to the register numbers in the transmission queue and are stored into the corresponding data fields in the reservation stations, in which case the operands can be obtained from the reservation stations and sent to the pipeline for execution. The specific acquisition location is not overly limited herein.
In the embodiment of the application, after the operands corresponding to the first operation are sent to the pipeline for execution, the operand domains of the operands are in an idle state, and the idle operand domains are multiplexed for storing the intermediate execution result of the multi-operand instruction, so that cache resources are prevented from being occupied, and the pipeline is not occupied to influence the execution of other instructions in the pipeline. Where the free operand field may be located in a reservation station or register file, depending on where the operand is fetched.
The number of the multiplexed operand fields can be selected according to the size of the operand and the size of the intermediate execution result. Taking a floating-point multiply-add instruction with three operands as an example, the size of one operand field is 64 bits, and the size of the addition of two operand fields is 128 bits, that is, there is a 128-bit storage space in the data field of the two operands corresponding to the first operation, and the 128-bit storage space is used for storing the intermediate execution result obtained by the first operation in the floating-point multiply-add instruction.
In some embodiments, when the floating-point operation instruction is processed, the storage space required by the intermediate execution result may be determined according to whether the intermediate execution result of the instruction is obtained by normalizing and rounding the execution result of the candidate operation, so as to determine to store the intermediate execution result into one idle operand domain or a plurality of idle operand domains. For example, when an FMA instruction is processed, the result of the multiply operation is not normalized and rounded, and the intermediate result of the FMA instruction may include the result of the multiply operation and rounding information, which are stored in two free operand fields in the reservation station or register file. In addition, the intermediate execution result can also be data obtained by normalizing and rounding the execution result of the multiplication operation, and at the moment, the intermediate execution result can be stored only by one idle operand field, and any idle operand field can be selected for storage.
Taking a vector floating point instruction as an example, one operand domain is 512 bits, the sum of the two operand domains is 1024 bits, one operand includes 8 64 bits, the 8 64 bits corresponding to the two operands are executed in parallel to obtain an intermediate execution result, and the result can be stored in two idle operand domains.
When a fixed-point operation instruction is processed, one operand field is 64 bits, and an intermediate execution result obtained by executing a candidate operation on two operands is also 64 bits, and the intermediate execution result can be stored in any of the free operand fields.
As another example, for vector instruction processing with masks, the vector instruction may be a 3-operand instruction, including two vector operands and one mask operand, or a 4-operand instruction, including 3 vector operands and one mask operand. The Mask Operation (Mask Operation) of the RISC-V vector instruction set performs an and or nor Operation on the data in the vector register as Mask data. Vector instructions implement conditional execution through masking operations, the vector instructions storing masks using vector registers. Through the mask, whether a certain operation is executed or not is judged, so that the vector processing unit is provided with the capacity of conditional execution. The mask data can be obtained through access/comparison instructions and the like and stored in the vector register. The mask data is different from the normal vector data in that only the lower bits (Least Significant Bit) of the mask data are valid. By the method and the device, the operation can be completed firstly under the condition that a mask register of the RISC-V vector instruction is not prepared, and after the mask is prepared, the conditional operation is executed according to the value of the mask, namely, one candidate operation is the operation, the other candidate operation is the conditional operation, the mask vector register is used as a third operand, and the vector instruction can be transmitted under the condition that the mask register for storing the mask is not prepared.
In some embodiments, the vector and the operation between vectors are computed using two operand vectors in a vector register set specified by vs2 and vs1, the vector arithmetic instruction being masked under the vm field. If vs2 and vs1 are 512-bit vector operands, each element of the vector is 64 bits, and each vector has 8 elements; the mask register operand vm is 512 bits, the lower 8 bits of which are valid, representing a mask of 8 elements, and the remaining upper bits of vm are 0; vd is a 512-bit vector destination register.
For example, the instructions vop, vv, vs2, vs1, vm determine whether each element is a result after operation or retains an original value according to the value of the mask vm.
The vop operation may be an arithmetic or logical operation, such as vadd (vector addition), vsub (vector subtraction), vmadd (vector multiply add), vfadd (vector floating point add), vfmadd (vector floating point multiply add), etc., vand (vector and), vxor (vector xor), vsll (vector shift left), vsrl (vector shift right), etc.
Therefore, the multi-operand instruction processing method based on data field multiplexing can be applied to not only multi-operand operation instructions but also multi-operand non-operation instructions.
In the embodiment of the application, it is not necessary to wait for all operands to be ready, the ready operands are preferably sent to the instruction execution pipeline to execute the corresponding candidate operations, and during the process, the operands corresponding to the next target operation are monitored, and during the process, the intermediate execution result obtained by the candidate operation can be temporarily stored in the idle operand domain to wait for the operands corresponding to the next target operation to be ready. Thereby improving the processing efficiency of the instruction.
On the basis of the above embodiment, before "the functional unit corresponding to the target operation performs the target operation on the intermediate execution result and the operand corresponding to the target operation in step S104, the embodiment of the present application further includes: and in response to the condition that the operand corresponding to the target operation is ready, taking out the intermediate execution result from the idle operand domain, and sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation.
In the embodiment of the application, when it is monitored that the operand corresponding to the target operation is ready, the operand corresponding to the target operation is obtained, and an intermediate execution result corresponding to the target operation is called from an operand field in which the intermediate execution result is stored.
The operand corresponding to the target operation may be obtained from the register file or the reservation station according to the requirement of the register instruction, which is not described herein again.
In some embodiments, the "sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation" may be implemented by: identifying whether the functional unit corresponding to the target operation is in an idle state, and sending an intermediate execution result and an operand corresponding to the target operation to the functional unit corresponding to the target operation under the condition that the functional unit corresponding to the target operation is in the idle state; when the functional unit corresponding to the target operation is in an occupied state, the operand and the intermediate execution result are not acquired, and when the functional unit is in an idle state, the operand and the intermediate execution result are acquired again and sent to the functional unit.
For example, if the operand corresponding to the target operation is ready and the functional unit corresponding to the target operation is idle, the intermediate execution result corresponding to the target operation may be directly retrieved and sent to the functional unit, and the operand corresponding to the target operation may be sent to the functional unit, so that the target operation may be performed on the functional unit corresponding to the target operation based on the operand corresponding to the target operation.
On the basis of the above embodiments, as shown in fig. 3, the operand domain multiplexing-based multi-operand instruction processing method of the present application further includes a snoop procedure for determining whether operands are ready, including the following steps:
s301, monitoring a flag bit corresponding to any operand, wherein the flag bit exists in a state flag bit field of the multi-operand instruction.
As shown in fig. 4, when a multi-operand instruction including three operands is stored in an issue queue, the issue queue includes fields such as a register number field corresponding to an operand (e.g., a register number field corresponding to the operand src1, a register number field corresponding to the operand src2, and a register number field corresponding to the operand src 3), and a status flag bit field (e.g., a ready field), where three bits (i.e., three flag bits corresponding to three operands) in the status flag bit field are used to flag whether the operand is ready, and each flag bit corresponds to one operand. The method and the device monitor the flag bit corresponding to any operand in the status flag bit field to judge whether the operand is ready.
Fig. 5 is a schematic structural diagram of the reservation station, which includes, in addition to the register number field and the status flag bit field corresponding to the same operand in the issue queue, data fields corresponding to the operand (i.e., operand fields), such as a data field data-src1 field corresponding to the operand src1, a data field data-src2 field corresponding to the operand src2, and a data field data-src3 field corresponding to the operand src3.
It should be noted that, the value of the bit corresponding to the operand in the status flag bit field is updated through the pipeline according to the processing status of the instruction corresponding to the operand, for example, if the instruction to shift left is executed on the operand a to obtain the operand B, the value in the ready field is updated according to whether the instruction to shift left is executed (i.e., whether the operand B is obtained).
S302, if the flag bit corresponding to any operand is the first set value, the readiness of any operand is determined.
In the embodiment of the present application, setting the first setting value indicates that the operand is ready, as shown in fig. 4 or 5, setting 1 to the first setting value, that is, when the flag bit of the operand src1 is 1, it indicates that the operand src1 is ready, when the flag bit of the operand src2 is 1, it indicates that the operand src2 is ready, and when the flag bit of the operand src3 is 1, it indicates that the operand src3 is ready.
As a possible implementation manner, the state of any operand can be snooped by means of writeback bus snooping to determine whether it is ready, and the specific process is as follows:
monitoring a write-back bus of the instruction; and judging whether the monitored instruction is a related instruction of the multi-operand instruction, wherein the related instruction is a candidate instruction which is closest to the multi-operand instruction in the plurality of candidate instructions. The destination register number corresponding to the candidate instruction is the same as that of any operand of the multi-operand instruction (for example, for an instruction containing three operands, the destination register number is the same as that of the operand src1, the destination register number is the same as that of the operand src2, or the destination register number is the same as that of the operand src 3), and the program execution sequence of the candidate instruction precedes the multi-operand instruction; if the instruction being monitored is a dependent instruction, then it is determined that any operand is ready.
For example, when a plurality of instructions corresponding to a program are executed, the instructions include a multi-operand instruction, an instruction related to the multi-operand instruction is determined from the instructions, for example, a program serial execution sequence of instruction 1, instruction 2, instruction 3 and instruction 4 precedes the multi-operand instruction, if destination register numbers of instruction 1 and instruction 3 are the same as register numbers of an operand src1 of the multi-operand instruction, instruction 1 and instruction 3 can be regarded as candidate instructions, and of the two candidate instructions, instruction 3 is closest to the multi-operand instruction, and instruction 3 is determined as a related instruction of the multi-operand instruction. When a plurality of instructions corresponding to a program are executed, a write-back bus of the instructions is monitored, one or more monitored write-back instructions are judged, whether an instruction 3 exists in the instructions is judged, and if yes, the operand src1 can be considered to be ready. The program serial execution sequence corresponds to a program sequential execution sequence, and correspondingly, the instruction closest to the multi-operand is the instruction closest to the multi-operand based on the program sequence.
In addition, in the implementation, if the operands corresponding to one candidate operation are ready and the operands corresponding to other candidate operations are also ready, the operands can enter the pipeline for transmission together; or, after the execution of a candidate operation is finished, sending the operand corresponding to the next candidate operation to the corresponding functional unit to execute the operation together with other operation objects (intermediate execution results generated by other operations) of the operation; or storing the operand corresponding to the next candidate operation in a register file or a reservation station to wait for the intermediate execution result corresponding to the operation, and executing the operation after the intermediate execution result corresponding to the operation is generated. In conclusion, the intermediate execution result of the instruction is stored in the reservation station or the register file, so that the pipeline is not occupied, and the normal operation of the subsequent instruction in the pipeline is not influenced. At this time, the operation is executed continuously only after the operand corresponding to the next candidate instruction is ready, delay caused by waiting for all operands to be ready is avoided, the processing time of the instruction in the processor can be effectively reduced, and the performance of the processor is improved.
Fig. 6 is a block diagram illustrating a processing apparatus for operand domain multiplexing based multi-operand instruction according to an exemplary embodiment of the present application, where, as shown in fig. 6, the processing apparatus 500 includes: the device comprises an acquisition module 501, a monitoring module 502, a sending module 503 and a storage module 504.
The obtaining module 501 is configured to obtain a multi-operand instruction, where the multi-operand instruction includes N candidate operations, where N is a positive integer greater than or equal to 2, and the N candidate operations include a first operation and a subsequent target operation, where an operation object corresponding to the first operation is an operand in the multi-operand instruction, an operation object corresponding to the target operation is an operand and an intermediate execution result in the multi-operand instruction, and the intermediate execution result is generated by a candidate operation executed before the target operation.
The monitoring module 502 is configured to execute the multi-operand instruction according to the order of the candidate operations, and for each target operation, monitor whether an operand corresponding to the target operation is ready in the process of generating an intermediate execution result corresponding to the target operation.
The sending module 503 is configured to send the operand corresponding to the target operation to the pipeline where the current execution stage is located if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, and transmit the operand along with the operation of the current execution stage in the pipeline, so that after the intermediate execution result is generated, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation.
The storage module 504 is configured to store the intermediate execution result into an idle operand domain if the operand corresponding to the target operation is not ready and the intermediate execution result is generated, and continue to monitor the operand corresponding to the target operation so that the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation when the operand corresponding to the target operation is ready.
In this embodiment of the present application, the listening module 502 is further configured to: in response to snooping by a functional component that generates the intermediate execution result, the snooping ends at or before a last pipeline beat in which the functional component executes a corresponding candidate operation, the candidate operation to generate the intermediate execution result.
In this embodiment of the present application, the listening module 502 is further configured to: monitoring whether an operand corresponding to the first operation is ready; in response to the operands corresponding to the first operation being ready, a multi-operand instruction is issued to the instruction execution pipeline to perform the first operation.
In this embodiment of the present application, the sending module 503 may further be configured to: and in response to the condition that the operand corresponding to the target operation is ready, taking out the intermediate execution result from the idle operand domain, and sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation.
In this embodiment of the application, the sending module 503 is further configured to: and identifying whether the functional unit corresponding to the target operation is in an idle state, and if the functional unit corresponding to the target operation is in the idle state, sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation.
In the embodiment of the present application, the free operand field is located in the reservation station or register file.
In this embodiment of the present application, the listening module 502 is further configured to: monitoring a flag bit corresponding to any operand, wherein the flag bit exists in a state flag bit field of a multi-operand instruction; and if the flag bit corresponding to any operand is the first set value, determining that any operand is ready.
In this embodiment of the application, the listening module 502 is further configured to: monitoring a write-back bus of the instruction; judging whether the monitored instruction is a related instruction of a multi-operand instruction, wherein the related instruction is a candidate instruction which is closest to the multi-operand instruction in a plurality of candidate instructions, the destination register number corresponding to the candidate instruction is the same as the register number of any operand, and the program serial execution sequence of the candidate instruction is before the multi-operand instruction; if the monitored instruction is the relevant instruction, then it is determined that any operand is ready.
It should be noted that the above explanation of the embodiment of the method for processing a multi-operand instruction based on operand domain multiplexing is also applicable to the apparatus for processing a multi-operand instruction based on operand domain multiplexing in the embodiment of the present application, and the specific process is not described herein again.
The processing apparatus of the multi-operand instruction based on operand domain multiplexing according to the embodiment of the application, N candidate operations of the multi-operand instruction include a first operation and a subsequent target operation, an operand corresponding to the first operation is an operand in the multi-operand instruction, an operand corresponding to the target operation is an internal operand and an intermediate execution result of the multi-operand instruction, the intermediate execution result is generated by the candidate operation executed before the target operation, the multi-operand instruction is executed according to the order of the candidate operations, for each target operation, whether the operand corresponding to the target operation is ready or not is monitored in the process of generating the intermediate execution result corresponding to the target operation, if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, the operand corresponding to the target operation is sent to a pipeline where a current execution stage is located, the operand corresponding to the target operation is transmitted in the pipeline along with the operation of the current execution stage, after the intermediate execution result is generated, the function unit corresponding to the intermediate execution result and the operand corresponding to the target operation are executed by the target operation, if the operand corresponding to the target operation is not ready and the intermediate execution result is transmitted in the pipeline, the intermediate execution result is stored in the idle execution result, and the operand corresponding to the target operation is monitored when the operand corresponding to the target operation is continuously executed. This approach does not need to wait for the operands to be fully ready to begin executing the instruction. For each target operation, in the process of generating an operation object (an intermediate execution result generated by other candidate operations) of the operation, whether another operation object (an operand corresponding to the operation) is ready is monitored, if the operation object formed by the operands is not ready after the intermediate execution result corresponding to the target operation is generated, the intermediate execution result is stored into an idle operand domain, no additional storage space is occupied, and until the other operation object is ready, the functional unit corresponding to the target operation executes the target operation on the operation object corresponding to the target operation, so that the pipeline occupation caused by the waiting of the operand is avoided, the delay of the instruction execution is reduced, and the instruction processing efficiency is improved.
In order to implement the foregoing embodiment, an embodiment of the present application further provides an electronic device 600, as shown in fig. 7, where the electronic device 600 specifically includes: the memory 601, the processor 602, and the computer program stored in the memory 601 and executable on the processor 602, when the processor 602 executes the program, the processing method of the multi-operand instruction based on operand domain multiplexing as shown in the above embodiments is implemented.
In order to implement the foregoing embodiments, the present application further proposes a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for processing a multi-operand instruction as shown in the foregoing embodiments.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise. In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are exemplary and should not be construed as limiting the present application and that changes, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims (16)

1. A method for processing a multi-operand instruction based on operand domain multiplexing, comprising:
obtaining a multi-operand instruction, wherein the multi-operand instruction includes N candidate operations, where N is a positive integer greater than or equal to 2, and the N candidate operations include a first operation and a subsequent target operation, where an operand corresponding to the first operation is an operand in the multi-operand instruction, and operands corresponding to the target operation are an operand in the multi-operand instruction and an intermediate execution result, where the intermediate execution result is generated by a candidate operation executed before the target operation;
executing the multi-operand instruction according to the sequence of the candidate operation, and monitoring whether the operand corresponding to the target operation is ready or not in the process of generating an intermediate execution result corresponding to the target operation for each target operation;
if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, the operand corresponding to the target operation is sent to a pipeline where the current execution stage is located, and the operand is transmitted in the pipeline along with the operation of the current execution stage, so that after the intermediate execution result is generated, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation;
if the operand corresponding to the target operation is not ready and the intermediate execution result is generated, storing the intermediate execution result to an idle operand domain, and continuing to monitor the operand corresponding to the target operation so that when the operand corresponding to the target operation is ready, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation;
before the executing the multi-operand instruction according to the order of the candidate operation, the method further comprises the following steps:
monitoring whether an operand corresponding to the first operation is ready;
responsive to an operand corresponding to the first operation being ready, the multi-operand instruction is issued to an instruction execution pipeline to execute the first operation.
2. The processing method of claim 1, wherein said snooping whether operands corresponding to the target operation are ready comprises:
in response to snooping by a functional component that generates the intermediate execution result, the snooping ends at or before a last pipeline beat in which the functional component executes a corresponding candidate operation, the candidate operation to generate the intermediate execution result.
3. The processing method according to claim 1, wherein, in a case where an operand corresponding to the target operation is not ready and the intermediate execution result is generated, before the target operation is executed on the intermediate execution result and the operand corresponding to the target operation by the functional unit corresponding to the target operation, further comprising:
in response to the condition that the operand corresponding to the target operation is monitored to be ready, the intermediate execution result is taken out from the idle operand domain, and the intermediate execution result and the operand corresponding to the target operation are sent to the functional unit corresponding to the target operation.
4. The processing method according to claim 3, wherein said sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation comprises:
and identifying whether the functional unit corresponding to the target operation is in an idle state, and if the functional unit corresponding to the target operation is in the idle state, sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation.
5. The processing method according to any of claims 1 to 4, wherein the free operand domain is located in a reservation station or a register file.
6. The processing method of any of claims 1-4, wherein snooping whether any operand within the multi-operand instruction is ready comprises:
monitoring a flag bit corresponding to any operand, wherein the flag bit exists in a state flag bit field of the multi-operand instruction;
and if the flag bit corresponding to any operand is a first set value, determining that the operand is ready.
7. The processing method of any of claims 1-4, wherein snooping whether any operand within the multi-operand instruction is ready comprises:
monitoring a write-back bus of the instruction;
judging whether the monitored instruction is a related instruction of the multi-operand instruction, wherein the related instruction is a candidate instruction which is closest to the multi-operand instruction in a plurality of candidate instructions, the destination register number corresponding to the candidate instruction is the same as the register number of any operand, and the program serial execution sequence of the candidate instruction is before the multi-operand instruction;
determining that any operand is ready if the snooped instruction is the dependent instruction.
8. An operand domain multiplexing based multi-operand instruction processing apparatus, comprising:
an obtaining module, configured to obtain a multi-operand instruction, where the multi-operand instruction includes N candidate operations, where N is a positive integer greater than or equal to 2, and the N candidate operations include a first operation and a subsequent target operation, where an operation object corresponding to the first operation is an operand in the multi-operand instruction, and an operation object corresponding to the target operation is an operand and an intermediate execution result in the multi-operand instruction, where the intermediate execution result is generated by a candidate operation that is executed before the target operation;
a monitoring module, configured to execute the multi-operand instruction according to the order of the candidate operations, and monitor, for each target operation, whether an operand corresponding to the target operation is ready in a process of generating an intermediate execution result corresponding to the target operation;
a sending module, configured to send the operand corresponding to the target operation to a pipeline where a current execution stage is located if the operand corresponding to the target operation is ready and the intermediate execution result is not generated, and transmit the operand along with the operation of the current execution stage in the pipeline, so that after the intermediate execution result is generated, the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation;
the storage module is used for storing the intermediate execution result to an idle operand domain if the operand corresponding to the target operation is not ready and the intermediate execution result is generated, and continuing to monitor the operand corresponding to the target operation so that the functional unit corresponding to the target operation executes the target operation on the intermediate execution result and the operand corresponding to the target operation when the operand corresponding to the target operation is ready;
the monitoring module is further configured to:
monitoring whether an operand corresponding to the first operation is ready;
in response to the operand corresponding to the first operation being ready, the multi-operand instruction is issued to an instruction execution pipeline to perform the first operation.
9. The processing apparatus of claim 8, wherein the listening module is further configured to:
responsive to snooping by a functional component that generates the intermediate execution result, the snooping ends at or before a last pipeline beat in which the functional component executes a corresponding candidate operation, the candidate operation to generate the intermediate execution result.
10. The processing apparatus as claimed in claim 8, wherein the sending module is further configured to:
in response to the condition that the operand corresponding to the target operation is monitored to be ready, the intermediate execution result is taken out from the idle operand domain, and the intermediate execution result and the operand corresponding to the target operation are sent to the functional unit corresponding to the target operation.
11. The processing apparatus as claimed in claim 8, wherein the sending module is further configured to:
and identifying whether the functional unit corresponding to the target operation is in an idle state, and if the functional unit corresponding to the target operation is in the idle state, sending the intermediate execution result and the operand corresponding to the target operation to the functional unit corresponding to the target operation.
12. A processing apparatus according to any of claims 8-11, wherein the free operand domain is located in a reservation station or a register file.
13. The processing apparatus according to any of claims 8-11, wherein the listening module is further configured to:
monitoring a flag bit corresponding to any operand in the multi-operand instruction, wherein the flag bit exists in a state flag bit field of the multi-operand instruction;
and if the flag bit corresponding to any operand is a first set value, determining that the operand is ready.
14. The processing apparatus according to any one of claims 8 to 11, wherein the listening module is further configured to:
monitoring a write-back bus of the instruction;
judging whether the monitored instruction is a related instruction of the multi-operand instruction, wherein the related instruction is a candidate instruction which is closest to the multi-operand instruction in a plurality of candidate instructions, the destination register number corresponding to the candidate instruction is the same as the register number of any operand in the multi-operand instruction, and the program serial execution sequence of the candidate instruction is before the multi-operand instruction;
determining that any operand is ready if the snooped instruction is the dependent instruction.
15. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, which when executed by the processor implements the method of any one of claims 1 to 7.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202210409706.8A 2022-04-19 2022-04-19 Operand domain multiplexing-based multi-operand instruction processing method and device Active CN114816526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210409706.8A CN114816526B (en) 2022-04-19 2022-04-19 Operand domain multiplexing-based multi-operand instruction processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210409706.8A CN114816526B (en) 2022-04-19 2022-04-19 Operand domain multiplexing-based multi-operand instruction processing method and device

Publications (2)

Publication Number Publication Date
CN114816526A CN114816526A (en) 2022-07-29
CN114816526B true CN114816526B (en) 2022-11-11

Family

ID=82506285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210409706.8A Active CN114816526B (en) 2022-04-19 2022-04-19 Operand domain multiplexing-based multi-operand instruction processing method and device

Country Status (1)

Country Link
CN (1) CN114816526B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5553256A (en) * 1994-02-28 1996-09-03 Intel Corporation Apparatus for pipeline streamlining where resources are immediate or certainly retired
US5761476A (en) * 1993-12-30 1998-06-02 Intel Corporation Non-clocked early read for back-to-back scheduling of instructions
US6101597A (en) * 1993-12-30 2000-08-08 Intel Corporation Method and apparatus for maximum throughput scheduling of dependent operations in a pipelined processor
US6553484B1 (en) * 1998-11-30 2003-04-22 Nec Corporation Instruction-issuing circuit that sets reference dependency information in a preceding instruction when a succeeding instruction is stored in an instruction out-of-order buffer

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7398375B2 (en) * 2002-04-04 2008-07-08 The Regents Of The University Of Michigan Technique for reduced-tag dynamic scheduling and reduced-tag prediction
US8356160B2 (en) * 2008-01-15 2013-01-15 International Business Machines Corporation Pipelined multiple operand minimum and maximum function
US20110314263A1 (en) * 2010-06-22 2011-12-22 International Business Machines Corporation Instructions for performing an operation on two operands and subsequently storing an original value of operand
WO2013147852A1 (en) * 2012-03-30 2013-10-03 Intel Corporation Instruction scheduling for a multi-strand out-of-order processor
US9582286B2 (en) * 2012-11-09 2017-02-28 Advanced Micro Devices, Inc. Register file management for operations using a single physical register for both source and result
US20160011876A1 (en) * 2014-07-11 2016-01-14 Cavium, Inc. Managing instruction order in a processor pipeline

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761476A (en) * 1993-12-30 1998-06-02 Intel Corporation Non-clocked early read for back-to-back scheduling of instructions
US6101597A (en) * 1993-12-30 2000-08-08 Intel Corporation Method and apparatus for maximum throughput scheduling of dependent operations in a pipelined processor
US5553256A (en) * 1994-02-28 1996-09-03 Intel Corporation Apparatus for pipeline streamlining where resources are immediate or certainly retired
US6553484B1 (en) * 1998-11-30 2003-04-22 Nec Corporation Instruction-issuing circuit that sets reference dependency information in a preceding instruction when a succeeding instruction is stored in an instruction out-of-order buffer

Also Published As

Publication number Publication date
CN114816526A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
KR100244842B1 (en) Processor and method for speculatively executing an instruction loop
US9778911B2 (en) Reducing power consumption in a fused multiply-add (FMA) unit of a processor
US7694112B2 (en) Multiplexing output from second execution unit add/saturation processing portion of wider width intermediate result of first primitive execution unit for compound computation
JP5573134B2 (en) Vector computer and instruction control method for vector computer
JP5456167B2 (en) Microprocessor and method for product sum calculation with improved accuracy on a microprocessor
US5898864A (en) Method and system for executing a context-altering instruction without performing a context-synchronization operation within high-performance processors
EP1562107B1 (en) Apparatus and method for performing early correction of conditional branch instruction mispredictions
JP4991299B2 (en) Method for reducing stall due to operand dependency and data processor therefor
US5991863A (en) Single carry/borrow propagate adder/decrementer for generating register stack addresses in a microprocessor
US10007524B2 (en) Managing history information for branch prediction
US5802340A (en) Method and system of executing speculative store instructions in a parallel processing computer system
US7010676B2 (en) Last iteration loop branch prediction upon counter threshold and resolution upon counter one
US6237085B1 (en) Processor and method for generating less than (LT), Greater than (GT), and equal to (EQ) condition code bits concurrent with a logical or complex operation
KR100307980B1 (en) Method and apparatus for generating less than (lt), greater than (gt), and equal to (eq) condition code bits concurrent with an arithmetic or logical operation
CN114816526B (en) Operand domain multiplexing-based multi-operand instruction processing method and device
US20070005676A1 (en) Simple and amended saturation for pipelined arithmetic processors
US20210042123A1 (en) Reducing Operations of Sum-Of-Multiply-Accumulate (SOMAC) Instructions
US5764939A (en) RISC processor having coprocessor for executing circular mask instruction
US7472264B2 (en) Predicting a jump target based on a program counter and state information for a process
US9170819B2 (en) Forwarding condition information from first processing circuitry to second processing circuitry
KR100237989B1 (en) Method and system for efficiently utilizing rename buffers to reduce dispatch unit stalls in a superscalar processor
EP3118737B1 (en) Arithmetic processing device and method of controlling arithmetic processing device
US7991816B2 (en) Inverting data on result bus to prepare for instruction in the next cycle for high frequency execution units
JP3541005B2 (en) Data processing apparatus and method
US6718458B2 (en) Method and apparatus for performing addressing operations in a superscalar, superpipelined processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant