CN110515656B

CN110515656B - CASP instruction execution method, microprocessor and computer equipment

Info

Publication number: CN110515656B
Application number: CN201910803055.9A
Authority: CN
Inventors: 郑重; 孙彩霞; 王永文; 黄立波; 隋兵才; 倪晓强; 王俊辉; 雷国庆; 郭维; 郭辉
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2021-07-16
Anticipated expiration: 2039-08-28
Also published as: CN110515656A

Abstract

The invention discloses a method for executing a CASP instruction, a microprocessor and computer equipment, wherein the CASP instruction is fetched from an instruction buffer; splitting the CASP into two micro-operations, each micro-operation having a destination operand, a first micro-operation having three source operands, and a second micro-operation having two source operands; decode, operand rename, dispatch and execute in micro-operations. The first micro-operation execution compares the carried operand with the storage area value, if the operand is not equal to the storage area value, the CASP instruction execution is ended, and the result is written back; otherwise, executing the second micro-operation to obtain the write authority of the data, comparing and writing the storage area, and writing back the result. The invention can reduce the number of the source register and the destination register channels of the instruction execution path, reduce the data storage width of the CASP instruction execution path, and accelerate the execution of the instruction.

Description

CASP instruction execution method, microprocessor and computer equipment

Technical Field

The invention relates to the technical field of microprocessor design, in particular to an execution method of a CASP instruction (compare-modify-logarithm atomic instruction), a microprocessor and computer equipment.

Background

In the reduced instruction set computer instruction set, the vast majority of instructions are 3 registers (two source registers, one destination register, or three source registers). Some architected instruction sets provide a compare-swap one-log atomic instruction (CASP instruction) which is denoted by the mnemonics CASP Rs1, Rs2, Rt1, Rt2, Rn, # offset. The instruction has 5 source operands and 2 destination registers. The function of this instruction is to tile the Rs1 and Rs2 register values into a compare value, the Rt1 and Rt2 register values into new values, read the two register width values from the address [ Rn + offset ] as return values, store the new values at [ Rn + offset ] if the compare and return values are the same, otherwise do not perform a data store operation, and finally write the return values into the Rs1 and Rs2 registers. The instruction may implement an atomic operation that is twice as wide as the register width data, and may facilitate exclusive access to certain references to shared data that exceeds the register width.

As shown in fig. 1, since the general instructions have only 3 source operands and 1 destination operand, the following problems occur when implementing the CASP instruction: 1) the instruction propagation path must be widened to accommodate 5 source operands and 2 destination operands; 2) the data storage width must be widened because the CASP instruction needs to carry data of 4 register widths.

The chinese patent application No. 201810718968.6 provides a method for implementing a logarithmic load instruction, which can solve the problem of storing 2 destination operands. However, the CASP instruction also has 5 source operands to be solved, and 5 operation data can not be merged after dispatching. In addition, the execution of atomic instructions requires that the write authority of data is obtained for operation, and the timing of the CASP for writing back the Rs1 and Rs2 registers is also long, which affects the program performance.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: the invention can reduce the number of source register and destination register channels of an instruction execution path, reduce the data storage width of the CASP instruction execution path, and accelerate the execution of the instruction.

In order to solve the technical problems, the invention adopts the technical scheme that:

a method for executing a CASP instruction, the implementation steps comprising:

1) fetching a CASP instruction, wherein the instruction format of the CASP instruction is as follows: CASP Rs1, Rs2, Rt1, Rt2, Rn, # offset; wherein Rs1, Rs2, Rt1, Rt2, Rn are 5 source operands and Rs1, Rs2 are also destination operands, and # offset is the address offset;

2) splitting the CASP instruction into a micro-operation CASP0 and a micro-operation CASP1 which are executed sequentially;

3) decoding, renaming operands, dispatching and executing by taking micro-operation as a unit;

4) performing a first micro-operation CASP0 to spell the Rs1 and Rs2 register values into a comparison value and read a two register width value from the address [ Rn + offset ] as a return value;

5) comparing the comparison value with the return value, and skipping to execute the step 8) if the comparison value is not equal to the return value; otherwise, skipping to execute the step 6);

6) executing a second micro-operation CASP1 to piece the register values of Rt1 and Rt2 into new values;

7) writing the new value as the final return value of the CASP instruction at address [ Rn + offset ];

8) the data at address [ Rn + offset ] is written back to the result bus.

Optionally, when the CASP instruction is split into the sequentially executed micro-operations CASP0 and CASP1 in step 2), the instruction format of the split micro-operation CASP0 is as follows:

CASP0 Rs1, Rs1, RS2, Rn #offset；

the instruction format of the micro-operation CASP1 is:

CASP1 Rs2, Rt1, Rt2；

wherein Rs1, Rs2, Rt1, Rt2, Rn are 5 source operands, Rs1, Rs2 are also destination operands, and # offset is the address offset.

Optionally, the detailed steps of step 3) include:

3.1) resolving operand information by taking micro-operation as a unit: for the first micro-operation CASP0, decoding and analyzing the micro-operation with three source registers Rs1, RS2 and Rn and a destination register Rs 1; for the second micro-operation CASP1, decoding and analyzing the micro-operation with two source registers Rt1 and Rt2 and a destination register Rs 2;

3.2) reading the mapping table, and allocating a new rename item for the destination register, so that the source register of the second micro-operation CASP1 does not depend on the destination register of the first micro-operation CASP0, and register renaming is completed by taking the micro-operation as a unit;

3.3) dispatching the micro-operation CASP0 and the micro-operation CASP1 to the execution units in sequence;

3.4) sequentially emitting the micro-operational CASP0 and the micro-operational CASP 1.

Optionally, when performing the micro-operation CASP0 in step 4), the values of the source registers Rs1, Rs2, Rn are derived from register reads or from bypassing of the processor core result bus.

Optionally, the value of the source registers Rt1 and Rt2 when the micro-operation CASP1 is executed in step 6) is derived from a read of the registers or from a bypass of the processor core result bus.

Optionally, the detailed steps of step 7) include:

7.1) judging whether the state of the data of the target storage area corresponding to the current address [ Rn + offset ] is in a writable state or not; jump execution 7.4) if the data area is in a writable state; otherwise, skipping to execute the next step;

7.2) acquiring the write permission of the data of the target storage area corresponding to the current address [ Rn + offset ], and if the acquisition is successful, skipping to execute the next step; otherwise, skipping to execute the step 7.2);

7.3) comparing the current data of the target storage area corresponding to the current address [ Rn + offset ] with the comparison value obtained by executing the micro-operation CASP0, and if the two are equal, jumping to execute the step 7.4); otherwise, skipping to execute the step 8);

7.4) overwrite the data of the target storage area with the new value resulting from the execution of the micro-operation CASP 1.

Optionally, the step 7.2) of obtaining the write permission of the data in the target storage area corresponding to the current address [ Rn + offset ] specifically refers to sending a corresponding storage consistency request to the next-level storage, and waiting for the next-level storage to return the write permission of the data.

Optionally, the step 8) of writing back the data at the address [ Rn + offset ] to the result bus specifically means that the processor having multiple paths of write registers writes the data at the address [ Rn + offset ] out of the two destination registers Rs1, Rs2 at a time through two paths of write registers or writes the data at the address [ Rn + offset ] out of the two destination registers Rs1, Rs2 in two cycles.

The present invention also provides a microprocessor programmed to perform the steps of the method for executing the CASP instruction of the present invention.

The invention also provides a computer device having a microprocessor programmed to perform the steps of the method of execution of the aforementioned CASP instructions of the invention.

Compared with the prior art, the invention has the following advantages:

1. the invention can reduce the number of the source register and the destination register channels. The CASP instruction has 5 source operands, 2 destination operands. The invention reduces the execution path of the CASP instruction into 3 common source operands and 1 common destination operand. After being taken out, the CASP instruction is firstly split into two micro-operations, each micro-operation is provided with a destination register, and then the micro-operations are carried out by taking the micro-operations as the granularity during decoding, renaming and allocating, so that only three source registers and one destination register channel need to be arranged on the whole path.

2. The width of the stored data is reduced. In the instruction execution path, the CASP instruction requires a storage space of 4 register widths for storing data, while in the whole instruction execution path, the split micro-operation requires a data storage space of 2 register widths. The storage space of an instruction execution path is reduced by 50% in terms of data storage.

3. Execution of CASP instructions is expedited. The CASP instruction, at the time of the first micro-operation instruction, does not need to acquire the writable state of the data if the data is already in the processor core. If the judgment results are not equal, the data can be directly written back to the register, and the execution of the whole CASP instruction is finished. The time for acquiring the data writing state is saved, and the data writing state acquisition generally needs to access the next-level storage, so that a long time is consumed.

Drawings

FIG. 1 is a diagram illustrating the number of CASP instructions and general instruction registers.

Fig. 2 is a schematic flow chart of a basic implementation of the embodiment of the invention.

Detailed Description

As shown in fig. 2, the implementation steps of the method for executing the CASP instruction in this embodiment include:

8) the data at address [ Rn + offset ] is written back to the result bus.

In this embodiment, the fetching of the CASP instruction in step 1) specifically refers to fetching the CASP instruction from the instruction Cache.

In this embodiment, when the CASP instruction is split into the sequentially executed micro-operation CASP0 and the micro-operation CASP1 in step 2), the instruction format of the split micro-operation CASP0 is as follows:

CASP0 Rs1, Rs1, RS2, Rn #offset；

the instruction format of the micro-operation CASP1 is:

CASP1 Rs2, Rt1, Rt2；

By splitting the CASP instruction into two micro-operations, each with a destination register. After the LDP instruction is split, decoding is carried out by taking micro-operation as a unit, the number of destination registers of each micro-operation does not exceed 1, and therefore only one destination register channel needs to be set. After splitting, the micro-operation CASP0 instruction carries three operands, Rs1, Rs2, and Rn, and a destination operand, Rs 1. The micro-operation CASP0 carries the values that the CASP instruction needs to compare. The micro-operation CASP1 instruction carries the Rt1, Rt2 operands, and the destination operand Rs 2. That is to say the micro-operation CASP1 carries the value that needs to be stored.

As shown in fig. 2, the detailed steps of step 3) include:

3.3) dispatching the micro-operation CASP0 and the micro-operation CASP1 to the execution units in sequence; even in an out-of-order issue processor, the dispatch of instructions is sequential, so it is consistent with a general-purpose processor implementation here;

Micro-operation CASP0 and micro-operation CASP1 need to be executed in order, with micro-operation CASP0 leading and micro-operation CASP1 trailing. There are two reasons why sequential transmission is required here: (I) atomic instructions have a writing property, and are generally implemented by processors according to a sequence, and the instructions can be stored only after being submitted; (II) since the micro-operation CASP0 has data to compare against, CASP1 micro-operation has data to store, and CASP command needs to compare data before deciding whether to write to a storage area, it is also meaningless that CASP1 is executed before CASP 0.

The first micro-operation CASP0 executes, reading the source operands carried by CASP0, i.e., the values in the Rs1 and Rs2 registers, concatenated to a compare value twice the width of the registers. And reading the value of the Rn register, and calculating the address of the storage area needing to be operated. In this embodiment, when the micro-operation CASP0 is executed in step 4), the values of the source registers Rs1, Rs2, Rn are derived from register reads or bypasses from the processor core result bus.

The second micro-operation CASP1, when executed, tiles the Rt1 and Rt2 register values into new values. And reading the value of the Rn register, and calculating the address of the storage area needing to be operated. In this embodiment, the value of the source registers Rt1 and Rt2 results from a read of the registers or bypass from the processor core result bus when the micro-operation CASP1 is executed in step 6).

As shown in fig. 2, the detailed steps of step 7) include:

7. The method of claim 6, wherein the step 7.2) of obtaining the write permission of the data in the target storage area corresponding to the current address [ Rn + offset ] is to send a corresponding storage coherency request to the next-level storage, and wait for the next-level storage to return the write permission of the data.

In this embodiment, the step 8) of writing back the data at the address [ Rn + offset ] to the result bus specifically means that the processor having multiple paths for writing registers writes the data at the address [ Rn + offset ] to two destination registers Rs1, Rs2 at a time through two paths for writing registers or writes the data at the address [ Rn + offset ] to two destination registers Rs1, Rs2 in two cycles for the processor having no multiple paths for writing registers.

In addition, the present embodiment further provides a microprocessor, which is programmed to execute the steps of the method for executing the CASP instruction according to the present embodiment, and the microprocessor supports the instruction set with the CASP instruction, so that the microprocessor can be programmed to execute the steps of the method for executing the CASP instruction according to the present embodiment. The present embodiment also provides a computer device, the microprocessor of which is programmed to execute the steps of the method for executing the CASP instruction of the present embodiment, and the computer device can be a mainframe computer, a mini-computer, a personal computer, an industrial computer, a mobile computing device, etc.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A method of executing a CASP instruction, the method comprising the steps of:

8) the data at address [ Rn + offset ] is written back to the result bus.

2. The method of claim 1, wherein when the CASP instruction is split into the sequentially executed micro-operations CASP0 and CASP1 in step 2), the split micro-operation CASP0 has the following instruction format:

CASP0 Rs1, Rs1, RS2, Rn #offset；

the instruction format of the micro-operation CASP1 is:

CASP1 Rs2, Rt1, Rt2；

3. The method of claim 2, wherein the detailed step of step 3) comprises:

4. The method of claim 2, wherein the value of the source registers Rs1, Rs2, Rn is derived from register read or bypass from the processor core result bus when the micro-operation CASP0 is executed in step 4).

5. The method of claim 2, wherein the value of the source registers Rt1 and Rt2 is derived from register read or bypass from the processor core result bus when executing the micro-operation CASP1 in step 6).

6. The method of claim 2, wherein the detailed step of step 7) comprises:

7.1) judging whether the state of the data of the target storage area corresponding to the current address [ Rn + offset ] is in a writable state or not; jump execution 7.4 if in writable state); otherwise, skipping to execute the next step;

8. The method of any one of claims 1 to 7, wherein the writing back of the data at the address [ Rn + offset ] to the result bus in step 8) is performed by writing the data at the address [ Rn + offset ] to two destination registers Rs1, Rs2 at a time through two lanes of write registers for a processor with multiple lanes of write registers or writing the data at the address [ Rn + offset ] to two destination registers Rs1, Rs2 in two cycles for a processor without multiple lanes of write registers.

9. A microprocessor programmed to perform the steps of the method of executing the CASP instruction of any one of claims 1 to 8.

10. A computer device, characterized in that a microprocessor of the computer device is programmed to execute the steps of the execution method of CASP instructions according to any one of claims 1 to 8.