CN110515656B - CASP instruction execution method, microprocessor and computer equipment - Google Patents

CASP instruction execution method, microprocessor and computer equipment Download PDF

Info

Publication number
CN110515656B
CN110515656B CN201910803055.9A CN201910803055A CN110515656B CN 110515656 B CN110515656 B CN 110515656B CN 201910803055 A CN201910803055 A CN 201910803055A CN 110515656 B CN110515656 B CN 110515656B
Authority
CN
China
Prior art keywords
micro
casp
offset
instruction
casp0
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910803055.9A
Other languages
Chinese (zh)
Other versions
CN110515656A (en
Inventor
郑重
孙彩霞
王永文
黄立波
隋兵才
倪晓强
王俊辉
雷国庆
郭维
郭辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201910803055.9A priority Critical patent/CN110515656B/en
Publication of CN110515656A publication Critical patent/CN110515656A/en
Application granted granted Critical
Publication of CN110515656B publication Critical patent/CN110515656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/22Microcontrol or microprogram arrangements
    • G06F9/26Address formation of the next micro-instruction ; Microprogram storage or retrieval arrangements
    • G06F9/262Arrangements for next microinstruction selection
    • G06F9/264Microinstruction selection based on results of processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/22Microcontrol or microprogram arrangements
    • G06F9/28Enhancement of operational speed, e.g. by using several microcontrol devices operating in parallel

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a method for executing a CASP instruction, a microprocessor and computer equipment, wherein the CASP instruction is fetched from an instruction buffer; splitting the CASP into two micro-operations, each micro-operation having a destination operand, a first micro-operation having three source operands, and a second micro-operation having two source operands; decode, operand rename, dispatch and execute in micro-operations. The first micro-operation execution compares the carried operand with the storage area value, if the operand is not equal to the storage area value, the CASP instruction execution is ended, and the result is written back; otherwise, executing the second micro-operation to obtain the write authority of the data, comparing and writing the storage area, and writing back the result. The invention can reduce the number of the source register and the destination register channels of the instruction execution path, reduce the data storage width of the CASP instruction execution path, and accelerate the execution of the instruction.

Description

CASP instruction execution method, microprocessor and computer equipment
Technical Field
The invention relates to the technical field of microprocessor design, in particular to an execution method of a CASP instruction (compare-modify-logarithm atomic instruction), a microprocessor and computer equipment.
Background
In the reduced instruction set computer instruction set, the vast majority of instructions are 3 registers (two source registers, one destination register, or three source registers). Some architected instruction sets provide a compare-swap one-log atomic instruction (CASP instruction) which is denoted by the mnemonics CASP Rs1, Rs2, Rt1, Rt2, Rn, # offset. The instruction has 5 source operands and 2 destination registers. The function of this instruction is to tile the Rs1 and Rs2 register values into a compare value, the Rt1 and Rt2 register values into new values, read the two register width values from the address [ Rn + offset ] as return values, store the new values at [ Rn + offset ] if the compare and return values are the same, otherwise do not perform a data store operation, and finally write the return values into the Rs1 and Rs2 registers. The instruction may implement an atomic operation that is twice as wide as the register width data, and may facilitate exclusive access to certain references to shared data that exceeds the register width.
As shown in fig. 1, since the general instructions have only 3 source operands and 1 destination operand, the following problems occur when implementing the CASP instruction: 1) the instruction propagation path must be widened to accommodate 5 source operands and 2 destination operands; 2) the data storage width must be widened because the CASP instruction needs to carry data of 4 register widths.
The chinese patent application No. 201810718968.6 provides a method for implementing a logarithmic load instruction, which can solve the problem of storing 2 destination operands. However, the CASP instruction also has 5 source operands to be solved, and 5 operation data can not be merged after dispatching. In addition, the execution of atomic instructions requires that the write authority of data is obtained for operation, and the timing of the CASP for writing back the Rs1 and Rs2 registers is also long, which affects the program performance.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the invention can reduce the number of source register and destination register channels of an instruction execution path, reduce the data storage width of the CASP instruction execution path, and accelerate the execution of the instruction.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for executing a CASP instruction, the implementation steps comprising:
1) fetching a CASP instruction, wherein the instruction format of the CASP instruction is as follows: CASP Rs1, Rs2, Rt1, Rt2, Rn, # offset; wherein Rs1, Rs2, Rt1, Rt2, Rn are 5 source operands and Rs1, Rs2 are also destination operands, and # offset is the address offset;
2) splitting the CASP instruction into a micro-operation CASP0 and a micro-operation CASP1 which are executed sequentially;
3) decoding, renaming operands, dispatching and executing by taking micro-operation as a unit;
4) performing a first micro-operation CASP0 to spell the Rs1 and Rs2 register values into a comparison value and read a two register width value from the address [ Rn + offset ] as a return value;
5) comparing the comparison value with the return value, and skipping to execute the step 8) if the comparison value is not equal to the return value; otherwise, skipping to execute the step 6);
6) executing a second micro-operation CASP1 to piece the register values of Rt1 and Rt2 into new values;
7) writing the new value as the final return value of the CASP instruction at address [ Rn + offset ];
8) the data at address [ Rn + offset ] is written back to the result bus.
Optionally, when the CASP instruction is split into the sequentially executed micro-operations CASP0 and CASP1 in step 2), the instruction format of the split micro-operation CASP0 is as follows:
CASP0 Rs1, Rs1, RS2, Rn #offset;
the instruction format of the micro-operation CASP1 is:
CASP1 Rs2, Rt1, Rt2;
wherein Rs1, Rs2, Rt1, Rt2, Rn are 5 source operands, Rs1, Rs2 are also destination operands, and # offset is the address offset.
Optionally, the detailed steps of step 3) include:
3.1) resolving operand information by taking micro-operation as a unit: for the first micro-operation CASP0, decoding and analyzing the micro-operation with three source registers Rs1, RS2 and Rn and a destination register Rs 1; for the second micro-operation CASP1, decoding and analyzing the micro-operation with two source registers Rt1 and Rt2 and a destination register Rs 2;
3.2) reading the mapping table, and allocating a new rename item for the destination register, so that the source register of the second micro-operation CASP1 does not depend on the destination register of the first micro-operation CASP0, and register renaming is completed by taking the micro-operation as a unit;
3.3) dispatching the micro-operation CASP0 and the micro-operation CASP1 to the execution units in sequence;
3.4) sequentially emitting the micro-operational CASP0 and the micro-operational CASP 1.
Optionally, when performing the micro-operation CASP0 in step 4), the values of the source registers Rs1, Rs2, Rn are derived from register reads or from bypassing of the processor core result bus.
Optionally, the value of the source registers Rt1 and Rt2 when the micro-operation CASP1 is executed in step 6) is derived from a read of the registers or from a bypass of the processor core result bus.
Optionally, the detailed steps of step 7) include:
7.1) judging whether the state of the data of the target storage area corresponding to the current address [ Rn + offset ] is in a writable state or not; jump execution 7.4) if the data area is in a writable state; otherwise, skipping to execute the next step;
7.2) acquiring the write permission of the data of the target storage area corresponding to the current address [ Rn + offset ], and if the acquisition is successful, skipping to execute the next step; otherwise, skipping to execute the step 7.2);
7.3) comparing the current data of the target storage area corresponding to the current address [ Rn + offset ] with the comparison value obtained by executing the micro-operation CASP0, and if the two are equal, jumping to execute the step 7.4); otherwise, skipping to execute the step 8);
7.4) overwrite the data of the target storage area with the new value resulting from the execution of the micro-operation CASP 1.
Optionally, the step 7.2) of obtaining the write permission of the data in the target storage area corresponding to the current address [ Rn + offset ] specifically refers to sending a corresponding storage consistency request to the next-level storage, and waiting for the next-level storage to return the write permission of the data.
Optionally, the step 8) of writing back the data at the address [ Rn + offset ] to the result bus specifically means that the processor having multiple paths of write registers writes the data at the address [ Rn + offset ] out of the two destination registers Rs1, Rs2 at a time through two paths of write registers or writes the data at the address [ Rn + offset ] out of the two destination registers Rs1, Rs2 in two cycles.
The present invention also provides a microprocessor programmed to perform the steps of the method for executing the CASP instruction of the present invention.
The invention also provides a computer device having a microprocessor programmed to perform the steps of the method of execution of the aforementioned CASP instructions of the invention.
Compared with the prior art, the invention has the following advantages:
1. the invention can reduce the number of the source register and the destination register channels. The CASP instruction has 5 source operands, 2 destination operands. The invention reduces the execution path of the CASP instruction into 3 common source operands and 1 common destination operand. After being taken out, the CASP instruction is firstly split into two micro-operations, each micro-operation is provided with a destination register, and then the micro-operations are carried out by taking the micro-operations as the granularity during decoding, renaming and allocating, so that only three source registers and one destination register channel need to be arranged on the whole path.
2. The width of the stored data is reduced. In the instruction execution path, the CASP instruction requires a storage space of 4 register widths for storing data, while in the whole instruction execution path, the split micro-operation requires a data storage space of 2 register widths. The storage space of an instruction execution path is reduced by 50% in terms of data storage.
3. Execution of CASP instructions is expedited. The CASP instruction, at the time of the first micro-operation instruction, does not need to acquire the writable state of the data if the data is already in the processor core. If the judgment results are not equal, the data can be directly written back to the register, and the execution of the whole CASP instruction is finished. The time for acquiring the data writing state is saved, and the data writing state acquisition generally needs to access the next-level storage, so that a long time is consumed.
Drawings
FIG. 1 is a diagram illustrating the number of CASP instructions and general instruction registers.
Fig. 2 is a schematic flow chart of a basic implementation of the embodiment of the invention.
Detailed Description
As shown in fig. 2, the implementation steps of the method for executing the CASP instruction in this embodiment include:
1) fetching a CASP instruction, wherein the instruction format of the CASP instruction is as follows: CASP Rs1, Rs2, Rt1, Rt2, Rn, # offset; wherein Rs1, Rs2, Rt1, Rt2, Rn are 5 source operands and Rs1, Rs2 are also destination operands, and # offset is the address offset;
2) splitting the CASP instruction into a micro-operation CASP0 and a micro-operation CASP1 which are executed sequentially;
3) decoding, renaming operands, dispatching and executing by taking micro-operation as a unit;
4) performing a first micro-operation CASP0 to spell the Rs1 and Rs2 register values into a comparison value and read a two register width value from the address [ Rn + offset ] as a return value;
5) comparing the comparison value with the return value, and skipping to execute the step 8) if the comparison value is not equal to the return value; otherwise, skipping to execute the step 6);
6) executing a second micro-operation CASP1 to piece the register values of Rt1 and Rt2 into new values;
7) writing the new value as the final return value of the CASP instruction at address [ Rn + offset ];
8) the data at address [ Rn + offset ] is written back to the result bus.
In this embodiment, the fetching of the CASP instruction in step 1) specifically refers to fetching the CASP instruction from the instruction Cache.
In this embodiment, when the CASP instruction is split into the sequentially executed micro-operation CASP0 and the micro-operation CASP1 in step 2), the instruction format of the split micro-operation CASP0 is as follows:
CASP0 Rs1, Rs1, RS2, Rn #offset;
the instruction format of the micro-operation CASP1 is:
CASP1 Rs2, Rt1, Rt2;
wherein Rs1, Rs2, Rt1, Rt2, Rn are 5 source operands, Rs1, Rs2 are also destination operands, and # offset is the address offset.
By splitting the CASP instruction into two micro-operations, each with a destination register. After the LDP instruction is split, decoding is carried out by taking micro-operation as a unit, the number of destination registers of each micro-operation does not exceed 1, and therefore only one destination register channel needs to be set. After splitting, the micro-operation CASP0 instruction carries three operands, Rs1, Rs2, and Rn, and a destination operand, Rs 1. The micro-operation CASP0 carries the values that the CASP instruction needs to compare. The micro-operation CASP1 instruction carries the Rt1, Rt2 operands, and the destination operand Rs 2. That is to say the micro-operation CASP1 carries the value that needs to be stored.
As shown in fig. 2, the detailed steps of step 3) include:
3.1) resolving operand information by taking micro-operation as a unit: for the first micro-operation CASP0, decoding and analyzing the micro-operation with three source registers Rs1, RS2 and Rn and a destination register Rs 1; for the second micro-operation CASP1, decoding and analyzing the micro-operation with two source registers Rt1 and Rt2 and a destination register Rs 2;
3.2) reading the mapping table, and allocating a new rename item for the destination register, so that the source register of the second micro-operation CASP1 does not depend on the destination register of the first micro-operation CASP0, and register renaming is completed by taking the micro-operation as a unit;
3.3) dispatching the micro-operation CASP0 and the micro-operation CASP1 to the execution units in sequence; even in an out-of-order issue processor, the dispatch of instructions is sequential, so it is consistent with a general-purpose processor implementation here;
3.4) sequentially emitting the micro-operational CASP0 and the micro-operational CASP 1.
Micro-operation CASP0 and micro-operation CASP1 need to be executed in order, with micro-operation CASP0 leading and micro-operation CASP1 trailing. There are two reasons why sequential transmission is required here: (I) atomic instructions have a writing property, and are generally implemented by processors according to a sequence, and the instructions can be stored only after being submitted; (II) since the micro-operation CASP0 has data to compare against, CASP1 micro-operation has data to store, and CASP command needs to compare data before deciding whether to write to a storage area, it is also meaningless that CASP1 is executed before CASP 0.
The first micro-operation CASP0 executes, reading the source operands carried by CASP0, i.e., the values in the Rs1 and Rs2 registers, concatenated to a compare value twice the width of the registers. And reading the value of the Rn register, and calculating the address of the storage area needing to be operated. In this embodiment, when the micro-operation CASP0 is executed in step 4), the values of the source registers Rs1, Rs2, Rn are derived from register reads or bypasses from the processor core result bus.
The second micro-operation CASP1, when executed, tiles the Rt1 and Rt2 register values into new values. And reading the value of the Rn register, and calculating the address of the storage area needing to be operated. In this embodiment, the value of the source registers Rt1 and Rt2 results from a read of the registers or bypass from the processor core result bus when the micro-operation CASP1 is executed in step 6).
As shown in fig. 2, the detailed steps of step 7) include:
7.1) judging whether the state of the data of the target storage area corresponding to the current address [ Rn + offset ] is in a writable state or not; jump execution 7.4) if the data area is in a writable state; otherwise, skipping to execute the next step;
7.2) acquiring the write permission of the data of the target storage area corresponding to the current address [ Rn + offset ], and if the acquisition is successful, skipping to execute the next step; otherwise, skipping to execute the step 7.2);
7.3) comparing the current data of the target storage area corresponding to the current address [ Rn + offset ] with the comparison value obtained by executing the micro-operation CASP0, and if the two are equal, jumping to execute the step 7.4); otherwise, skipping to execute the step 8);
7.4) overwrite the data of the target storage area with the new value resulting from the execution of the micro-operation CASP 1.
7. The method of claim 6, wherein the step 7.2) of obtaining the write permission of the data in the target storage area corresponding to the current address [ Rn + offset ] is to send a corresponding storage coherency request to the next-level storage, and wait for the next-level storage to return the write permission of the data.
In this embodiment, the step 8) of writing back the data at the address [ Rn + offset ] to the result bus specifically means that the processor having multiple paths for writing registers writes the data at the address [ Rn + offset ] to two destination registers Rs1, Rs2 at a time through two paths for writing registers or writes the data at the address [ Rn + offset ] to two destination registers Rs1, Rs2 in two cycles for the processor having no multiple paths for writing registers.
In addition, the present embodiment further provides a microprocessor, which is programmed to execute the steps of the method for executing the CASP instruction according to the present embodiment, and the microprocessor supports the instruction set with the CASP instruction, so that the microprocessor can be programmed to execute the steps of the method for executing the CASP instruction according to the present embodiment. The present embodiment also provides a computer device, the microprocessor of which is programmed to execute the steps of the method for executing the CASP instruction of the present embodiment, and the computer device can be a mainframe computer, a mini-computer, a personal computer, an industrial computer, a mobile computing device, etc.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A method of executing a CASP instruction, the method comprising the steps of:
1) fetching a CASP instruction, wherein the instruction format of the CASP instruction is as follows: CASP Rs1, Rs2, Rt1, Rt2, Rn, # offset; wherein Rs1, Rs2, Rt1, Rt2, Rn are 5 source operands and Rs1, Rs2 are also destination operands, and # offset is the address offset;
2) splitting the CASP instruction into a micro-operation CASP0 and a micro-operation CASP1 which are executed sequentially;
3) decoding, renaming operands, dispatching and executing by taking micro-operation as a unit;
4) performing a first micro-operation CASP0 to spell the Rs1 and Rs2 register values into a comparison value and read a two register width value from the address [ Rn + offset ] as a return value;
5) comparing the comparison value with the return value, and skipping to execute the step 8) if the comparison value is not equal to the return value; otherwise, skipping to execute the step 6);
6) executing a second micro-operation CASP1 to piece the register values of Rt1 and Rt2 into new values;
7) writing the new value as the final return value of the CASP instruction at address [ Rn + offset ];
8) the data at address [ Rn + offset ] is written back to the result bus.
2. The method of claim 1, wherein when the CASP instruction is split into the sequentially executed micro-operations CASP0 and CASP1 in step 2), the split micro-operation CASP0 has the following instruction format:
CASP0 Rs1, Rs1, RS2, Rn #offset;
the instruction format of the micro-operation CASP1 is:
CASP1 Rs2, Rt1, Rt2;
wherein Rs1, Rs2, Rt1, Rt2, Rn are 5 source operands, Rs1, Rs2 are also destination operands, and # offset is the address offset.
3. The method of claim 2, wherein the detailed step of step 3) comprises:
3.1) resolving operand information by taking micro-operation as a unit: for the first micro-operation CASP0, decoding and analyzing the micro-operation with three source registers Rs1, RS2 and Rn and a destination register Rs 1; for the second micro-operation CASP1, decoding and analyzing the micro-operation with two source registers Rt1 and Rt2 and a destination register Rs 2;
3.2) reading the mapping table, and allocating a new rename item for the destination register, so that the source register of the second micro-operation CASP1 does not depend on the destination register of the first micro-operation CASP0, and register renaming is completed by taking the micro-operation as a unit;
3.3) dispatching the micro-operation CASP0 and the micro-operation CASP1 to the execution units in sequence;
3.4) sequentially emitting the micro-operational CASP0 and the micro-operational CASP 1.
4. The method of claim 2, wherein the value of the source registers Rs1, Rs2, Rn is derived from register read or bypass from the processor core result bus when the micro-operation CASP0 is executed in step 4).
5. The method of claim 2, wherein the value of the source registers Rt1 and Rt2 is derived from register read or bypass from the processor core result bus when executing the micro-operation CASP1 in step 6).
6. The method of claim 2, wherein the detailed step of step 7) comprises:
7.1) judging whether the state of the data of the target storage area corresponding to the current address [ Rn + offset ] is in a writable state or not; jump execution 7.4 if in writable state); otherwise, skipping to execute the next step;
7.2) acquiring the write permission of the data of the target storage area corresponding to the current address [ Rn + offset ], and if the acquisition is successful, skipping to execute the next step; otherwise, skipping to execute the step 7.2);
7.3) comparing the current data of the target storage area corresponding to the current address [ Rn + offset ] with the comparison value obtained by executing the micro-operation CASP0, and if the two are equal, jumping to execute the step 7.4); otherwise, skipping to execute the step 8);
7.4) overwrite the data of the target storage area with the new value resulting from the execution of the micro-operation CASP 1.
7. The method of claim 6, wherein the step 7.2) of obtaining the write permission of the data in the target storage area corresponding to the current address [ Rn + offset ] is to send a corresponding storage coherency request to the next-level storage, and wait for the next-level storage to return the write permission of the data.
8. The method of any one of claims 1 to 7, wherein the writing back of the data at the address [ Rn + offset ] to the result bus in step 8) is performed by writing the data at the address [ Rn + offset ] to two destination registers Rs1, Rs2 at a time through two lanes of write registers for a processor with multiple lanes of write registers or writing the data at the address [ Rn + offset ] to two destination registers Rs1, Rs2 in two cycles for a processor without multiple lanes of write registers.
9. A microprocessor programmed to perform the steps of the method of executing the CASP instruction of any one of claims 1 to 8.
10. A computer device, characterized in that a microprocessor of the computer device is programmed to execute the steps of the execution method of CASP instructions according to any one of claims 1 to 8.
CN201910803055.9A 2019-08-28 2019-08-28 CASP instruction execution method, microprocessor and computer equipment Active CN110515656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910803055.9A CN110515656B (en) 2019-08-28 2019-08-28 CASP instruction execution method, microprocessor and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910803055.9A CN110515656B (en) 2019-08-28 2019-08-28 CASP instruction execution method, microprocessor and computer equipment

Publications (2)

Publication Number Publication Date
CN110515656A CN110515656A (en) 2019-11-29
CN110515656B true CN110515656B (en) 2021-07-16

Family

ID=68628496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910803055.9A Active CN110515656B (en) 2019-08-28 2019-08-28 CASP instruction execution method, microprocessor and computer equipment

Country Status (1)

Country Link
CN (1) CN110515656B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112181495B (en) * 2020-09-28 2022-10-18 中国人民解放军国防科技大学 Method and device for realizing operand instruction of predicate register
CN114675890B (en) * 2022-05-26 2022-09-23 飞腾信息技术有限公司 Instruction execution method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096609A (en) * 2009-12-10 2011-06-15 英特尔公司 Instruction-set architecture for programmable cyclic redundancy check (CRC) computations
CN102103482A (en) * 2009-12-18 2011-06-22 英特尔公司 Adaptive optimized compare-exchange operation
CN108845830A (en) * 2018-07-03 2018-11-20 中国人民解放军国防科技大学 Execution method of one-to-one loading instruction
CN109062604A (en) * 2018-06-26 2018-12-21 天津飞腾信息技术有限公司 A kind of launching technique and device towards the mixing execution of scalar sum vector instruction
DE102018132521A1 (en) * 2017-12-29 2019-07-04 Intel Corporation DEVICE AND METHOD FOR LOADING AND REDUCING LOOPS IN A SINGLE INSTRUCTION, MULTIPLE DATA (SIMD) PIPELINE

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9152382B2 (en) * 2012-10-31 2015-10-06 Intel Corporation Reducing power consumption in a fused multiply-add (FMA) unit responsive to input data values
CN108845829B (en) * 2018-07-03 2021-06-25 中国人民解放军国防科技大学 Method for executing system register access instruction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096609A (en) * 2009-12-10 2011-06-15 英特尔公司 Instruction-set architecture for programmable cyclic redundancy check (CRC) computations
CN102103482A (en) * 2009-12-18 2011-06-22 英特尔公司 Adaptive optimized compare-exchange operation
DE102018132521A1 (en) * 2017-12-29 2019-07-04 Intel Corporation DEVICE AND METHOD FOR LOADING AND REDUCING LOOPS IN A SINGLE INSTRUCTION, MULTIPLE DATA (SIMD) PIPELINE
CN109062604A (en) * 2018-06-26 2018-12-21 天津飞腾信息技术有限公司 A kind of launching technique and device towards the mixing execution of scalar sum vector instruction
CN108845830A (en) * 2018-07-03 2018-11-20 中国人民解放军国防科技大学 Execution method of one-to-one loading instruction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Operand-Load-Based Split Pipeline Architecture for High Clock Rate and Commensurable IPC;Rama Sangireddy;《 IEEE Transactions on Parallel and Distributed Systems》;20080430;第19卷(第4期);529-544 *
Pentium指令集微操作设计;郭鹏;《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》;20060715(第07期);I137-18 *

Also Published As

Publication number Publication date
CN110515656A (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN108845830B (en) Execution method of one-to-one loading instruction
US7228402B2 (en) Predicate register file write by an instruction with a pending instruction having data dependency
US7428631B2 (en) Apparatus and method using different size rename registers for partial-bit and bulk-bit writes
JP2012043443A (en) Continuel flow processor pipeline
KR102524565B1 (en) Store and load tracking by bypassing load store units
US20160011876A1 (en) Managing instruction order in a processor pipeline
US20120173848A1 (en) Pipeline flush for processor that may execute instructions out of order
EP1768020A2 (en) Arithmetic operation apparatus, information processing apparatus and register file control method
CN110515656B (en) CASP instruction execution method, microprocessor and computer equipment
US6266763B1 (en) Physical rename register for efficiently storing floating point, integer, condition code, and multimedia values
JP2001209535A (en) Command scheduling device for processors
US20160011877A1 (en) Managing instruction order in a processor pipeline
GB2540940A (en) An apparatus and method for transferring a plurality of data structures between memory and one or more vectors of data elements stored in a register bank
US8127114B2 (en) System and method for executing instructions prior to an execution stage in a processor
US10474469B2 (en) Apparatus and method for determining a recovery point from which to resume instruction execution following handling of an unexpected change in instruction flow
EP2972791B1 (en) Method and apparatus for forwarding literal generated data to dependent instructions more efficiently using a constant cache
CN110928577A (en) Execution method of vector storage instruction with exception return
EP3260978A1 (en) System and method of merging partial write result during retire phase
JP2022549493A (en) Compressing the Retirement Queue
JP3816845B2 (en) Processor and instruction control method
US7191315B2 (en) Method and system for tracking and recycling physical register assignment
US6959377B2 (en) Method and system for managing registers
US20100100709A1 (en) Instruction control apparatus and instruction control method
US5850563A (en) Processor and method for out-of-order completion of floating-point operations during load/store multiple operations
CN101706715B (en) Device and method for scheduling instruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant