CN116954722B - Method for transferring data between registers - Google Patents

Method for transferring data between registers Download PDF

Info

Publication number
CN116954722B
CN116954722B CN202311224785.6A CN202311224785A CN116954722B CN 116954722 B CN116954722 B CN 116954722B CN 202311224785 A CN202311224785 A CN 202311224785A CN 116954722 B CN116954722 B CN 116954722B
Authority
CN
China
Prior art keywords
register
instruction
renaming
source
pass
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311224785.6A
Other languages
Chinese (zh)
Other versions
CN116954722A (en
Inventor
张立新
黄静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shudu Information Technology Co ltd
Original Assignee
Beijing Shudu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shudu Information Technology Co ltd filed Critical Beijing Shudu Information Technology Co ltd
Priority to CN202311224785.6A priority Critical patent/CN116954722B/en
Publication of CN116954722A publication Critical patent/CN116954722A/en
Application granted granted Critical
Publication of CN116954722B publication Critical patent/CN116954722B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • G06F9/384Register renaming

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention provides a method for transferring data among registers, which divides a data transfer instruction among registers into two types of data copying and data transfer, defines data copying as MOV instructions and defines data transfer as PASS instructions, wherein the PASS instructions enable a source register not to hold original data after moving the source register data to a destination register, and the source register is in an idle state. The invention provides a novel PASS instruction, and also provides two specific PASS instruction implementation modes by combining a register renaming technology, and provides a solution to how to process the PASS instruction when no register renames, after a source register transfers data to a destination register, the source register can not hold the value any more, and the source register is in an idle state so as to reduce the problem of invalid occupation of hardware resources.

Description

Method for transferring data between registers
Technical Field
The present invention relates to a register, and more particularly, to a method for transferring data between registers.
Background
1. Inter-register data transfer instruction
The mainstream architecture supports instructions that transfer data between registers. Depending on the type of source and destination registers, such instructions can be broadly divided into the following categories (not limited):
1. the general purpose register contents are transferred to another general purpose register. For example, x86 and ARM pass the contents of the source general purpose register (ebx, x 2) to the destination general purpose register (eax, x 1) with "moveax, ebx" and "movx 1, x2", respectively.
2. The general purpose register contents are passed to the system registers. For example, x86 and arm pass the contents of the source general register (ebx, x 2) to the system register (cr 0, sp_el0) with "movcr0, ebx" and "msr sp_el0, x2", respectively. mrs is the arm's "move to system register" instruction.
3. The system register contents are transferred to the general purpose registers. For example, x86 and arm pass the contents of the system registers (cr 0, sp_el0) to the source general purpose registers (eax, x 1) with "moveax, cr0" and "msr x1, sp_el0", respectively. msr is the arm's "move from system register" instruction.
4. The general purpose register contents are passed to the floating point registers. For example, arm uses "fmov d1, x2" to transfer the contents of the source general purpose register (x 2) to the floating point register (d 1).
5. The floating point register contents are transferred to the general purpose registers. For example, arm uses "fmov x1, d 2" to transfer the contents of floating point register (d 2) to destination general purpose register (x 1).
6. The floating point register contents are transferred to the floating point registers. For example, arm uses "fmov d1, d 2" to transfer the contents of floating point register (d 2) to floating point register (d 1).
In a practical system there are more types and finer classifications of registers, e.g. vector registers for vector units, floating point control registers may be subdivided among floating point registers, etc. But conceptually there is no difference from the above classification.
The microarchitectural implementation of such instructions may have different schemes. For example, ARM in class 1) multiplexes the mov instruction between general registers with the same instruction encoding and implementation as the arithmetic orr instruction. "movx 1, x2" is equivalent to "orr x1,0, x2". In classifications 2 and 3, ARM designed special data paths and instruction encodings (msr and mrs instructions) for transfer instructions between general purpose registers and system registers.
2. Register renaming
In register renaming (Register Renaming), the principle is to dynamically map fewer programmer-visible architectural registers (Architecture Register, also called logical registers) to more programmer-invisible physical registers (Physical Register) during execution, thereby solving write-after-write (WAW) and write-after-read (WAR) problems in data conflicts (data hard).
For example the following code sequences:
1 mov r1, r 3// load data from r3 register to r1 register
2 add r1, 2// r1 register plus 2
3 mov r4, r 1// load data from r1 register to r4 register
4 mov r1, r 5// load data from r5 register to r1 register
The 5 add r1, 2// r1 register plus 2
6 mov r6, r 1// load data from r1 register to r6 register
Wherein 4-6 rows are independent of 1-3 rows. However, since the two parts refer to the same register r1, there is write-after-write and write-after-read correlation, preventing parallel execution. After the register renaming technology, the logical registers r1 in 1-3 rows and 4-6 rows can be dynamically mapped into the physical registers p1 and p2 in the execution process, so that the code segments in 1-3 rows and 4-6 rows can be executed simultaneously, and the instruction parallelism is improved.
1 mov p1, r3
2 add p1, 2
3 mov r4, p1
4 mov p2, r5
5 add p2, 2
6 mov r6, p2
In the current high-performance processor micro-architecture design, a renaming technology is used for general registers, part of the design also renames floating point registers, and system registers are not generally renamed.
In the prior art, there are two typical implementations of register renaming, respectively:
1. register renaming based on unified register files. The method centralizes physical registers and manages them uniformly, called unified register file. While providing a rename map providing a mapping from logical registers to physical registers.
2. Register renaming based on a reservation stack. The method does not have explicit physical registers, but rather implements a register renaming process by preserving source/destination data of the stack and preserving dependency of stack entries.
Existing inter-register data transfer instructions only provide the semantics of "copy". I.e. the source register value is transferred to the destination register, the source register is still implicitly occupied and will not be released until it is subsequently reassigned. The disadvantage of lack of "transfer" semantics is particularly pronounced under register renaming techniques. Because the source register cannot be in an invalid state, resources such as physical register entries, read-write ports, data paths and the like occupied by the source register cannot be effectively released, the possibility of pipeline suspension is increased, and the performance of a processor is reduced.
The prior art has the following defects:
the prior art has the following defects: the register data transfer instruction can not explicitly distinguish two kinds of semantics of data copying and data transferring at the instruction level, so that the space for optimizing software and hardware is limited;
the prior art has the following defects: in the context of embodying data transfer, the source logic register cannot be in an invalid state, so that the resource of the logic register is wasted;
the prior art has the following defects: in the context of embodying data transfer, the physical register memory entry is wasted and the processor performance is reduced due to the combination of a register renaming technology;
the defects of the prior art are four: in the context of embodying data transfer, in combination with a register renaming technique, content handling from a source physical register to a destination physical register needs to be performed, resulting in waste of read-write ports and degradation of processor performance;
drawbacks five of the prior art: in the context of embodying data transfers, in conjunction with register renaming techniques, data movement operations often need to go through a functional unit (typically a simple logical arithmetic unit), wasting the functional unit's opportunity to execute other instructions, resulting in reduced processor performance.
Disclosure of Invention
The invention provides a method for transferring data among registers, which solves the problems that under a set context, a source register can embody the semantics of data transfer, namely, the source register can not hold the value after transferring the data to a destination register, and the source register is in an idle state so as to reduce the invalid occupation of hardware resources, and the technical scheme is as follows:
a data transfer method between registers divides data transfer instructions between registers into two types of data copying and data transfer, defines data copying as MOV instructions and defines data transfer as PASS instructions, wherein the PASS instructions enable source registers to be in idle state without holding original data after source register data are moved to destination registers.
Furthermore, when implementing PASS instructions based on register renaming of the unified register file, valid fields are added in the register renaming mapping table to support the semantics of the source register becoming invalid after transferring contents, and for PASS X1, the flow of X2 instructions is as follows:
s1: for a source register X2, searching a renaming mapping table to obtain a corresponding physical register number Pa;
s2: updating the item corresponding to the target register X1 in the renaming mapping table by Pa, namely filling the item corresponding to the X1 in the renaming mapping table by Pa;
s3: updating an item corresponding to the instruction in the reorder buffer by Pa;
s4: updating a target physical register domain corresponding to the instruction in the reserved stack by Pa;
s5: setting the valid field of the renaming mapping table corresponding to the destination register X1 to be in a valid state;
s6: the valid field of the renaming mapping table item corresponding to the source register X2 is set to be in an invalid state from an effective state;
s7: immediately submitting the result to a reorder buffer directly;
s8: when the PASS instruction is at the tail of the reorder buffer, the entry corresponding to the instruction is submitted and deleted.
Further, when implementing PASS class instructions based on register renaming of the unified register file, the flow of the X2 instructions for PASS X1 is as follows:
s1: for a source register X2, searching a renaming mapping table to obtain a corresponding physical register number Pa;
s2: updating the item corresponding to the target register X1 in the renaming mapping table by Pa, namely filling the item corresponding to the X1 in the renaming mapping table by Pa;
s3: updating an item corresponding to the instruction in the reorder buffer by Pa;
s4: updating a target physical register domain corresponding to the instruction in the reserved stack by Pa;
s5: when Pa in the renaming map T+ field is a valid pointer to the register file, i.e., implicitly indicates that the entry is valid;
s6: replacing the register file index corresponding to the entry from Pa to Null space Null;
s7: immediately submitting the result to a reorder buffer directly;
s8: when the PASS instruction is at the tail of the reorder buffer, the entry corresponding to the instruction is submitted and deleted.
Further, when implementing PASS class instructions based on register renaming of the unified register file, the flow of the X2 instructions for PASS X1 is as follows:
s1: for a source register X2, searching a renaming mapping table to obtain a corresponding physical register number Pa;
s2: updating the item corresponding to the target register X1 in the renaming mapping table by Pa, namely filling the item corresponding to the X1 in the renaming mapping table by Pa;
s3: updating an item corresponding to the instruction in the reorder buffer by Pa;
s4: updating a target physical register domain corresponding to the instruction in the reserved stack by Pa;
s5: when Pa in the renaming map T+ field is a valid pointer to the register file, i.e., implicitly indicates that the entry is valid;
s6: the register file index corresponding to the entry is pointed to a special invalid register code number from Pa;
s7: immediately submitting the result to a reorder buffer directly;
s8: when the PASS instruction is at the tail of the reorder buffer, the entry corresponding to the instruction is submitted and deleted.
The special invalid register refers to a specific physical register with constant 0.
Furthermore, when the register renaming and PASS instructions based on the reserved stack are implemented, the ARF pointer field is added in the register renaming mapping table, so that the corresponding relation between the register renaming table item and the register can be dynamically changed, and the one-to-one relation between the renaming mapping table item and the register group index is reserved by exchanging the index values of the source logic register and the destination logic register.
For PASS X1, the flow of X2 instructions is as follows:
s1: for a destination register X1 and a source register X2, indexing out logical register ARF pointers corresponding to the X1 and the X2 in a renaming mapping table;
s2: searching a source in ARF/ROB of the renaming mapping table, and if the source is an instruction which is completed, entering step S3; if the source is the result of the instruction which is not completed yet, the number of the instruction which is depended on by the generation X2 in the reserved stack is searched at the moment, and the step S4 is entered;
s3: reading out data corresponding to the source register X2, filling the data into the V2 of the reserved stack, and entering step S6;
s4: for the result of step S1, if dependent on the instruction in the other reservation stack, copying the ROB pointer in the renaming mapping table corresponding to the source register X2 to the active source register dependency field of the reservation stack, and waiting for the end of the reservation stack instruction on which the active source register dependency field depends;
s5: ending the reservation stack instruction depending on the effective source register domain, obtaining the value of X2, and filling the value into V2 of the reservation stack;
s6: exchanging the logical register ARF pointers corresponding to the X1 and the X2 searched in the step S1;
s7: submitting the value of V2 to a reorder buffer while bypassing into a reservation stack component;
s8: deleting an entry corresponding to the MOV instruction in the reservation stack;
s9: when the instruction is bypassed to the reserved stack in the reordering buffer at the same time, the instruction is used for other instructions which depend on the X1 register;
s10: when the PASS instruction is at the tail of the reorder buffer, the entry corresponding to the instruction is submitted and deleted.
Further, in a processor design that does not support a register renaming mechanism, each logical register is directly mapped to a physical register in a deterministic one-to-one manner, the PASS class instruction is degenerated into a general MOV class instruction for processing, and the source register is automatically updated to an occupied state when being assigned next time.
The method for transferring data between registers provided by the invention provides two kinds of semantics of data copying and data transferring, and accordingly provides a novel PASS instruction, and also provides two specific PASS instruction implementation schemes by combining a register renaming technology, and simultaneously provides a solution for how to process the PASS instruction without register renaming.
Drawings
FIG. 1 is a schematic diagram of a prior art MOV class instruction implementation based on register renaming of a unified register file;
FIG. 2 is a schematic diagram of a register renaming implementation PASS class instruction based on a unified register file in accordance with the present invention;
FIG. 3 is a schematic diagram of an improved embodiment of the present invention for implementing PASS class instructions based on register renaming of a unified register file;
FIG. 4 is a second schematic diagram of an improved embodiment of the present invention for implementing PASS class instructions based on register renaming of a unified register file;
FIG. 5 is a schematic diagram of a prior art MOV class instruction implementation based on register renaming of a reservation stack;
FIG. 6 is a schematic diagram of a register renaming implementation PASS class instruction based on a reservation stack in accordance with the present invention.
Detailed Description
1. Branch instruction semantic discrimination and scene analysis
According to the invention, the set data transfer instruction can express two different semantics according to different contexts. The first semantic meaning is the copying of data, and the semantic meaning copies the data to a destination register on the basis of not damaging a source register; and the second semantic is data transfer, and after the source register data is moved to the destination register, the source register no longer needs to hold the original data and is in an idle state.
Some scenarios in which semantics two are as follows (not limited to):
1. compilers often use an application binary interface (ABI, abstract binary interface) to pass function parameters or return values through specific registers. ARM64, for example, typically uses x0 registers for referencing, so the source registers containing the parameter content are often "transferred" to the x0 registers;
2. in the process of modifying the system register by the program, the value of the system register is temporarily cached by a general register, a new system register value is formed after some logic operation, and finally the new system register value is transferred back to the system register to finish the modification of the system register;
3. the data is "transferred" from the general purpose registers to the floating point registers for floating point operations. After data transfer, the source general register is typically semantically in an "idle" state;
4. the data is "transferred" from the floating point registers to the general purpose registers. After data transfer, the source floating point registers may be semantically in an "idle" state;
5. the data is "transferred" from the floating point registers to the floating point registers. After the data transfer, the source floating point registers may be semantically in an "idle" state.
The invention provides a novel data transfer instruction which is used for distinguishing two kinds of semantics of data copying and data transferring. For convenience, the original data replication type transfer instructions are hereinafter collectively referred to as MOV type instructions, and the new data transfer type instructions are hereinafter collectively referred to as PASS type instructions.
The present invention further provides a method for transferring data between registers, in combination with high performance processor design practices, using general purpose inter-register data transfer as an example, comprising an implementation based on a unified register file, and two improvements provided on this basis, including an implementation based on a reservation stack, including a processing scheme without renaming.
2. In the first embodiment of the invention: register renaming and PASS class instruction implementation based on unified register file.
As shown in fig. 1, is a prior art process for implementing MOV instructions based on register renaming of a unified register file. The flow is as follows:
the first step, searching a renaming mapping table for a source register X2 to obtain a corresponding physical register number Pa;
second, put Pa into the source physical register field corresponding to the MOV instruction in the reservation stack (the reservation stack in FIG. 1 has two source register fields, only one valid for this instruction, e.g., T2+);
thirdly, for the destination register X1, a corresponding physical register Pb is allocated to the destination register X1 from the idle table;
updating the item corresponding to X1 in the renaming mapping table by Pb;
fifthly, updating the item corresponding to the instruction in the reorder buffer by Pb;
sixthly, updating a target physical register domain corresponding to the instruction in the reserved stack by using Pb;
seventh, when the source register and the destination register are ready and the Functional Unit (FU) is active, transmitting the data to the functional unit for operation;
eighth step, the functional unit reads the number corresponding to Pa from the register file;
ninth, the functional component writes the read data back to the position corresponding to Pb;
tenth, recovering the idle table, and changing the idle table item corresponding to the Told from occupied to idle state in the reordering buffer;
eleventh, when the MOV instruction is at the end of the reorder buffer, the entry corresponding to the instruction is committed and deleted.
As shown in FIG. 2, the process of implementing PASS class instructions based on register renaming of a unified register file is described. The flow is as follows:
first, for the source register X2, a renaming mapping table is searched for to obtain the corresponding physical register number Pa. Compared with the renaming mapping table in the prior art, the renaming mapping table of the invention has the advantages that the valid domain is added besides the T+ domain representing the register number. The valid domain is added, so that the demand for physical register allocation can be reduced in subsequent operations;
there is no instruction design to invalidate a register, and the valid field added by the present invention provides the meaning of invalidating the register, where invalidating refers to not care about the value of the register any more.
And secondly, updating an item corresponding to a target register X1 in the renaming mapping table by Pa, namely filling an item corresponding to X1 in the renaming mapping table by Pa. Compared with Pb of a new physical register which needs to be allocated from the physical register when the MOV is realized by the prior method (see figure 1), the complexity of operation is greatly reduced;
examples of the updates are as follows: the register X1 refers to a destination register of a PASS instruction, for example, PASS X1, X2 instructions, which transfers the contents of the logical register X2 into the logical register X1, and the contents of X2 are invalidated. That is, assuming that the content in X2 is 0X12345678 at the beginning, the content of X1 becomes 0X12345678 and the X2 content is no longer meaningful after PASS X1, X2 instruction.
Updating the item corresponding to the instruction in the reorder buffer by Pa;
updating a target physical register domain corresponding to the instruction in the reserved stack by Pa;
fifthly, setting the valid field of the renaming mapping table entry corresponding to the destination register X1 to be in a valid (V) state;
sixthly, the valid field of the renaming mapping table item corresponding to the source register X2 is set to be in an invalid (I) state from an effective (V) state;
step seven, immediately submitting the result to a reordering buffer directly;
eighth, when the PASS instruction is at the end of the reorder buffer, the entry corresponding to the instruction is committed and deleted.
Compared with the implementation of MOV instructions, the implementation of the PASS instruction in the scheme has the following characteristics:
(1) The valid domain is added in the register renaming mapping table and is used for supporting the invalid semantics of the source register after transferring the content;
(2) New physical registers are not required to be allocated from the idle table again, so that occupation of the physical registers is reduced;
(3) The functional unit FU is not needed to be passed, so that the functional unit is liberated to engage in other types of operation, and the efficiency of the functional unit is improved;
(4) And data is not required to be carried from one physical register to another physical register from the physical register file, so that the read-write requirement on the physical register file is reduced, and the efficiency of reading and writing the register file is improved.
In the first modification of the present embodiment, the invalid (Null) pointer is used to index the invalid bit of the register file without adding Valid field of the renaming map, as shown in fig. 3, and the first modification is different from the original embodiment shown in fig. 2 in the fifth step and the sixth step.
In the fifth step, the valid field of the renaming mapping table corresponding to the destination register X1 is no longer required to be set in the valid (V) state. When Pa in the t+ field is the active pointer of the register file, i.e., implicitly indicates that the entry is active;
in the sixth step, the valid field of the renaming map table entry corresponding to the source register X2 is no longer required to be set from the valid (V) state to the invalid (I) state. Instead, the register file index corresponding to the entry is replaced from Pa to Null.
The register file index refers to the t+ field in the renaming map, and corresponds to fig. 3, which means that after the instruction is executed, the value of X2 in the renaming map is no longer Pa, but is Null. Note that not Pa corresponding to X2 points to Null, but is replaced by Null.
The improvement inherits most of the characteristics of the original embodiment, and the difference characteristics are as follows:
(1) The valid field does not need to be explicitly added in the renaming mapping table, and whether the register mapping table entry is valid or not is distinguished through the validity or invalidity of the register file index;
(2) Since the number of physical register file valid indices is typically not an integer power of 2, there is no need to increase the number of bits in the renaming map to set a physical register number that does not exist to be used as the special tag required.
In the second modification of the present embodiment, the Valid field of the renaming map is not added, and a special invalid register code is used. For example, in many systems, logical register 0 is always '0' and always maps to a particular physical register (hardwired to 0), which is never put back into the free table and to which the source logical register of the PASS instruction may be mapped. As shown in fig. 4. The second modification is different from the first modification only in the sixth step.
In the sixth step, the register file index corresponding to the entry is not required to point to the invalid location, but to the special invalid register code.
The invalid bit refers to null shown in modification one, and the special invalid register refers to a specific physical register with constant 0 in modification two. The difference between them is that:
the former only has the physical register number and no actual physical register entity. The latter exists as an actual physical register entity, except that the register is constantly 0.
The former scenario where the first modification exists and is used is where the number of effective indices of the physical register file is not an integer power of 2, i.e. not a number such as 2,4,8, 16, 32, 64, 128, for example 96. The latter scenario of existence and use of modification two is the existence of a constant 0 physical register in the system.
The existence and use of the former is premised on the implementation of physical registers in the system microarchitecture, independent of the architecture-defined logical registers, e.g., 128 physical registers in the microarchitecture are unusable, and 96 physical registers are usable. The latter has a relation with the architecture-specified registers, and can be used when the architecture does not specify a constant 0 register, and cannot be used when the architecture does not specify a constant 0 register.
The second improvement inherits most of the characteristics of the first improvement, and the difference characteristics are as follows:
(1) Identifying register invalidation in the renaming mapping table through a special invalidation register code number;
(2) Particularly for systems where special invalid register codes exist.
In the second embodiment of the invention: register renaming and PASS class instruction implementation based on a reservation stack, as shown in fig. 5, is a process for implementing MOV instructions based on register renaming of a reservation stack in the prior art. The flow is as follows:
first, for the source register X2, the source is looked up in the rename map. The source may be an instruction that has completed, at which point its value is read from the logical register; it may also be the result of an instruction that has not yet completed, at which point the number in the reservation stack of the instruction on which X2 was generated will be looked up;
second, for the result of the first step, if the value of X2 is read from the logical register, it is put into V2 (the reservation stack has two source register value fields V1/V2, only one valid for MOV instructions, e.g., V2), and the fifth step is entered; otherwise, entering a third step;
wherein V1, V2 refer to data stored in a register, t1+, t2+ refer to an entry number of a reservation stack. For example, the MOV X1, X2 instruction copies the value of the X2 register into the X1 register, there are two scenarios: first, the ARF/ROB field in the rename map is indicated as ARF, i.e., the value of the X2 register is stored in the logical register AFR, so that the value of X2 is read from the AFR and placed in the V2 field of the reservation stack. Second, the ARF/ROB field in the rename map is indicated as ROB, i.e., the value of the X2 register is stored in an entry in the reservation stack, and Tx is placed in the T2+ field corresponding to the MOV instruction, assuming that the entry is numbered Tx.
Third, for the result of the first step, if it depends on the instruction in the other reservation stack, it fills in T2+ (the reservation stack in the figure has two source register dependent fields, only one is valid for the instruction, for example T2+), and waits for the reservation stack instruction on which T2+ depends to end;
fourthly, ending the T < 2+ > dependent reservation stack instruction, obtaining the value of X2, and filling the value of X2 into V2 (the reservation stack in the figure has two source register value fields V1/V2, and only one valid for the MOV instruction is V2, for example);
fifthly, transmitting a reserve stack instruction corresponding to the MOV to the functional component for execution;
a sixth step of updating the result of the functional unit into a reorder buffer while bypassing (forward) the functional unit into a reservation stack unit for use by other X1 register dependent instructions;
seventh, deleting the entry corresponding to the MOV instruction in the reservation stack;
eighth, when the instruction is committed in the reorder buffer, it is updated into the logical register while forward is on the reservation stack for use by other instructions that rely on the X1 register.
And ninth, submitting the instruction, and deleting the entry corresponding to the instruction in a reordering buffer.
As shown in fig. 6, the register renaming implementation PASS class instruction based on the reservation stack of the present invention includes the following steps:
first, for the destination register X1 and the source register X2, the logical register ARF pointers corresponding to X1 and X2 are indexed in the rename map.
Searching a source in ARF/ROB of a renaming mapping table, and entering a third step if the source is a finished instruction; if the source is the result of the instruction which is not completed yet, searching the number of the instruction which is depended on by the generation X2 in the reserved stack, and entering a fourth step;
wherein the renaming map has more ARF pointers, in FIG. 5, X1, X2 are used as logical register numbers, and the mapping between the logical register AFR is one-to-one and fixed, and can be directly used for indexing the logical register AFR without renaming the ARF pointers in the map. In fig. 6, X1, X2 are also logical register numbers, but there may be many-to-1 and possibly varying register sets, so the ARF pointers are required to be remapped. For example, X1 is 5, which represents logical register number 5, X2 is 6, which represents logical register number 6, the initial state X1 points to register number 4 of the register set in the ARF pointer of the renaming map, and X2 points to register number 3 of the register set in the ARF pointer of the renaming map, then the PASS type instruction is executed and these two pointers are swapped. The value of X2 is thus passed into X1, as to where X2 points is no longer important.
Step three, reading out the data corresponding to the source register X2, filling the data into V2 (the reserved stack in the figure has two source register value fields V1/V2, and only one of the data is valid for the PASS instruction, for example, V2), and entering a sixth step;
fourth, for the result of the first step, if depending on the instruction in the other reservation stack, copy the ROB pointer in the source register X2 corresponding rename map to t2+ (the reservation stack has two source register dependency fields in the figure, only one valid for this instruction, e.g., t2+, this operation indicates that the current PASS instruction waits for the end of the instruction in program sequence that previously updated the source register), and waits for the end of the reservation stack instruction on which t2+ depends;
fifthly, ending a T2+ dependent reservation stack instruction, obtaining a value of X2, and filling in V2;
sixth, exchanging the logical register ARF pointers corresponding to the X1 and the X2 searched in the first step;
seventh, submitting the value of V2 to a reorder buffer while forward to a reserved stack component;
eighth step, deleting the entry corresponding to MOV instruction in the reservation stack;
ninth, when the instruction is in the reorder buffer and forward to the reserved stack, the instruction is used by other instructions which depend on the X1 register;
tenth, when the PASS instruction is at the end of the reorder buffer, the entry corresponding to the instruction is committed and deleted.
Compared with the implementation of MOV instructions, the implementation of the PASS instruction in the scheme has the following characteristics:
(1) The ARF pointer field is added in the register renaming mapping table, and compared with the static one-to-one relation between the register renaming table item and the ARF register in the prior art, the new structure dynamically changes the corresponding relation between the register renaming table item and the register, so that the register set is not called as the ARF register any more, and is changed into a register set for short;
(2) The exchange of the index values of the source logical register and the destination logical register is to reserve a one-to-one relation between the renaming mapping table entries and the register set indexes, so that the subsequent logical relation is prevented from being disordered;
(3) The functional component is not passed, the reservation stack acquires the source operand and then forwards the source operand to the reservation stack logic, so that the functional component is liberated to perform other types of operation, and the efficiency of the functional component is improved;
(4) Adding ARF pointer domain in the reorder buffer, tracking the dynamically changed logical register index value, and ensuring the updating of the correct register;
(5) The reorder buffer no longer updates the result to the register set, only retaining forward to other data paths of the reservation stack. Note that when the source is an outstanding instruction in the first step, this indicates that there is a precursor instruction to update the source registers, which is in fact done by the precursor instruction when it committed, and that the current PASS instruction does not need to update the registers again when it commits. The characteristic reduces the reading and writing requirements on the register set and improves the efficiency of the register set.
The third embodiment of the invention: embodiments under a register-less renaming mechanism
In a processor design that does not support a register renaming mechanism, each logical register directly maps to a physical register deterministically one-to-one. In order to be compatible with the support method of the PASS instruction, the PASS instruction can be simply degenerated into a general MOV instruction for processing. Placing the source registers of PASS class instructions in the "idle" state, not shown, does not affect the correctness of the program logic. The source register will automatically update to the occupied state the next time it is assigned.
The invention is based on the addition of an invalid state to the logical register. This state may have more effect, e.g. the register may not be operated anymore when the scene is saved. For example, save live operations sw x1,0 (sp) instruction, if the x1 register is in an invalid state, the instruction can be treated as nop, avoiding the occurrence of access operations. A further direction of optimization in this example is how x1 is known not to have been saved at the beginning when lw x1,0 (sp) is considered to be restored to the scene.
For arithmetic type instructions, the source registers may also be left inactive after execution, depending on context. Although the purpose of transferring data can not be achieved by pointer movement like a PASS instruction, more logic registers can be in an invalid state, and occupation of physical registers is reduced. The difficulty is in the multiple operands and instruction encoding space.
The present invention is exemplified by general purpose register to general purpose register data transfer. The concepts and methods may be extended to other types of register-to-register data transfers, such as general purpose register-to-system registers, general purpose register-to-floating point registers, floating point register-to-general purpose registers, floating point register-to-floating point registers, etc., in systems implementing floating point register renaming. Since system registers generally do not use renaming mechanisms, they are not suitable for system registers to general purpose registers.
The technical scheme of the invention has the beneficial effects that:
(1) The invention provides a transfer method and implementation of inter-register data for a high-performance processor. The invention introduces a novel instruction semantic, subdivides the traditional inter-register data transfer instruction into two types of instructions, namely inter-register data copying and inter-register data transferring, provides a novel instruction semantic for the latter type of instructions, and provides a scheme for realizing a system.
(2) The instruction provides finer instruction level semantics and wider space for software and hardware optimization. Meanwhile, the invention takes the data transmission from the general register to the general register as an example, and combines the register renaming technology to provide two solutions. The scheme effectively saves the physical register items, saves the physical ports of the physical registers and improves the efficiency of the operation part. The scheme is easy to realize and convenient to upgrade and reform in the existing framework. The invention also provides a solution to the PASS instruction without a register renaming mechanism.
The invention has the following characteristics:
(1) A novel PASS instruction. The semantics of transferring data between registers and placing the source registers in an inactive state are implemented. Characterized by transferring source register data to a destination register without retaining the value in the source register; the source register and the destination register are logical registers, and are registers defined in the architecture. Unified register file refers to physical registers, which is a way to implement logical registers, the number of logical registers being the same in a processor conforming to a certain architecture (e.g., ARM, MIPS), and the number of physical registers used in a particular implementation may vary;
(2) An implementation device of PASS class instruction based on unified register file register renaming technique, which is characterized in that: the valid domain is added in the register renaming mapping table and is used for supporting the invalid semantics of the source register after transferring the content; new physical registers are not required to be allocated from the idle table again, so that occupation of the physical registers is reduced; the functional unit FU is not needed to be passed, so that the functional unit is liberated to engage in other types of operation, and the efficiency of the functional unit is improved; the data is not required to be carried from one physical register to another physical register from the physical register file, so that the read-write requirement on the physical register file is reduced, and the efficiency of reading and writing the register file is improved;
(3) An improved generation one of an implementation device of a PASS class instruction based on a unified register file register renaming technology is characterized in that: inherits most of the characteristics of the original instance; the valid field does not need to be explicitly added in the register renaming mapping table, and whether the register mapping table entry is valid or not is distinguished through the validity or invalidity of the register file index; in particular, when the number of effective indexes of the physical register file is not generally the integer power of 2, an inexistent physical register number can be set to be used as a required special mark without increasing the number of bits of the mapping table.
(4) An improved generation II of an implementation device of a PASS instruction based on a unified register file register renaming technology is characterized in that: most of the features of the first embodiment are inherited; identifying register invalidation in the renaming mapping table through a special invalidation register code number; particularly for systems where special invalid register codes exist.
(5) The device for realizing the PASS instruction based on the reserved stack register renaming technology is characterized in that an ARF pointer field is added in a register renaming mapping table, and a new structure can dynamically change the corresponding relation between a register renaming table item and a register; the one-to-one relation between the renaming mapping table entries and the register set indexes is reserved through exchanging the index values of the source logic register and the destination logic register; the functional component is not passed, the reservation stack acquires the source operand and then forwards the source operand to the reservation stack logic, so that the functional component is liberated to perform other types of operation, and the efficiency of the functional component is improved; adding ARF pointer domain in the reorder buffer, and tracking the index value of the logic register which is dynamically changed; the PASS instruction is no longer updated to the register set in the reorder buffer, only retaining forward to other data paths of the reserved stack. The reading and writing requirements on the register set are reduced, and the efficiency of the register set is improved;
(6) An implementation device of PASS instructions based on a register-free renaming mechanism converts the PASS instructions tui into MOV instructions for processing. The scheme is characterized in that: the micro architecture is not required to be changed, the existing pipeline stage is multiplexed, and the decoding support for the PASS instruction is only required to be increased.

Claims (8)

1. A method for transferring data between registers, comprising: dividing the inter-register data transfer instruction into two types of data copying and data transfer, wherein defining data copying is MOV instruction, defining data transfer is PASS instruction, and the PASS instruction makes the source register no longer need to hold original data after moving the source register data to the destination register, and the source register is in idle state.
2. The method of inter-register data transfer of claim 1, wherein: when the PASS instruction is realized based on the register renaming of the unified register file, valid domain is added in the register renaming mapping table to support the invalid semantics after the source register transfers the content, and the flow of the PASS (X1, X2) instruction is as follows:
s1: for a source register X2, searching a renaming mapping table to obtain a corresponding physical register number Pa;
s2: updating the item corresponding to the target register X1 in the renaming mapping table by Pa, namely filling the item corresponding to the X1 in the renaming mapping table by Pa;
s3: updating an item corresponding to the instruction in the reorder buffer by Pa;
s4: updating a target physical register domain corresponding to the instruction in the reserved stack by Pa;
s5: setting the valid field of the renaming mapping table corresponding to the destination register X1 to be in a valid state;
s6: the valid field of the renaming mapping table item corresponding to the source register X2 is set to be in an invalid state from an effective state;
s7: immediately submitting the result to a reorder buffer directly;
s8: when the PASS instruction is at the tail of the reorder buffer, the entry corresponding to the instruction is submitted and deleted.
3. The method of inter-register data transfer of claim 1, wherein: when implementing PASS class instructions based on register renaming of a unified register file, the flow of PASS (X1, X2) instructions is as follows:
s1: for a source register X2, searching a renaming mapping table to obtain a corresponding physical register number Pa;
s2: updating the item corresponding to the target register X1 in the renaming mapping table by Pa, namely filling the item corresponding to the X1 in the renaming mapping table by Pa;
s3: updating an item corresponding to the instruction in the reorder buffer by Pa;
s4: updating a target physical register domain corresponding to the instruction in the reserved stack by Pa;
s5: when Pa in the renaming map T+ field is a valid pointer to the register file, i.e., implicitly indicates that the entry is valid;
s6: replacing the register file index corresponding to the entry from Pa to Null space Null;
s7: immediately submitting the result to a reorder buffer directly;
s8: when the PASS instruction is at the tail of the reorder buffer, the entry corresponding to the instruction is submitted and deleted.
4. The method of inter-register data transfer of claim 1, wherein:
when implementing PASS class instructions based on register renaming of a unified register file, the flow of PASS (X1, X2) instructions is as follows:
s1: for a source register X2, searching a renaming mapping table to obtain a corresponding physical register number Pa;
s2: updating the item corresponding to the target register X1 in the renaming mapping table by Pa, namely filling the item corresponding to the X1 in the renaming mapping table by Pa;
s3: updating an item corresponding to the instruction in the reorder buffer by Pa;
s4: updating a target physical register domain corresponding to the instruction in the reserved stack by Pa;
s5: when Pa in the renaming map T+ field is a valid pointer to the register file, i.e., implicitly indicates that the entry is valid;
s6: the register file index corresponding to the entry is pointed to a special invalid register code number from Pa;
s7: immediately submitting the result to a reorder buffer directly;
s8: when the PASS instruction is at the tail of the reorder buffer, the entry corresponding to the instruction is submitted and deleted.
5. The method of inter-register data transfer of claim 4, wherein: the special invalid register refers to a specific physical register with constant 0.
6. The method of inter-register data transfer of claim 1, wherein: when the register renaming and PASS instructions are realized based on the reserved stack, the ARF pointer field is added in the register renaming mapping table, the corresponding relation between the register renaming table item and the register can be dynamically changed, and the one-to-one relation between the renaming mapping table item and the register group index is reserved by exchanging the index values of the source logic register and the destination logic register.
7. The method of inter-register data transfer of claim 6, wherein: the flow of instructions for PASS (X1, X2) is as follows:
s1: for a destination register X1 and a source register X2, indexing out logical register ARF pointers corresponding to the X1 and the X2 in a renaming mapping table;
s2: searching a source in ARF/ROB of the renaming mapping table, and if the source is an instruction which is completed, entering step S3; if the source is the result of the instruction which is not completed yet, the number of the instruction which is depended on by the generation X2 in the reserved stack is searched at the moment, and the step S4 is entered;
s3: reading out data corresponding to the source register X2, filling the data into the V2 of the reserved stack, and entering step S6;
s4: for the result of step S1, if dependent on the instruction in the other reservation stack, copying the ROB pointer in the renaming mapping table corresponding to the source register X2 to the active source register dependency field of the reservation stack, and waiting for the end of the reservation stack instruction on which the active source register dependency field depends;
s5: ending the reservation stack instruction depending on the effective source register domain, obtaining the value of X2, and filling the value into V2 of the reservation stack;
s6: exchanging the logical register ARF pointers corresponding to the X1 and the X2 searched in the step S1;
s7: submitting the value of V2 to a reorder buffer while bypassing into a reservation stack component;
s8: deleting an entry corresponding to the MOV instruction in the reservation stack;
s9: when the instruction is bypassed to the reserved stack in the reordering buffer at the same time, the instruction is used for other instructions which depend on the X1 register;
s10: when the PASS instruction is at the tail of the reorder buffer, the entry corresponding to the instruction is submitted and deleted.
8. The method of inter-register data transfer of claim 1, wherein: in a processor design that does not support a register renaming mechanism, each logical register is mapped directly to a physical register in a deterministic one-to-one manner, the PASS class instructions are retired into MOV class instructions for processing, and the source registers are automatically updated to an occupied state when assigned next time.
CN202311224785.6A 2023-09-21 2023-09-21 Method for transferring data between registers Active CN116954722B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311224785.6A CN116954722B (en) 2023-09-21 2023-09-21 Method for transferring data between registers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311224785.6A CN116954722B (en) 2023-09-21 2023-09-21 Method for transferring data between registers

Publications (2)

Publication Number Publication Date
CN116954722A CN116954722A (en) 2023-10-27
CN116954722B true CN116954722B (en) 2024-01-16

Family

ID=88449713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311224785.6A Active CN116954722B (en) 2023-09-21 2023-09-21 Method for transferring data between registers

Country Status (1)

Country Link
CN (1) CN116954722B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2409060A (en) * 2003-12-09 2005-06-15 Advanced Risc Mach Ltd Moving data between registers of different register data stores
CN113703832A (en) * 2021-09-10 2021-11-26 中国人民解放军国防科技大学 Method, device and medium for executing immediate data transfer instruction
CN113849224A (en) * 2020-06-27 2021-12-28 英特尔公司 Apparatus, method and system for instructions to move data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10528355B2 (en) * 2015-12-24 2020-01-07 Arm Limited Handling move instructions via register renaming or writing to a different physical register using control flags

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2409060A (en) * 2003-12-09 2005-06-15 Advanced Risc Mach Ltd Moving data between registers of different register data stores
CN113849224A (en) * 2020-06-27 2021-12-28 英特尔公司 Apparatus, method and system for instructions to move data
CN113703832A (en) * 2021-09-10 2021-11-26 中国人民解放军国防科技大学 Method, device and medium for executing immediate data transfer instruction

Also Published As

Publication number Publication date
CN116954722A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
US7506139B2 (en) Method and apparatus for register renaming using multiple physical register files and avoiding associative search
US7587585B1 (en) Flag management in processors enabled for speculative execution of micro-operation traces
US7769986B2 (en) Method and apparatus for register renaming
JP5416223B2 (en) Memory model of hardware attributes in a transactional memory system
JP5120832B2 (en) Efficient and flexible memory copy operation
KR100334479B1 (en) Methods and apparatus for reordering load operations in a computer processing system
US6651163B1 (en) Exception handling with reduced overhead in a multithreaded multiprocessing system
US7428631B2 (en) Apparatus and method using different size rename registers for partial-bit and bulk-bit writes
US20100070741A1 (en) Microprocessor with fused store address/store data microinstruction
US20080065864A1 (en) Post-retire scheme for tracking tentative accesses during transactional execution
US20070050592A1 (en) Method and apparatus for accessing misaligned data streams
JP2783505B2 (en) Method and system for improving instruction dispatch in superscalar processor systems utilizing independently accessed intermediate storage
US20100205387A1 (en) Apparatus utilizing efficient hardware implementation of shadow registers and method thereof
GB2496934A (en) Multi-stage register renaming using dependency removal and renaming maps.
US20040148496A1 (en) Method for handling a conditional move instruction in an out of order multi-issue processor
US11599359B2 (en) Methods and systems for utilizing a master-shadow physical register file based on verified activation
CN115640047B (en) Instruction operation method and device, electronic device and storage medium
US6957323B2 (en) Operand file using pointers and reference counters and a method of use
CN106095393B (en) Merge the system and method for partial write result during retraction phase
WO2001018645A1 (en) Register renaming system
CN116954722B (en) Method for transferring data between registers
US9588769B2 (en) Processor that leapfrogs MOV instructions
US20030182538A1 (en) Method and system for managing registers
US6266761B1 (en) Method and system in an information processing system for efficient maintenance of copies of values stored within registers
CN117492844B (en) Register renaming method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant