CN117806706A - Storage order violation processing method, storage order violation processing device, electronic equipment and medium - Google Patents

Storage order violation processing method, storage order violation processing device, electronic equipment and medium Download PDF

Info

Publication number
CN117806706A
CN117806706A CN202311872930.1A CN202311872930A CN117806706A CN 117806706 A CN117806706 A CN 117806706A CN 202311872930 A CN202311872930 A CN 202311872930A CN 117806706 A CN117806706 A CN 117806706A
Authority
CN
China
Prior art keywords
instruction
executed
violation
load
instructions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311872930.1A
Other languages
Chinese (zh)
Inventor
翟少敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hexin Technology Co ltd
Shanghai Hexin Digital Technology Co ltd
Original Assignee
Hexin Technology Co ltd
Shanghai Hexin Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hexin Technology Co ltd, Shanghai Hexin Digital Technology Co ltd filed Critical Hexin Technology Co ltd
Priority to CN202311872930.1A priority Critical patent/CN117806706A/en
Publication of CN117806706A publication Critical patent/CN117806706A/en
Pending legal-status Critical Current

Links

Landscapes

  • Advance Control (AREA)

Abstract

The application provides a storage sequence violation processing method, a storage sequence violation processing device, electronic equipment and a storage sequence violation processing medium. The method comprises the following steps: in the process of carrying out-of-order execution of instructions for a program to be executed, for each current instruction to be executed, determining all violation instructions associated with the instruction to be executed according to a loading instruction reordering queue; determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; wherein the first violation instruction is a load instruction; the load instruction rearrangement sequence comprises a plurality of items, wherein different items correspond to different transmitted load instructions, and each item stores a load address of the load instruction corresponding to the item, thread information of the load instruction corresponding to the item, an execution state, a snoped value and a sequence identifier; and executing the first violation instruction in the program to be executed and the programs after the first violation instruction in the program to be executed out of order. The method is used for reducing power consumption waste and improving the performance of the processor when the storage sequence violation is processed.

Description

Storage order violation processing method, storage order violation processing device, electronic equipment and medium
Technical Field
The present disclosure relates to computer technologies, and in particular, to a storage order violation processing method, apparatus, electronic device, and medium.
Background
The processor execution modes are divided into two types: sequential execution and out-of-order execution. Sequential execution, i.e., instruction fetch order by instruction Counter (PC). However, in the case of sequential execution, once instruction dependency is encountered, stalling occurs, resulting in wasted resources. Modern processors often employ out-of-order execution to exploit instruction-level parallelism in programs, thereby improving processor performance. Out-of-order execution allows incoherent instructions to cross long delayed events, thereby improving instruction throughput. Out-of-order execution presents the potential for storage order violations (memory order violation). For example, assuming a store (store) instruction and a load (load) instruction that access the same address in a program, because the instructions are executed out-of-order, the load instruction will not execute successfully and write a new value until the store instruction has taken a older memory value before writing data to the store instruction, and other instructions that depend on the load instruction will have taken the older value to execute the instructions, this may cause a storage order violation, resulting in inconsistent results of out-of-order execution of the instructions and execution of the instructions in program order.
In the prior art, as also described above, when a storage order violation is detected, out-of-order execution is resumed from the next instruction in the program that is the storage instruction. However, the instruction interval between the store instruction and the load instruction in the out-of-order execution process may be increased, which means that the execution is directly started from the next instruction of the store instruction in the program, which may waste the instruction that has been correctly executed between the store instruction and the load instruction, resulting in waste of power consumption and affecting the performance of the processor.
Disclosure of Invention
The application provides a storage sequence violation processing method, a storage sequence violation processing device, electronic equipment and a medium, which are used for reducing power consumption waste and improving the performance of a processor when the storage sequence violation is processed.
In one aspect, the present application provides a storage order violation processing method, including:
in the process of carrying out-of-order execution of instructions for a program to be executed, for each current instruction to be executed, determining all violation instructions associated with the instruction to be executed according to a loading instruction reordering queue;
determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; wherein the first violation instruction is a load instruction; the load instruction rearrangement sequence comprises a plurality of items, wherein different items correspond to different transmitted load instructions, and each item stores a load address of the load instruction corresponding to the item, thread information of the load instruction corresponding to the item, an execution state of the load instruction corresponding to the item, a snoped value of the load instruction corresponding to the item and a sequence identifier of the load instruction corresponding to the item;
And executing the first violation instruction and the following programs in the program to be executed out of order.
Optionally, the determining all the violating instructions associated with the to-be-executed instruction according to the load instruction reordering queue includes:
if the instruction to be executed is a storage instruction, taking the instruction meeting the first violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the first violation condition comprises thread information identical to thread information of the instruction to be executed, a loading address identical to a storage address of the instruction to be executed, an execution state representing that the execution is completed, and a sequence identifier larger than a sequence identifier corresponding to the instruction to be executed;
if the instruction to be executed is a loading instruction, taking the instruction meeting a second violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the second violation condition comprises thread information being the same as thread information where the instruction to be executed is located, a loading address being the same as a loading address of the instruction to be executed, and an execution state representing that the execution is completed, and a sequence identifier being larger than a sequence identifier corresponding to the instruction to be executed; and the snoped value is 1;
If the instruction to be executed is a synchronous instruction, taking the instruction meeting a third violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the third violation condition comprises thread information which is the same as the thread information of the instruction to be executed, the execution states of all the violation instructions associated with the instruction to be executed represent that the execution is completed, and the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed; and the snoped value is 1.
Optionally, the determining, from all the violating instructions associated with the to-be-executed instruction, the first violating instruction that is located forefront in the to-be-executed program includes:
if the number of the violation instructions associated with the instruction to be executed is one, the violation instructions are used as the first violation instructions;
and if the number of the violation instructions associated with the instruction to be executed is a plurality of, taking the violation instruction with the smallest sequence identifier as the first violation instruction according to the sequence identifiers of all the violation instructions associated with the instruction to be executed.
Optionally, the determining, from all the violating instructions associated with the to-be-executed instruction, before the first violating instruction that is located forefront in the to-be-executed program further includes:
If the number of the violation instructions associated with the to-be-executed instruction is multiple, generating a checking request corresponding to the to-be-executed instruction, wherein the checking request corresponding to the to-be-executed instruction comprises an instruction identifier of the to-be-executed instruction and sequence identifiers of all the violation instructions associated with the to-be-executed instruction; adding the inspection request corresponding to the instruction to be executed into an inspection request queue;
the determining the first violation instruction which is the forefront in the program to be executed from all violation instructions associated with the instruction to be executed comprises the following steps:
and aiming at an instruction to be executed corresponding to the first checking request in the checking request queue, determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed.
Optionally, the performing instruction out-of-order execution on the first violation instruction and the program after the first violation instruction in the program to be executed includes:
ending the out-of-order execution of the instruction at this time and acquiring the instruction identification of the first violation instruction; reading a program counter value corresponding to the first violation instruction from an instruction reordering buffer according to the instruction identification of the first violation instruction;
And executing the instruction with the instruction address of the program counter value corresponding to the first violation instruction and the following programs in the program to be executed out of order according to the program counter value corresponding to the first violation instruction.
Optionally, after the ending the out-of-order execution of the instruction, the method further includes:
determining a second check request in the check request queue, wherein an instruction identifier in the second check request is larger than an instruction identifier of the first violation instruction;
the second inspection request is cleared from the inspection request queue.
Optionally, the determining, from all the violating instructions associated with the to-be-executed instruction, the first violating instruction that is located forefront in the to-be-executed program includes:
generating a violation signal corresponding to the instruction to be executed according to all the violation instructions associated with the instruction to be executed, wherein the violation signal comprises a plurality of bits, each bit of the violation signal corresponds to each item of the load instruction reordering queue one by one, the bit is a first value to indicate that the load instruction corresponding to the bit is the violation instruction associated with the instruction to be executed, and the bit is a second value to indicate that the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed;
And inputting the violation signals into a checking unit to obtain sequence identifications output by the checking unit, and reordering the loading instructions corresponding to the sequence identifications in the loading instruction reordering queue to serve as the first violation instructions.
Optionally, the checking unit includes a selector corresponding to each item in the load instruction reorder queue and a binary tree comparator; the step of inputting the violation signals into the checking unit to obtain the sequence identification output by the checking unit comprises the following steps:
each bit of the violation signal corresponding to the instruction to be executed is used as an enabling signal and is input to an enabling end of the selector corresponding to the corresponding item; the selector corresponding to each item receives a preset value and a sequence identifier of a loading instruction corresponding to the item, and is used for selecting and outputting the sequence identifier when the enabling signal is a first value and selecting and outputting the preset value when the enabling signal is a second value; the preset value is larger than the sequence identification of all the loading instructions in the loading instruction reordering queue;
inputting the result output by the selector corresponding to each item into a binary tree comparator to obtain a sequence identifier output by the binary tree comparator; the binary tree comparator is used for comparing the results output by all the selectors and outputting the minimum result.
In another aspect, the present application provides a storage order violation processing device, including:
the first determining module is used for determining all violation instructions associated with each current instruction to be executed according to a loading instruction reordering queue in the process of performing instruction out-of-order execution on the program to be executed;
the second determining module is used for determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; wherein the first violation instruction is a load instruction; the load instruction rearrangement sequence comprises a plurality of items, wherein different items correspond to different transmitted load instructions, and each item stores a load address of the load instruction corresponding to the item, thread information of the load instruction corresponding to the item, an execution state of the load instruction corresponding to the item, a snoped value of the load instruction corresponding to the item and a sequence identifier of the load instruction corresponding to the item;
and the execution module is used for executing the first violation instruction and the following programs in the program to be executed in order.
Optionally, the first determining module is specifically configured to:
If the instruction to be executed is a storage instruction, taking the instruction meeting the first violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the first violation condition comprises thread information identical to thread information of the instruction to be executed, a loading address identical to a storage address of the instruction to be executed, an execution state representing that the execution is completed, and a sequence identifier larger than a sequence identifier corresponding to the instruction to be executed;
if the instruction to be executed is a loading instruction, taking the instruction meeting a second violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the second violation condition comprises thread information being the same as thread information where the instruction to be executed is located, a loading address being the same as a loading address of the instruction to be executed, and an execution state representing that the execution is completed, and a sequence identifier being larger than a sequence identifier corresponding to the instruction to be executed; and the snoped value is 1;
if the instruction to be executed is a synchronous instruction, taking the instruction meeting a third violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the third violation condition comprises thread information which is the same as the thread information of the instruction to be executed, the execution states of all the violation instructions associated with the instruction to be executed represent that the execution is completed, and the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed; and the snoped value is 1.
Optionally, the second determining module is specifically configured to:
if the number of the violation instructions associated with the instruction to be executed is one, the violation instructions are used as the first violation instructions;
and if the number of the violation instructions associated with the instruction to be executed is a plurality of, taking the violation instruction with the smallest sequence identifier as the first violation instruction according to the sequence identifiers of all the violation instructions associated with the instruction to be executed.
Optionally, the second determining module is further configured to:
if the number of the violation instructions associated with the to-be-executed instruction is multiple, generating a checking request corresponding to the to-be-executed instruction, wherein the checking request corresponding to the to-be-executed instruction comprises an instruction identifier of the to-be-executed instruction and sequence identifiers of all the violation instructions associated with the to-be-executed instruction; adding the inspection request corresponding to the instruction to be executed into an inspection request queue;
the second determining module is further specifically configured to:
and aiming at an instruction to be executed corresponding to the first checking request in the checking request queue, determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed.
Optionally, the execution module is specifically configured to:
ending the out-of-order execution of the instruction at this time and acquiring the instruction identification of the first violation instruction; reading a program counter value corresponding to the first violation instruction from an instruction reordering buffer according to the instruction identification of the first violation instruction;
and executing the instruction with the instruction address of the program counter value corresponding to the first violation instruction and the following programs in the program to be executed out of order according to the program counter value corresponding to the first violation instruction.
Optionally, the second determining module is further configured to:
the clearing module is used for determining a second checking request in the checking request queue, and the instruction identification in the second checking request is larger than the instruction identification of the first violation instruction;
the second inspection request is cleared from the inspection request queue.
Optionally, the second determining module is further configured to:
generating a violation signal corresponding to the instruction to be executed according to all the violation instructions associated with the instruction to be executed, wherein the violation signal comprises a plurality of bits, each bit of the violation signal corresponds to each item of the load instruction reordering queue one by one, the bit is a first value to indicate that the load instruction corresponding to the bit is the violation instruction associated with the instruction to be executed, and the bit is a second value to indicate that the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed;
And inputting the violation signals into a checking unit to obtain sequence identifications output by the checking unit, and reordering the loading instructions corresponding to the sequence identifications in the loading instruction reordering queue to serve as the first violation instructions.
Optionally, the checking unit includes a selector corresponding to each item in the load instruction reorder queue and a binary tree comparator; the second determining module is further configured to:
each bit of the violation signal corresponding to the instruction to be executed is used as an enabling signal and is input to an enabling end of the selector corresponding to the corresponding item; the selector corresponding to each item receives a preset value and a sequence identifier of a loading instruction corresponding to the item, and is used for selecting and outputting the sequence identifier when the enabling signal is a first value and selecting and outputting the preset value when the enabling signal is a second value; the preset value is larger than the sequence identification of all the loading instructions in the loading instruction reordering queue;
inputting the result output by the selector corresponding to each item into a binary tree comparator to obtain a sequence identifier output by the binary tree comparator; the binary tree comparator is used for comparing the results output by all the selectors and outputting the minimum result.
In yet another aspect, the present application provides an electronic device, including: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory to implement the method as described above.
In yet another aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement the method as described above.
In the storage sequence violation processing method, the storage sequence violation processing device, the electronic equipment and the storage sequence violation processing medium, in the process of performing instruction out-of-order execution on a program to be executed, aiming at each current instruction to be executed, determining all violation instructions related to the instruction to be executed in a load instruction reordering queue according to loading addresses, thread information, execution states, unopened values and sequence identifiers of different transmitted load instructions stored in the load instruction reordering queue, and determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions related to the instruction to be executed; performing instruction out-of-order execution on a first violation instruction in a program to be executed and a program behind the first violation instruction; when the storage sequence violations result in the error of the execution results of part of the programs, the violating instructions with the error of the initial execution results in the programs are accurately determined, and re-execution is performed from the violating instructions, so that the waste of executed correct programs can be avoided, the power consumption waste is reduced while the storage sequence violations are realized, and the performance of a processor is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
A schematic diagram of the architecture of a processor core is shown schematically in fig. 1;
fig. 2 is a schematic flow chart of a storage order violation processing method according to a first embodiment of the present application;
FIG. 3 is a schematic diagram schematically illustrating a structure of a load instruction reorder queue according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an inspection request queue according to an embodiment of the present application;
FIG. 5 is a flow chart illustrating a first embodiment of the present application for determining a first violation instruction;
a schematic structural diagram of a selector according to a first embodiment of the present application is schematically shown in fig. 6;
fig. 7 is a schematic diagram schematically illustrating the structure of an inspection unit according to a first embodiment of the present application;
fig. 8 is a schematic structural diagram of a storage order violation processing device according to a second embodiment of the present application;
fig. 9 is a schematic structural diagram of a storage order violation processing electronic device according to a third embodiment of the present application.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
The modules in this application refer to functional modules or logic modules. It may be in the form of software, the functions of which are implemented by the execution of program code by a processor; or may be in hardware. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The terms referred to in this application are explained first:
multi-hot encoding is a numerical vector that converts features into n dimensions, 1 in the dimension where there is interaction, and 0 in all other dimensions.
The processor execution modes are divided into two types: sequential execution and out-of-order execution. Sequential execution, i.e., execution in instruction counter instruction fetch order. However, in the case of sequential execution, once instruction dependency is encountered, stalling occurs, resulting in wasted resources. Modern processors often employ out-of-order execution to exploit instruction-level parallelism in programs, thereby improving processor performance. Out-of-order execution allows incoherent instructions to cross long delayed events, thereby improving instruction throughput. Out-of-order execution presents the potential for storage order violations to occur. For example, assuming that a store instruction and a load instruction in a program access the same address in tandem, because the instructions are executed out of order, the load instruction will execute successfully and write a new value after taking a relatively old memory value before the store instruction writes data, and then the other instructions have taken the old value to execute the instructions depending on the load instruction, which causes a storage order violation, resulting in inconsistent results for out-of-order execution of the instructions and execution of the instructions in program order.
In the prior art, as also described above, when a storage order violation is detected, out-of-order execution is resumed from the next instruction in the program that is the storage instruction. However, the instruction interval between the store instruction and the load instruction in the out-of-order execution process may be increased, which means that the execution is directly started from the next instruction of the store instruction in the program, which may waste the instruction that has been correctly executed between the store instruction and the load instruction, resulting in waste of power consumption and affecting the performance of the processor.
The storage sequence violation processing method, device, electronic equipment and medium provided by the application aim to solve the technical problems.
In the storage sequence violation processing method, the storage sequence violation processing device, the electronic equipment and the storage sequence violation processing medium, in the process of performing instruction out-of-order execution on a program to be executed, aiming at each current instruction to be executed, determining all violation instructions related to the instruction to be executed in a load instruction reordering queue according to loading addresses, thread information, execution states, unopened values and sequence identifiers of different transmitted load instructions stored in the load instruction reordering queue, and determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions related to the instruction to be executed; performing instruction out-of-order execution on a first violation instruction in a program to be executed and a program behind the first violation instruction; when the storage sequence violations result in the error of the execution results of part of the programs, the violating instructions with the error of the initial execution results in the programs are accurately determined, and re-execution is performed from the violating instructions, so that the waste of executed correct programs can be avoided, the power consumption waste is reduced while the storage sequence violations are realized, and the performance of a processor is improved.
FIG. 1 illustrates a schematic diagram of a processor core, as shown in FIG. 1, which is a simple example of a high performance out-of-order processor core, which may be a hardware single-threaded or a hardware multi-threaded processor core. The processor core pipeline is sequentially from front to back, an Instruction Cache (Instruction-Cache) 101, an Instruction fetch Unit (Instruction fetch Unit, IFU) 102, a branch prediction Unit (Branch Prediction Unit, BPU) 103, an Instruction decode Unit (Instruction Decode Unit, IDU) 104, a register renaming Unit (register renaming) 105, an Instruction issue queue (Instruction Issue Queue) 106, a physical register file 107, an integer execution Unit (integer Unit) 108, a Floating Point/Vector execution Unit (Floating Point/Vector Unit) 109, a memory execution Unit (Load-Store Unit, LSU) 110, a data Cache 111, an Instruction reorder buffer (Instruction ReOrder Buffer, ROB/inst-ROB) 112, an address resolution completion table 113, a completion Unit 114, and an architected register map table (arch-state rename mapper) 115.
Instruction cache 101 is responsible for storing recently used instructions and later used instructions and ancillary information (e.g., pre-decode information) of the instructions.
Instruction fetch unit 102 is responsible for indexing one to multiple instructions from instruction cache 101 per clock cycle.
Branch prediction unit 103 may be coupled to fetch unit 102 or decoupled from fetch unit 102. Branch prediction unit 103 predicts the trace of a subsequent instruction fetch for fetch unit 102, i.e., predicts the jump direction and jump address of the current instruction stream.
The decode unit 104 receives and decodes instructions from the instruction fetch unit 102, typically decodes instruction words specified by an instruction set into a sequence of microinstructions suitable for hardware execution. Depending on the complexity of the instruction, it is decoded into one or more micro-instructions or internal operations (internal operation, IOP for short). IOPs can be processed and executed out of order by subsequent pipelines, but often need to wait for all internal operations that belong to the same instruction to execute correctly before committing.
Register renaming unit 105 is typically used to eliminate spurious dependencies (e.g., write-after-write, WAW; write-after-read, WAR) between instructions/internal operations, such that serial execution of instructions may be avoided. In particular implementations, the renaming unit may map a logical (instruction set architecture-level) target register number to a number in a physical register space, and may eliminate spurious correlations (e.g., WAW, WAR) during the mapping process. Once decoded and renamed, the IOP may be allocated pipeline backend resources, such as instruction reorder buffer 112, instruction scheduling Queue (IssueQuue), load instruction reorder Queue (Load Reorder Queue, LRQ) and store instruction reorder Queue (Store Reorder Queue, SRQ) during the dispatch stage. Based on the classification of the IOP information, the relevant resource preparation executing instructions are respectively stored.
Instruction reorder buffer 112 allocates entries for instructions during the dispatch stage and stores instruction information that enters instruction reorder buffer 112 from the dispatch stage until the instructions complete their commit from instruction reorder buffer 112, and updates architectural state register map 115 to release the entries in instruction reorder buffer 112. The instruction reordering buffer 112 mainly stores related information such as instruction completion status, instruction exception type, instruction register mapping history, instruction address, etc., so as to support normal instruction submission or processing instruction speculation errors (exceptions, branch prediction errors, storage order violations, etc.), and restore the processor core related microarchitectural status, and re-fetch instruction execution.
The instruction issue queue 106 is responsible for storing the IOP after renaming, tracking the operand readiness of the instruction, the execution unit's running state and the state of the resources required by the instruction pipeline; an instruction in the wakeup queue depends on certain operands, which may be about to be generated by the issued IOP. The instruction queue is also responsible for selecting the corresponding IOP for the downstream execution unit, arbitrating that the oldest or most critical instruction is issued for execution. After the instruction queue issues IOP, it passes through the physical register file 107, reads the source operands or takes the data from the bypass path, and then goes to the corresponding execution units, such as branch execution unit (Branch Execution Unit), integer execution unit 108, floating point/vector execution unit 109, memory instruction execution unit 110, etc.
Completion unit 114 is responsible for checking the instruction completion and determining if an exception has occurred, and if not, updating the corresponding architectural register mapping table 115 according to the instruction type. The mapping table stores a mapping of destination register logical addresses (logical GPR index) to physical addresses (physical GPR index) of committed instructions. If the exception exists, the exception processing program is executed in the corresponding exception processing routine according to the exception information in the ROB.
The technical solutions of the present application are illustrated in the following specific examples. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Example 1
Fig. 2 is a flow chart of a storage order violation processing method according to an embodiment of the present application. As shown in fig. 2, the storage order violation processing method provided in this embodiment may include:
s201, in the process of carrying out-of-order execution of instructions for a program to be executed, for each current instruction to be executed, reordering a queue according to a loading instruction, and determining all violation instructions associated with the instruction to be executed;
s202, determining a first violation instruction which is positioned at the forefront in a program to be executed from all violation instructions associated with the instruction to be executed; wherein the first violation instruction is a load instruction; the load instruction rearrangement sequence comprises a plurality of items, wherein different items correspond to different transmitted load instructions, and each item stores a load address of the load instruction corresponding to the item, thread information of the load instruction corresponding to the item, an execution state of the load instruction corresponding to the item, a snoped value of the load instruction corresponding to the item and a sequence identifier of the load instruction corresponding to the item;
S203, executing the first violating instruction in the program to be executed and the following programs in order.
In practical application, the execution body of the embodiment may be a storage order violation processing device, which may be implemented by a computer program, for example, application software or the like; alternatively, the computer program may be implemented as a medium storing a related computer program, for example, a usb disk, a cloud disk, or the like; still alternatively, it may be implemented by a physical device, e.g., a chip, a server, etc., integrated with or installed with the relevant computer program.
The load instruction reorder queue is used to maintain load status into an out-of-order window and to monitor potential semantic hazards. FIG. 3 is a schematic diagram of a load instruction reordering queue according to an embodiment of the present application, where each load instruction is allocated to an entry of a load instruction reordering queue in a dispatch stage or an issue stage, and different entries of the load instruction reordering queue store sequence identifiers, load addresses, thread information (tid), execution status (final), and snoop values of different load instructions, as shown in FIG. 3. After the load instruction execution is completed, the final will be set to 1. In order to flexibly share the same set of LRQ resources in a processor core implementing a hardware simultaneous multi-threading function (SMT), entries of a load instruction reordering queue may be allowed to be allocated to different hardware threads as required, so that a program sequence of a load instruction cannot be represented to a physical sequence of entries of the load instruction reordering queue that is actually allocated.
The sequence identification of the load instruction comprises a load identification (load tag) or an instruction identification (iop-tag), wherein the load identification is an identification which is distributed by a processor kernel before the load instruction is transmitted to an emission queue and used for representing the sequence of the load instruction among all load instructions in the program; the instruction identification is an identification assigned by the processor core before the load instruction is issued to the issue queue that characterizes the order of the load instruction among all instructions in the program.
The snoop value characterizes the snooped state of the load instruction. At some time, an invalidate request from an external storage system may be sent to the load instruction reorder queue, where the invalidate request is accompanied by load address information, the load address information in the invalidate request is compared with the load addresses of all the load instructions with final=1 in the load instruction reorder queue, and the snoop value in the same entry is set to 1, which indicates that the load instruction is snooped (snooped) by other potential write requests, and the data corresponding to the load address may not be up to date.
In the process of carrying out-of-order execution of instructions for a program to be executed, determining all violation instructions associated with the instructions to be executed according to the information of the loading instructions stored in a loading instruction reordering queue for each current instruction to be executed; and determining the first violation instruction which is the first violation instruction in the program to be executed from all violation instructions associated with the instruction to be executed. The first violating instruction which is positioned at the forefront in the program to be executed can be determined by comparing the sequence identifications of all violating instructions, and the greater the sequence identification is, the later the sequence of the corresponding violating instruction in the program is. And executing the first violation instruction in the program to be executed and the programs after the first violation instruction in the program to be executed out of order.
In the example, in the process of performing instruction out-of-order execution on a program to be executed, for each current instruction to be executed, determining all violation instructions associated with the instruction to be executed in a load instruction reordering queue according to a load address, thread information, an execution state, a snoozed value and a sequence identifier of different transmitted load instructions stored in the load instruction reordering queue, and determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; performing instruction out-of-order execution on a first violation instruction in a program to be executed and a program behind the first violation instruction; when the storage sequence violations result in the error of the execution results of part of the programs, the violating instructions with the error of the initial execution results in the programs are accurately determined, and re-execution is performed from the violating instructions, so that the waste of executed correct programs can be avoided, the power consumption waste is reduced while the storage sequence violations are realized, and the performance of a processor is improved.
The storage sequence violations may include various situations, such as Store-Hit-Load, load-Hit-Load, and Sync-Hit-Load, where according to different storage sequence violations, all violations associated with the to-be-executed instruction need to be determined by different methods. In one example, determining all offending instructions associated with the instruction to be executed from the load instruction reorder queue includes:
If the instruction to be executed is a storage instruction, taking the instruction meeting the first violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the first violation condition comprises thread information identical to thread information of the instruction to be executed, a loading address identical to a storage address of the instruction to be executed, an execution state representing that the execution is completed, and a sequence identifier greater than a sequence identifier corresponding to the instruction to be executed;
if the instruction to be executed is a loading instruction, taking the instruction meeting the second violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the second violation condition comprises thread information identical to thread information of the instruction to be executed, a loading address identical to a loading address of the instruction to be executed, the execution state represents that the execution is completed, and the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed; and the snoped value is 1;
if the instruction to be executed is a synchronous instruction, taking the instruction meeting the third violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the third violation condition comprises thread information which is the same as thread information of the instruction to be executed, the execution state of all the violation instructions associated with the instruction to be executed represents that the execution is completed, and the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed; and the snoped value is 1.
Specifically, store-Hit-Load (SHL) refers to a Store (Store) instruction and a Load (Load) instruction that access the same address in a program, and because the instructions are executed out of order, the Load instruction is executed successfully and writes a new value after taking a relatively old memory value before the Store instruction writes data, and other instructions have taken the old value to execute the instructions depending on the Load instruction, which causes a storage order violation. If the instruction to be executed is a Store instruction, it is necessary to detect whether Store-Hit-Load has occurred. The detection of whether the Store-Hit-Load occurs can be achieved by judging whether a Load instruction in a Load instruction reordering queue meets a first violation condition, wherein the first violation condition comprises thread information being the same as thread information where an instruction to be executed is located, a Load address is the same as a storage address of the instruction to be executed, an execution state indicates that execution is completed, and a sequence identifier is larger than a sequence identifier corresponding to the instruction to be executed. And taking the instruction meeting the first violation condition in the load instruction reordering queue as the violation instruction associated with the instruction to be executed according to the load instruction reordering queue, the sequence identification of the instruction to be executed, the storage address and the thread information.
Load-hit-Load refers to the first Load (Load 1) instruction and the second Load (Load 2) instruction that access the same address in the program in tandem, because the instructions are executed out of order, the second Load instruction is executed before the first Load instruction, after the value in the memory is taken, other programs in the storage system update the value of the address, then the first Load instruction is executed and takes the latest value in the memory, the sequence consistency of the program is violated, and the storage sequence violation is caused, at this time, the second Load instruction is monitored (snoop) by other potential write requests, which indicates that the data corresponding to the address may not be the latest. If the instruction to be executed is a Load instruction, it is necessary to detect whether Load-hit-Load occurs. The detection of whether Load-hit-Load occurs can be achieved by judging whether a Load instruction in a Load instruction reordering queue meets a second violation condition, wherein the second violation condition comprises that thread information is identical to thread information of an instruction to be executed, a Load address is identical to a storage address of the instruction to be executed, an execution state indicates that execution is completed, a sequence identifier is larger than a sequence identifier corresponding to the instruction to be executed, and a snoozed value is 1. And taking the instruction meeting the second violation condition in the load instruction reordering queue as the violation instruction associated with the instruction to be executed according to the load instruction reordering queue, the sequence identification of the instruction to be executed, the storage address and the thread information.
Sync-hit-load refers to a synchronization (Sync) instruction and a load (load) instruction in a program, where the Sync instruction needs to sort subsequent load operations before, i.e. a load instruction accessing an address can only take the value of the address in memory after the Sync instruction performs global memory sorting. Because of out-of-order execution, after the load instruction first executes and takes the value in the memory, other programs in the storage system update the value of the address, and then the synchronous instruction executes and enters the storage subsystem to complete the ordering operation, and returns an answer sync_ack for ordering completion. After receiving the sync_ack, the processor core needs to query all load instructions that have been executed but not committed, if there is a load instruction that is invalidated by an external snoop (snoop) request (the load instruction is marked as invalid for execution or marked as snoop), representing that the load instruction is executed too early, the old value is taken, and if there is a synchronization instruction before the load instruction to order the memory instructions, a memory order violation occurs. If the instruction to be executed is a synchronous instruction, whether a Sync-hit-load occurs or not needs to be detected. The detection of the occurrence of the Sync-hit-load can be achieved by judging whether the load instruction in the load instruction reordering queue meets a third violation condition, wherein the third violation condition comprises thread information identical to thread information of an instruction to be executed, the execution state indicates that the execution is completed, the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed, and the snoped value is 1. And taking the instruction meeting the third violation condition in the load instruction reordering queue as the violation instruction associated with the instruction to be executed according to the load instruction reordering queue, the sequence identification of the instruction to be executed and the thread information.
In the process of comparing the sequence identifier of the instruction to be executed with the sequence identifier of the loading instruction, the sequence identifier of the instruction to be executed is always consistent with the type of the sequence identifier of the loading instruction, namely, when the sequence identifier of the instruction to be executed is the loading identifier corresponding to the instruction to be executed, the sequence identifier of the loading instruction is the loading identifier of the loading instruction; when the sequence identifier of the instruction to be executed is the instruction identifier corresponding to the instruction to be executed, the sequence identifier of the loading instruction is the instruction identifier of the loading instruction. The loading identifier corresponding to the instruction to be executed refers to the loading identifier recorded by the processor core at the moment before the instruction to be executed is dispatched to the transmitting queue; the instruction identifier of the instruction to be executed refers to an instruction identifier allocated by the processor core before the instruction to be executed is dispatched to the issue queue, and is used to characterize the sequence of each instruction in the program.
In this example, for various storage order violations, the violating instructions associated with the instructions to be executed can be accurately acquired.
In practical applications, the number of the violating instructions associated with the instructions to be executed may be multiple, and at this time, the oldest violating instruction in all the violating instructions needs to be further judged, and re-execution is performed from the oldest violating instruction. In one example, determining a first offending instruction that is the forefront in the program to be executed from all offending instructions associated with the instruction to be executed includes:
If the number of the violation instructions associated with the instruction to be executed is one, the violation instructions are used as first violation instructions;
if the number of the violation instructions associated with the instructions to be executed is multiple, taking the violation instruction with the smallest sequence identifier as a first violation instruction according to the sequence identifiers of all the violation instructions associated with the instructions to be executed.
Specifically, when the number of the violating instructions associated with the instruction to be executed is one, the violating instruction can be directly determined as a first violating instruction; when the number of the violating instructions associated with the instructions to be executed is a plurality of, the oldest violating instruction can be determined through the sequence identification, namely, the violating instruction with the minimum sequence identification is used as the first violating instruction.
In this example, by the sequence identification, the first violating instruction can be accurately determined among the violating instructions associated with the plurality of instructions to be executed.
In practical applications, there may be multiple storage order violations, and the violating instructions have multiple instructions to be executed, so a check request queue (Check Request Queue, abbreviated as CRQ) may be established, and the requests of the multiple instructions to be executed to determine the first violating instruction are saved. In one example, determining the first offending instruction that is first in the program to be executed before from all offending instructions associated with the instruction to be executed further includes:
If the number of the violation instructions associated with the to-be-executed instruction is multiple, generating an inspection request corresponding to the to-be-executed instruction, wherein the inspection request corresponding to the to-be-executed instruction comprises an instruction identifier of the to-be-executed instruction and sequence identifiers of all the violation instructions associated with the to-be-executed instruction; adding an inspection request corresponding to an instruction to be executed into an inspection request queue;
determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed, wherein the method comprises the following steps:
and aiming at the to-be-executed instruction corresponding to the first checking request in the checking request queue, determining the first violation instruction which is positioned at the forefront in the to-be-executed program from all violation instructions associated with the to-be-executed instruction.
Fig. 4 is a schematic structural diagram of an inspection Request queue provided in an embodiment of the present application, as shown in fig. 4, for a plurality of associated instructions to be executed, an inspection Request (Check Request) corresponding to the executed instruction is generated, where the inspection Request includes an instruction identifier of the instruction to be executed and a sequence identifier of all the instructions to be executed associated with the instruction to be executed; and adding the check request to a check request queue, the check request queue saving the check request to a free slots (free slots). The check request queue may store a plurality of check requests, and send the first check request with respect to the oldest sequence in the check request queue to the oldest-Load pipeline by adopting a first-in first-out method, and determine the first violation instruction that is the forefront in the program to be executed from all the violation instructions associated with the instruction to be executed by comparing the sequence identification sizes of all the violation instructions associated with the instruction to be executed.
In this example, by establishing the check request queue, it is possible to sequentially determine, for a plurality of instructions to be executed associated with a plurality of instructions to be violating, instructions to be executed of which the order is oldest.
In one example, performing instruction out-of-order execution of a first offending instruction in a program to be executed and programs following the first offending instruction, includes:
ending the out-of-order execution of the instruction at this time and acquiring the instruction identification of the first violation instruction; reading a program counter value corresponding to the first violation instruction from the instruction reordering buffer according to the instruction identification of the first violation instruction;
and executing the instruction to be executed with the instruction address of the program counter value corresponding to the first violating instruction and the following program out of order according to the program counter value corresponding to the first violating instruction.
Specifically, pipeline flushing (flush pipeline) is performed, and the out-of-order execution of the instruction at this time is ended. And acquiring the instruction identification of the first illegal instruction, and reading the Program Counter (PC) value corresponding to the first illegal instruction from the instruction reordering buffer according to the instruction identification of the first illegal instruction. In practical applications, the instruction is stored in a memory space with continuous addresses, and a Program Counter (PC) is used to store the addresses of the instruction. When the program is executed, the initial value of the program counter is the address of the first instruction of the program, the first step of instruction execution is to fetch the instruction from the memory according to the address in the program counter, and then the address in the program counter is automatically added with 1 or the address of the next instruction in the program is given by the transfer pointer. Therefore, according to the program counter value corresponding to the first violating instruction, the instruction with the instruction address of the program counter value corresponding to the first violating instruction in the program to be executed and the following programs can be executed out of order.
In this example, according to the instruction identifier of the first violating instruction, the program counter value corresponding to the first violating instruction is read from the instruction reordering buffer, so that the first violating instruction and the following instructions can be re-executed starting with the first violating instruction.
After re-performing out-of-order execution based on a first violation instruction corresponding to a certain to-be-executed instruction, checking that a part of to-be-executed instructions corresponding to a checking request stored in a request queue may be in a range of the out-of-order execution, and in order to avoid wasting resources, in one example, after ending the out-of-order execution of the instruction, the method further includes:
determining a second checking request in the checking request queue, wherein the instruction identification in the second checking request is larger than the instruction identification of the first illegal instruction;
the second inspection request is cleared from the inspection request queue.
Specifically, according to a first violation instruction corresponding to a certain instruction to be executed, re-executing the instruction in disorder, after finishing the instruction in disorder execution of this time, comparing the instruction identification of the first violation instruction with the instruction identification of the instruction to be executed stored in an inspection request queue, removing the inspection request with the instruction identification larger than the instruction identification of the first violation instruction, wherein the instruction to be executed corresponding to the inspection request is in the range of re-executing in disorder, so that the current storage sequence violation of the instruction to be executed can be eliminated, and the waste of power consumption is effectively avoided.
In this example, after the out-of-order execution of the instruction of this time is finished, the inspection request with the instruction identifier larger than the instruction identifier of the first violation instruction in the inspection request queue is removed, so that the oldest first violation instruction in the multiple associated violation instructions is effectively avoided for the instruction to be executed corresponding to the inspection request, the power consumption waste is reduced, and the performance of the processor is improved.
In practical application, the information of the instruction to be executed can be compared with the information in the reorder queue of the loading instruction based on the violation condition, a violation signal is generated, and all the violation instructions associated with the instruction to be executed are determined based on the violation signal. In one example, determining a first offending instruction that is the forefront in the program to be executed from all offending instructions associated with the instruction to be executed includes:
generating a violation signal corresponding to the instruction to be executed according to all the violation instructions associated with the instruction to be executed, wherein the violation signal comprises a plurality of bits, each bit of the violation signal corresponds to each item of a load instruction reordering queue one by one, the bit is a first value to indicate that the load instruction corresponding to the bit is the violation instruction associated with the instruction to be executed, and the bit is a second value to indicate that the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed;
And inputting the violation signals into the checking unit to obtain the sequence identification output by the checking unit, and using the load instruction corresponding to the sequence identification in the load instruction reordering queue as a first violation instruction.
Specifically, a violation signal may be generated based on the first violation condition, the second violation condition, or the third violation condition, where the violation signal includes a plurality of bits, each bit of the violation signal corresponds to each item of the load instruction reorder queue one by one, a value of each bit may be a first value or a second value, and the bit is the first value, and represents that a load instruction corresponding to the bit is a violation instruction associated with an instruction to be executed, and the load instruction and the instruction to be executed together form a storage sequence violation; the bit is a second value, the load instruction corresponding to the bit is not a violation instruction associated with the instruction to be executed, and the load instruction and the instruction to be executed do not form a storage sequence violation. For example, the first value may be 1, and the second value may be 0, that is, when a certain bit of the violation signal is 1, the load instruction corresponding to the bit is a violation instruction associated with the instruction to be executed; when a certain bit of the violation signal is 0, the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed.
When the bit with the value of the first value in the violation signal is one, taking a loading instruction corresponding to the bit as a first violation instruction; when the number of bits with the first value in the violation signal is multiple, the violation signal is input into the checking unit, and the sequence of the multiple violation instructions is compared through the checking unit, so that the sequence identifier output by the checking unit is obtained; and the corresponding loading instruction is identified in the loading instruction reordering queue according to the sequence, and the loading instruction is used as a first illegal instruction. In practical application, the inspection request may include a violation signal corresponding to an instruction to be executed, and the violation signal corresponding to the inspection request with the first order in the inspection request queue is input into the inspection unit to determine a first violation instruction corresponding to the instruction to be executed.
FIG. 5 is a flow chart of determining a first violation instruction according to an embodiment of the present application, as shown in FIG. 5, in an example, taking a Store-Hit-Load storage order violation as an example, after executing a read register file and address generation, determining whether corresponding information of a Load instruction in a Load instruction reorder queue meets a first violation condition, and characterizing a determination result by Multi-Hot encoding, wherein the address generation includes source operand selection and address generation. Assuming that N items exist in the load instruction reordering queue, comparing the sequence identification of the instruction to be executed with the sequence identification in the load instruction reordering queue to generate an older signal comprising N bits, wherein each bit of the older signal corresponds to each item of the load instruction reordering queue one by one, the sequence identification of the load instruction corresponding to the bit with the value of 1 in the older signal is larger than the sequence identification of the instruction to be executed, and the sequence identification of the load instruction corresponding to the bit with the value of 0 in the older signal is not larger than the sequence identification of the instruction to be executed; comparing a storage address of an instruction to be executed with a loading address in a loading instruction reordering queue to generate a match signal comprising N bits, wherein each bit of the match signal corresponds to each item of the loading instruction reordering queue one by one, the loading address of the loading instruction corresponding to a bit with value of 1 in the match signal is the same as the storage address of the instruction to be executed, and the loading address of the loading instruction corresponding to a bit with value of 0 in the match signal is different from the storage address of the instruction to be executed; comparing the thread information of the instruction to be executed with the thread information in the load instruction reordering queue to generate a thread signal comprising N bits, wherein each bit of the thread signal corresponds to each item of the load instruction reordering queue one by one, the thread information of the load instruction corresponding to the bit with the value of 1 in the thread signal is the same as the thread information of the instruction to be executed, and the thread information of the load instruction corresponding to the bit with the value of 0 in the thread signal is different from the thread information of the instruction to be executed; according to the execution state in the load instruction reordering queue, generating a fin signal comprising N bits, wherein each bit of the fin signal corresponds to each item of the load instruction reordering queue one by one, the execution state of a load instruction corresponding to a bit with value 1 in the fin signal is executed, and the execution state of a load instruction corresponding to a bit with value 0 in the fin signal is unexecuted. Performing bit pressing and operation on the order signal, the match signal, the thread signal and the fin signal to obtain a violation signal, wherein the first value is 1, the second value is 0 at the moment, namely when a certain bit of the violation signal is 1, a loading instruction corresponding to the bit is a violation instruction related to an instruction to be executed; when a certain bit of the violation signal is 0, the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed. Inputting the violation signal into a signal checking circuit, which may be a multi-Hot detection circuit, for example; judging whether the number of bits with the value of 1 in the violation signal is 1 or not based on the signal checking circuit, if the number of bits with the value of 1 in the violation signal is 1, taking the violation signal as an index, and determining a first violation instruction in a loading instruction reordering queue; if the number of bits with the value of 1 in the violation signal is a plurality of, determining a first violation instruction by the checking unit. For example, a corresponding check request may be generated and added to a check request queue, a violation signal corresponding to the check request with the first order in the check request queue is input to the check unit, and a first violation instruction corresponding to the instruction to be executed is determined according to the order identifier of the violation instruction. Wherein the structure of the check request queue is shown in fig. 4.
In this example, by generating a violation signal characterizing a violation condition of a load instruction in the load instruction reorder queue, it is possible to accurately determine a violation instruction associated with an instruction to be executed, and by the inspection unit, determine a first violation instruction among a plurality of violation instructions.
The structure of the checking unit can be various, in one example, the checking unit comprises a selector corresponding to each item in the load instruction reordering queue and a binary tree comparator; inputting the violation signal into the checking unit to obtain the sequence identifier output by the checking unit, including:
each bit of the violation signal corresponding to the instruction to be executed is used as an enabling signal and is input to an enabling end of the selector corresponding to the corresponding item; the selector corresponding to each item receives a preset value and a sequence identifier of the loading instruction corresponding to the item, and is used for selecting and outputting the sequence identifier when the enabling signal is a first value and selecting and outputting the preset value when the enabling signal is a second value; the preset value is larger than the sequence identification of all the loading instructions in the loading instruction reordering queue;
inputting the result output by the selector corresponding to each item into a binary tree comparator to obtain the sequence identifier output by the binary tree comparator; the binary tree comparator is used for comparing the results output by all the selectors and outputting the minimum result.
FIG. 6 is a schematic structural diagram of a selector according to an embodiment of the present application, as shown in FIG. 6, an enabling end of a 1 selector corresponding to each item in a reorder queue of load instructions receives each bit value of a violation signal corresponding to an instruction to be executed, and the selector corresponding to each item receives a preset value (Max_value) and a sequence identifier of the load instruction corresponding to the item, selects and outputs the sequence identifier of the load instruction corresponding to the item when the enabling signal is a first value, and selects and outputs the preset value when the enabling signal is a second value; wherein the preset value is greater than the sequential identification of all load instructions in the load instruction reorder queue so that it can be filtered out in a subsequent binary tree comparator comparison. Fig. 7 is a schematic structural diagram of an inspection unit according to an embodiment of the present application, where as shown in fig. 7, a result output by each selector is input into a binary tree comparator, where the binary tree comparator is composed of a plurality of 2-input comparison selectors, each 2-input comparison selector outputs a smaller input value, and finally obtains a minimum sequence identifier, and if the load instruction reorder queue has N entries and N is an integer power of 2, then the binary tree selector array is a regular binary tree, where N is 64, 32 2-select 1 selectors are needed for performing a stage comparison, each stage compares sequence identifiers corresponding to two bits of a set of adjacent violation signals, the first stage includes 16 2-input comparison selectors, the second stage includes 8 2-input comparison selectors, the third stage includes 4 2-input comparison selectors, the fourth stage includes 2-input comparison selectors, the fifth stage includes 1-input comparison selector, and finally outputs the minimum sequence identifier.
In the storage sequence violation processing method provided by the embodiment, in the process of performing instruction out-of-order execution on a program to be executed, for each current instruction to be executed, determining all violation instructions associated with the instruction to be executed in a load instruction reordering queue according to the load addresses, the thread information, the execution states, the unopened values and the sequence identifications of different transmitted load instructions stored in the load instruction reordering queue, and determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; performing instruction out-of-order execution on a first violation instruction in a program to be executed and a program behind the first violation instruction; when the storage sequence violations result in the error of the execution results of part of the programs, the violating instructions with the error of the initial execution results in the programs are accurately determined, and re-execution is performed from the violating instructions, so that the waste of executed correct programs can be avoided, the power consumption waste is reduced while the storage sequence violations are realized, and the performance of a processor is improved.
Example two
Fig. 8 is a schematic structural diagram of a storage order violation processing device according to an embodiment of the present application. As shown in fig. 8, the storage order violation processing device 800 provided in this embodiment may include:
The first determining module 81 is configured to, for each current instruction to be executed, determine, according to the load instruction reorder queue, all the violation instructions associated with the instruction to be executed in a process of performing instruction out-of-order execution on the program to be executed;
a second determining module 82, configured to determine, from all the violating instructions associated with the instruction to be executed, a first violating instruction that is the forefront in the program to be executed; wherein the first violation instruction is a load instruction; the load instruction rearrangement sequence comprises a plurality of items, wherein different items correspond to different transmitted load instructions, and each item stores a load address of the load instruction corresponding to the item, thread information of the load instruction corresponding to the item, an execution state of the load instruction corresponding to the item, a snoped value of the load instruction corresponding to the item and a sequence identifier of the load instruction corresponding to the item;
the execution module 83 is configured to execute the first violation instruction in the program to be executed and the program following the first violation instruction out of order.
In practical application, the storage sequence violation processing device may be implemented by a computer program, for example, application software or the like; alternatively, the computer program may be implemented as a medium storing a related computer program, for example, a usb disk, a cloud disk, or the like; still alternatively, it may be implemented by a physical device, e.g., a chip, a server, etc., integrated with or installed with the relevant computer program.
The load instruction reorder queue is used to maintain load status into an out-of-order window and to monitor potential semantic hazards. Each load instruction is allocated to an entry of a load instruction reorder queue during the dispatch stage or the launch stage, and different entries of the load instruction reorder queue store the sequence identification, load address, thread information, execution state, and snoop value of different load instructions. In order to flexibly share the same set of LRQ resources in a processor core which realizes the simultaneous hardware multi-threading function, entries of a load instruction reordering queue can be allowed to be allocated to different hardware threads as required, so that the program sequence of the load instruction cannot be embodied to the physical sequence of the entries of the load instruction reordering queue which is actually allocated.
The sequence identification of the loading instructions comprises a loading identification or an instruction identification, wherein the loading identification is an identification which is distributed by a processor kernel before the loading instructions are transmitted to an emission queue and used for representing the sequence of all the loading instructions in the program; the load identifier is an identifier assigned by the processor core before the load instruction is issued to the issue queue that characterizes the order of the load instruction among all instructions in the program.
The snoop value characterizes the snooped state of the load instruction. At some time, an invalidate request from an external storage system may be sent to the load instruction reordering queue, where the invalidate request carries load address information, the load address information in the invalidate request is compared with the load addresses of all executed load instructions in the load instruction reordering queue, and the snoop value in the same item is set to 1, which indicates that the load instruction is snooped by other potential write requests, and the data corresponding to the load address may not be up to date.
In the process of carrying out-of-order execution of instructions for a program to be executed, determining all violation instructions associated with the instructions to be executed according to the information of the loading instructions stored in a loading instruction reordering queue for each current instruction to be executed; and determining the first violation instruction which is the first violation instruction in the program to be executed from all violation instructions associated with the instruction to be executed. The first violating instruction which is positioned at the forefront in the program to be executed can be determined by comparing the sequence identifications of all violating instructions, and the greater the sequence identification is, the later the sequence of the corresponding violating instruction in the program is. And executing the first violation instruction in the program to be executed and the programs after the first violation instruction in the program to be executed out of order.
In the example, in the process of performing instruction out-of-order execution on a program to be executed, for each current instruction to be executed, determining all violation instructions associated with the instruction to be executed in a load instruction reordering queue according to a load address, thread information, an execution state, a snoozed value and a sequence identifier of different transmitted load instructions stored in the load instruction reordering queue, and determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; performing instruction out-of-order execution on a first violation instruction in a program to be executed and a program behind the first violation instruction; when the storage sequence violations result in the error of the execution results of part of the programs, the violating instructions with the error of the initial execution results in the programs are accurately determined, and re-execution is performed from the violating instructions, so that the waste of executed correct programs can be avoided, the power consumption waste is reduced while the storage sequence violations are realized, and the performance of a processor is improved.
The storage sequence violations may include various situations, such as Store-Hit-Load, load-Hit-Load, and Sync-Hit-Load, where according to different storage sequence violations, all violations associated with the to-be-executed instruction need to be determined by different methods. In one example, the first determining module may be specifically configured to:
If the instruction to be executed is a storage instruction, taking the instruction meeting the first violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the first violation condition comprises thread information identical to thread information of the instruction to be executed, a loading address identical to a storage address of the instruction to be executed, an execution state representing that the execution is completed, and a sequence identifier greater than a sequence identifier corresponding to the instruction to be executed;
if the instruction to be executed is a loading instruction, taking the instruction meeting the second violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the second violation condition comprises thread information identical to thread information of the instruction to be executed, a loading address identical to a loading address of the instruction to be executed, the execution state represents that the execution is completed, and the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed; and the snoped value is 1;
if the instruction to be executed is a synchronous instruction, taking the instruction meeting the third violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the third violation condition comprises thread information which is the same as thread information of the instruction to be executed, the execution state of all the violation instructions associated with the instruction to be executed represents that the execution is completed, and the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed; and the snoped value is 1.
Specifically, store-Hit-Load (SHL) refers to a Store (Store) instruction and a Load (Load) instruction that access the same address in a program, and because the instructions are executed out of order, the Load instruction is executed successfully and writes a new value after taking a relatively old memory value before the Store instruction writes data, and other instructions have taken the old value to execute the instructions depending on the Load instruction, which causes a storage order violation. If the instruction to be executed is a Store instruction, it is necessary to detect whether Store-Hit-Load has occurred. The detection of whether the Store-Hit-Load occurs can be achieved by judging whether a Load instruction in a Load instruction reordering queue meets a first violation condition, wherein the first violation condition comprises thread information being the same as thread information where an instruction to be executed is located, a Load address is the same as a storage address of the instruction to be executed, an execution state indicates that execution is completed, and a sequence identifier is larger than a sequence identifier corresponding to the instruction to be executed. And taking the instruction meeting the first violation condition in the load instruction reordering queue as the violation instruction associated with the instruction to be executed according to the load instruction reordering queue, the sequence identification of the instruction to be executed, the storage address and the thread information.
Load-hit-Load refers to the first Load (Load 1) instruction and the second Load (Load 2) instruction that access the same address in the program in tandem, because the instructions are executed out of order, the second Load instruction is executed before the first Load instruction, after the value in the memory is taken, other programs in the storage system update the value of the address, then the first Load instruction is executed and takes the latest value in the memory, the sequence consistency of the program is violated, and the storage sequence violation is caused, at this time, the second Load instruction is monitored (snoop) by other potential write requests, which indicates that the data corresponding to the address may not be the latest. If the instruction to be executed is a Load instruction, it is necessary to detect whether Load-hit-Load occurs. The detection of whether Load-hit-Load occurs can be achieved by judging whether a Load instruction in a Load instruction reordering queue meets a second violation condition, wherein the second violation condition comprises that thread information is identical to thread information of an instruction to be executed, a Load address is identical to a storage address of the instruction to be executed, an execution state indicates that execution is completed, a sequence identifier is larger than a sequence identifier corresponding to the instruction to be executed, and a snoozed value is 1. And taking the instruction meeting the second violation condition in the load instruction reordering queue as the violation instruction associated with the instruction to be executed according to the load instruction reordering queue, the sequence identification of the instruction to be executed, the storage address and the thread information.
Sync-hit-load refers to a synchronization (Sync) instruction and a load (load) instruction in a program, where the Sync instruction needs to sort subsequent load operations before, i.e. a load instruction accessing an address can only take the value of the address in memory after the Sync instruction performs global memory sorting. Because of out-of-order execution, after the load instruction first executes and takes the value in the memory, other programs in the storage system update the value of the address, and then the synchronous instruction executes and enters the storage subsystem to complete the ordering operation, and returns an answer sync_ack for ordering completion. After receiving the sync_ack, the processor core needs to query all load instructions that have been executed but not committed, if there is a load instruction that is invalidated by an external snoop (snoop) request (the load instruction is marked as invalid for execution or marked as snoop), representing that the load instruction is executed too early, the old value is taken, and if there is a synchronization instruction before the load instruction to order the memory instructions, a memory order violation occurs. If the instruction to be executed is a synchronous instruction, whether a Sync-hit-load occurs or not needs to be detected. The detection of the occurrence of the Sync-hit-load can be achieved by judging whether the load instruction in the load instruction reordering queue meets a third violation condition, wherein the third violation condition comprises thread information identical to thread information of an instruction to be executed, the execution state indicates that the execution is completed, the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed, and the snoped value is 1. And taking the instruction meeting the third violation condition in the load instruction reordering queue as the violation instruction associated with the instruction to be executed according to the load instruction reordering queue, the sequence identification of the instruction to be executed and the thread information.
In the process of comparing the sequence identifier of the instruction to be executed with the sequence identifier of the loading instruction, the sequence identifier of the instruction to be executed is always consistent with the type of the sequence identifier of the loading instruction, namely, when the sequence identifier of the instruction to be executed is the loading identifier corresponding to the instruction to be executed, the sequence identifier of the loading instruction is the loading identifier of the loading instruction; when the sequence identifier of the instruction to be executed is the instruction identifier corresponding to the instruction to be executed, the sequence identifier of the loading instruction is the instruction identifier of the loading instruction. The loading identifier corresponding to the instruction to be executed refers to the loading identifier recorded by the processor core at the moment before the instruction to be executed is dispatched to the transmitting queue; the instruction identifier of the instruction to be executed refers to an instruction identifier allocated by the processor core before the instruction to be executed is dispatched to the issue queue, and is used to characterize the sequence of each instruction in the program.
In this example, for various storage order violations, the violating instructions associated with the instructions to be executed can be accurately acquired.
In practical applications, the number of the violating instructions associated with the instructions to be executed may be multiple, and at this time, the oldest violating instruction in all the violating instructions needs to be further judged, and re-execution is performed from the oldest violating instruction. In one example, the second determination module may be specifically configured to:
If the number of the violation instructions associated with the instruction to be executed is one, the violation instructions are used as first violation instructions;
if the number of the violation instructions associated with the instructions to be executed is multiple, taking the violation instruction with the smallest sequence identifier as a first violation instruction according to the sequence identifiers of all the violation instructions associated with the instructions to be executed.
Specifically, when the number of the violating instructions associated with the instruction to be executed is one, the violating instruction can be directly determined as a first violating instruction; when the number of the violating instructions associated with the instructions to be executed is a plurality of, the oldest violating instruction can be determined through the sequence identification, namely, the violating instruction with the minimum sequence identification is used as the first violating instruction.
In this example, by the sequence identification, the first violating instruction can be accurately determined among the violating instructions associated with the plurality of instructions to be executed.
In practical application, there may be multiple violations of the storage sequence, and the violating instructions have multiple instructions to be executed, so a check request queue may be established, and the requests of the multiple instructions to be executed to determine the first violating instruction may be saved. In one example, the second determination module may be further configured to:
if the number of the violation instructions associated with the to-be-executed instruction is multiple, generating an inspection request corresponding to the to-be-executed instruction, wherein the inspection request corresponding to the to-be-executed instruction comprises an instruction identifier of the to-be-executed instruction and sequence identifiers of all the violation instructions associated with the to-be-executed instruction; adding an inspection request corresponding to an instruction to be executed into an inspection request queue;
Determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed, wherein the method comprises the following steps:
and aiming at the to-be-executed instruction corresponding to the first checking request in the checking request queue, determining the first violation instruction which is positioned at the forefront in the to-be-executed program from all violation instructions associated with the to-be-executed instruction.
Specifically, for a plurality of associated instructions to be executed with violations, generating an inspection request corresponding to the executed instructions, wherein the inspection request comprises an instruction identifier of the instructions to be executed and sequence identifiers of all the violations associated with the instructions to be executed; and adding the check request into a check request queue, wherein the check request queue stores the check request into the idle item. The check request queue may store a plurality of check requests, and send the first check request with respect to the oldest sequence in the check request queue to the oldest-Load pipeline by adopting a first-in first-out method, and determine the first violation instruction that is the forefront in the program to be executed from all the violation instructions associated with the instruction to be executed by comparing the sequence identification sizes of all the violation instructions associated with the instruction to be executed.
In this example, by establishing the check request queue, it is possible to sequentially determine, for a plurality of instructions to be executed associated with a plurality of instructions to be violating, instructions to be executed of which the order is oldest.
In one example, an execution module may be specifically configured to:
ending the out-of-order execution of the instruction at this time and acquiring the instruction identification of the first violation instruction; reading a program counter value corresponding to the first violation instruction from the instruction reordering buffer according to the instruction identification of the first violation instruction;
and executing the instruction to be executed with the instruction address of the program counter value corresponding to the first violating instruction and the following program out of order according to the program counter value corresponding to the first violating instruction.
Specifically, pipeline clearing is performed, and the out-of-order execution of the instructions of this time is ended. And acquiring the instruction identification of the first illegal instruction, and reading the program counter value corresponding to the first illegal instruction from the instruction reordering buffer according to the instruction identification of the first illegal instruction. In practical application, the instruction is stored in a memory space with continuous addresses, and the program counter is used for storing the addresses of the instruction. When the program is executed, the initial value of the program counter is the address of the first instruction of the program, the first step of instruction execution is to fetch the instruction from the memory according to the address in the program counter, and then the address in the program counter is automatically added with 1 or the address of the next instruction in the program is given by the transfer pointer. Therefore, according to the program counter value corresponding to the first violating instruction, the instruction with the instruction address of the program counter value corresponding to the first violating instruction in the program to be executed and the following programs can be executed out of order.
In this example, according to the instruction identifier of the first violating instruction, the program counter value corresponding to the first violating instruction is read from the instruction reordering buffer, so that the first violating instruction and the following instructions can be re-executed starting with the first violating instruction.
After re-performing out-of-order execution based on a first violation instruction corresponding to a certain to-be-executed instruction, a portion of to-be-executed instructions corresponding to an inspection request stored in an inspection request queue may be within a range of the out-of-order execution, so as to avoid wasting resources, in an example, the second determining module may be further configured to:
determining a second checking request in the checking request queue, wherein the instruction identification in the second checking request is larger than the instruction identification of the first illegal instruction;
the second inspection request is cleared from the inspection request queue.
Specifically, according to a first violation instruction corresponding to a certain instruction to be executed, re-executing the instruction in disorder, after finishing the instruction in disorder execution of this time, comparing the instruction identification of the first violation instruction with the instruction identification of the instruction to be executed stored in an inspection request queue, removing the inspection request with the instruction identification larger than the instruction identification of the first violation instruction, wherein the instruction to be executed corresponding to the inspection request is in the range of re-executing in disorder, so that the current storage sequence violation of the instruction to be executed can be eliminated, and the waste of power consumption is effectively avoided.
In this example, after the out-of-order execution of the instruction of this time is finished, the inspection request with the instruction identifier larger than the instruction identifier of the first violation instruction in the inspection request queue is removed, so that the oldest first violation instruction in the multiple associated violation instructions is effectively avoided for the instruction to be executed corresponding to the inspection request, the power consumption waste is reduced, and the performance of the processor is improved.
In practical application, the information of the instruction to be executed can be compared with the information in the reorder queue of the loading instruction based on the violation condition, a violation signal is generated, and all the violation instructions associated with the instruction to be executed are determined based on the violation signal. In one example, the second determination module may be further specifically configured to:
generating a violation signal corresponding to the instruction to be executed according to all the violation instructions associated with the instruction to be executed, wherein the violation signal comprises a plurality of bits, each bit of the violation signal corresponds to each item of a load instruction reordering queue one by one, the bit is a first value to indicate that the load instruction corresponding to the bit is the violation instruction associated with the instruction to be executed, and the bit is a second value to indicate that the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed;
And inputting the violation signals into the checking unit to obtain the sequence identification output by the checking unit, and using the load instruction corresponding to the sequence identification in the load instruction reordering queue as a first violation instruction.
Specifically, a violation signal may be generated based on the first violation condition, the second violation condition, or the third violation condition, where the violation signal includes a plurality of bits, each bit of the violation signal corresponds to each item of the load instruction reorder queue one by one, a value of each bit may be a first value or a second value, and the bit is the first value, and represents that a load instruction corresponding to the bit is a violation instruction associated with an instruction to be executed, and the load instruction and the instruction to be executed together form a storage sequence violation; the bit is a second value, the load instruction corresponding to the bit is not a violation instruction associated with the instruction to be executed, and the load instruction and the instruction to be executed do not form a storage sequence violation. For example, the first value may be 1, and the second value may be 0, that is, when a certain bit of the violation signal is 1, the load instruction corresponding to the bit is a violation instruction associated with the instruction to be executed; when a certain bit of the violation signal is 0, the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed.
When the bit with the value of the first value in the violation signal is one, taking a loading instruction corresponding to the bit as a first violation instruction; when the bit of the value of the violation signal in the first value is one, the violation signal is input into the checking unit, and the sequence of a plurality of violation instructions is compared through the checking unit to obtain the sequence identifier output by the checking unit; and the corresponding loading instruction is identified in the loading instruction reordering queue according to the sequence, and the loading instruction is used as a first illegal instruction. In practical application, the inspection request may include a violation signal corresponding to an instruction to be executed, and the violation signal corresponding to the inspection request with the first order in the inspection request queue is input into the inspection unit to determine a first violation instruction corresponding to the instruction to be executed.
Taking a Store-Hit-Load storage order violation as an example, based on a first violation condition, assuming that a Load instruction reorder queue has N items, comparing an order identifier of an instruction to be executed with an order identifier in the Load instruction reorder queue, generating an order signal including N bits, each bit of the order signal corresponding to each item of the Load instruction reorder queue one by one, an order identifier of a Load instruction corresponding to a bit with a value of 1 in the order signal being greater than an order identifier of the instruction to be executed, and an order identifier of a Load instruction corresponding to a bit with a value of 0 in the order signal being not greater than an order identifier of the instruction to be executed; comparing a storage address of an instruction to be executed with a loading address in a loading instruction reordering queue to generate a match signal comprising N bits, wherein each bit of the match signal corresponds to each item of the loading instruction reordering queue one by one, the loading address of the loading instruction corresponding to a bit with value of 1 in the match signal is the same as the storage address of the instruction to be executed, and the loading address of the loading instruction corresponding to a bit with value of 0 in the match signal is different from the storage address of the instruction to be executed; comparing the thread information of the instruction to be executed with the thread information in the load instruction reordering queue to generate a thread signal comprising N bits, wherein each bit of the thread signal corresponds to each item of the load instruction reordering queue one by one, the thread information of the load instruction corresponding to the bit with the value of 1 in the thread signal is the same as the thread information of the instruction to be executed, and the thread information of the load instruction corresponding to the bit with the value of 0 in the thread signal is different from the thread information of the instruction to be executed; according to the execution state in the load instruction reordering queue, generating a fin signal comprising N bits, wherein each bit of the fin signal corresponds to each item of the load instruction reordering queue one by one, the execution state of a load instruction corresponding to a bit with value 1 in the fin signal is executed, and the execution state of a load instruction corresponding to a bit with value 0 in the fin signal is unexecuted. Performing bit pressing and operation on the order signal, the match signal, the thread signal and the fin signal to obtain a violation signal, wherein the first value is 1, the second value is 0 at the moment, namely when a certain bit of the violation signal is 1, a loading instruction corresponding to the bit is a violation instruction related to an instruction to be executed; when a certain bit of the violation signal is 0, the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed. Inputting the violation signals into a multi-Hot detection circuit, judging whether the number of bits with the value of 1 in the violation signals is 1 through the multi-Hot detection circuit, if the number of bits with the value of 1 in the violation signals is 1, taking the violation signals as indexes, and determining a first violation instruction in a loading instruction reordering queue; if the number of bits with the value of 1 in the violation signal is a plurality of, determining a first violation instruction by the checking unit.
In this example, by generating a violation signal characterizing a violation condition of a load instruction in the load instruction reorder queue, it is possible to accurately determine a violation instruction associated with an instruction to be executed, and by the inspection unit, determine a first violation instruction among a plurality of violation instructions.
The structure of the checking unit can be various, in one example, the checking unit comprises a selector corresponding to each item in the load instruction reordering queue and a binary tree comparator; the second determining module may be further configured to:
each bit of the violation signal corresponding to the instruction to be executed is used as an enabling signal and is input to an enabling end of the selector corresponding to the corresponding item; the selector corresponding to each item receives a preset value and a sequence identifier of the loading instruction corresponding to the item, and is used for selecting and outputting the sequence identifier when the enabling signal is a first value and selecting and outputting the preset value when the enabling signal is a second value; the preset value is larger than the sequence identification of all the loading instructions in the loading instruction reordering queue;
inputting the result output by the selector corresponding to each item into a binary tree comparator to obtain the sequence identifier output by the binary tree comparator; the binary tree comparator is used for comparing the results output by all the selectors and outputting the minimum result.
Specifically, the enabling end of the 1 selector corresponding to each item in the load instruction reordering queue receives each bit value of the violation signal corresponding to the instruction to be executed, the selector corresponding to each item receives a preset value and the sequence identifier of the load instruction corresponding to the item, the sequence identifier of the load instruction corresponding to the item is selected to be output when the enabling signal is a first value, and the preset value is selected to be output when the enabling signal is a second value; wherein the preset value is greater than the sequential identification of all load instructions in the load instruction reorder queue so that it can be filtered out in a subsequent binary tree comparator comparison. And inputting the result output by each selector into a binary tree comparator, wherein the binary tree comparator consists of a plurality of 2-input comparison selectors, and each 2-input comparison selector outputs a smaller input value to finally obtain the minimum sequence identifier.
In the storage sequence violation processing device provided by the embodiment, in the process of performing instruction out-of-order execution on a program to be executed, for each current instruction to be executed, determining all violation instructions associated with the instruction to be executed in a load instruction reordering queue according to the load addresses, the thread information, the execution states, the unopened values and the sequence identifiers of different transmitted load instructions stored in the load instruction reordering queue, and determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; performing instruction out-of-order execution on a first violation instruction in a program to be executed and a program behind the first violation instruction; when the storage sequence violations result in the error of the execution results of part of the programs, the violating instructions with the error of the initial execution results in the programs are accurately determined, and re-execution is performed from the violating instructions, so that the waste of executed correct programs can be avoided, the power consumption waste is reduced while the storage sequence violations are realized, and the performance of a processor is improved.
Example III
Fig. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the disclosure, as shown in fig. 9, where the electronic device includes:
a processor 291, the electronic device further comprising a memory 292; a communication interface (Communication Interface) 293 and bus 294 may also be included. The processor 291, the memory 292, and the communication interface 293 may communicate with each other via the bus 294. Communication interface 293 may be used for information transfer. The processor 291 may call logic instructions in the memory 292 to perform the methods of the above-described embodiments.
Further, the logic instructions in memory 292 described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product.
The memory 292 is a computer-readable storage medium that may be used to store a software program, a computer-executable program, and program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 291 executes functional applications and data processing by running software programs, instructions and modules stored in the memory 292, i.e., implements the methods of the method embodiments described above.
Memory 292 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. Further, memory 292 may include high-speed random access memory, and may also include non-volatile memory.
The disclosed embodiments provide a non-transitory computer readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement the method of the previous embodiments.
Example IV
The disclosed embodiments provide a computer program product comprising a computer program which, when executed by a processor, implements the method provided by any of the embodiments of the disclosure described above.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A storage order violation processing method, comprising:
in the process of carrying out-of-order execution of instructions for a program to be executed, for each current instruction to be executed, determining all violation instructions associated with the instruction to be executed according to a loading instruction reordering queue;
determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; wherein the first violation instruction is a load instruction; the load instruction rearrangement sequence comprises a plurality of items, wherein different items correspond to different transmitted load instructions, and each item stores a load address of the load instruction corresponding to the item, thread information of the load instruction corresponding to the item, an execution state of the load instruction corresponding to the item, a snoped value of the load instruction corresponding to the item and a sequence identifier of the load instruction corresponding to the item;
and executing the first violation instruction and the following programs in the program to be executed out of order.
2. The method of claim 1, wherein determining all offending instructions associated with the instruction to be executed based on the load instruction reorder queue comprises:
if the instruction to be executed is a storage instruction, taking the instruction meeting the first violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the first violation condition comprises thread information identical to thread information of the instruction to be executed, a loading address identical to a storage address of the instruction to be executed, an execution state representing that the execution is completed, and a sequence identifier larger than a sequence identifier corresponding to the instruction to be executed;
if the instruction to be executed is a loading instruction, taking the instruction meeting a second violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the second violation condition comprises thread information being the same as thread information where the instruction to be executed is located, a loading address being the same as a loading address of the instruction to be executed, and an execution state representing that the execution is completed, and a sequence identifier being larger than a sequence identifier corresponding to the instruction to be executed; and the snoped value is 1;
If the instruction to be executed is a synchronous instruction, taking the instruction meeting a third violation condition in the loading instruction reordering queue as a violation instruction associated with the instruction to be executed; the third violation condition comprises thread information which is the same as the thread information of the instruction to be executed, the execution states of all the violation instructions associated with the instruction to be executed represent that the execution is completed, and the sequence identifier is larger than the sequence identifier corresponding to the instruction to be executed; and the snoped value is 1.
3. The method of claim 1, wherein the determining a first offending instruction that is forefront in the program to be executed from among all offending instructions associated with the instruction to be executed comprises:
if the number of the violation instructions associated with the instruction to be executed is one, the violation instructions are used as the first violation instructions;
and if the number of the violation instructions associated with the instruction to be executed is a plurality of, taking the violation instruction with the smallest sequence identifier as the first violation instruction according to the sequence identifiers of all the violation instructions associated with the instruction to be executed.
4. The method of claim 1, wherein the determining from all of the offending instructions associated with the instruction to be executed, before the first offending instruction that is the foremost in the program to be executed, further comprises:
If the number of the violation instructions associated with the to-be-executed instruction is multiple, generating a checking request corresponding to the to-be-executed instruction, wherein the checking request corresponding to the to-be-executed instruction comprises an instruction identifier of the to-be-executed instruction and sequence identifiers of all the violation instructions associated with the to-be-executed instruction; adding the inspection request corresponding to the instruction to be executed into an inspection request queue;
the determining the first violation instruction which is the forefront in the program to be executed from all violation instructions associated with the instruction to be executed comprises the following steps:
and aiming at an instruction to be executed corresponding to the first checking request in the checking request queue, determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed.
5. The method of claim 4, wherein said performing instruction out-of-order execution of the first offending instruction in the program to be executed and the programs following it, comprises:
ending the out-of-order execution of the instruction at this time and acquiring the instruction identification of the first violation instruction; reading a program counter value corresponding to the first violation instruction from an instruction reordering buffer according to the instruction identification of the first violation instruction;
And executing the instruction with the instruction address of the program counter value corresponding to the first violation instruction and the following programs in the program to be executed out of order according to the program counter value corresponding to the first violation instruction.
6. The method of claim 5, wherein after ending the out-of-order execution of the current instruction, further comprising:
determining a second check request in the check request queue, wherein an instruction identifier in the second check request is larger than an instruction identifier of the first violation instruction;
the second inspection request is cleared from the inspection request queue.
7. The method of any of claims 1-6, wherein the determining a first offending instruction that is forefront in the program to be executed from among all offending instructions associated with the instruction to be executed comprises:
generating a violation signal corresponding to the instruction to be executed according to all the violation instructions associated with the instruction to be executed, wherein the violation signal comprises a plurality of bits, each bit of the violation signal corresponds to each item of the load instruction reordering queue one by one, the bit is a first value to indicate that the load instruction corresponding to the bit is the violation instruction associated with the instruction to be executed, and the bit is a second value to indicate that the load instruction corresponding to the bit is not the violation instruction associated with the instruction to be executed;
And inputting the violation signals into a checking unit to obtain sequence identifications output by the checking unit, and reordering the loading instructions corresponding to the sequence identifications in the loading instruction reordering queue to serve as the first violation instructions.
8. The method of claim 7, wherein the check unit includes a selector and a binary tree comparator for each entry in the load instruction reorder queue; the step of inputting the violation signals into the checking unit to obtain the sequence identification output by the checking unit comprises the following steps:
each bit of the violation signal corresponding to the instruction to be executed is used as an enabling signal and is input to an enabling end of the selector corresponding to the corresponding item; the selector corresponding to each item receives a preset value and a sequence identifier of a loading instruction corresponding to the item, and is used for selecting and outputting the sequence identifier when the enabling signal is a first value and selecting and outputting the preset value when the enabling signal is a second value; the preset value is larger than the sequence identification of all the loading instructions in the loading instruction reordering queue;
inputting the result output by the selector corresponding to each item into a binary tree comparator to obtain a sequence identifier output by the binary tree comparator; the binary tree comparator is used for comparing the results output by all the selectors and outputting the minimum result.
9. A storage order violation processing device, comprising:
the first determining module is used for determining all violation instructions associated with each current instruction to be executed according to a loading instruction reordering queue in the process of performing instruction out-of-order execution on the program to be executed;
the second determining module is used for determining a first violation instruction which is positioned at the forefront in the program to be executed from all violation instructions associated with the instruction to be executed; wherein the first violation instruction is a load instruction; the load instruction rearrangement sequence comprises a plurality of items, wherein different items correspond to different transmitted load instructions, and each item stores a load address of the load instruction corresponding to the item, thread information of the load instruction corresponding to the item, an execution state of the load instruction corresponding to the item, a snoped value of the load instruction corresponding to the item and a sequence identifier of the load instruction corresponding to the item;
and the execution module is used for executing the first violation instruction and the following programs in the program to be executed in order.
10. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
The memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-8.
CN202311872930.1A 2023-12-29 2023-12-29 Storage order violation processing method, storage order violation processing device, electronic equipment and medium Pending CN117806706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311872930.1A CN117806706A (en) 2023-12-29 2023-12-29 Storage order violation processing method, storage order violation processing device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311872930.1A CN117806706A (en) 2023-12-29 2023-12-29 Storage order violation processing method, storage order violation processing device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN117806706A true CN117806706A (en) 2024-04-02

Family

ID=90425313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311872930.1A Pending CN117806706A (en) 2023-12-29 2023-12-29 Storage order violation processing method, storage order violation processing device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN117806706A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005533A1 (en) * 2006-06-30 2008-01-03 International Business Machines Corporation A method to reduce the number of load instructions searched by stores and snoops in an out-of-order processor
CN103984643A (en) * 2013-02-11 2014-08-13 想象力科技有限公司 Speculative load issue
CN105786448A (en) * 2014-12-26 2016-07-20 深圳市中兴微电子技术有限公司 Instruction scheduling method and device
CN113227969A (en) * 2019-04-03 2021-08-06 超威半导体公司 Speculative instruction wake-up for accommodating drain latency of memory violation ordering check buffer
CN117270971A (en) * 2023-09-15 2023-12-22 上海合芯数字科技有限公司 Load queue control method and device and processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005533A1 (en) * 2006-06-30 2008-01-03 International Business Machines Corporation A method to reduce the number of load instructions searched by stores and snoops in an out-of-order processor
CN103984643A (en) * 2013-02-11 2014-08-13 想象力科技有限公司 Speculative load issue
CN105786448A (en) * 2014-12-26 2016-07-20 深圳市中兴微电子技术有限公司 Instruction scheduling method and device
CN113227969A (en) * 2019-04-03 2021-08-06 超威半导体公司 Speculative instruction wake-up for accommodating drain latency of memory violation ordering check buffer
CN117270971A (en) * 2023-09-15 2023-12-22 上海合芯数字科技有限公司 Load queue control method and device and processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨洪斌;吴悦;刘权胜;: "同时多线程微处理器分布式保留站结构的数据流技术", 应用科学学报, no. 02, 15 March 2008 (2008-03-15) *

Similar Documents

Publication Publication Date Title
US5887161A (en) Issuing instructions in a processor supporting out-of-order execution
US8468539B2 (en) Tracking and detecting thread dependencies using speculative versioning cache
US9262160B2 (en) Load latency speculation in an out-of-order computer processor
US6721874B1 (en) Method and system for dynamically shared completion table supporting multiple threads in a processing system
US7409589B2 (en) Method and apparatus for reducing number of cycles required to checkpoint instructions in a multi-threaded processor
US8627044B2 (en) Issuing instructions with unresolved data dependencies
US5913048A (en) Dispatching instructions in a processor supporting out-of-order execution
US8074060B2 (en) Out-of-order execution microprocessor that selectively initiates instruction retirement early
US8335912B2 (en) Logical map table for detecting dependency conditions between instructions having varying width operand values
US10289415B2 (en) Method and apparatus for execution of threads on processing slices using a history buffer for recording architected register data
US6098167A (en) Apparatus and method for fast unified interrupt recovery and branch recovery in processors supporting out-of-order execution
US20100274961A1 (en) Physically-indexed logical map table
US20120066483A1 (en) Computing Device with Asynchronous Auxiliary Execution Unit
JP3577052B2 (en) Instruction issuing device and instruction issuing method
JP3183837B2 (en) How to detect load queue and bypass errors
TW201411485A (en) Zero cycle load
US20080005533A1 (en) A method to reduce the number of load instructions searched by stores and snoops in an out-of-order processor
US9378022B2 (en) Performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching
US10073699B2 (en) Processing instructions in parallel with waw hazards and via a distributed history buffer in a microprocessor having a multi-execution slice architecture
US10545765B2 (en) Multi-level history buffer for transaction memory in a microprocessor
US10007524B2 (en) Managing history information for branch prediction
CN117270971B (en) Load queue control method and device and processor
US8037366B2 (en) Issuing instructions in-order in an out-of-order processor using false dependencies
US20190187993A1 (en) Finish status reporting for a simultaneous multithreading processor using an instruction completion table
JP3683439B2 (en) Information processing apparatus and method for suppressing branch prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination