CN117785285A - Method, processor and medium for establishing load and store instruction dependencies - Google Patents

Method, processor and medium for establishing load and store instruction dependencies Download PDF

Info

Publication number
CN117785285A
CN117785285A CN202311769756.8A CN202311769756A CN117785285A CN 117785285 A CN117785285 A CN 117785285A CN 202311769756 A CN202311769756 A CN 202311769756A CN 117785285 A CN117785285 A CN 117785285A
Authority
CN
China
Prior art keywords
instruction
load
current
storage
store
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311769756.8A
Other languages
Chinese (zh)
Inventor
翟少敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hexin Technology Co ltd
Shanghai Hexin Digital Technology Co ltd
Original Assignee
Hexin Technology Co ltd
Shanghai Hexin Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hexin Technology Co ltd, Shanghai Hexin Digital Technology Co ltd filed Critical Hexin Technology Co ltd
Priority to CN202311769756.8A priority Critical patent/CN117785285A/en
Publication of CN117785285A publication Critical patent/CN117785285A/en
Pending legal-status Critical Current

Links

Abstract

The application provides a method, a processor and a medium for establishing load and store instruction dependency, wherein a trained LHS prediction table is queried according to address information of a load instruction in a current group of instructions which have completed decoding, and whether the load instruction has a SHL sequence violation is determined; and under the condition that the load instruction is determined to have the SHL sequence violation, determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor, and establishing a dependency relationship between the load instruction and the storage instruction so that the load instruction is executed after the execution of the storage instruction is completed or after the completion of the correct address translation of the storage instruction. The method for establishing the dependency of the load instruction and the store instruction has the advantage of accurately tracking the dependency of the load instruction and the store instruction.

Description

Method, processor and medium for establishing load and store instruction dependencies
Technical Field
The present application relates to the field of processor technologies, and in particular, to a method for establishing load and store instruction dependencies, a processor, and a medium.
Background
Currently, processors mostly exploit instruction-level parallelism in programs by out-of-order execution of instructions, thereby improving processor performance. Out-of-order execution allows incoherent instructions to cross long delayed events, thereby improving instruction throughput. However, even though instructions may be executed in various orders, they must update the processor state in program order to achieve an accurate exception. In a high performance processor, in order to fully utilize the performance benefits of out-of-order execution of load and store instructions, and at the same time, to be able to discover the semantic hazards introduced by out-of-order execution, the load and store instruction states entering the out-of-order window are typically maintained by 2 or more separate queues, namely LRQ (Load Reorder Queue, load instruction reorder queue) and SRQ (Store Reorder Queue ), respectively, and the potential semantic hazards are monitored.
An important function of LRQ is to generate a SHL signal during the process of querying LRQ by a store instruction, and if there are multiple match signals of 1 for each entry, it cannot be determined which entry corresponds to the oldest load instruction, and when SHL occurs, the oldest load instruction should be selected from all matched load instructions to be re-executed in order to reduce the waste of instruction execution. To reduce invalid instruction execution, many processors use hardware circuitry to find the oldest load instruction in the store-load, initiate a pipeline flush mechanism (pipeline flush), and then re-execute from the load instruction. As pipelined instruction out-of-order windows of processor cores become larger, flush caused by SHL may cause many instructions to be re-fetched and executed, potentially wasting performance and power consumption from properly executing completed instructions.
Disclosure of Invention
In view of the above-described shortcomings of the prior art, it is an object of the present application to provide a method, processor, and medium for establishing load and store instruction dependencies for solving the problem of SHL order violations in the prior art due to out-of-order execution of load instructions.
To achieve the above and other related objects, a first aspect of the present application provides a method of establishing load and store instruction dependencies, for use with a processor, the method comprising: inquiring a trained LHS prediction table according to address information of a load instruction in the current group of instructions which have completed decoding, and determining whether the load instruction has an SHL sequence violation; and under the condition that the load instruction is determined to have the SHL sequence violation, determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor, and establishing a dependency relationship between the load instruction and the storage instruction so that the load instruction is executed after the execution of the storage instruction is completed or after the completion of the correct address translation of the storage instruction.
In some embodiments of the first aspect of the present application, the means for training the LHS prediction table comprises: executing an entry filling operation on the current LHS prediction table whenever it is determined that a SHL sequence violation has occurred for a store instruction; wherein, the entry filling operation includes: generating reorder buffer location information of a loading instruction with SHL sequence violations occurring between the earliest and the current storage instruction; based on the constructed ordering table, obtaining the instruction distance between the current storage instruction and the loading instruction according to the current storage instruction and the reorder buffer location information of the loading instruction with the earliest SHL sequence violation with the current storage instruction; obtaining the address information of the current storage instruction and the loading instruction from an address table in the processor according to the current storage instruction and the reorder buffer memory position information of the loading instruction; and allocating an entry in the current LHS prediction table, writing the address information of the current storage instruction, the address information of the loading instruction and the instruction interval into the entry, and setting the state of the entry to be effective so as to update the current LHS prediction table.
In some embodiments of the first aspect of the present application, determining that there is a store instruction that causes an SHL order violation comprises: after a memory access address corresponding to a storage instruction is obtained, determining all loading instructions with the age sequence behind the storage instruction according to the age information of the storage instruction; according to the data address information of the storage instruction, matching the data address information of each loading instruction with the age sequence behind the storage instruction in a loading instruction reordering queue in the processor; if the address matching is successful, determining that the storage instruction has a SHL sequence violation; if the address matching is unsuccessful, determining that the storage instruction has not undergone an SHL sequence violation.
In some embodiments of the first aspect of the present application, the manner of constructing the sorted list includes: updating the accumulated instruction number of the current period based on the instruction number allocated in the previous period and the accumulated instruction number of the previous period; determining serial numbers of the load instruction and the store instruction allocated in the current period based on positions of the load instruction and the store instruction allocated in the current period in the instruction allocated in the current period and the accumulated instruction number of the current period respectively; respectively initializing the sequence numbers of the load instruction and the store instruction allocated in the determined current period into the ordering table entries correspondingly allocated by the reordering buffer; each ordering table entry allocated by the reorder buffer carries corresponding reorder buffer location information.
In some embodiments of the first aspect of the present application, obtaining, based on the constructed ordering table, an instruction distance between a current store instruction and a load instruction that is earliest in SHL order violation with the current store instruction and reorder buffer location information of the load instruction includes: according to the current storage instruction and the reorder buffer memory position information of the loading instruction with the earliest SHL sequence violation with the current storage instruction, respectively obtaining the current storage instruction and the sequence number of the loading instruction from the corresponding ordering table entry; and obtaining the instruction distance between the current storage instruction and the loading instruction based on the current storage instruction and the sequence number of the loading instruction.
In some embodiments of the first aspect of the present application, the means for updating the current synchronization table includes: each time a group of instructions finish decoding, inquiring a trained LHS prediction table according to address information of a storage instruction carried by the group of instructions, and determining whether the storage instruction has a SHL sequence violation or not so as to execute an entry updating operation on a current synchronous table under the condition that the storage instruction is determined to have the SHL sequence violation; wherein, the table entry updating operation includes: and distributing an item in a current synchronous table in the processor, writing a sequence number corresponding to a current storage instruction and age information into the item, and setting the item state of the item to be an association state and a valid state so as to update the current synchronous table.
In some embodiments of the first aspect of the present application, in the event that the load instruction is determined to have suffered a SHL order violation, determining the store instruction on which the load instruction depends by querying a current synchronization table in the processor comprises: inquiring a trained LHS prediction table according to the address information of the loading instruction to obtain a corresponding instruction interval; obtaining sequence number information based on the obtained instruction spacing and the sequence number of the loading instruction in the constructed sequencing table; based on the current synchronization table in the processor, according to the sequence number information, matching an entry comprising the sequence number information to determine a store instruction on which the load instruction depends.
In some embodiments of the first aspect of the present application, determining a store instruction on which the load instruction depends, and establishing a dependency between the load instruction and the store instruction comprises: obtaining an entry state from the matched entries; and under the condition that the table item state is the association state and the effective state, writing the age information obtained from the table item and the association state in the table item state into an instruction scheduler in the processor so as to establish the dependency relationship between the loading instruction and the storage instruction.
To achieve the above and other related objects, a second aspect of the present application provides a processor, comprising: the sequence violation judging module is used for inquiring the trained LHS prediction table according to the address information of the loading instruction in the current group of instructions which have completed decoding and determining whether the loading instruction has a SHL sequence violation or not; and the dependency relation establishing module is connected with the sequence violation judging module and is used for determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor under the condition that the load instruction is determined to have the SHL sequence violation, and establishing the dependency relation between the load instruction and the storage instruction so as to enable the load instruction to be executed after the execution of the storage instruction is completed or to be executed after the storage instruction is completed with correct address translation.
To achieve the above and other related objects, a third aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of establishing load and store instruction dependencies.
As described above, the method, processor and medium for establishing load and store instruction dependencies of the present application have the following beneficial effects:
Inquiring a trained LHS prediction table according to address information of a load instruction in the current group of instructions which have completed decoding, and determining whether the load instruction has an SHL sequence violation; and under the condition that the load instruction is determined to be in the SHL sequence violation, determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor, and establishing a dependency relationship between the two instructions so that the load instruction is executed after the execution of the storage instruction is completed or after the execution of the storage instruction is completed and the correct address translation is performed. The method for establishing the dependency of the load instruction and the store instruction has the advantage of accurately tracking the dependency of the load instruction and the store instruction.
Drawings
FIG. 1 is a flow chart illustrating a method for establishing load and store instruction dependencies in accordance with one embodiment of the present application.
FIG. 2 is a flow chart of a method for constructing a sorted list in an embodiment of the present application.
FIG. 3A is a schematic diagram of a sorted list in an embodiment of the present application.
FIG. 3B is a schematic diagram of a sorted list structure in an embodiment of the present application.
FIG. 3C is a schematic diagram of a sorted list in accordance with another embodiment of the present application.
FIG. 4 is a flow chart of a method for training the LHS prediction table in an embodiment of the present application.
Fig. 5 is a schematic flow chart of training an LHS prediction table in an embodiment of the present application.
FIG. 6 is a diagram illustrating the structure of the LHS prediction table, synchronization table, sequencing table and instruction scheduler according to one embodiment of the present application.
FIG. 7 is a flowchart of a method for updating a synchronization table according to an embodiment of the present application.
FIG. 8 is a flow chart illustrating a method for establishing a dependency relationship according to an embodiment of the present application.
FIG. 9 is a flowchart illustrating a method for releasing a dependency relationship according to an embodiment of the present application.
FIG. 10 is a flowchart of updating a synchronization table according to an embodiment of the present application.
FIG. 11 is a flow chart illustrating the establishment of a dependency relationship according to an embodiment of the present application.
Fig. 12 is a schematic diagram of a processor according to an embodiment of the present application.
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the present disclosure, when the following description of the embodiments is taken in conjunction with the accompanying drawings. The present application may be embodied or carried out in other specific embodiments, and the details of the present application may be modified or changed from various points of view and applications without departing from the spirit of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings, which describe several embodiments of the present application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "upper," and the like, may be used herein to facilitate a description of one element or feature as illustrated in the figures as being related to another element or feature.
In this application, unless specifically stated and limited otherwise, the terms "mounted," "connected," "secured," "held," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.
Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, operations, elements, components, items, categories, and/or groups. It will be further understood that the terms "or" and/or "as used herein are to be interpreted as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions or operations are in some way inherently mutually exclusive.
The application provides a method, a processor and a medium for establishing load and store instruction dependency, wherein a trained LHS prediction table is queried according to address information of a load instruction in a current group of instructions which have completed decoding, and whether the load instruction has a SHL sequence violation is determined; and under the condition that the load instruction is determined to have the SHL sequence violation, determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor, and establishing a dependency relationship between the load instruction and the storage instruction so that the load instruction is executed after the execution of the storage instruction is completed or after the completion of the correct address translation of the storage instruction. The method for establishing the dependency of the load instruction and the store instruction has the advantage of accurately tracking the dependency of the load instruction and the store instruction.
Before explaining the present invention in further detail, terms and terminology involved in the embodiments of the present invention will be explained, and the terms and terminology involved in the embodiments of the present invention are applicable to the following explanation:
(1) LSU (Load Store Unit), the data load and store unit in the processor is responsible for executing load, store instructions. LSUs in high performance processors typically include data caches, page table caches, load queues, store queues, and interfaces to external storage subsystems.
(2) LRQ (Load Reorder Queue, load instruction reorder queue) storing physical addresses of completed load instructions, snooped flags, load tags, and other control/flag information; the store and load instructions on the LSU pipeline can query the LRQ, compare the load tag to determine age information, and compare the physical address to determine address conflict; this may detect a storage order hazard or storage order violation for load-load and store-load. Each load instruction may be assigned to an LRQ entry during either the dispatch stage or the issue stage. The LRQ entry allocation may be designed such that the physical order is independent of the program order, or the LRQ entry may be allocated in a load program order, i.e., the load program order is reflected in its corresponding LRQ entry physical order.
(3) LoadQ: load Reorder queue with re-issue function; when the Load instruction generates instruction hazards, namely Load cache miss, TLB miss or cache bank conflict; load needs to stay in LoadQ, wait for the hazard to be released and then be re-launched from LoadQ, i.e. re-issue.
(4) SRQ (Store Reorder Queue ): is responsible for tracking all Store instruction states, data addresses, data bit widths, store types from dispatch to commit. When a Load instruction of the same address and thread is encountered, bypass data may be provided for it without waiting for a commit to the storage subsystem before providing the Load with data. The store requests are sequentially controlled to be updated to the memory subsystem in program sequence in the submitting stage.
(5) SQD (Store Data Queue), store Data Queue: data queue attached to SRQ, SRQ as control part, SDQ stores data of store instruction completed by execution, but these data can only be stored in SDQ for sequential commit reason; while the SDQ may provide bypass/forwarding data to the Load.
(6) SB/StoreQ (StoreBuffer, storeQuue): a combination of SRQ and SDQ.
(7) Store-Hit-Load: in short, SHL is called, and the store and load instructions with the same address are accessed in sequence, because the load out-of-order execution takes the memory value of the older store (before the store writes the data), then the store succeeds in executing and writes the new value, but other instructions depending on the load already take the old value to execute the instructions, thus causing the storage order violation. To detect this sequence violation, the store needs to check the address of the completed load in the LRQ and the new-old relationship to the store when it finds that a load in the LRQ with the same address and thread instruction sequence after the store has performed the completion return data, this type of situation is called SHL.
(8) Load-hit-Load: for short, LHL, load-load is for multithreaded programs. The Load-Load sequence violation is mainly that Load1 and Load2 instructions which access the same address in sequence in a program, because of out-of-order execution, load2 is executed first and takes the value in the memory; then there are other programs in the storage system that also update the value of the address, after which load1 executes and takes the latest value in memory, because this violates the program's order consistency, commonly referred to as a load-load order violation.
(9) Sync-hit-load: sync-load is also for multithreaded programs. Such sequence violations are mainly sync and load instructions in the program, where the sync needs to sort the subsequent load operations before, i.e. the load of the access address a can only take to the sync for global memory sorting, and then the value of the address a in the memory. Because of the out-of-order execution, the load is executed first, and the value in the memory is taken; then, other programs in the storage system update the value of the address, and the sync then executes and enters the storage subsystem to complete the ordering operation and returns an acknowledgement of the ordering completion, sync_ack. When the processor core needs to query all load that have been executed but not committed after receiving the sync_ack, if there is a load that is invalidated by an external snoop (snoop) request (load marked as invalid for execution or marked as snoop), which represents that the load is executed too early, the old value is taken, if the load is preceded by a sync instruction for ordering the memory instructions, a sync-load order violation is violated, and the load needs to be re-executed.
(10) Load-hit-store: for short, LHS, when a Store and a Load of the same program in sequence have the same data address, because of instruction scheduling, when the Store is executed first and then the Load is sent to the LSU for execution, the Load will hit in the SRQ. This scenario is called Load-Hit-Store. After LHS occurs, the SRQ typically generates forwarding index, sends it to the SDQ, indexes forwarding data to Load, which needs to ignore data in cache even if data cache hit occurs, and accepts forwarding data because forwarding data is the latest data.
(11) Memory order violation (memory order violation), the semantic hazard results from the out-of-order execution of the access instructions being inconsistent with the results of the execution of the instructions in program order. Generally includes 3 types memory order violation: a. store-load order violations; b. load-load order violations; c. sync-load order violations.
(12) OoO (Out-of-Order Execution, out-of-Order Execution instruction): processors typically increase instruction parallelism by out-of-order execution, masking the latency of partially long-latency instructions.
(13) Semantic adventure: the processor presents various semantic hazards during out-of-order instructions, including: a. program flow errors caused by branch prediction errors; b. load, store out of order execution, store semantic hazards.
(14) Pipeline Flush: pipeline flushing, i.e., flush pipeline, is typically performed when a processor experiences a semantic hazard and the instruction information already on the processor pipeline is not sufficient to re-execute to avoid the semantic hazard. Instruction execution is then re-fetched from the correct instruction.
(15) Re-Order Buffer, ROB for short, and a table storing the information of the disordered instruction in the disordered processor; entering a table from an instruction dispatch stage; after the instruction sequence is committed, the corresponding entry is released. ROBs are responsible for sequential commit of instructions and state recovery in exceptional cases (instruction exceptions; branch prediction errors; storage order violations, etc.). The location of each instruction in the ROB, i.e., the instrucition tag, is denoted herein by itag, abbreviated itag.
(16) Instructions to: the instruction set specification describes commands that modify the state of a processor, and is divided into the cisc, risc instruction sets. Cisc has mainly the X86 instruction set; risc is mainly arm, power, riscv and the like.
(17) Microinstructions: for convenience of processing, the processor microarchitecture translates instructions into an internal microarchitectural instruction format, referred to as microinstructions, which may be modified according to the microarchitectural requirements. An instruction may be decoded into one or more micro instructions. Sometimes referred to as micro ops, uops, iop (intermediate ops).
(18) Ltag: in order to distinguish the age order among the instructions, the processor numbers the Load instructions in the out-of-order instruction queue, and allocates a Load Tag, abbreviated as Ltag, before the Load instructions are dispatched to the instruction issue queue.
(19) Stag: in order to distinguish the age order among instructions, the processor numbers the Store instructions in the out-of-order instruction queue, and allocates a Store Tag, abbreviated as Stag, before the Store instructions are dispatched to the instruction issue queue.
(20) store instruction: the instructions are stored.
(21) load instruction: a load instruction.
In order to make the objects, technical solutions and advantages of the present invention more apparent, further detailed description of the technical solutions in the embodiments of the present invention will be given by the following examples with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
FIG. 1 is a flow chart illustrating a method of establishing load and store instruction dependencies in an embodiment of the invention.
The method for establishing the dependency of the load and store instructions comprises the following steps:
step S101: and querying a trained LHS prediction table according to address information of a load instruction in the current group of instructions which have completed decoding, and determining whether the load instruction has an SHL sequence violation.
In one embodiment, the way the LHS prediction table is trained includes: executing an entry filling operation on the current LHS prediction table whenever it is determined that a SHL sequence violation has occurred for a store instruction; wherein, the entry filling operation includes: generating reorder buffer location information of a loading instruction with SHL sequence violations occurring between the earliest and the current storage instruction; based on the constructed ordering table, obtaining the instruction distance between the current storage instruction and the loading instruction according to the current storage instruction and the reorder buffer location information of the loading instruction with the earliest SHL sequence violation with the current storage instruction; obtaining the address information of the current storage instruction and the loading instruction from an address table in the processor according to the current storage instruction and the reorder buffer memory position information of the loading instruction; and allocating an entry in the current LHS prediction table, writing the address information of the current storage instruction, the address information of the loading instruction and the instruction interval into the entry, and setting the state of the entry to be effective so as to update the current LHS prediction table.
In one embodiment, determining that a SHL order violation occurred with a store instruction comprises: after a memory access address corresponding to a storage instruction is obtained, determining all loading instructions with the age sequence behind the storage instruction according to the age information of the storage instruction; according to the data address information of the storage instruction, matching the data address information of each loading instruction with the age sequence behind the storage instruction in a loading instruction reordering queue in the processor; if the address matching is successful, determining that the storage instruction has a SHL sequence violation; if the address matching is unsuccessful, determining that the storage instruction has not undergone an SHL sequence violation.
In one embodiment, as shown in fig. 2, the manner of constructing the sorting table includes: updating the accumulated instruction number (cur_acc_inst_num) of the current cycle based on the instruction number (cur_inst_num) allocated in the previous cycle and the accumulated instruction number of the previous cycle; determining sequence numbers (inst_num) of the load instruction and the store instruction allocated in the current cycle based on positions of the load instruction and the store instruction allocated in the current cycle in the instruction allocated in the current cycle and the accumulated instruction number of the current cycle, respectively; respectively initializing the sequence numbers of the load instruction and the storage instruction allocated in the determined current period into the corresponding allocated ordering table items of the reordering buffer; each ordering table entry allocated by the reorder buffer carries corresponding reorder buffer location information.
In one embodiment, based on the constructed ordering table, obtaining the instruction distance between the current store instruction and the load instruction according to the current store instruction and the reorder buffer location information of the load instruction for which the SHL order violation occurred with the current store instruction comprises: according to the current storage instruction and the reorder buffer memory position information of the loading instruction with the earliest SHL sequence violation with the current storage instruction, respectively obtaining the current storage instruction and the sequence number of the loading instruction from the corresponding ordering table entry; and obtaining the instruction distance between the current storage instruction and the loading instruction based on the current storage instruction and the sequence number of the loading instruction.
The manner in which the sorted list is constructed will first be explained in detail below in conjunction with fig. 2, 7 and 8:
because in out-of-order executing processor cores, the processor will dispatch instructions in order, execute out-of-order, and finally commit instructions in order. . ROB (RE-Order Buffer), is used to record the Order of instructions in the program. An instruction does not commit (i.e., modify processor state) immediately after execution is completed, but instead waits in the ROB before it commits all instructions before it can commit the result to the logical register file.
Each cycle will have instructions decoded and renamed written to the ROB, and the ROB will allocate an entry for each executed instruction. And itag (instruction tag) indicates the location of each instruction in the ROB.
In the present invention, the reorder buffer location information is itag.
In this embodiment, the accumulated instruction number per cycle is updated by an accumulator in the processor. Taking the calculation of the start of the third cycle as an example, assuming that the accumulated instruction number of the second cycle is 2 and the instruction number allocated to the second cycle is 3, the accumulated instruction number of the third cycle is 5. If the number of instructions allocated in the third cycle is 4, the accumulated number of instructions in the fourth cycle is 9.
If the instruction allocated in the current period comprises a storage instruction and a loading instruction, after the accumulated instruction number of the current period is updated, determining the positions of the loading instruction and the storage instruction allocated in the current period in the instruction allocated in the current period, and determining the serial numbers respectively corresponding to the loading instruction and the storage instruction allocated in the current period according to the accumulated instruction number of the current period.
For example, assuming that the accumulated instruction number of the current cycle is 9, the instruction number allocated in the current cycle is 5, the position of the store instruction allocated in the current cycle in the instruction allocated in the current cycle is 1 st bit, the position of the load instruction allocated in the current cycle in the instruction allocated in the current cycle is 5 th bit, then the corresponding sequence number of the store instruction allocated in the current cycle is 5, and the corresponding sequence number of the load instruction allocated in the current cycle is 9.
After determining the serial numbers respectively corresponding to the load instruction and the store instruction allocated in the current period, initializing the serial number corresponding to the load instruction into the table item corresponding to the load itag in the ROB according to the reorder buffer location information (load itag) of the load instruction allocated in the current period, namely the location information of the load instruction in the ROB; and initializing a sequence number corresponding to the storage instruction into an entry corresponding to the store itag in the ROB according to the reorder buffer location information (store itag) of the storage instruction allocated in the current period, namely the location information of the storage instruction in the ROB.
It should be noted that, the serial numbers of the load instruction and the store instruction in this embodiment are different from the processors of the store itag and the load itag, respectively. In order to simplify the execution logic of complex instructions, some complex instructions are often split and decoded into multiple micro instructions. When multiple microinstructions are located between a load instruction and a store instruction, the distance obtained by store itag and load itag cannot accurately represent the distance between the load instruction and the store instruction. And the values of distances obtained by store itag and load itag are different when instructions are fused or instructions are not fused. Thus, the distance obtained by store itag and load itag cannot accurately represent the distance between a load instruction and a store instruction in the case of macro instruction fusion. The instruction interval calculated by the sequence number corresponding to the load instruction and the sequence number corresponding to the store instruction determined in the embodiment is based on the instruction boundary of the ISA layer, and no matter whether macro instruction fusion is performed on the program instruction, the ISA instruction interval will not change.
For example, as shown in fig. 3A, an initialization scheme is shown when ROB size=256. Assuming that the microinstruction allocation bandwidth is W, cur_inst_num=n for the current cycle allocation. Assuming that the allocated bandwidth w=8, the cumulative instruction number of the current cycle is 10. As shown in fig. 3B, the beginning itag=20 and ending itag=27 of ROB allocation are shown. At this time, the set of instructions has 6 instructions, there are 2 instructions to split, each Pc needs to split into 2 instructions, and the inst_num=10 corresponding to the start itag, so the inst_num=15 corresponding to the end itag. As shown in fig. 3C, the beginning itag=20 and ending itag=27 of the ROB allocation is shown. At this time, the group of instructions is 10, 4 instructions are respectively fused into 1 macro instruction, and the 4 instructions finally occupy 2 ROB table entries. The inst_num=10 corresponding to the start itag, and thus the inst_num=19 corresponding to the end itag.
It should be noted that the program needs to be compiled into a language that can be understood by the processor before it is executed, and that the language or specification interprets the instruction set ISA (Instruction Set Architectures).
Before explaining the way in which the LHS prediction table is trained, the way in which a SHL order violation of a store instruction is determined to occur is explained in detail below:
After a store instruction successfully obtains its corresponding memory address, according to age information (Stag) allocated by the processor to the store instruction, querying all load instructions after the store instruction in age order. And matching, in a load instruction reorder queue (LRQ) in the processor, data address information of each load instruction whose age order follows the store instruction according to the data address information of the store instruction. If the same data address information is matched in the load instruction reorder queue, the SHL order violation is indicated as occurring between the load instruction and the store instruction.
Note that the data address information mentioned in this embodiment is a physical data address (physical data Address). The SHL order violation is discovered based on a physical data address (physical data Address) match.
The LHS prediction table is used to store information about store instructions and load instructions that are subject to SHL order violations, and the manner in which the prediction table is trained is explained in detail below in conjunction with fig. 4 and 5:
when it is determined that a store instruction A has a sequence violation, reorder buffer location information (st_itag) of the store instruction A, i.e., the location of the store instruction A in the reorder buffer, is generated, and the store instruction A notifies a load instruction age information generation module (oldest violated ld _itag generation) to generate reorder buffer location information (ld_itag) of the load instruction that has an earliest SHL sequence violation with the store instruction A, i.e., the location information of the load instruction B in the reorder buffer.
After obtaining the reorder buffer memory position information of the storage instruction A and the reorder buffer memory position information of the loading instruction B which is the earliest violating the SHL sequence of the storage instruction, according to the reorder buffer memory position information (st_itag) of the storage instruction A, matching the table entry corresponding to the reorder buffer memory position information of the storage instruction A in the constructed ordering table (inst_num table), and obtaining the sequence number of the storage instruction A from the matched table entry; according to the reorder buffer location information (ld_itag) of the load instruction B, matching the table entry corresponding to the reorder buffer location information of the load instruction B in the constructed ordering table, and obtaining the sequence number of the load instruction B from the matched table entry. After the sequence numbers are obtained, subtracting operation is carried out on the sequence numbers corresponding to the storage instruction A and the sequence numbers corresponding to the loading instruction B, and corresponding instruction intervals are obtained.
For example, according to st_itag of store instruction A, matching is performed in the constructed sorted list, obtaining a sequence number (inst_num) of store instruction A of 10. According to the ld_itag of the load instruction B, matching is performed in the constructed ordered list, and the sequence number (inst_num) of the load instruction B is 30. Then the instruction spacing between store instruction a and load instruction B is (30-10=20). After obtaining the reorder buffer location information of the store instruction a and the reorder buffer location information of the load instruction B, obtaining the address information (st_pc) of the store instruction a, that is, the PC information of the store instruction a, from the address table (pc_table) in the processor according to the reorder buffer location information (st_itag) of the store instruction a; the address information (ld_pc) of the load instruction B, i.e., the PC information of the load instruction B, is obtained from an address table in the processor according to the reorder buffer location information (ld_itag) of the load instruction B.
As shown in fig. 5, an entry is allocated in a prediction table (prediction table), the obtained address information (st_pc) of the store instruction a, address information (ld_pc) of the load instruction B, and instruction interval (inst_cnt_distance) between the store instruction a and the load instruction B are stored in the entry, and the state of the entry is set to be valid.
In one embodiment, as shown in fig. 6, vld in each entry in the prediction table (prediction table) is represented as the state of the corresponding entry. When vld=1, the state of the corresponding entry is indicated as being set to be valid; when vld=0, it indicates that the state of the corresponding entry is invalid. The subsequent establishment of the dependency relationship can only be performed if vld=1.
In one embodiment, to filter out occasional SHL order violations, a saturation counter is added to the prediction table. The saturation counter is initialized to 0, and when two SHL sequence violations occur in the PC information (address information) corresponding to a group of load instructions and store instructions, respectively, the subsequent establishment of the dependency relationship is performed.
In one embodiment, if a SHL order violation occurs in a store instruction and a plurality of load instructions, the corresponding information may be stored via a plurality of entries. For example, if a store instruction and two load instructions have a SHL order violation, two entries may be allocated in the prediction table, where the information written in the two entries is: st_pc, ld_pc_1, inst_cnt_distance_1_1, vld=1, st_pc, ld_pc_2, inst_cnt_distance_2, vld=1.
In one embodiment, after the current set of instructions has been decoded, the trained LHS prediction table is queried based on the address information (PC information) of the store instruction and the address information (PC information) of the load instruction in the set of instructions. If the address information of the stored instruction in the group of instructions is matched in the trained LHS prediction table and the corresponding entry state is valid, the stored instruction in the group of instructions and the load instruction in other groups of instructions are in SHL sequence violation, and a dependency relationship needs to be established. If the address information of the load instruction in the group of instructions is matched in the trained LHS prediction table and the corresponding entry state is valid, it indicates that the load instruction in the group of instructions and the store instruction in the other group of instructions have a SHL sequence violation, and a dependency relationship needs to be established.
Step S102: and under the condition that the load instruction is determined to have the SHL sequence violation, determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor, and establishing a dependency relationship between the load instruction and the storage instruction so that the load instruction is executed after the execution of the storage instruction is completed or after the completion of the correct address translation of the storage instruction.
In one embodiment, the method for updating the current synchronization table includes: each time a group of instructions finish decoding, inquiring a trained LHS prediction table according to address information of a storage instruction carried by the group of instructions, and determining whether the storage instruction has a SHL sequence violation or not so as to execute an entry updating operation on a current synchronous table under the condition that the storage instruction is determined to have the SHL sequence violation; wherein, the table entry updating operation includes: and distributing an item in a current synchronous table in the processor, writing a sequence number corresponding to a current storage instruction and age information into the item, and setting the item state of the item to be an association state and a valid state so as to update the current synchronous table.
In one embodiment, in the event that the load instruction is determined to have suffered a SHL order violation, determining the store instruction on which the load instruction depends by querying a current synchronization table in the processor comprises: inquiring a trained LHS prediction table according to the address information of the loading instruction to obtain a corresponding instruction interval; obtaining sequence number information based on the obtained instruction spacing and the sequence number of the loading instruction in the constructed sequencing table; based on the current synchronization table in the processor, according to the sequence number information, matching an entry comprising the sequence number information to determine a store instruction on which the load instruction depends.
In one embodiment, determining the store instruction on which the load instruction depends and establishing a dependency relationship between the load instruction and the store instruction comprises: obtaining an entry state from the matched entries; and under the condition that the table item state is the association state and the effective state, writing the age information obtained from the table item and the association state in the table item state into an instruction scheduler in the processor so as to establish the dependency relationship between the loading instruction and the storage instruction.
The manner in which the synchronization table is updated will be explained in detail below in conjunction with fig. 7:
taking the example that the current group of instructions includes a storage instruction a:
after the current group of instructions finish decoding, inquiring a trained LHS prediction table according to address information of the storage instruction A in the current group of instructions, and determining whether the storage instruction A in the current group of instructions has a SHL sequence violation. In the case that the store instruction a in the current set of instructions determines that a SHL order violation has occurred, the store instruction a requests that a current synchronization table in the processor allocate an entry for the store instruction a, stores a sequence number (st_ict) and corresponding age information (stag) generated when the store instruction a is allocated to the ROB into the entry, and sets an entry state of the entry to an associated state and an active state for a load instruction that has occurred a SHL order violation with the store instruction a to establish a dependency relationship with the store instruction a later.
For example, as shown in fig. 6, let sync_ld=1 and vld=1 in the entry corresponding to the store instruction a, where sync_ld=1 is indicated as the associated state and vld=1 is indicated as the valid state. When syncld=1 and vld=1 in the entry corresponding to the store instruction a, the store instruction a is indicated as an effective store instruction, and other load instructions dependent on the store instruction a need to establish a dependency relationship with the store instruction a, so that the load instructions dependent on the store instruction a must be executed after the store instruction a completes execution. This ensures that load instructions that rely on store instruction A do not get erroneous data and pass it on to other instructions.
When the store instruction a completes execution before the load instruction B (SHL order violation occurred with the store instruction a) queries the current synchronization table, the store instruction a needs to clear the association state in the corresponding entry in the current synchronization table, i.e., make sync_ld in the entry 0, i.e., the subsequent load instruction B does not need to go through to establish the association state with the store instruction a and must begin execution after the execution of the store instruction a has ended. When store instruction A is flushed from the pipeline by the processor, the associated state of the corresponding entry in the current synchronization table also needs to be cleared, i.e., to have sync_ld in that entry be 0.
In connection with the above manner of updating the synchronization table, the manner in which the dependency relationship is established will be explained in detail with reference to fig. 8 below:
taking the example of a store instruction a that includes a load instruction B in the current set of instructions and that the load instruction B depends on:
after the current group of instructions finish decoding, the trained LHS prediction table is queried according to the address information (ld_pc) of the load instruction B in the current group of instructions, and whether the load instruction B in the current group of instructions has the SHL sequence violation or not is determined. In the case where the load instruction B is determined to have occurred in the SHL order violation, a corresponding instruction space (inst_cnt_distance) is obtained from a prediction table (prediction table) as shown in fig. 6 based on the address information (ld_pc) of the load instruction B, and the sequence number (ld_ict) of the load instruction B in the reorder buffer (ROB) is used as a reduced number with the instruction space as a reduced number, to obtain a corresponding subtraction result. The result of this subtraction is the sequence number (st_icnt) of the store instruction on which load instruction B depends, i.e., st_icnt=ld_icnt-inst_cnt_distance.
After the sequence number (st_icnt) of the store instruction relied on by the load instruction B is obtained, matching is carried out in the current synchronous table according to the sequence number (st_icnt) of the store instruction relied on by the load instruction B to determine whether an entry corresponding to the sequence number exists in the synchronous table, so that age information is obtained from the matched entry. And determining a storage instruction on which the loading instruction B depends according to the obtained age information. In this embodiment, load instruction B determines store instruction a as the store instruction on which it depends based on age information obtained from the current synchronization table.
When the matching is successful (namely, the table entry corresponding to the sequence number exists in the synchronous table), the table entry state is obtained from the table entry corresponding to the sequence number. If the obtained entry state is the association state and the valid state, the load instruction B may be indicated as establishing a dependency relationship with the store instruction A.
The process of establishing the dependency relationship between the load instruction B and the store instruction A comprises the following steps: an entry is allocated in the instruction scheduler, the age information of the stored instruction a and the associated status are written into the entry, and the status of the entry is set to be valid. Store instruction A's age information will wait for LSU to de-rely on the tag as load instruction B enters the instruction scheduler. Before the dependency is released, load instruction B is not within the instruction scheduler scheduling range.
In one embodiment, as shown in fig. 9, each cycle detects whether the syncld of each entry in the instruction scheduler is cleared, and when syncld=0, the corresponding load instruction is indicated as not being limited by the store instruction it depends on, and may be scheduled after the instruction operands and execution resources are ready. When syncld=1, it is indicated that the corresponding load instruction is limited by the store instruction it depends on, and when the store instruction it depends on is not complete in execution, the load instruction is not within the scheduling scope of the instruction scheduler. And the clr_sync_ld, clr_stag signal sent by the LSU needs to be monitored at any time, and once clr_sync_ld=1 and Clr-stag matches with self-stag, the clr_ld is set to 0.
It should be noted that, the method for establishing the dependency of the load and store instructions in the present invention can accurately record the instruction addresses of a pair of load-store instructions in which the SHL sequence violation occurs after the SHL sequence violation occurs, and record the dependency relationship between the two in the dynamic instruction, for example, accurately track the result of which cycle the load instruction of a certain cycle depends on after a certain static instruction cycle is expanded in the dynamic execution process. The method for establishing the dependency of the loading and storing instructions can accurately calculate the instruction distance between the loading instructions and the storing instructions after the SHL sequence violations occur, so that the dependency relationship between the loading instructions and the storing instructions can be accurately recorded. In particular, in the case of current processors that mostly employ instruction fusion, fusion sometimes occurs due to hardware limitations, and fusion sometimes does not occur, which causes trouble in calculating instruction spacing of a pair of load-store instructions where SHL order violations occur. The method for establishing the dependency of the loading and storing instructions ensures that the instruction distance between the loading instructions and the storing instructions can be obtained after the SHL sequence violation occurs by recording the number of each instruction in the execution process, namely, a plurality of instructions are separated.
To better illustrate the method of establishing load and store instruction dependencies, a specific embodiment is now provided.
Embodiment one: a process for establishing load and store instruction dependencies.
The load instruction of ld_pc=0xe80 depends on the store instruction of st_pc= xea0, the load instruction of ld_pc=0xe80 depends on the store instruction of st_pc= xea0, and a SHL order violation occurred. The number of instructions in store and load instruction intermediate when executed, instcnt distance=30. To distinguish between different dynamic instructions, instructions at different functions/loops are labeled A, B, C, D … at the same pc.
As shown in fig. 10, the store instruction C of 0xea0 is assigned a stop=50 and a sequence number st_ict=30 after decoding is completed. After the store instruction C queries the prediction table, a valid content match for st_pc is found. The sync table is then notified to allocate an entry for recording information about store instruction C. So that a load instruction associated with store instruction C may establish a dependency therewith through the entry. The Sync table allocates an item for the storage instruction C, initializes st_icnt and tag information in the item, and sets sync_ld and vld to 1; indicating that there is a potential load instruction dependent on a store instruction of st_icnt=30.
As shown in fig. 11, a load instruction with 0xe80 is allocated with ld_icnt=48 after several cycles. The load instruction queries the ld_pc in the prediction table, finds that the ld_pc of the load instruction is matched with one ld_pc in the prediction table, and reads out the corresponding inst_cnt_distance. The load instruction derives the st_icnt=30 of the store instruction that the load instruction depends on by subtracting inst_cnt_distance=18 from just allocated ld_icnt=48. According to the just calculated st_icnt=30, searching in the sync table, finding that one of the table entries is matched, and reading out the corresponding stag if the sync_ld=1 and vld=1 in the table entry. The tag will be written to the instruction scheduler along with the load instruction.
Similar to the above embodiments, the present invention also provides a processor.
Specific embodiments are provided below with reference to the accompanying drawings:
as shown in fig. 12, a schematic diagram of a processor structure in an embodiment of the present invention is shown.
The processor 12 includes:
a sequence violation judging module 121, configured to query a trained LHS prediction table according to address information of a load instruction in a current group of instructions that have completed decoding, and determine whether a SHL sequence violation has occurred in the load instruction;
the dependency relationship establishing module 122 is connected to the sequence violation judging module 121, and is configured to determine, by querying a current synchronization table in the processor, a store instruction on which the load instruction depends, and establish a dependency relationship between the load instruction and the store instruction, so that the load instruction is executed after the store instruction is executed or executed after the store instruction completes correct address translation, if it is determined that the load instruction has a SHL sequence violation.
It should be noted that the above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), and the like; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In one embodiment, the way the LHS prediction table is trained includes: executing an entry filling operation on the current LHS prediction table whenever it is determined that a SHL sequence violation has occurred for a store instruction; wherein, the entry filling operation includes: generating reorder buffer location information of a loading instruction with SHL sequence violations occurring between the earliest and the current storage instruction; based on the constructed ordering table, obtaining the instruction distance between the current storage instruction and the loading instruction according to the current storage instruction and the reorder buffer location information of the loading instruction with the earliest SHL sequence violation with the current storage instruction; obtaining the address information of the current storage instruction and the loading instruction from an address table in the processor according to the current storage instruction and the reorder buffer memory position information of the loading instruction; and allocating an entry in the current LHS prediction table, writing the address information of the current storage instruction, the address information of the loading instruction and the instruction interval into the entry, and setting the state of the entry to be effective so as to update the current LHS prediction table.
In one embodiment, determining that a SHL order violation occurred with a store instruction comprises: after a memory access address corresponding to a storage instruction is obtained, determining all loading instructions with the age sequence behind the storage instruction according to the age information of the storage instruction; according to the data address information of the storage instruction, matching the data address information of each loading instruction with the age sequence behind the storage instruction in a loading instruction reordering queue in the processor; if the address matching is successful, determining that the storage instruction has a SHL sequence violation; if the address matching is unsuccessful, determining that the storage instruction has not undergone an SHL sequence violation.
In one embodiment, the manner of constructing the ranking table includes: based on the constructed ordering table, according to the current storage instruction and the reorder buffer location information of the loading instruction with the earliest SHL sequence violation with the current storage instruction, obtaining the instruction interval between the current storage instruction and the loading instruction comprises: according to the current storage instruction and the reorder buffer memory position information of the loading instruction with the earliest SHL sequence violation with the current storage instruction, respectively obtaining the current storage instruction and the sequence number of the loading instruction from the corresponding table entry; and obtaining the instruction distance between the current storage instruction and the loading instruction based on the current storage instruction and the sequence number of the loading instruction.
In one embodiment, based on the constructed ordering table, obtaining the instruction distance between the current store instruction and the load instruction corresponding to the current store instruction according to the current store instruction and the reorder buffer location information of the load instruction corresponding to the current store instruction includes: according to the current storage instruction and the reorder buffer memory position information of the loading instruction corresponding to the current storage instruction, the sequence numbers of the current storage instruction and the loading instruction corresponding to the current storage instruction are respectively obtained from the corresponding ordering table entries; and obtaining the instruction distance between the current storage instruction and the loading instruction corresponding to the current storage instruction based on the current storage instruction and the serial number of the loading instruction corresponding to the current storage instruction.
In one embodiment, the method for updating the current synchronization table includes: each time a group of instructions finish decoding, inquiring a trained LHS prediction table according to address information of a storage instruction carried by the group of instructions, and determining whether the storage instruction has a SHL sequence violation or not so as to execute an entry updating operation on a current synchronous table under the condition that the storage instruction is determined to have the SHL sequence violation; wherein, the table entry updating operation includes: and distributing an item in a current synchronous table in the processor, writing a sequence number corresponding to a current storage instruction and age information into the item, and setting the item state of the item to be an association state and a valid state so as to update the current synchronous table.
In one embodiment, in the event that the load instruction is determined to have suffered a SHL order violation, determining the store instruction on which the load instruction depends by querying a current synchronization table in the processor comprises: inquiring a trained LHS prediction table according to the address information of the loading instruction to obtain a corresponding instruction interval; obtaining sequence number information based on the obtained instruction spacing and the sequence number of the loading instruction in the constructed sequencing table; based on the current synchronization table in the processor, according to the sequence number information, matching an entry comprising the sequence number information to determine a store instruction on which the load instruction depends.
In one embodiment, determining the store instruction on which the load instruction depends and establishing a dependency relationship between the load instruction and the store instruction comprises: obtaining an entry state from the matched entries; and under the condition that the table item state is the association state and the effective state, writing the age information obtained from the table item and the association state in the table item state into an instruction scheduler in the processor so as to establish the dependency relationship between the loading instruction and the storage instruction.
The present invention also provides a computer readable storage medium storing a computer program which when run implements a method of establishing load and store instruction dependencies as described in figure 1. The computer-readable storage medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disk-read only memories), magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium may be an article of manufacture that is not accessed by a computer device or may be a component used by an accessed computer device.
In some embodiments of the invention, the computer-readable and writable storage medium may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, U-disk, removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. In addition, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
In summary, the present application provides a method, a processor, and a medium for establishing dependency of a load instruction and a store instruction, where a trained LHS prediction table is queried according to address information of a load instruction in a current set of instructions that have completed decoding, to determine whether a SHL order violation has occurred in the load instruction; and under the condition that the load instruction is determined to have the SHL sequence violation, determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor, and establishing a dependency relationship between the load instruction and the storage instruction so that the load instruction is executed after the execution of the storage instruction is completed or after the completion of the correct address translation of the storage instruction. The method for establishing the dependency of the load instruction and the store instruction has the advantage of accurately tracking the dependency of the load instruction and the store instruction. The method for establishing the dependency of the load instruction and the store instruction has the advantage of accurately tracking the dependency of the load instruction and the store instruction. Therefore, the method effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles of the present application and their effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those of ordinary skill in the art without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications and variations which may be accomplished by persons skilled in the art without departing from the spirit and technical spirit of the disclosure be covered by the claims of this application.

Claims (10)

1. A method of establishing load and store instruction dependencies, for application to a processor, the method comprising:
inquiring a trained LHS prediction table according to address information of a load instruction in the current group of instructions which have completed decoding, and determining whether the load instruction has an SHL sequence violation;
and under the condition that the load instruction is determined to have the SHL sequence violation, determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor, and establishing a dependency relationship between the load instruction and the storage instruction so that the load instruction is executed after the execution of the storage instruction is completed or after the completion of the correct address translation of the storage instruction.
2. The method of claim 1 wherein the manner in which the LHS prediction table is trained comprises:
executing an entry filling operation on the current LHS prediction table whenever it is determined that a SHL sequence violation has occurred for a store instruction;
wherein, the entry filling operation includes:
generating reorder buffer location information of a loading instruction with SHL sequence violations occurring between the earliest and the current storage instruction;
based on the constructed ordering table, obtaining the instruction distance between the current storage instruction and the loading instruction according to the current storage instruction and the reorder buffer location information of the loading instruction with the earliest SHL sequence violation with the current storage instruction;
Obtaining the address information of the current storage instruction and the loading instruction from an address table in the processor according to the current storage instruction and the reorder buffer memory position information of the loading instruction;
and allocating an entry in the current LHS prediction table, writing the address information of the current storage instruction, the address information of the loading instruction and the instruction interval into the entry, and setting the state of the entry to be effective so as to update the current LHS prediction table.
3. The method of claim 2, wherein determining that there is a store instruction for which a SHL order violation occurred comprises:
after a memory access address corresponding to a storage instruction is obtained, determining all loading instructions with the age sequence behind the storage instruction according to the age information of the storage instruction;
according to the data address information of the storage instruction, matching the data address information of each loading instruction with the age sequence behind the storage instruction in a loading instruction reordering queue in the processor;
if the address matching is successful, determining that the storage instruction has a SHL sequence violation;
if the address matching is unsuccessful, determining that the storage instruction has not undergone an SHL sequence violation.
4. The method of claim 2, wherein the manner in which the ranking table is constructed comprises:
updating the accumulated instruction number of the current period based on the instruction number allocated in the previous period and the accumulated instruction number of the previous period;
determining serial numbers of the load instruction and the store instruction allocated in the current period based on positions of the load instruction and the store instruction allocated in the current period in the instruction allocated in the current period and the accumulated instruction number of the current period respectively;
respectively initializing the sequence numbers of the load instruction and the store instruction allocated in the determined current period into the ordering table entries correspondingly allocated by the reordering buffer; each ordering table entry allocated by the reorder buffer carries corresponding reorder buffer location information.
5. The method of claim 4, wherein obtaining the instruction spacing between the current store instruction and the load instruction based on the constructed ordered list based on the current store instruction and the reorder buffer location information for the load instruction that occurred the SHL order violation with the current store instruction comprises:
according to the current storage instruction and the reorder buffer memory position information of the loading instruction with the earliest SHL sequence violation with the current storage instruction, respectively obtaining the current storage instruction and the sequence number of the loading instruction from the corresponding ordering table entry;
And obtaining the instruction distance between the current storage instruction and the loading instruction based on the current storage instruction and the sequence number of the loading instruction.
6. The method of claim 4, wherein updating the current synchronization table comprises:
each time a group of instructions finish decoding, inquiring a trained LHS prediction table according to address information of a storage instruction carried by the group of instructions, and determining whether the storage instruction has a SHL sequence violation or not so as to execute an entry updating operation on a current synchronous table under the condition that the storage instruction is determined to have the SHL sequence violation;
wherein, the table entry updating operation includes:
and distributing an item in a current synchronous table in the processor, writing a sequence number corresponding to a current storage instruction and age information into the item, and setting the item state of the item to be an association state and a valid state so as to update the current synchronous table.
7. The method of claim 6, wherein determining the store instruction on which the load instruction depends by querying a current synchronization table in the processor if the load instruction is determined to have suffered a SHL order violation comprises:
Inquiring a trained LHS prediction table according to the address information of the loading instruction to obtain a corresponding instruction interval;
obtaining sequence number information based on the obtained instruction spacing and the sequence number of the loading instruction in the constructed sequencing table;
based on the current synchronization table in the processor, according to the sequence number information, matching an entry comprising the sequence number information to determine a store instruction on which the load instruction depends.
8. The method of claim 7, wherein determining a store instruction on which the load instruction depends and establishing a dependency between the load instruction and the store instruction comprises:
obtaining an entry state from the matched entries;
and under the condition that the table item state is the association state and the effective state, writing the age information obtained from the table item and the association state in the table item state into an instruction scheduler in the processor so as to establish the dependency relationship between the loading instruction and the storage instruction.
9. A processor, comprising:
the sequence violation judging module is used for inquiring the trained LHS prediction table according to the address information of the loading instruction in the current group of instructions which have completed decoding and determining whether the loading instruction has a SHL sequence violation or not;
And the dependency relation establishing module is connected with the sequence violation judging module and is used for determining a storage instruction on which the load instruction depends by querying a current synchronous table in the processor under the condition that the load instruction is determined to have the SHL sequence violation, and establishing the dependency relation between the load instruction and the storage instruction so as to enable the load instruction to be executed after the execution of the storage instruction is completed or to be executed after the storage instruction is completed with correct address translation.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of any one of claims 1 to 8.
CN202311769756.8A 2023-12-20 2023-12-20 Method, processor and medium for establishing load and store instruction dependencies Pending CN117785285A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311769756.8A CN117785285A (en) 2023-12-20 2023-12-20 Method, processor and medium for establishing load and store instruction dependencies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311769756.8A CN117785285A (en) 2023-12-20 2023-12-20 Method, processor and medium for establishing load and store instruction dependencies

Publications (1)

Publication Number Publication Date
CN117785285A true CN117785285A (en) 2024-03-29

Family

ID=90386243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311769756.8A Pending CN117785285A (en) 2023-12-20 2023-12-20 Method, processor and medium for establishing load and store instruction dependencies

Country Status (1)

Country Link
CN (1) CN117785285A (en)

Similar Documents

Publication Publication Date Title
KR101025354B1 (en) Global overflow method for virtualized transactional memory
KR101496063B1 (en) Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region
KR102008733B1 (en) A load store buffer agnostic to threads implementing forwarding from different threads based on store seniority
CN101322103B (en) Unbounded transactional memory systems
TWI571799B (en) Apparatus, method, and machine readable medium for dynamically optimizing code utilizing adjustable transaction sizes based on hardware limitations
CN102483704B (en) There is the transactional memory system that efficient high-speed cache is supported
KR101825585B1 (en) Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
KR101996462B1 (en) A disambiguation-free out of order load store queue
US9280349B2 (en) Decode time instruction optimization for load reserve and store conditional sequences
US20030217251A1 (en) Prediction of load-store dependencies in a processing agent
KR102248470B1 (en) A semaphore method and system with out of order loads in a memory consistency model that constitutes loads reading from memory in order
KR101993562B1 (en) An instruction definition to implement load store reordering and optimization
KR20150027212A (en) A virtual load store queue having a dynamic dispatch window with a distributed structure
KR20150027209A (en) A virtual load store queue having a dynamic dispatch window with a unified structure
US9354888B2 (en) Performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching
KR20170066700A (en) A lock-based and synch-based method for out of order loads in a memory consistency model using shared memory resources
KR20150020246A (en) A method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US20140095838A1 (en) Physical Reference List for Tracking Physical Register Sharing
US6625725B1 (en) Speculative reuse of code regions
US10545765B2 (en) Multi-level history buffer for transaction memory in a microprocessor
US20140095814A1 (en) Memory Renaming Mechanism in Microarchitecture
CN117785285A (en) Method, processor and medium for establishing load and store instruction dependencies
KR101832574B1 (en) A method and system for filtering the stores to prevent all stores from having to snoop check against all words of a cache
CN117270971A (en) Load queue control method and device and processor
CN117806706A (en) Storage order violation processing method, storage order violation processing device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination