CN115756608A - Instruction execution method, shared cache, computer system and storage medium - Google Patents

Instruction execution method, shared cache, computer system and storage medium Download PDF

Info

Publication number
CN115756608A
CN115756608A CN202211448873.XA CN202211448873A CN115756608A CN 115756608 A CN115756608 A CN 115756608A CN 202211448873 A CN202211448873 A CN 202211448873A CN 115756608 A CN115756608 A CN 115756608A
Authority
CN
China
Prior art keywords
instruction
specific instruction
execution
signal
executed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211448873.XA
Other languages
Chinese (zh)
Inventor
韩新辉
姚永斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Eswin Computing Technology Co Ltd
Original Assignee
Beijing Eswin Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eswin Computing Technology Co Ltd filed Critical Beijing Eswin Computing Technology Co Ltd
Priority to CN202211448873.XA priority Critical patent/CN115756608A/en
Publication of CN115756608A publication Critical patent/CN115756608A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Advance Control (AREA)

Abstract

The embodiment of the application provides an instruction execution method, a shared cache, a computer system and a storage medium, and relates to the technical field of computers. The method comprises the following steps: the exclusive monitor responds to a specific instruction from the processor core and a signal for representing the execution state of the specific instruction, and determines the execution mode of the specific instruction; the state machine responds that the execution mode of the specific instruction is speculative execution, determines to obtain exclusive permission of the shared cache, and responds that the execution mode of the specific instruction is determined execution, and executes the write operation aiming at the label and the data corresponding to the specific instruction; and the exclusive monitor returns an instruction completion signal to the processor core after responding to the completion of the execution of the specific instruction. According to the embodiment of the application, the exclusive right of the shared cache is obtained in advance by using the method for speculatively executing the specific instruction, the submission time of the specific instruction can be shortened, and the performance of a CPU (Central processing Unit) is improved.

Description

Instruction execution method, shared cache, computer system and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an instruction execution method, a shared cache, a computer system, and a computer-readable storage medium.
Background
In a Reduced Instruction Set Computer (RISC-V) Instruction Set, a Load-Reserved (LR) Instruction and a Store-Conditional (SC) Instruction are exclusive access instructions to a memory, and generally, monitors are designed in a cache (cache) and a double data rate synchronous dynamic random access memory (DDR) to implement the functions of the Instruction pair.
Because the SC instruction has a return value, in the design of the existing scheme, the SC instruction is sent to the cache before all the instructions of the SC instruction are submitted, which results in that the time for writing the SC instruction return value into the physical register is delayed, thereby affecting the instructions related to Read After Write (RAW) of the SC instruction after the SC instruction, and further affecting the performance of a Central Processing Unit (CPU for short).
Therefore, how to shorten the issue time of the SC instruction is a problem to be solved in order to improve the performance of the CPU.
Disclosure of Invention
The application provides an instruction execution method, a shared cache, a computer system and a computer readable storage medium, which aim to solve at least one technical problem in the prior art.
According to a first aspect of embodiments of the present application, there is provided an instruction execution method, which is applied to a shared cache including an exclusive monitor and a state machine, the method including:
the exclusive monitor is used for responding to a specific instruction from a processor core and a signal for representing the execution state of the specific instruction, and determining the execution mode of the specific instruction, wherein the signal comprises a first signal for representing whether the specific instruction is determined to be executed and a second signal for representing whether the specific instruction is cancelled;
the state machine responds that the execution mode of the specific instruction is speculative execution, determines to obtain the exclusive right of the shared cache, responds that the execution state of the specific instruction is determined execution, and executes the write operation aiming at the label and the data corresponding to the specific instruction;
and the exclusive monitor returns an instruction completion signal to the processor core after responding to the completion of the execution of the specific instruction.
In one possible implementation, the exclusive monitor determines the execution state of the particular instruction by one of:
when a first signal for representing that the execution mode of the specific instruction is speculative execution and a second signal for representing that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is speculative execution;
and when a first signal for indicating that the execution mode of the specific instruction is determined to be executed and a second signal for indicating that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is determined to be executed.
In yet another possible implementation, the method further includes:
if the monitor receives a first signal representing that the execution state of the specific instruction is the speculative execution state before returning an instruction completion signal to the processor core, determining that the specific instruction is in the speculative execution state, and returning a third signal representing that the specific instruction is not completely executed to the processor core.
In another possible implementation manner, in response to that the execution state of the specific instruction is determined to be execution, the state machine executes a write operation for the tag and the data corresponding to the specific instruction, including:
and the state machine writes the data corresponding to the specific instruction into a storage unit of the shared cache in a bypass mode.
In another possible implementation, the method further includes:
the exclusive monitor determines to re-execute the particular instruction by the processor core upon receiving a first signal characterizing an execution manner of the particular instruction as speculative execution and a second signal characterizing the particular instruction is cancelled.
In another possible implementation, the method further includes:
and the state machine responds that the execution mode of the specific instruction is determined to be executed, executes the write operation of the label and the data corresponding to the specific instruction, and writes the data corresponding to the specific instruction into the storage unit of the shared cache in a queuing mode in the data cache region.
In another possible implementation, the shared cache is a level two cache and the particular instruction is an SC instruction.
According to a second aspect of the embodiments of the present application, there is provided a shared cache, including: an exclusive monitor, and a state machine, wherein,
an exclusive monitor, which is used for responding to a specific instruction from a processor core and a signal for representing the execution state of the specific instruction, determining the execution mode of the specific instruction, and returning an instruction completion signal to the processor core after responding to the completion of the execution of the specific instruction,
wherein the signals comprise a first signal characterizing the execution mode of the specific instruction and a second signal characterizing whether the specific instruction is cancelled or not;
and the state machine is used for responding to the execution state of the specific instruction as speculative execution, determining to acquire exclusive permission of the shared cache, and responding to the execution state of the specific instruction as determined execution, and executing the write operation aiming at the tag and the data corresponding to the specific instruction.
In one possible implementation, the exclusive monitor determines the execution state of the particular instruction by one of:
when a first signal for representing that the execution mode of the specific instruction is speculative execution and a second signal for representing that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is speculative execution;
and when a first signal for indicating that the execution mode of the specific instruction is determined to be executed and a second signal for indicating that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is determined to be executed.
In another possible implementation manner, the exclusive monitor is further configured to determine that the specific instruction is in the speculative execution state if a first signal indicating that the execution state of the specific instruction is the speculative execution state is received before returning the instruction completion signal to the processor core, and return a third signal indicating that the specific instruction is not completely executed to the processor core.
In another possible implementation, the exclusive monitor is further configured to determine that the particular instruction is re-executed by the processor core upon receiving a first signal indicating that the particular instruction is being executed speculatively and a second signal indicating that the particular instruction is cancelled.
In another possible implementation manner, in response to that the execution manner of the specific instruction is determined to be execution, the state machine is configured to write data corresponding to the specific instruction into a storage unit of the shared cache in a bypass manner during execution of a write operation for a tag and data corresponding to the specific instruction.
In another possible implementation manner, in response to that the execution manner of the specific instruction is determined to be execution, the state machine is further configured to execute a write operation for the tag and the data corresponding to the specific instruction, and write the data corresponding to the specific instruction into a storage unit of the shared cache by queuing in a data cache region.
In another possible implementation, the shared cache is a level two cache and the particular instruction is an SC instruction.
According to a third aspect of embodiments of the present application, there is provided a computer system including:
a processor;
a memory coupled to the processor and having stored therein computer-executable instructions for, when executed by the processor, implementing the steps of the instruction execution method of the first aspect described above.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium, which when executed by a processor implements the steps of the instruction execution method shown in the first aspect described above.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
the method comprises the steps that an exclusive monitor responds to a specific instruction from a processor core and a signal representing the execution state of the specific instruction, the execution mode of the specific instruction is determined, and an instruction completion signal is returned to the processor core after the specific instruction is completely executed. The exclusive right of the shared cache is obtained in advance by using the method for speculatively executing the specific instruction, so that the submission time of the specific instruction can be shortened under the condition that the specific instruction is not cancelled, and the performance of a CPU (Central processing Unit) is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic structural diagram of a shared cache according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for executing an instruction according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a method for executing instructions according to another embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for executing instructions according to another embodiment of the present application;
FIG. 5 is a flowchart illustrating an instruction execution method according to another embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer system according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
LR/SC instructions in the RISC-V instruction set need to be implemented by designing an exclusive monitor (Exclusive monitor) in the cache. One of the simplest flows includes: and the processor core (core) issues an LR instruction, when the LR instruction enters the cache, the address of the LR instruction is recorded in the monitor, and the data in the cache is returned to the core. If the LR instruction returns an equal value to the expected value, the core issues an SC instruction to modify the data corresponding to the address. When the SC instruction reaches the cache, it first checks whether the address is still marked in the monitor, and if yes, the SC is successfully executed, and at this time, the core takes an exclusive access right of the memory (memory). Otherwise, the SC instruction fails to execute, and the core re-launches the LR/SC instruction.
For example: when the return value and the expected value of the LR instruction are both 0, the core issues an SC instruction to modify the SC instruction into 1, and when the SC instruction reaches the cache, if the address is marked in the monitor, the address can be locked; otherwise, core re-initiates the LR/SC instruction.
If the LR instruction's return value and expected value are not equal, the core re-issues the LR/SC instruction until the LR instruction's return value and expected value are equal.
Because the SC instruction has a return value, in the design of the existing scheme, the SC instruction is sent to the cache before the SC instruction is completely submitted, which results in that the time for writing the SC instruction return value into the physical register is delayed, thereby affecting the instruction after the SC instruction, which is related to the SC instruction, having read-after-write (RAW), and further affecting the performance of a central processing unit (CPU for short).
Therefore, how to shorten the issue time of the SC instruction is a problem to be solved in order to improve the performance of the cpu.
In view of the foregoing technical problems in the prior art, embodiments of the present application provide an instruction execution method, an instruction execution device, and a computer-readable storage medium.
In the scheme of the application, the SC instruction is sent to the cache in advance without waiting for the previous instruction to be submitted completely. If the SC instruction is not interrupted by other core's snoop (snoop) requests before it commits, and no interrupts or exceptions occur, then the result of the early execution of the SC instruction is correct. If a snoop from other core aiming at the address is encountered before the SC instruction is submitted or an interrupt or exception occurs, a cancel (flush) signal is sent to the cache to indicate that the execution result of the current SC instruction is wrong, and then the core re-initiates the SC instruction to the cache.
According to the design scheme, under the condition that the SC instruction is not flushed, the submitting time of the SC instruction can be greatly advanced, and the performance of the CPU can be improved.
Next, first, a description is given of a related art of the embodiment of the present application.
1、Cache
Cache is a Cache Memory located between a CPU and a main DRAM (Dynamic Random Access Memory), has a small scale but a high speed, and is generally composed of an SRAM (Static Random Access Memory). The function of the Cache is to increase the input and output rate of CPU data. The speed of the general register of the CPU is far higher than that of the main memory, when the CPU directly accesses data from the main memory, the CPU waits for a certain time period, the Cache can store a part of data which is just used or recycled by the CPU, and if the CPU needs to use the part of data again, the CPU can be directly called from the Cache, so that the data is prevented from being repeatedly accessed, the waiting time of the CPU is reduced, and the efficiency of the system is improved.
The multi-level Cache comprises an L1 Cache (a first-level Cache), an L2Cache (a second-level Cache) and an L3 Cache (a third-level Cache), wherein the L1 Cache is mainly integrated in the CPU, and the L2Cache is integrated on a mainboard or the CPU. The L1 Cache comprises an L1I-Cache (a first-level instruction Cache) and an L1D-Cache (a first-level data Cache), wherein the L1I-Cache is used for storing instructions, and the L1D-Cache is used for storing data. The difference between the two is that the data in the L1D-cache can be written back, and the data in the L1I-cache is read-only.
2. RISC-V instruction set
RISC-V is an open source Instruction Set architecture based on Reduced Instruction Set (RISC) principles. The RISC-V instruction set is not limited in use compared to most instruction sets, and its design makes it suitable for modern computing devices (e.g., warehouse-scale cloud computers, high-end mobile phones, and tiny embedded systems).
There are two special instructions in the RISCV instruction set: LR (Load-Reserved) and SC (Store-Conditional), which are used to achieve synchronization between different processes. The LR instruction reads data from an address, which may be one word or two, and stores the data in the destination register while setting a reserved bit on this address range. The SC instruction writes data into an address, the size of the data can be one word or two words, whether a set reserved bit exists in an accessed address range is checked, if not, data which is not 0 is returned to indicate failure, and if yes, data which is 0 is returned to indicate success. If the SC instruction of a process succeeds, the process can execute the next program, and if the SC fails, the LR/SC instruction is executed again until the SC succeeds, and the next program can not be executed. This is the mechanism of lock acquisition between different processes.
Next, the technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps and the like in different embodiments is not repeated.
Fig. 1 provides a shared cache according to an embodiment of the present application. As shown in fig. 1, the shared cache includes: an exclusive monitor 301 and a state machine 302. Wherein the content of the first and second substances,
an exclusive monitor 301, configured to determine an execution mode of a specific instruction in response to the specific instruction from the processor core and a signal indicating an execution status of the specific instruction, and return an instruction completion signal to the processor core in response to the specific instruction being executed, where the signal includes a first signal indicating the execution mode of the specific instruction and a second signal indicating whether the specific instruction is cancelled.
And a state machine 302, configured to determine that the exclusive permission of the shared cache is acquired in response to the execution state of the specific instruction determined by the exclusive monitor 301 being speculative execution, and execute a write operation for the tag and the data corresponding to the specific instruction in response to the execution state of the specific instruction being determined as being execution-assured.
In one possible implementation, exclusive monitor 301 is further configured to determine that the particular instruction is to be re-executed by the processor core upon receiving a first signal indicating that the execution mode of the particular instruction is speculative and a second signal indicating that the particular instruction is cancelled.
In another possible implementation, the exclusive monitor 301 is further configured to determine that the specific instruction is in the speculative execution state if a first signal indicating that the execution state of the specific instruction is the speculative execution state is received before returning the instruction completion signal to the processor core, and return a third signal indicating that the specific instruction is not completely executed to the processor core.
In another possible implementation, in response to the specific instruction being executed in a certain manner, the state machine 302 writes the data corresponding to the specific instruction into the memory location of the shared cache in a bypass manner during the execution of the write operation for the tag and the data corresponding to the specific instruction.
In another possible implementation, the exclusive monitor 301 determines the execution state of a particular instruction by one of:
when a first signal for representing that the execution mode of the specific instruction is speculative execution and a second signal for representing that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is speculative execution;
and when a first signal for indicating that the execution mode of the specific instruction is determined to be executed and a second signal for indicating that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is determined to be executed.
In another possible implementation, the state machine 302 is further configured to, in response to that the execution mode of the specific instruction is determined to be executing, perform a write operation for the tag and the data corresponding to the specific instruction, and write the data corresponding to the specific instruction into a storage unit of the shared cache by queuing in the data cache region.
The apparatus in the embodiments of the present application may execute the instruction execution method provided in the embodiments of the present application, and the implementation principle and the achievable effect are similar, actions executed by each module in the apparatus in the embodiments of the present application correspond to steps in the method in the embodiments of the present application, and for detailed functional description of each module in the apparatus, reference may be specifically made to the description in the instruction execution method below, and details are not repeated here.
Fig. 2 is a flowchart illustrating an instruction execution method according to an embodiment of the present disclosure. The method is applied to a shared cache, and the shared cache comprises the following steps: exclusive monitors and state machines. The method shown in fig. 2 comprises:
s1, the exclusive monitor responds to a specific instruction from a processor core and a signal for representing the execution state of the specific instruction, and determines the execution mode of the specific instruction, wherein the signal comprises a first signal for representing whether the specific instruction is determined to be executed and a second signal for representing whether the specific instruction is cancelled.
And S2, the state machine responds to the execution mode of the specific instruction as speculative execution, determines to obtain exclusive permission of the shared cache, and responds to the execution mode of the specific instruction as determined execution, and executes the write operation aiming at the label and the data corresponding to the specific instruction.
And S3, returning an instruction completion signal to the processor core after the exclusive monitor responds to the completion of the execution of the specific instruction.
Specifically, in this embodiment, if the execution mode of the specific instruction determined by the exclusive monitor is speculative execution, the state machine first obtains the exclusive right of the shared cache, then executes the write operation of the tag and the data corresponding to the instruction when the execution mode of the specific instruction is determined to be speculative execution, and returns an instruction completion signal to the processor core from the exclusive monitor after the instruction is executed.
In one possible implementation, the exclusive monitor determines the execution state of a particular instruction by one of:
when a first signal for representing that the execution mode of the specific instruction is speculative execution and a second signal for representing that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is speculative execution;
and when a first signal for indicating that the execution mode of the specific instruction is determined to be executed and a second signal for indicating that the specific instruction is not cancelled are received, determining the execution state of the specific instruction to be determined to be executed.
In yet another possible implementation manner, the method may further include:
if the monitor receives a first signal representing that the execution state of the specific instruction is the speculative execution state before returning the instruction completion signal to the processor core, the monitor determines that the specific instruction is in the speculative execution state, and returns a third signal representing that the specific instruction is not completely executed to the processor core.
In another possible implementation manner, in response to that the execution state of the specific instruction is determined to be executed in step S2, the state machine executes a write operation for the tag and the data corresponding to the specific instruction, including: and the state machine writes the data corresponding to the specific instruction into the storage unit of the shared cache in a bypass mode.
Specifically, in this embodiment, if the execution mode of the specific instruction determined by the exclusive monitor is speculative execution, the state machine first obtains the exclusive right of the shared cache, then executes the write operation of the tag and the data corresponding to the specific instruction when the execution mode of the specific instruction is determined to be execution, and writes the data corresponding to the specific instruction into the storage unit of the shared cache by using a bypass mode without queuing in the data cache area.
In another possible implementation manner, the method may further include:
the exclusive monitor determines that the particular instruction is re-executed by the processor core upon receiving a first signal indicating that the particular instruction is being executed speculatively and a second signal indicating that the particular instruction is being cancelled.
In another possible implementation manner, the method may further include:
and the state machine responds to the fact that the execution mode of the specific instruction is determined to be execution, executes the write operation of the tag and the data corresponding to the specific instruction, and writes the data corresponding to the specific instruction into the storage unit of the shared cache in a queuing mode in the data cache region.
Specifically, in this embodiment, if the execution mode of the specific instruction determined by the exclusive monitor is determined to be executing, the state machine executes a write operation of the tag and the data corresponding to the instruction, and writes the data corresponding to the instruction in a queue in the data cache area when writing the data corresponding to the instruction into the storage unit of the shared cache.
The following describes in detail a technical solution of an instruction execution method provided in an embodiment of the present application with reference to fig. 3 to 5, by taking a shared cache as a second-level cache L2cache and taking a specific instruction as an SC instruction as an example.
Fig. 3 is a flowchart illustrating an instruction execution method according to an embodiment of the present disclosure. The method may be applied to a secondary cache L2cache, the L2cache including an exclusive monitor and a state machine. The method shown in fig. 3 comprises:
s101, the exclusive monitor determines the execution mode of the SC instruction based on the SC instruction from the processor core and a signal representing the execution state of the SC instruction.
Wherein the signals include a first signal indicating whether the SC instruction is determined to execute and a second signal indicating whether the SC instruction is cancelled.
And S102, if the execution mode of the SC instruction acquired by the state machine is speculative execution, determining to acquire the exclusive permission of the L2 cache.
S103, if the exclusive monitor receives a first signal for representing that the SC instruction is determined to be executed and a second signal for representing that the SC instruction is not cancelled, the exclusive monitor determines to execute the SC instruction.
And S104, if the state machine acquires the SC instruction and determines to execute, executing the write operation of the tag and the data corresponding to the SC instruction, and returning an instruction completion signal to the processor core by the exclusive monitor after the SC instruction is executed.
In this embodiment, if the exclusive monitor determines that the execution mode of the SC instruction is speculative execution based on receiving a first signal indicating whether the SC instruction is determined to be executed and a second signal indicating whether the SC instruction is cancelled, and when the state machine acquires that the execution mode of the SC instruction is speculative execution, it is determined that the exclusive right of the L2cache is obtained. And then, if the exclusive monitor receives a first signal for representing that the SC instruction determines to be executed and a second signal for representing that the SC instruction is not cancelled, determining to execute the SC instruction, executing the write operation of the tag and the data corresponding to the SC instruction when the state machine acquires that the SC instruction determines to be executed, and returning an instruction completion signal to the processor core by the exclusive monitor after the SC instruction is executed. The scheme utilizes the method of the speculative execution of the SC instruction to obtain the exclusive permission of the L2cache in advance, and can accelerate the submission time of the SC instruction under the condition that the SC instruction is not cancelled, thereby improving the performance of the CPU.
It should be noted that, in this embodiment, the first signal may be denoted as processed, and if processed =1, that is: the first signal is high, indicating that the SC instruction is determined to execute, if processed =0, that is: the first signal is low, indicating speculative execution of the SC instruction. The second signal may be denoted as flush, if flush =1, that is: if the second signal is high, the SC command is cancelled, and if flush =0, that is: the second signal is low, indicating that the SC command has not been cancelled.
In some embodiments, step S101 may specifically include: if the exclusive monitor receives a first signal representing the speculatively executed SC instruction and a second signal representing that the SC instruction is not cancelled, determining that the execution mode of the SC instruction is speculatively executed; and if the exclusive monitor receives a first signal for representing that the SC instruction is determined to be executed and a second signal for representing that the SC instruction is not cancelled, determining that the execution mode of the SC instruction is determined to be executed.
Specifically, in this embodiment, if the first signal proceded =0, the second signal flush =0, indicating that the SC instruction is speculatively executed; if the first signal processed =1, the second signal flush =0, indicating that the SC instruction has been committed in the processor core, it is determined to execute.
A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 4, after S102, the method further includes:
s105, if the exclusive monitor receives a first signal representing that the SC instruction is executed speculatively and a second signal representing that the SC instruction is cancelled, determining that the SC instruction is executed again by the processor core.
Specifically, in this embodiment, if the exclusive monitor determines that the execution mode of the SC instruction is speculative execution based on receiving a first signal indicating whether the SC instruction is determined to be executed and a second signal indicating whether the SC instruction is cancelled, and when the execution mode of the SC instruction obtained by the state machine is speculative execution, it is determined that the exclusive right of the L2cache is obtained. Then, if the exclusive monitor receives a first signal indicating that the SC instruction is speculatively executed and a second signal indicating that the SC instruction is cancelled, the processor core needs to re-execute the SC instruction.
In one possible implementation, the method further comprises:
s106 (not shown), the exclusive monitor returns an SC instruction feedback signal to the processor core.
Specifically, in this embodiment, if the exclusive monitor determines that the execution mode of the SC instruction is speculative execution based on receiving a first signal indicating whether the SC instruction is determined to be executed and a second signal indicating whether the SC instruction is cancelled, when the state machine acquires that the execution mode of the SC instruction is speculative execution, it is determined that the exclusive right of the L2cache is obtained, and the exclusive monitor returns an SC instruction feedback signal to the processor core, for example: resp valid =1 is returned to indicate that the SC instruction feedback information is valid. A fourth signal may also be returned characterizing whether the SC instruction executed successfully, for example, the fourth signal may be denoted as fail, if the SC instruction executed successfully, fail =0, that is: the fourth signal is low, fail =1 if the SC instruction fails to execute, i.e.: the fourth signal is high.
It should be noted that when resp _ valid =1, complete may be 0 or 1, and fail may also be 0 or 1.
1. If fail =1, then it must fail and the processor core needs to re-launch the LR instruction and execute the SC instruction.
2. If fail =0, complete =0, indicating the SC instruction "no failure," the processor core may continue "speculatively executing" subsequent instructions.
3. If fail =0,complete =1, indicating that the SC instruction executed successfully. This condition is final for the processor core, and is determined to be successful.
And then, if the exclusive monitor receives a first signal for representing that the SC instruction determines to be executed and a second signal for representing that the SC instruction is not cancelled, determining to execute the SC instruction, executing the write operation of the tag and the data corresponding to the SC instruction when the state machine acquires that the SC instruction determines to be executed, and returning an instruction completion signal to the processor core by the exclusive monitor after the SC instruction is executed. Meanwhile, an SC instruction feedback signal and a fourth signal for representing whether the execution of the SC instruction is successful or not can be returned.
If the exclusive monitor receives a first signal indicating that the SC instruction is speculatively executed and a second signal indicating that the SC instruction is cancelled, indicating that the SC instruction is cancelled prior to commit, the processor core needs to re-execute the SC instruction. Meanwhile, an SC instruction feedback signal and a fourth signal representing whether the SC instruction is executed successfully or not can be returned to the processor core.
It should be noted that the condition of the SC instruction execution failure can be referred to as a specification in an instruction manual, and for brevity of description, the description is not repeated herein.
The embodiment of the application provides a possible implementation manner, and the method further comprises the following steps:
s107 (not shown), if the received signal includes the SC instruction representing the speculative execution and the SC instruction is not cancelled before the exclusive monitor returns the SC instruction feedback signal to the processor core, determining that the SC instruction is in the speculative execution state, and returning a third signal representing that the SC instruction is not completely executed to the processor core.
Specifically, in this embodiment, the third signal may be written as complete, and if complete =1, that is: if the third signal is at high level, it represents that the SC instruction is executed completely, and if complete =0, that is: the third signal is low, indicating that the SC instruction is not executed. And if the exclusive monitor receives a first signal for representing whether the SC instruction is determined to be executed or not and a second signal for representing whether the SC instruction is cancelled or not, determining that the execution mode of the SC instruction is speculative execution, and when the state machine acquires that the execution mode of the SC instruction is speculative execution, determining that the exclusive right of the L2cache is acquired. If the processed and flush are always 0 before the exclusive monitor returns the SC instruction feedback signal to the processor core, it means that the SC instruction is always in the speculatively executed state, and complete =0 indicates that the SC instruction is not really executed.
In the embodiment of the present application, a possible implementation manner is provided, for example, in fig. 5 and S104, in the process of executing the write operation of the tag and the data corresponding to the SC instruction, the data corresponding to the SC instruction is written into the storage unit of the L2cache in a bypass manner.
Specifically, in this embodiment, if the SC instruction starts from speculative execution to deterministic execution, the SC data may not be ready in the speculative execution phase, and the SC data may not be sent to the L2cache until the SC instruction commits. At this time, bypass must be 1, so that the SC data is written to the storage unit without queuing in a data buffer (data buffer), thereby avoiding deadlock.
Another possible implementation manner is provided in the embodiment of the present application, and as shown in fig. 5, the method further includes:
and S108, if the execution mode of the SC instruction acquired by the state machine is determined to be executed, determining to execute the write operation of the tag and the data corresponding to the SC instruction, and writing the data corresponding to the SC instruction into a storage unit of the L2cache in a queuing mode in a data cache area.
Specifically, in this embodiment, if the exclusive monitor determines that the execution mode of the SC instruction is determined to be the determined execution mode based on receiving the first signal indicating whether the SC instruction is determined to be executed and the second signal indicating whether the SC instruction is cancelled, the exclusive monitor executes the write operation of the tag and the data corresponding to the SC instruction, and writes the data corresponding to the SC instruction into the storage unit of the L2cache in a manner of queuing in the data cache area.
For example: an SC valid (indicating that the current request is an SC instruction) instruction reaches an exclusive monitor (exclusive monitor) in the L2cache, when proceded =1, flush =0, indicating that the SC instruction has committed in core, determining execution. The exclusive monitor returns respvalid =1, fail =1 if the SC instruction indicates a failure, otherwise fail =0. Since the SC instruction determines execution, complete =1 at this time. Because the SC instruction is determined to be executed, the SC data and the SC valid arrive in the L2cache at the same time, and at this time, the data is processed according to the common storage data (store data), and bypass =0 (indicating that no bypass is needed by the SC data at present), that is: bypass is not required when the SC instruction is not speculatively executing.
The above is the process that the SC instruction is not executed in advance, this scenario is the most ideal state, and the CPU pipeline is not blocked.
By adopting the scheme of the embodiment of the application, if other instructions are not successfully submitted before the SC instruction, the SC instruction arranged behind can be sent to the L2cache in advance without waiting for submission.
The specific flow of the speculative execution of the SC instruction is as follows:
1. the SC valid instruction reaches the exclusive monitor in the L2cache, at which time proceded =0, flush =0, indicating that the SC instruction is speculatively executed.
2. The exclusive monitor returns respvalid =1, fail =1 if SC instruction fail. Otherwise fail =0. If the proceded and flush are always 0 before the resp valid is returned, it means that the SC instruction is always in the speculatively executed state, and complete =0 indicates that the SC instruction is not really executed.
3. If the exclusive monitor subsequently receives proceded =1, flush =0, stating that the SC instruction was not flushed before commit, execution is determined. Then, the exclusive monitor returns resp _ valid =1, fali =0/1, complete =1 again.
4. If the exclusive monitor receives proceded =0, flush =1, indicating that the SC instruction was flushed before committing, core needs to re-execute LR/SC. The exclusive monitor returns resp _ valid =1, fali =0/1, complete =1.
In summary, in the scheme in the embodiment of the present application, the exclusive right of the L2cache is obtained in advance by using the SC instruction speculative execution method, and in a scenario where the SC instruction is not cancelled, the commit time of the SC instruction can be shortened, so that the performance of the CPU is improved.
An embodiment of the present application further provides a computer system, including:
a processor;
a memory coupled to the processor and having stored therein computer-executable instructions for implementing the steps of the instruction execution method provided by the embodiments of the present application when executed by the processor, and implementing, compared with the prior art: the method comprises the steps that an exclusive monitor responds to a specific instruction from a processor core and a signal representing the execution state of the specific instruction, the execution mode of the specific instruction is determined, and an instruction completion signal is returned to the processor core after the specific instruction is completely executed. The exclusive right of the shared cache is obtained in advance by using the method for speculatively executing the specific instruction, so that the submission time of the specific instruction can be shortened under the condition that the specific instruction is not cancelled, and the performance of a CPU (Central processing Unit) is improved.
In an embodiment of the present application, as shown in fig. 6, the computer system 400 includes: a processor 401 and a memory 403. Wherein the processor 401 is coupled to the memory 403, such as via a bus 402.
The Processor 401 may be a CPU (Central Processing Unit), a general purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 401 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 402 may include a path that transfers information between the above components. The bus 402 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. Bus 402 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
The Memory 403 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact disk Read Only Memory) or other optical disk storage, optical disk storage (including Compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, without limitation.
The memory 403 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 401 to execute. The processor 401 is adapted to execute a computer program stored in the memory 403 to implement the steps shown in the foregoing method embodiments.
The embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the steps of the foregoing method embodiments and corresponding content.
The processor-readable storage medium may be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memories (NAND FLASH), solid State Disks (SSDs)), etc.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in the present application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of the present application are also within the protection scope of the embodiments of the present application without departing from the technical idea of the present application.

Claims (15)

1. A shared cache, the shared cache comprising:
an exclusive monitor, which is used for responding to a specific instruction from a processor core and a signal for representing the execution state of the specific instruction, determining the execution mode of the specific instruction, and returning an instruction completion signal to the processor core after responding to the completion of the execution of the specific instruction,
wherein the signals comprise a first signal characterizing the execution mode of the specific instruction and a second signal characterizing whether the specific instruction is cancelled or not;
and the state machine is used for responding to the execution state of the specific instruction as speculative execution, determining to acquire exclusive permission of the shared cache, and responding to the execution state of the specific instruction as determined execution, and executing the write operation aiming at the tag and the data corresponding to the specific instruction.
2. The shared cache of claim 1,
the exclusive monitor determines the execution state of the specific instruction by one of:
when a first signal for representing that the execution mode of the specific instruction is speculative execution and a second signal for representing that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is speculative execution;
and when a first signal for indicating that the execution mode of the specific instruction is determined to be executed and a second signal for indicating that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is determined to be executed.
3. The shared cache of claim 1 or 2,
the exclusive monitor is further configured to determine that the specific instruction is in a speculative execution state if a first signal indicating that an execution state of the specific instruction is speculative execution is received before returning an instruction completion signal to the processor core, and return a third signal indicating that the specific instruction is not completely executed to the processor core.
4. The shared cache of claim 1,
the exclusive monitor is further configured to determine that the particular instruction is re-executed by the processor core upon receiving a first signal characterizing an execution of the particular instruction as speculatively executed and a second signal characterizing the particular instruction is cancelled.
5. The shared cache of claim 1,
and the state machine is used for responding to the determined execution mode of the specific instruction, and writing the data corresponding to the specific instruction into the storage unit of the shared cache in a bypass mode in the process of executing the write operation aiming at the tag and the data corresponding to the specific instruction.
6. The shared cache of claim 1,
and the state machine is further configured to, in response to that the execution mode of the specific instruction is determined to be executed, execute a write operation for the tag and the data corresponding to the specific instruction, and write the data corresponding to the specific instruction into the storage unit of the shared cache in a queuing manner in the data cache region.
7. The shared cache of any one of claims 1-6,
the shared cache is a second level cache, and the specific instruction is an SC instruction.
8. An instruction execution method, wherein the method is applied to a shared cache, wherein the shared cache comprises an exclusive monitor and a state machine, and wherein the method comprises:
the exclusive monitor responds to a specific instruction from a processor core and a signal for representing the execution state of the specific instruction, and determines the execution mode of the specific instruction, wherein the signal comprises a first signal for representing whether the specific instruction is determined to be executed or not and a second signal for representing whether the specific instruction is cancelled or not;
the state machine responds that the execution mode of the specific instruction is speculative execution, determines to obtain the exclusive right of the shared cache, responds that the execution state of the specific instruction is determined execution, and executes the write operation aiming at the label and the data corresponding to the specific instruction;
and the exclusive monitor returns an instruction completion signal to the processor core after responding to the completion of the execution of the specific instruction.
9. The method of claim 8, wherein the exclusive monitor determines the execution state of the particular instruction by one of:
when a first signal for representing that the execution mode of the specific instruction is speculative execution and a second signal for representing that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is speculative execution;
and when a first signal for indicating that the execution mode of the specific instruction is determined to be executed and a second signal for indicating that the specific instruction is not cancelled are received, determining that the execution state of the specific instruction is determined to be executed.
10. The method according to claim 8 or 9, characterized in that the method further comprises:
if the monitor receives a first signal representing that the execution state of the specific instruction is the speculative execution state before returning an instruction completion signal to the processor core, determining that the specific instruction is in the speculative execution state, and returning a third signal representing that the specific instruction is not completely executed to the processor core.
11. The method of claim 8, wherein the executing, by the state machine, the write operation for the tag and the data corresponding to the specific instruction in response to the execution state of the specific instruction being the deterministic execution comprises:
and the state machine writes the data corresponding to the specific instruction into a storage unit of the shared cache in a bypass mode.
12. The method of claim 8, further comprising:
the exclusive monitor determines to re-execute the particular instruction by the processor core upon receiving a first signal characterizing an execution manner of the particular instruction as speculative execution and a second signal characterizing the particular instruction is cancelled.
13. The method of claim 8, further comprising:
and the state machine responds to the fact that the execution mode of the specific instruction is determined to be executed, executes the write operation of the tag and the data corresponding to the specific instruction, and writes the data corresponding to the specific instruction into the storage unit of the shared cache in a queuing mode in the data cache region.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the instruction execution method of any one of claims 8-13.
15. A computer system, comprising:
a processor;
a memory coupled to the processor and having stored therein computer-executable instructions for performing, when executed by the processor, the instruction execution method of any one of claims 8-13.
CN202211448873.XA 2022-11-18 2022-11-18 Instruction execution method, shared cache, computer system and storage medium Pending CN115756608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211448873.XA CN115756608A (en) 2022-11-18 2022-11-18 Instruction execution method, shared cache, computer system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211448873.XA CN115756608A (en) 2022-11-18 2022-11-18 Instruction execution method, shared cache, computer system and storage medium

Publications (1)

Publication Number Publication Date
CN115756608A true CN115756608A (en) 2023-03-07

Family

ID=85373589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211448873.XA Pending CN115756608A (en) 2022-11-18 2022-11-18 Instruction execution method, shared cache, computer system and storage medium

Country Status (1)

Country Link
CN (1) CN115756608A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627865A (en) * 2023-04-26 2023-08-22 安庆师范大学 Method and device for accessing computer with multiple storage devices

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627865A (en) * 2023-04-26 2023-08-22 安庆师范大学 Method and device for accessing computer with multiple storage devices
CN116627865B (en) * 2023-04-26 2024-02-06 安庆师范大学 Method and device for accessing computer with multiple storage devices

Similar Documents

Publication Publication Date Title
US8190859B2 (en) Critical section detection and prediction mechanism for hardware lock elision
US8065491B2 (en) Efficient non-transactional write barriers for strong atomicity
EP2503460B1 (en) Hardware acceleration for a software transactional memory system
US7730286B2 (en) Software assisted nested hardware transactions
US8209689B2 (en) Live lock free priority scheme for memory transactions in transactional memory
US8327188B2 (en) Hardware transactional memory acceleration through multiple failure recovery
US8688963B2 (en) Checkpoint allocation in a speculative processor
US11113056B2 (en) Techniques for performing store-to-load forwarding
JP2006164277A (en) Device and method for removing error in processor, and processor
CN108694094B (en) Apparatus and method for handling memory access operations
CN115756608A (en) Instruction execution method, shared cache, computer system and storage medium
CN112236750A (en) Processing exclusive load instructions in a device supporting transactional memory
US6799285B2 (en) Self-checking multi-threaded processor
US11126459B2 (en) Filesystem using hardware transactional memory on non-volatile dual in-line memory module
JP7403541B2 (en) Speculative instruction wake-up to tolerate memory ordering violation check buffer drain delay
CN115269199A (en) Data processing method and device, electronic equipment and computer readable storage medium
US11327759B2 (en) Managing low-level instructions and core interactions in multi-core processors
CN110515660B (en) Method and device for accelerating execution of atomic instruction
WO2022058704A1 (en) A technique for handling transactions in a system employing transactional memory
KR20190045225A (en) A lock address contention predictor
JP2002063154A (en) System for controlling scalar memory access instruction issue at accessing of vector memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination