WO2019200618A1

WO2019200618A1 - Instruction execution method and device

Info

Publication number: WO2019200618A1
Application number: PCT/CN2018/083991
Authority: WO
Inventors: 李国柱; 孙涛
Original assignee: 华为技术有限公司
Priority date: 2018-04-21
Filing date: 2018-04-21
Publication date: 2019-10-24
Also published as: CN111936968A

Abstract

An instruction execution method and device is provided for effectively defending a speculative memory access-based Cache delay side channel attack. The method in the embodiment of the present application comprises: acquiring a first instruction, the first instruction being an instruction for reading a memory; determining each second instruction that satisfies a preset condition, the second instructions being branch instructions that are executed out of order, and the preset condition including a control dependency relationship with the first instruction; and after determining that each of the second instructions satisfying the preset condition is parsed, executing the first instruction. After the second instructions that satisfy the preset condition are parsed, mis-speculation and execution of the first instruction can be avoided, and thus a change in the Cache microstructure state caused by mis-speculation and execution of the first instruction does not occur; therefore, an attacker cannot steal data of a protected area by means of the change in the Cache microstructure state, thereby effectively defending a speculative memory access-based Cache delay side channel attack.

Description

Instruction execution method and device

Technical field

The present application relates to the field of communications, and in particular, to an instruction execution method and apparatus.

Background technique

At present, it is speculated that out-of-order execution is the basic technology for developing processor-level parallelism. The correct execution of any instruction must satisfy two points: the first point, the control dependency is satisfied, that is, the instruction is on the correct branch path; the second point, the data dependency is satisfied, that is, the source operand of the instruction is correctly obtained. It is speculated that out-of-order execution means that if the control dependency of an instruction has not been resolved, the instruction is speculatively executed as long as it is determined that the data dependency is satisfied. Of course, if a speculation is wrong, the instruction may be on the wrong branch path. Thereafter, when the processor detects a speculative error, the instruction for mispredicting execution is revoked and re-executed from the correct path, thereby ensuring correct program semantics.

On a processor that speculates on out-of-order execution, mispredicting the executed instruction does not change the architectural visible state of any structure definition, but changes the micro-architural state, such as physical registers (physical). Register file) and Cache, and this microstructural state change will not be undone with the revocation of the instruction that was speculatively executed. In the normal use of the system, since any microstructure state is not used by the software and is invisible, it is misunderstood that the microstructural state change caused by the execution does not cause a software operation error. However, with the development of hacker technology, a method called "Cache Delay Side Channel Attack" can accurately detect the change of Cache micro-structure state, resulting in a very threatening against existing processors. Attack means: Cache delay side channel attack based on speculative memory access. This kind of attack means, such as the Spectre attack, mainly uses the fetching execution of the memory access instruction to implement the attack. In principle, it is roughly divided into two steps: the first step is to construct a branch mispredicted scene so that it is on the wrong branch path. The speculative execution of the Load instruction accesses the data of the protected area; the second step is to use the data to construct the address index to access the Cache, which will cause the Cache microstructure state to change (for example, the corresponding cache line access changes from miss to hit), and then This change is detected by the Cache Delay Side channel, thereby stealing protected data content. Processors currently using speculative out-of-order execution techniques are generally incapable of defending against Cache latency side channel attacks based on speculative memory access.

In the prior art, the above-mentioned attacks are mainly defended by the following two emergency avoidance schemes: in one scheme, the branch target buffer (branch) is explicitly cleared during process switching by system software, such as an operating system (OS). Target buffer, BTB) prevents attacks, but this approach can greatly impair system performance. In another solution, in order to recompile and generate a binary code with a branch barrier feature for a specific sensitive code segment, since the compiler has not yet released a code for automatically identifying sensitive code, the programmer needs to modify the source code to explicitly indicate Sensitive code segments, but it is difficult to ensure that all programs, all sensitive code segments within the program are recognized and recompiled.

In summary, the prior art adopts an emergency avoidance scheme, which has two problems of completeness of defense and performance loss, and cannot effectively prevent Cache delay side channel attacks based on speculative memory access.

Summary of the invention

The embodiment of the present invention provides an instruction execution method and apparatus, which are used for effectively defending a Cache delay side channel attack based on a speculative memory access.

The first aspect provides an instruction execution method, in which the instruction execution device acquires a first instruction, and determines each second instruction that satisfies a preset condition, where the first instruction is an instruction to read a memory, and the second instruction is a disorder The branch instruction executed in sequence. After determining that the second instructions satisfying the preset condition are analyzed, the first instruction is executed. Since the first instruction is not speculatively executed after the second instruction that satisfies the preset condition is completed, the Cache microstructure state change caused by the misprediction of executing the first instruction does not occur, so the attacker cannot pass the attack. The Cache micro-structure state changes the data of the protected area, so that the Cache delay-side channel attack based on the speculative memory can be effectively defended.

In a possible design, in order to meet different levels of defense requirements, it can be achieved by setting different preset conditions. Optionally, the preset condition includes a preset type of control dependency relationship with the first instruction, in which case, after the second instruction with the preset type of control dependency of the first instruction is parsed, The first instruction can be executed without waiting for the second instruction having a control dependency with the first instruction to be parsed. In order to meet the higher level of defense requirements, further avoiding mispredicting the execution of the first instruction, optionally, the preset condition is that there is a control dependency relationship with the first instruction. In this case, it is necessary to wait for a control dependency relationship with the first instruction. The first instruction is executed after each second instruction is parsed.

In a possible design, before determining that the second instructions satisfying the preset condition are completed, the method further includes: determining, according to the sequence number of each second instruction that meets the preset condition, a value of the first variable; The sequence number of the second instruction or the value of the first variable that is not resolved in each of the second instructions determines the value of the second variable. Further, the determining, that the second instruction that meets the preset condition is parsed, comprises: if the value of the first variable is less than the value of the second variable, determining that the second instruction parsing that meets the preset condition is completed. If the value of the first variable is greater than or equal to the value of the second variable, it is determined that each second instruction that satisfies the preset condition is not parsed. In this way, according to the value of the first variable and the value of the second variable, and before executing the first instruction, a small amount of judgment logic is used to explicitly determine whether the first instruction has a control dependency risk, thereby ensuring that the first instruction is not The first instruction is executed in the case of controlling the dependency risk, thereby effectively preventing the Cache delay side channel attack based on the speculative memory access.

In a possible implementation, in a third implementation manner of the first aspect, the determining, according to the sequence number of each second instruction that meets the preset condition, determining a value of the first variable includes: determining that the content is satisfied a serial number of the second instruction having the largest serial number among the second instructions of the preset condition, as a value of the first variable; the serial number or the second instruction unresolved according to the second instruction satisfying the preset condition Determining the value of the first variable, determining the value of the second variable, comprising: determining a sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition, as a value of the second variable; or If it is determined that there is no unresolved second instruction in each of the second instructions that satisfy the preset condition, the value of the first variable is incremented by one as the value of the second variable. In this way, by comparing the value of the first variable and the value of the second variable, it is possible to accurately determine whether the second instructions satisfying the preset condition are parsed, and then after the second instructions satisfying the preset condition are parsed. The first instruction is executed to implement effective defense against the Cache delay side channel attack based on the speculative memory access.

In a possible design, the determining, by the second instruction that meets the preset condition, the sequence number of the second instruction having the smallest serial number and the unresolved, as the value of the second variable, including: After parsing the unparsed second instruction, the sequence number of the unresolved second instruction after the second instruction having the smallest and unresolved sequence number is used as the value of the second variable. In this way, the value of the second variable is dynamically updated, so that the value of the second variable always indicates the second instruction having the smallest serial number and the unresolved in each second instruction that satisfies the preset condition, thereby ensuring that each of the preset conditions is satisfied. Whether the second instruction resolves the accuracy of the completion.

In a second aspect, an instruction execution apparatus is provided, including an interceptor, a decoder, a pre-execution buffer, a scheduler, and an executor; the fetcher is configured to fetch an instruction in an instruction cache; For decoding the instruction, obtaining a decoding result of the instruction; the decoding result includes an instruction type; the pre-execution buffer is configured to store the decoding result of the instruction and the instruction; The scheduler is configured to: if the first instruction is obtained from the pre-execution buffer, determine: each second instruction that meets a preset condition, and determine, in determining the second instruction that meets the preset condition After completion, the first instruction is sent to an executor; the first instruction is an instruction to read a memory; the second instruction is a branch instruction executed out of order; if the first instruction is obtained from the pre-execution buffer The second instruction sends the second instruction to the executor; the executor is configured to execute the first instruction and the second instruction.

The instruction execution device can be at least one processing element or chip.

In a possible design, the preset condition includes a control dependency relationship with the first instruction.

In a possible design, the preset condition includes a preset type of control dependency with the first instruction.

In a possible design, the scheduler is further configured to: determine a value of the first variable according to the sequence number of each second instruction that meets the preset condition; according to the second instruction that meets the preset condition Determining the sequence number of the second instruction or the value of the first variable, determining a value of the second variable; if the value of the first variable is less than the value of the second variable, determining that the predetermined condition is met The second instruction analysis is completed.

In a possible design, the scheduler is specifically configured to: determine a sequence number of the second instruction having the largest serial number among the second instructions that meet the preset condition, as a value of the first variable; and determine that the content is satisfied a sequence number of the second instruction having the smallest and unresolved number among the second instructions of the preset condition, as the value of the second variable; or, if it is determined that the second instruction satisfying the preset condition does not exist in the unresolved The second instruction adds one to the value of the first variable as the value of the second variable.

In a possible design, the scheduler is specifically configured to: after the second instruction with the smallest and unresolved sequence number, parse the unresolved second after the second instruction with the smallest and unresolved sequence number The sequence number of the instruction as the value of the second variable.

In a third aspect, a chip is provided, the chip being coupled to a memory for reading and executing a software program stored in the memory to implement the method according to the first aspect or any of the possible designs above method.

In a fourth aspect, a readable storage medium is provided, the instructions being stored in the readable storage medium, when executed on a computer, causing the computer to perform the above first aspect or any of the above possible designs Methods.

DRAWINGS

FIG. 1 is a schematic diagram of a system architecture of a network device according to an embodiment of the present disclosure;

2 is a schematic structural diagram of a possible processor according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a possible scheduler according to an embodiment of the present disclosure;

FIG. 4 is a schematic flowchart of a method for executing an instruction according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described in the following with reference to the accompanying drawings in the embodiments.

The present invention provides a method for executing an instruction, which can be applied to a system architecture of a network device as shown in FIG. 1. Referring to FIG. 1 , a system architecture of a network device provided by an embodiment of the present application. As shown in FIG. 1, the system architecture 100 includes a memory 110, a processor 120, and a communication interface 130; wherein the memory 110, the processor 120, and the communication interface 130 are connected to each other.

The memory 110 may include a volatile memory such as a random-access memory (RAM); the memory may also include a non-volatile memory such as a flash memory. A hard disk drive (HDD) or a solid-state drive (SSD); the memory 110 may also include a combination of the above types of memories.

The processor 120 can be a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP. The processor 120 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof.

The communication interface 130 can be a wired communication access port, a wireless communication interface, or a combination thereof, wherein the wired communication interface can be, for example, an Ethernet interface. The Ethernet interface can be an optical interface, an electrical interface, or a combination thereof. The wireless communication interface can be a WLAN interface.

Based on the system architecture shown in FIG. 1 above, the instruction execution method provided by the present application may be implemented by some components in the processor 120.

In one possible design, the processor 120 shown in FIG. 1 above may include various components. Based on the system architecture shown in FIG. 1 , with reference to FIG. 2 , a schematic structural diagram of a possible processor provided by an embodiment of the present application is provided. As shown in FIG. 2, the processor 120 includes an instruction cache (Icache) 210, a pre-execution buffer 220, a fetcher 230, a decoder 240, a scheduler 250, an executor 260, and an intermediate register 270. The intermediate register 270 is used to store the result of the speculatively executed instruction.

The instruction cache 210 stores various types of instructions, such as branch instructions, memory access instructions, and other types of instructions. The fetcher 230 fetches the instructions in order from the instruction cache 210 and passes them to the decoder 240. The decoder 240 obtains the type of each instruction and then stores the instruction and its decoding information in the pre-execution buffer 220. For example, the decoder 240 decodes the four instructions acquired in order, and the decoding results are: the first instruction is a branch instruction, the second instruction is a fetch instruction, and the third instruction is a branch. The instruction, the fourth instruction is the fetch instruction.

The scheduler 250 can retrieve the instructions and their types from the pre-execution buffer 220. After the instruction is fetched, the scheduler 250 can also send each instruction to the executor 260 that executes the instruction. As shown in FIG. 1, the executor 260 includes other pipelines 261 and a memory 262. The memory access device 262 includes a memory access instruction queue 263. In a specific application, the memory access instruction includes two types: a load instruction and a store instruction. Therefore, the memory access instruction queue may be referred to as a load/store queue.

Specifically, if the instruction acquired by the scheduler 250 is a branch instruction, the branch instruction is sent to another pipeline 261 for execution. If the instruction acquired by the scheduler 250 is a memory access instruction, the memory access instruction is sent to the memory buffer 262 for execution.

Since there are control dependencies and data dependencies between the instructions, if the data dependencies are not met, the memory access instructions will not be executed. Therefore, the default data dependency of this application is satisfied. If only the control dependencies are considered, if one is accessed, The control of the instruction depends on unresolved, and the Cache delay side channel attack based on the speculative memory access may occur during the speculative out-of-order execution. In order to avoid such an attack, in the embodiment provided by the present application, if the instruction acquired by the scheduler 250 is a memory access instruction, the memory is fetched after the branch instruction of the memory access instruction existence control is completed. The instructions are sent to the memory 262 for execution. Optionally, if there is an unresolved branch instruction in each branch instruction that has a control dependency on the memory access instruction, the memory access instruction is first stored in the pre-execution buffer 220 until there is control dependency with the memory access instruction. The branch instruction is parsed, and the memory access instruction is dispatched from the pre-execution buffer 220 and transmitted to the memory buffer 262 for execution.

The memory 262 may store the memory access instruction in the Load/Store queue after receiving the memory access instruction, and execute each memory access instruction in an out-of-order manner when both the data dependency and the control dependency are satisfied.

For an instruction, the processing of the above-mentioned fetcher 230, decoder 240, scheduler 250, executor 260, etc. forms a pipeline including a plurality of pipelines, the pipeline includes fetching, decoding, scheduling, and The equal-flow segment is executed, wherein the finger-trigger 230 corresponds to the finger-flow segment, the decoder 240 corresponds to the decoded pipeline segment, the scheduler 250 corresponds to the distribution pipeline segment, and the actuator 260 corresponds to the execution pipeline segment. Alternatively, the pipeline may also include one or more of a water flow segment (not shown in FIG. 1 for renaming, transmitting, writing back, commanding, etc.) as shown in FIG.

Based on the processor shown in FIG. 2 above, in the embodiment of the present application, the instruction execution method is executed by the scheduler 250 in the processor 120. Referring to FIG. 3, it is a schematic structural diagram of a scheduler provided by an embodiment of the present application.

As shown in FIG. 3, the scheduler 250 includes an acquisition module 310, a determination module 320, and an execution module 330. among them:

The obtaining module 310 is configured to obtain an instruction from the pre-execution buffer 220. Optionally, the acquired instruction includes a branch instruction, a memory access instruction, and other types of instructions.

The determining module 320 is configured to determine, after the obtaining module 310 acquires the first instruction, a second instruction that meets the preset condition. The first instruction is an instruction to read the memory, and the second instruction is a branch instruction executed out of order.

The executing module 330 is configured to execute the first instruction after the scheduler 250 determines that the second instructions satisfying the preset condition are parsed.

With reference to FIG. 1, FIG. 2 and FIG. 3 above, the instruction execution method provided by the present application is specifically described below.

Based on the above description, FIG. 4 exemplarily shows a flow of an instruction execution method provided by the present application.

As shown in FIG. 4, the process specifically includes:

In step 401, the scheduler 250 acquires a first instruction, and the first instruction is an instruction to read the memory.

In the embodiment of the present application, the acquiring module 310 in the scheduler 250 acquires the first instruction, and the first instruction may also be referred to as a load instruction, that is, a load instruction.

Step 402, the scheduler 250 determines each second instruction that satisfies the preset condition, and the second instruction is a branch instruction that is executed out of order.

Each second instruction is scheduled to be out of sequence in the other pipelines 261 by the scheduler 250, and during execution of the second instructions, the scheduler 250 continues to schedule subsequent instructions to the scheduler 250. The second instructions in the other pipelines 261 can be executed out of order, and need not be executed in the instruction fetching order. For example, the three second instructions are respectively the second instruction 1 in the fetching decoding order. The two instructions 2 and the second instruction 3 are executed in other pipelines 261 as follows. For example, if the second instruction 1 is not executed, the second instruction 2 and the second instruction 3 may be executed first.

Step 403, after the scheduler 250 determines that the second instructions satisfying the preset condition are analyzed, the first instruction is executed.

When the first instruction arrives at the scheduler 250, the determination module 320 in the scheduler 250 is required to determine whether each second instruction that satisfies the preset condition is parsed, and if so, the first instruction is executed, and the execution action refers to scheduling The execution module 330 in the device 250 transmits the first instruction to the memory accessor 262; if not, that is, there is an unresolved second instruction, and then it is necessary to wait for the second instruction that satisfies the preset condition to be analyzed before executing the First instruction.

For a first instruction, the second instruction that needs to meet the preset condition is parsed, and then the first instruction is executed. If a plurality of first instructions need to be executed, the plurality of first instructions may be executed out of order. For example, for example, each of the second instructions that meet the preset condition is sequentially followed by the scheduler 250. For the first instruction A, the first instruction B, and the first instruction C, after the first instruction A is transmitted to the memory 262, the first instruction B is not parsed, and the first instruction C is parsed, then the first instruction is executed first. The instruction C, after the first instruction B, executes the first instruction B.

According to the solution provided by the foregoing embodiment, since the first instruction is not speculatively executed after the second instruction that satisfies the preset condition is completed, the Cache microstructure state caused by the mispredicting execution of the first instruction does not occur. Therefore, the attacker cannot steal the data of the protected area through the Cache micro-structure state change, so that the Cache delay-side channel attack based on the speculative memory can be effectively prevented.

In an optional implementation manner, the preset condition includes a control dependency relationship with the first instruction. That is, after the scheduler determines that the second instructions having the control dependency relationship with the first instruction are completed, the first instruction is executed.

It should be noted that whether the first instruction is executed or not needs to satisfy the condition that the control dependency has been resolved needs to satisfy the condition that the data dependency has been resolved. The first instruction cannot be executed because the data of the first instruction is unresolved, and the first instruction cannot be executed. Therefore, in the embodiment of the present application, when the first instruction is executed by default, the content is satisfied. The data depends on the condition that has been resolved.

In conjunction with FIG. 2 and the above-described embodiments, a specific example of an instruction execution process is provided below.

For example, the instruction cache includes four instructions, which are instruction 1, instruction 2, instruction 3, and instruction 4.

First, the above four instructions are sequentially taken out from the instruction cache. The decoding result is: instruction 1 is a branch instruction, instruction 2 is a load instruction, instruction 3 is a branch instruction, and instruction 4 is a load instruction.

Then, the above four instructions sequentially enter the scheduler 250. When the scheduler 250 receives the instruction 1, the instruction 1 is sent to the other pipelines 261 for execution. When the scheduler 250 receives the instruction 2, it is found that the instruction 1 has been parsed at this time, that is, the instruction 1 has been executed in the other pipelines 261, and the execution result is obtained, at which time the instruction 2 is transmitted to the memory 262. When the scheduler 250 receives the instruction 3, it is sent to the other pipeline 261 for execution. When the scheduler 250 receives the instruction 4, it finds that the instruction 3 is unresolved, that is, the control of the instruction 4 is unresolved at this time, then how to process the instruction 4 The following describes the difference between the solution adopted in this application and the solution adopted in the prior art:

In the prior art solution, at this time, the control of the instruction 4 depends on the unresolved, the scheduler 250 directly transmits the instruction 4 to the memory 262, and if the instruction 4 is mispresumed, the microstructural state in the cache is changed. The attacker can find internal data through the changed microstructure state, resulting in internal data leakage and data security risks.

In the solution of the present application, at this time, the control of the instruction 4 depends on unresolved, and the scheduler 250 suspends the transmission of the instruction 4. For example, the instruction 4 can be stored in the pre-execution buffer 220 until the instruction 3 is parsed, and the scheduler 250 The retransmission instruction 4 is executed in the memory 262. During the pause of transmission of instruction 4, the execution of other instructions after instruction 4 is not affected. Compared with the solution in the prior art, the solution of the present application can greatly improve data security, although the speculative execution scheme relative to the prior art can reduce the efficiency of instruction execution, but the impact of this performance degradation is much smaller than the background. The scheme of defense attacks given in the technology.

In order to meet different levels of defense needs, you can achieve this by setting different preset conditions.

In an optional implementation manner, the preset condition includes a preset type of control dependency relationship with the first instruction. The preset type may be a control dependency of a direct conditional branch jump, a control dependency of an indirect conditional branch jump, and may also be a dependency of other types of control.

For example, for example, only the control dependency of the direct conditional branch jump is tracked, that is, when the scheduler 250 determines that the branch instruction of the control dependent with the direct instruction branch jump of the first instruction is completed, the first execution can be performed. An instruction does not need to wait for all branch instructions that have control dependencies on the first instruction to be parsed. In this case, you can defend against the variant 1 "boundary detection attack" of the Spectre attack.

For example, only the control dependency of the indirect conditional branch jump is tracked, that is, when the scheduler 250 receives the second instruction, if it is determined that the second instruction is the second instruction that meets the preset condition, the second instruction is determined to be The branch instruction that is dependent on the control of the indirect conditional branch jump with the first instruction will track whether each branch instruction that has an indirect conditional branch jump with the first instruction is parsed or not. After the scheduler 250 determines that the branch instruction of the control dependent branch with the first instruction has an indirect conditional branch jump is completed, the first instruction can be executed without waiting for all branch instructions having a control dependency with the first instruction. Both are parsed. In this case, variant 2 "branch target injection" of the Spectre attack can be defended.

In order to meet the higher level of defense requirements, further avoiding mispredicting the execution of the first instruction, optionally, the preset condition is that there is a control dependency relationship with the first instruction. In this case, it is necessary to wait for a control dependency relationship with the first instruction. After all the second instructions have been parsed, the first instruction can be executed. Compared with the previous implementation manner, in this embodiment, it is required to determine whether all the second instructions having the control dependency relationship with the first instruction are parsed, and the number of the second instructions that need to be tracked is only the second of the preset type. The number of instructions is more, so the defense attack is better.

In the embodiment of the present application, an optional implementation manner of the foregoing FIG. 4 may be: setting a load filter in the scheduler 250, and implementing a filtering function of the load filter to implement defense based on the speculative memory access. Cache delay side channel attack. Specifically, when a different defense level needs to be set, the filtering function corresponding to each defense level can be set. For example, by reducing the type of branch instruction tracked by the Load Filter in the embodiment of the present application, Adjust the defense level. Of course, if the Cache delay side channel attack based on the speculative memory access is not required, the filtering function of the Load Filter is turned off, so that the second instruction that is dependent on the first instruction is not tracked, and the second instruction is not The pipeline has any logical impact.

Optionally, based on step 403 in the foregoing embodiment, before determining that the second instruction parsing that meets the preset condition is completed, the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction that satisfies the preset condition, according to The sequence number of the second instruction or the value of the first variable that is not resolved in each second instruction that satisfies the preset condition determines the value of the second variable. Then, the scheduler 250 determines whether each of the second instructions satisfying the preset condition is parsed based on the value of the first variable and the value of the second variable. Specifically, if the value of the first variable is smaller than the value of the second variable, the scheduler 250 determines that the second instruction parsing that satisfies the preset condition is completed. If the value of the first variable is not less than the value of the second variable, it is determined that each second instruction that satisfies the preset condition is not parsed. Optionally, the sequence number is determined according to a decoding sequence of each second instruction that satisfies a preset condition.

With this alternative embodiment, the scheduler 250 can explicitly determine whether the first instruction has a control dependency risk based on the value of the first variable and the value of the second variable, and a small amount of decision logic before executing the first instruction. In addition, the first instruction can be executed in the case that the first instruction does not have a control dependency risk, thereby effectively preventing the Cache delay side channel attack based on the speculative memory access.

Optionally, the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction that meets the preset condition, and specifically includes: the scheduler 250 determines that the second serial number of each second instruction that meets the preset condition is the largest. The sequence number of the instruction as the value of the first variable. The scheduler 250 determines the value of the second variable according to the sequence number of the unresolved second instruction or the value of the first variable in each second instruction that satisfies the preset condition, and includes any one of the following two methods:

In a first manner, the scheduler 250 determines the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition as the value of the second variable.

In the second mode, if the scheduler 250 determines that there is no unresolved second instruction in each second instruction that satisfies the preset condition, the value of the first variable is incremented by one as the value of the second variable.

If there is an unresolved second instruction in each second instruction that meets the preset condition, the determining module 320 in the scheduler 250 determines the value of the second variable according to the mode 1, if the second instruction that meets the preset condition does not There is an unresolved second instruction, and the determination module 320 in the scheduler 250 determines the value of the second variable according to mode two. Through the above embodiment, the determining module 320 in the scheduler 250 can accurately determine whether the second instructions satisfying the preset condition are parsed by comparing the value of the first variable and the value of the second variable, and then satisfy After the second instruction of the preset condition is parsed, the first instruction is executed to implement effective defense against the Cache delay side channel attack based on the speculative memory access.

Optionally, based on the foregoing manner 1, the scheduler 250 determines the sequence number of the second instruction that is the smallest and unresolved in the second instruction that meets the preset condition, and the value of the second variable is specifically included in the scheduler 250. The determining module 320 takes the sequence number of the unresolved second instruction after the second instruction with the smallest and unresolved sequence number as the value of the second variable after parsing the second instruction with the smallest and unresolved sequence number. In this way, by dynamically updating the value of the second variable, the value of the second variable always indicates the second instruction with the smallest serial number and the unresolved in each second instruction that satisfies the preset condition, thereby ensuring that each of the preset conditions is satisfied. Whether the second instruction resolves the accuracy of the completion.

In the specific embodiment of the present application, the parsing state of each second instruction may be tracked by adding two globally accessed registers to the processor that is inferred to perform out-of-order execution, and the parsing state includes parsed and unparsed. The following takes the first instruction as the Load instruction and the second instruction as the branch instruction as an example, and is described in detail in conjunction with the specific embodiment.

To implement the solution in this application, the pipeline logic added in each pipeline segment of the processing instruction is shown in Table 1 below.

Table 1

As shown in Table 1, the pipeline includes Fetch, Decode, Rename, Dispatch, Issue, Execution, WrBack, and Command ( Commit) and other eight flow sections.

First, two globally accessed registers are added to the processor 120 that speculates that the out-of-order execution is performed, respectively:

The first register maintains a first variable in the distribution pipeline segment, the first variable records the latest dispatched branch sequence number (LBrSN), and the value of LBrSN is incremented by one each time a branch instruction is dispatched.

The second register maintains a second variable in the execution pipeline segment, the second variable recording the oldest un-resolve branch sequence number (NRBrSN). If the branch order buffer (BOB) is empty, or the branch instruction that satisfies the preset condition has been resolved, the value of NRBrSN is equal to the value of LBrSN plus one; if a branch instruction meeting the preset condition is found in the BOB Unresolved, the NRBrSN points to the oldest and unresolved branch number.

In a specific application, as the branch instruction is parsed, the NRBrSN value will continue to be updated for multiple cycles.

For example, in the current cycle, if the branch instruction currently pointed to by the NRBrSN has been parsed, it is searched backwards along the BOB until the next unresolved branch instruction is found. If the next unresolved branch instruction is far away, such as spanning multiple BOB entries, the next unresolved branch instruction may not be found in the current cycle, such as the BOB in the end of the current cycle. 20 branch instructions, and the branch instruction has been parsed, then the next cycle continues to search backwards from the 21st branch instruction in the BOB until an unresolved branch instruction is found or BOB_end is reached.

If the NRBrSN does not point to an unresolved branch instruction during the above-mentioned multiple cycles of finding an unresolved branch instruction, this will not cause a functional error, and will only cause the corresponding Load instruction to be delayed for several cycles and then retransmitted, that is, It is necessary to wait until the branch instruction related to the Load instruction is parsed and then executed.

Second, as shown in Table 1, for each load instruction, a logical maintenance load.BrSN is added to the load instruction in the distribution pipeline, and the value of LBrSN is assigned to load.BrSN.

If the value of the load.BrSN corresponding to the Load instruction is less than NRBrSN, it indicates that each branch instruction having a control dependency relationship with the Load instruction has been parsed, and there is no control dependency risk when the Load instruction is executed.

Again, in the transmit pipeline segment, an explicit control dependency is determined for the transmit logic of the Load instruction, that is, the load.BrSN value of the Load command is less than the value of the NRBrSN, and the control dependency of the Load command is determined to be resolved. .

The transmission logic of the Load instruction in the prior art is: the data dependency is resolved, that is, the register dependency and the write read dependency are satisfied, and the Load instruction is transmitted.

In the embodiment of the present application, the transmission logic of the Load instruction needs to satisfy the data dependency to be resolved, and the control dependency needs to be resolved. The launch logic in Table 1 is: if(reg_dep_ok & mem dep ok& load.BrSN<NRBrSN) issue load.

As such, the speculative access of the Load has no control dependency risk, and is sufficient to defend against the Cache latency side channel attack based on speculative memory access such as Spectre attack. Moreover, when the pipeline is emptied, such as the pipeline due to branch misprediction, the LBrSN, NRBrSN, and load.BrSN do not require additional recovery logic and do not affect the processing efficiency of the instruction.

In conjunction with Table 1 and the above-described embodiments, a specific embodiment is provided below.

For example, assume that when the load instruction A is dispatched, the three instructions distributed before the load instruction A are in the order of distribution: branch instruction 1, branch instruction 2, and branch instruction 3, wherein the branch instruction 1 has Analysis, branch instruction 2 is not parsed, branch instruction 3 has been parsed.

The update of the value of LBrSN is as follows: when branch instruction 1 enters the distribution pipeline segment, LBrSN=1; when branch instruction 2 enters the distribution pipeline segment, LBrSN=2; when branch instruction 3 enters the distribution pipeline segment, LBrSN=3; When the fetch command A enters the distribution pipeline, load.BrSN=3.

The update of the value of NRBrSN is as follows:

When the load instruction A is dispatched, since the branch instruction 1 has been parsed, the branch instruction 2 is not resolved, and the branch instruction 3 has been parsed, at which time NRBrSN=2. Therefore, the control of the load command A may be determined to be unresolved and the load command A is suspended, according to the value of the load.BrSN being greater than the value of the NRBrSN.

When the branch instruction 2 is parsed, since the branch instruction 3 has been parsed, and there is no branch instruction following the branch instruction 3 with the control instruction A, the NRBrSN=4. Therefore, it can be determined that the control dependency of the load instruction A has been resolved according to load.BrSN is smaller than load.BrSN, and the load instruction A is executed.

With the above embodiment, it is only necessary to add a small amount of pipeline logic to the processor that is speculatively executed out of order, without having to rely on the programmer and the compiler to recompile the program, and there is no need to clear the branch target buffer in the software switching process (branch target) Buffer, BTB), so compared with the solution provided in the prior art, it can effectively prevent the side channel attack based on the Cache based on the speculative memory, and can significantly reduce the performance loss.

In the implementation of the present application, the processor may also be a chip, and the chip is connected to the memory for reading and executing the software program stored in the memory to implement the execution method in any of the foregoing embodiments.

Embodiments of the present application also provide a computer storage medium for storing computer software instructions for execution of the above instructions, including program code for performing the above method embodiments.

Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, apparatus (device), computer readable storage medium, or computer program product. Thus, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware aspects, which are collectively referred to herein as "module" or "system."

The present application is described with reference to flowchart illustrations and/or block diagrams of the methods, apparatus, and computer program products of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

While the invention has been described with respect to the specific embodiments and embodiments thereof, various modifications and combinations may be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are to be construed as the It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims

An instruction execution method, comprising:

Obtaining a first instruction; the first instruction is an instruction to read a memory;

Determining each second instruction that satisfies a preset condition; the second instruction is a branch instruction that is executed out of order;

After determining that the second instructions satisfying the preset condition are analyzed, the first instruction is executed.
The method of claim 1 wherein said predetermined condition comprises a control dependency relationship with said first instruction.
The method of claim 2, wherein the predetermined condition comprises a preset type of control dependency with the first instruction.
The method of claim 1, wherein before determining that the second instructions satisfying the preset condition are analyzed, the method further comprises:

Determining a value of the first variable according to the serial number of each second instruction that satisfies the preset condition;

Determining a value of the second variable according to the sequence number of the second instruction that is not parsed in each second instruction that satisfies the preset condition or the value of the first variable;

Determining that the second instructions satisfying the preset condition are parsed, including:

If the value of the first variable is smaller than the value of the second variable, it is determined that the second instruction parsing that satisfies the preset condition is completed.
The method according to claim 4, wherein the determining the value of the first variable according to the sequence number of each second instruction that satisfies the preset condition comprises:

Determining, as the value of the first variable, the serial number of the second instruction having the largest serial number among the second instructions satisfying the preset condition;

Determining a value of the second variable according to the sequence number of the unresolved second instruction or the value of the first variable in each second instruction that meets the preset condition, including:

Determining, as the value of the second variable, the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition; or, if determining the second instruction that meets the preset condition If there is no unresolved second instruction, the value of the first variable is incremented by one as the value of the second variable.
The method according to claim 5, wherein the determining the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions that satisfy the preset condition, as the value of the second variable, comprises:

After parsing the second instruction with the smallest and unresolved sequence number, the sequence number of the unresolved second instruction after the second instruction with the smallest and unresolved sequence number is used as the value of the second variable.
An instruction execution apparatus, comprising: an indexer, a decoder, a pre-execution buffer, a scheduler, and an executor;

The fetcher is configured to fetch an instruction in the instruction cache;

The decoder is configured to decode the instruction to obtain a decoding result of the instruction; the decoding result includes an instruction type;

The pre-execution buffer is configured to store the instruction and a decoding result of the instruction;

The scheduler is configured to:

If the first instruction is obtained from the pre-execution buffer, determining: each second instruction that satisfies the preset condition, and after determining that the second instruction that satisfies the preset condition is completed, the first Sending an instruction to an executor; the first instruction is an instruction to read a memory; and the second instruction is a branch instruction executed out of order;

And if the second instruction is obtained from the pre-execution buffer, sending the second instruction to the executor;

The executor is configured to execute the first instruction and the second instruction.
The apparatus of claim 7, wherein the predetermined condition comprises a control dependency relationship with the first instruction.
The apparatus of claim 7, wherein the preset condition comprises a preset type of control dependency with the first instruction.
The apparatus of claim 7, wherein the scheduler is further configured to:

Determining a value of the first variable according to the serial number of each second instruction that satisfies the preset condition;

Determining a value of the second variable according to the sequence number of the second instruction that is not parsed in each second instruction that satisfies the preset condition or the value of the first variable;

If the value of the first variable is smaller than the value of the second variable, it is determined that the second instruction parsing that satisfies the preset condition is completed.
The apparatus according to claim 10, wherein the scheduler is specifically configured to:

Determining, as the value of the first variable, the serial number of the second instruction having the largest serial number among the second instructions satisfying the preset condition;

Determining, as the value of the second variable, the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition; or, if determining the second instruction that meets the preset condition If there is no unresolved second instruction, the value of the first variable is incremented by one as the value of the second variable.
The apparatus according to claim 11, wherein the scheduler is specifically configured to:

After parsing the second instruction with the smallest and unresolved sequence number, the sequence number of the unresolved second instruction after the second instruction with the smallest and unresolved sequence number is used as the value of the second variable.
A chip, characterized in that the chip is connected to a memory for reading and executing a software program stored in the memory to implement the method according to any one of claims 1 to 6.
A computer storage medium comprising computer readable instructions for causing a computer to perform the method of any one of claims 1 to 6 when the computer reads and executes the computer readable instructions.