CN111936968A

CN111936968A - Instruction execution method and device

Info

Publication number: CN111936968A
Application number: CN201880091562.8A
Authority: CN
Inventors: 李国柱; 孙涛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-04-21
Filing date: 2018-04-21
Publication date: 2020-11-13
Also published as: WO2019200618A1

Abstract

An instruction execution method and device are used for achieving the purpose of effectively defending channel attacks on a Cache time delay side based on speculative memory access. The method in the embodiment of the application comprises the following steps: the method comprises the steps of obtaining a first instruction, wherein the first instruction is an instruction for reading a memory, determining second instructions meeting preset conditions, the second instructions are branch instructions executed out of order, the preset conditions comprise that a control dependency relation exists between the second instructions and the first instruction, and executing the first instruction after the second instructions meeting the preset conditions are determined to be analyzed. After the analysis of each second instruction meeting the preset condition is completed, the misspeculation execution of the first instruction can be avoided, and further the change of the state of the Cache microstructure caused by the misspeculation execution of the first instruction can not occur, so that an attacker cannot steal data in a protected area through the change of the state of the Cache microstructure, and the effective defense of the Cache time delay side channel attack based on the speculative access can be realized.

Description

Instruction execution method and device

Technical Field

The present application relates to the field of communications, and in particular, to a method and an apparatus for executing an instruction.

Background

Speculative out-of-order execution is currently the fundamental technique for modern processors to exploit instruction-level parallelism. The correct execution of any instruction must satisfy two points: first, the control dependence is satisfied, i.e., the instruction is on the correct branch path; second, data dependencies are satisfied, i.e., the source operands of the instruction are correctly obtained. Speculative out-of-order execution refers to speculatively executing a certain instruction if the control dependency of the instruction is not resolved, and the data dependency is determined to be satisfied. Of course, if the speculation is incorrect, the instruction may be on the wrong branch path. Thereafter, when the processor detects a speculation error, the speculatively executed instructions are retired and re-executed from the correct path, thereby ensuring that the program semantics are correct.

On processors that speculatively execute out-of-order, a mispredicted instruction may not change any architecturally-defined software-visible state (architecturally-visible state), but may change the micro-architectural state (micro-architecturally state), such as physical registers (physical register files) and caches (caches), and such micro-architectural state changes may not be undone as the mispredicted instruction is retired. In normal use of the system, since any micro-architectural state is not used and is invisible to software, the change in micro-architectural state caused by misspeculation execution does not cause software to run incorrectly. However, with the development of hacker technology, a method called "Cache latency side channel attack" can accurately detect the change of the Cache microstructure state, thereby creating a very threatening attack means for the existing processor: and (4) predicting Cache time delay side channel attack of the access. The attack means such as ghost (spectrum) attack mainly utilizes the access instruction executed by misspeculation to realize attack, and is roughly divided into two steps in principle: the method comprises the following steps that firstly, a branch misprediction scene is constructed, so that a Load instruction which is in a wrong branch path and is executed by misspeculation accesses data of a protected area; and secondly, constructing an address index by using the data to access the Cache, which can cause the change of the microstructure state of the Cache (for example, the corresponding Cache line access is changed from miss to hit), and detecting the change through a Cache time delay side channel so as to steal the protected data content. The current processor using the speculative out-of-order execution technology can not resist the Cache time delay side channel attack based on speculative access in hardware.

In the prior art, the attack is defended mainly through the following two emergency evasion schemes: in one scheme, a Branch Target Buffer (BTB) is explicitly cleared from attacks by system software, such as an Operating System (OS), during process switches, but this scheme can significantly compromise system performance. In another scheme, in order to generate a binary code with branch boundary (branch barrier) characteristic for recompiling and for a specific sensitive code segment, since no compiler for automatically identifying sensitive code has been issued in the industry at present, a programmer is required to modify source code to explicitly indicate a sensitive code segment, but it is difficult to ensure that all programs and all sensitive code segments in the programs are identified and recompiled.

In conclusion, the prior art adopts an emergency evasion scheme, which has the problems of two aspects of defense completeness and performance loss, and cannot effectively defend the Cache time delay side channel attack based on speculative access.

Disclosure of Invention

The embodiment of the application provides an instruction execution method and device, which are used for achieving the purpose of effectively defending Cache time delay side channel attacks based on speculative memory access.

In a first aspect, a method for executing an instruction is provided, in which an instruction execution apparatus obtains a first instruction, and determines second instructions that satisfy a preset condition, where the first instruction is an instruction to read a memory, and the second instructions are branch instructions to be executed out-of-order. And executing the first instruction after the second instructions meeting the preset conditions are determined to be analyzed. After the analysis of each second instruction meeting the preset condition is completed, the first instruction can be prevented from being executed by misspeculation, and further the state change of the Cache microstructure caused by the execution of the first instruction by misspeculation cannot occur, so that an attacker cannot steal data in a protected area through the state change of the Cache microstructure, and the Cache time delay side channel attack based on the speculative access can be effectively defended.

In one possible design, different preset conditions may be set to meet different levels of defense requirements. Optionally, the preset condition includes that a preset type of control dependency exists between the preset condition and the first instruction, in this case, after the analysis of each second instruction having the preset type of control dependency with the first instruction is completed, the first instruction may be executed without waiting for the analysis of each second instruction having the control dependency with the first instruction to be completed. In order to meet the defense requirement of a higher level, the first instruction is further prevented from being executed by misspeculation, optionally, the preset condition is that a control dependency relationship exists between the first instruction and the second instruction, and in this case, the first instruction needs to be executed after the second instructions having the control dependency relationship with the first instruction are analyzed.

In one possible design, before determining that each second instruction that satisfies the preset condition is resolved, the method further includes: determining the value of the first variable according to the serial numbers of the second instructions meeting the preset condition; and determining the value of the second variable according to the sequence number of the unresolved second instruction or the value of the first variable in each second instruction meeting the preset condition. Further, the determining that the second instructions meeting the preset condition are analyzed, includes: and if the value of the first variable is smaller than that of the second variable, determining that the analysis of each second instruction meeting the preset condition is completed. And if the value of the first variable is larger than or equal to the value of the second variable, determining that each second instruction meeting the preset condition is not analyzed and completed. Therefore, whether the first instruction has a control dependence risk or not can be explicitly judged according to the value of the first variable, the value of the second variable and a small amount of judgment logic before the first instruction is executed, and the first instruction can be executed under the condition that the first instruction does not have the control dependence risk, so that the Cache time delay side channel attack based on the speculative access is effectively defended.

In a possible design, in a third implementation manner of the first aspect, the determining, according to the sequence number of each second instruction that satisfies the preset condition, the value of the first variable includes: determining the serial number of the second instruction with the largest serial number in the second instructions meeting the preset conditions as the value of the first variable; the determining a value of a second variable according to the sequence number of the unresolved second instruction in the second instructions meeting the preset condition or the value of the first variable includes: determining the serial number of the unresolved second instruction with the smallest serial number in the second instructions meeting the preset condition as the value of a second variable; or, if it is determined that there is no unresolved second instruction in the second instructions that satisfy the preset condition, adding one to the value of the first variable to serve as the value of the second variable. Therefore, whether the second instructions meeting the preset conditions are analyzed or not can be accurately determined by comparing the values of the first variable and the second variable, and then the first instructions are executed after the second instructions meeting the preset conditions are analyzed, so that the Cache time delay side channel attack based on the speculative access is effectively defended.

In one possible design, the determining, as the value of the second variable, a sequence number of an unresolved second instruction having a smallest sequence number in the second instructions that satisfy the preset condition includes: and after the second instruction with the minimum sequence number and the unresolved sequence number is resolved, taking the sequence number of the unresolved second instruction positioned after the second instruction with the minimum sequence number and the unresolved sequence number as the value of the second variable. Therefore, the value of the second variable is dynamically updated, so that the value of the second variable always indicates the second instruction which has the smallest sequence number and is not analyzed in the second instructions meeting the preset condition, and the accuracy of determining whether the second instructions meeting the preset condition are analyzed is ensured.

In a second aspect, an instruction execution apparatus is provided, which includes an instruction fetcher, a decoder, a pre-execution buffer, a scheduler, and an executor; the instruction fetching device is used for fetching an instruction from the instruction cache; the decoder is used for decoding the instruction to obtain a decoding result of the instruction; the decode result comprises an instruction type; the pre-execution buffer is used for storing the instruction and a decoding result of the instruction; the scheduler is configured to: if a first instruction is fetched from the pre-execution buffer: determining second instructions meeting preset conditions, and after determining that the second instructions meeting the preset conditions are analyzed, sending the first instructions to an actuator; the first instruction is an instruction for reading a memory; the second instruction is an out-of-order executed branch instruction; if a second instruction is acquired from the pre-execution buffer, sending the second instruction to the executor; the executor is used for executing the first instruction and the second instruction.

The instruction execution means may be at least one processing element or chip.

In one possible design, the predetermined condition includes a control dependency existing with the first instruction.

In one possible design, the predetermined condition includes a predetermined type of control dependency on the first instruction.

In one possible design, the scheduler is further to: determining the value of a first variable according to the serial numbers of the second instructions meeting the preset condition; determining the value of a second variable according to the sequence number of an unresolved second instruction in each second instruction meeting the preset condition or the value of the first variable; and if the value of the first variable is smaller than that of the second variable, determining that the analysis of each second instruction meeting the preset condition is completed.

In one possible design, the scheduler is specifically configured to: determining the serial number of the second instruction with the largest serial number in the second instructions meeting the preset conditions as the value of the first variable; determining the serial number of the unresolved second instruction with the smallest serial number in the second instructions meeting the preset condition as the value of a second variable; or, if it is determined that there is no unresolved second instruction in the second instructions that satisfy the preset condition, adding one to the value of the first variable to serve as the value of the second variable.

In one possible design, the scheduler is specifically configured to: and after the second instruction with the minimum sequence number and the unresolved sequence number is resolved, taking the sequence number of the unresolved second instruction positioned after the second instruction with the minimum sequence number and the unresolved sequence number as the value of the second variable.

In a third aspect, there is provided a chip, said chip being connected to a memory for reading and executing a software program stored in said memory for implementing the method according to the first aspect or any one of the possible designs above.

In a fourth aspect, there is provided a readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the method of the first aspect above or any one of the above possible designs.

Drawings

Fig. 1 is a schematic diagram of a system architecture of a network device according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a possible processor architecture according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a possible scheduler structure according to an embodiment of the present application;

fig. 4 is a flowchart illustrating an instruction execution method according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The method for executing the instructions can be applied to the system architecture of the network device shown in fig. 1. Referring to fig. 1, a system architecture of a network device according to an embodiment of the present application is shown. As shown in fig. 1, system architecture 100 includes memory 110, processor 120, and communication interface 130; wherein the memory 110, the processor 120 and the communication interface 130 are connected to each other.

Memory 110 may include volatile memory (volatile memory), such as random-access memory (RAM); the memory may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); the memory 110 may also comprise a combination of the above-mentioned kinds of memories.

The processor 120 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP. The processor 120 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The communication interface 130 may be a wired communication access port, a wireless communication interface, or a combination thereof, wherein the wired communication interface may be an ethernet interface, for example. The ethernet interface may be an optical interface, an electrical interface, or a combination thereof. The wireless communication interface may be a WLAN interface.

Based on the system architecture shown in fig. 1, the instruction execution method provided by the present application may be implemented by some components in the processor 120.

In one possible design, processor 120 of FIG. 1, described above, may include a variety of components. Based on the system architecture shown in fig. 1, referring to fig. 2, a schematic diagram of a possible processor structure provided for the embodiment of the present application is shown. As shown in fig. 2, the processor 120 includes an instruction cache (Icache) 210, a pre-execution buffer 220, an instruction fetcher 230, a decoder 240, a scheduler 250, an executor 260, and an intermediate register 270. Intermediate register 270 is used to store the results of speculatively executed instructions.

Instruction cache 210 stores various types of instructions, such as branch instructions, memory access instructions, and other types of instructions. Instruction fetcher 230 fetches instructions from instruction cache 210 in order and passes them to decoder 240. The decoder 240 obtains the type of each instruction and stores the instruction and its decoded information in the pre-execution buffer 220. For example, the decoder 240 decodes 4 instructions obtained in sequence, and the decoding results are: the 1 st instruction is a branch instruction, the 2 nd instruction is a memory access instruction, the 3 rd instruction is a branch instruction, and the 4 th instruction is a memory access instruction.

Scheduler 250 may fetch the instructions and their types from pre-execution buffer 220. After fetching the instructions, scheduler 250 may also send each instruction to an executor 260 that executes the instruction. As shown in FIG. 1, the executor 260 includes other pipelines 261 and a memory access 262. The access memory 262 includes an access instruction queue 263, and in particular applications, the access instruction includes both a Load (Load) instruction and a Store (Store) instruction, so the access instruction queue may also be referred to as a Load/Store queue.

Specifically, if the instruction fetched by the scheduler 250 is a branch instruction, the branch instruction is sent to the other pipeline 261 to be executed. If the instruction obtained by scheduler 250 is a memory access instruction, the memory access instruction is sent to memory 262 for execution.

Because the control dependence and the data dependence exist among the instructions, if the data dependence is not satisfied, the access instruction cannot be executed, so the default data dependence is satisfied, and under the condition of only considering the control dependence, if the control dependence of one access instruction is not analyzed, the Cache time delay side channel attack based on the speculative access may occur in the speculative out-of-order execution process. In order to avoid such an attack, in the embodiment provided in the present application, if the instruction obtained by the scheduler 250 is a memory access instruction, after the resolution of each branch instruction having a control dependency with the memory access instruction is completed, the memory access instruction is sent to the memory access 262 for execution. Optionally, if an unresolved branch instruction exists in each branch instruction having a control dependency with the access instruction, the access instruction is stored in the pre-execution buffer 220 until the branch instruction having a control dependency with the access instruction completes the resolution, and then the access instruction is scheduled from the pre-execution buffer 220 and transmitted to the memory access 262 for execution.

The access registers 262 may Store access instructions in the Load/Store queue after receiving the access instructions, and execute the access instructions out of order when both data dependencies and control dependencies are satisfied.

For an instruction, a pipeline including a plurality of pipeline segments is formed through the processing processes of the instruction fetcher 230, the decoder 240, the scheduler 250, the actuator 260, and the like, and the pipeline includes pipeline segments for instruction fetching, decoding, scheduling, executing, and the like, wherein the instruction fetcher 230 corresponds to the instruction fetching pipeline segment, the decoder 240 corresponds to the decoding pipeline segment, the scheduler 250 corresponds to the distribution pipeline segment, and the actuator 260 corresponds to the execution pipeline segment. Optionally, the pipeline may also include one or more of the pipeline segments (the pipeline segment-dependent machines are not shown in FIG. 1) such as rename, launch, write-back, command, etc. as shown in FIG. 2.

Based on the processor shown in fig. 2, in the embodiment of the present application, the instruction execution method is executed by the scheduler 250 in the processor 120. Referring to fig. 3, a schematic diagram of a scheduler structure provided in an embodiment of the present application is shown.

As shown in FIG. 3, scheduler 250 includes an acquisition module 310, a determination module 320, and an execution module 330. Wherein:

the fetch module 310 is configured to fetch instructions from the pre-execution buffer 220, wherein the fetched instructions may include branch instructions, memory access instructions, and other types of instructions.

The determining module 320 is configured to determine a second instruction meeting the preset condition after the obtaining module 310 obtains the first instruction. The first instruction is an instruction for reading the memory, and the second instruction is a branch instruction executed out of order.

The execution module 330 is configured to execute the first instruction after the scheduler 250 determines that the parsing of each second instruction that meets the preset condition is completed.

With reference to fig. 1, fig. 2, and fig. 3, the following describes an instruction execution method provided in the present application.

Based on the above description, fig. 4 schematically shows a flow of an instruction execution method provided in the present application.

As shown in fig. 4, the process specifically includes:

in step 401, the scheduler 250 obtains a first instruction, where the first instruction is an instruction to read the memory.

In this embodiment, the obtaining module 310 in the scheduler 250 obtains the first instruction, which may also be referred to as a Load instruction, i.e., a Load instruction.

In step 402, the scheduler 250 determines second instructions that satisfy a predetermined condition, where the second instructions are out-of-order branch instructions.

Each second instruction is dispatched by the scheduler 250 to the other pipeline 261 for out-of-order execution, during which the scheduler 250 continues to schedule subsequent instructions to the scheduler 250. The second instructions in the other pipelines 261 may be executed out of order, and need not be executed according to the instruction fetch decoding order, for example, the three second instructions are the second instruction 1, the second instruction 2, and the second instruction 3 in the instruction fetch decoding order, and the execution conditions in the other pipelines 261 are as follows: for example, second instruction 1 is not executed, and second instruction 2 and second instruction 3 may be executed first.

In step 403, after the scheduler 250 determines that the parsing of each second instruction satisfying the preset condition is completed, the first instruction is executed.

When the first instruction reaches the scheduler 250, the determining module 320 in the scheduler 250 is required to determine whether the parsing of each second instruction meeting the preset condition is completed, if so, the first instruction is executed, and the executing action refers to that the executing module 330 in the scheduler 250 transmits the first instruction to the memory access 262; if not, that is, the unresolved second instructions exist, the first instruction is executed after the resolution of each second instruction meeting the preset condition is completed.

For a first instruction, after each second instruction which needs to meet the preset condition is analyzed, the first instruction is executed. For example, each second instruction meeting the preset condition is a first instruction a, a first instruction B, and a first instruction C in sequence according to the sequence of reaching the scheduler 250, and after the first instruction a is transmitted to the memory access 262, the first instruction B is not resolved, and the first instruction C is resolved, the first instruction C is executed first, and after the first instruction B is executed, the first instruction B is executed.

According to the scheme provided by the embodiment, after the second instructions meeting the preset conditions are analyzed, the first instructions can be prevented from being executed by misspeculation, and further the Cache microstructure state change caused by the execution of the first instructions by misspeculation can be avoided, so that an attacker cannot steal data in a protected area through the Cache microstructure state change, and the Cache time delay side channel attack based on speculative access can be effectively defended.

In an alternative implementation, the preset condition includes that a control dependency exists with the first instruction. That is, after the scheduler determines that each second instruction having a control dependency relationship with the first instruction has been resolved, the first instruction is executed.

It should be noted that whether or not the first instruction executes needs to satisfy the condition that the data dependency has been resolved in addition to the condition that the control dependency has been resolved. Since the correct source operand cannot be obtained, that is, the first instruction cannot be executed, when the data dependency of the first instruction is not resolved, in the embodiment of the present application, the condition that the data dependency is resolved is satisfied when the first instruction is executed by default.

With reference to fig. 2 and the above embodiments, a specific example describing the instruction execution process is provided below.

For example, four instructions, instruction 1, instruction 2, instruction 3, and instruction 4, are included in the instruction cache.

Firstly, the four instructions are sequentially fetched from the instruction cache, and the decoding result is as follows: instruction 1 is a branch instruction, instruction 2 is a Load instruction, instruction 3 is a branch instruction, and instruction 4 is a Load instruction.

Then, the four instructions sequentially enter the scheduler 250, and when the scheduler 250 receives instruction 1, the instruction 1 is sent to the other pipelined unit 261 to be executed. When scheduler 250 receives instruction 2, it finds that instruction 1 has resolved at this time, i.e. instruction 1 has completed execution in other pipeline 261, and obtains the execution result, and then issues instruction 2 into memory access 262. When the scheduler 250 receives the instruction 3, the instruction 3 is sent to the other pipeline 261 for execution, and when the scheduler 250 receives the instruction 4, the instruction 3 is found to be unresolved, that is, the control dependency of the instruction 4 is unresolved at this time, how to process the instruction 4 is described below separately for the scheme adopted in the present application and the scheme adopted in the prior art:

in the prior art, at this time, the control dependency of instruction 4 is not resolved, the scheduler 250 directly transmits instruction 4 to the memory access 262 for execution, if the instruction 4 is executed by misspeculation, the microstructure state in the cache may be changed, and an attacker may find internal data through the changed microstructure state, so that the internal data is leaked, and a data security hazard exists.

In the present embodiment, when the control dependency of instruction 4 is not resolved, the scheduler 250 may suspend transmitting instruction 4, for example, instruction 4 may be stored in the pre-execution buffer 220 until the resolution of instruction 3 is completed, and the scheduler 250 transmits instruction 4 to the memory 262 for execution. During the time that instruction 4 stalls transmission, execution of other instructions after instruction 4 is not affected. Compared with the scheme in the prior art, the scheme in the application can greatly improve the data security, and although the efficiency of instruction execution is reduced compared with the speculative out-of-order execution scheme in the prior art, the influence of performance reduction is far less than that of the scheme for defending against attacks in the background art.

In order to meet defense requirements of different levels, the defense system can be realized by setting different preset conditions.

In an alternative implementation, the predetermined condition includes a predetermined type of control dependency relationship with the first instruction. The preset type can be a control dependence of direct conditional branch jump, a control dependence of indirect conditional branch jump, and other types of control dependence.

For example, only the control dependence of the direct conditional branch jump is tracked, i.e., when the scheduler 250 determines that the resolution of the branch instruction on which the control dependence of the direct conditional branch jump exists is complete, the first instruction can be executed without waiting for all branch instructions having a control dependence on the first instruction to be resolved. In this case, variant 1 "boundary detection attack" which can defend against spectrum attack.

For another example, only the control dependence of the indirect conditional branch jump is tracked, that is, when the scheduler 250 receives the second instruction, if the second instruction is determined to be the second instruction meeting the preset condition, that is, if the second instruction is determined to be the branch instruction having the control dependence of the indirect conditional branch jump with the first instruction, whether the resolution of each branch instruction having the control dependence of the indirect conditional branch jump with the first instruction is completed or not is tracked. After scheduler 250 determines that the resolution of the branch instruction on which there is a control dependency of an indirect conditional branch jump to the first instruction is complete, the first instruction may be executed without waiting for all branch instructions having a control dependency to the first instruction to resolve. In this case, variant 2 "branch target injection" which may defend against spectrum attacks.

In order to meet the defense requirement of a higher level, the first instruction is further prevented from being executed by misspeculation, optionally, the preset condition is that a control dependency relationship exists between the first instruction and the second instruction, and in this case, the first instruction may be executed after the analysis of all the second instructions having the control dependency relationship with the first instruction is completed. Compared with the former implementation, in the embodiment, it is necessary to determine whether all the second instructions having the control dependency relationship with the first instruction are analyzed, and the number of the second instructions to be tracked is greater than the number of the second instructions only tracking the preset type, so that the effect of defending against the attack is better.

In this embodiment of the present application, an optional specific manner for implementing the foregoing fig. 4 may be to set a memory access Filter (Load Filter) in the scheduler 250, and implement defense against Cache latency side channel attack based on speculative memory access by starting a filtering function of the Load Filter, specifically, the filtering function corresponding to each defense level may be set when different defense levels need to be set, for example, the defense level may be adjusted as needed by reducing a branch instruction type tracked by the Load Filter in the embodiment of the present application. Of course, the filtering function of the Load Filter can be closed under the condition that the channel attack of the Cache time delay side based on the speculative access is not required to be defended, so that the second instruction which is dependent on the first instruction in control cannot be tracked, and any logic influence on the pipeline cannot be generated.

Optionally, based on step 403 in the foregoing embodiment, before determining that the parsing of each second instruction meeting the preset condition is completed, the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction meeting the preset condition, and determines the value of the second variable according to the sequence number of the unresolved second instruction in each second instruction meeting the preset condition or the value of the first variable. Then, the scheduler 250 determines whether each second instruction satisfying the preset condition is resolved according to the value of the first variable and the value of the second variable. Specifically, if the value of the first variable is smaller than the value of the second variable, the scheduler 250 determines that the resolution of each second instruction that satisfies the preset condition is completed. And if the value of the first variable is not less than that of the second variable, determining that each second instruction meeting the preset condition is not analyzed and completed. Optionally, the sequence number is determined according to a decoding order of each second instruction meeting a preset condition.

Through this optional embodiment, the scheduler 250 may explicitly determine whether the first instruction has a control dependency risk according to the value of the first variable and the value of the second variable, and through a small amount of determination logic before executing the first instruction, and may further ensure that the first instruction is executed under the condition that the first instruction does not have the control dependency risk, thereby effectively defending against a Cache latency side channel attack based on speculative access.

Optionally, the scheduler 250 determines a value of the first variable according to the sequence number of each second instruction meeting the preset condition, and specifically includes: the scheduler 250 determines, as the value of the first variable, the sequence number of the second instruction having the largest sequence number among the second instructions satisfying the preset condition. The scheduler 250 determines the value of the second variable according to the sequence number of the unresolved second instruction or the value of the first variable in each second instruction satisfying the preset condition, and the method includes any one of the following two manners:

in the first mode, the scheduler 250 determines, as the value of the second variable, the sequence number of the unresolved second instruction having the smallest sequence number among the second instructions satisfying the preset condition.

In the second method, if the scheduler 250 determines that there is no unresolved second instruction among the second instructions satisfying the preset condition, the value of the first variable is incremented by one to be used as the value of the second variable.

If the unresolved second instruction exists in the second instructions that satisfy the preset condition, the determining module 320 in the scheduler 250 determines the value of the second variable according to the first mode, and if the unresolved second instruction does not exist in the second instructions that satisfy the preset condition, the determining module 320 in the scheduler 250 determines the value of the second variable according to the second mode. Through the above embodiment, the determining module 320 in the scheduler 250 may accurately determine whether the second instructions meeting the preset condition are resolved by comparing the values of the first variable and the second variable, and then execute the first instruction after the second instructions meeting the preset condition are resolved, so as to implement effective defense against Cache time delay side channel attack based on speculative access.

Optionally, based on the first manner, the determining, by the scheduler 250, a sequence number of an unresolved second instruction with a smallest sequence number in the second instructions that meet the preset condition as a value of the second variable specifically includes: the determination module 320 in the scheduler 250 determines, after the second instruction having the smallest sequence number and not having a resolution, the sequence number of the second instruction having a smallest sequence number and not having a resolution, which is not a resolution, as the value of the second variable. Therefore, the value of the second variable is dynamically updated, so that the value of the second variable always indicates the second instruction which has the smallest sequence number and is not analyzed in the second instructions meeting the preset condition, and the accuracy of determining whether the second instructions meeting the preset condition are analyzed is ensured.

In the embodiment of the present application, tracking the resolution status of each second instruction may be implemented by adding two globally-accessed registers to a processor that speculatively executes out-of-order, where the resolution status includes resolved and unresolved. The following takes the first instruction as a Load instruction and the second instruction as a branch instruction as an example, and details will be described with reference to the specific embodiment.

To implement the scheme in the present application, the pipeline logic added at each pipeline stage of the processing instruction is shown in table 1 below.

TABLE 1

As shown in Table 1, the pipeline includes eight pipeline segments, namely, Fetch (Fetch), Decode (Decode), Rename (Rename), distribute (Dispatch), Issue (Issue), execute (Execution), writeback (WrBack), and Commit (Commit).

First, two globally accessed registers are added to processor 120 for speculative out-of-order execution, with the roles:

a first register that maintains a first variable recording a latest dispatched branch sequence number (LBrSN) at the dispatch pipeline section, the value of LBrSN being incremented by 1 whenever a branch instruction is dispatched.

The second register maintains a second variable in the execution pipeline stage that records the oldest and unresolved branch sequence number (NRBrSN). If the branch instruction buffer (BOB) is empty, or if all branch instructions that satisfy the predetermined condition have been resolved, then the value of NRBrSN is equal to the value of LBrSN plus one; if a branch instruction meeting the predetermined condition is found in the BOB unresolved, the NRBrSN points to the oldest unresolved branch number.

In particular applications, the NRBrSN value may also be updated for multiple cycles as the branch instruction resolution is completed.

For example, if the branch instruction currently pointed to by NRBrSN is resolved, such as in the current cycle, then look back along the BOB in sequence until the next unresolved branch instruction is found. If the next unresolved branch instruction is far away, e.g., across multiple BOB entries (entries), it may not be possible to find the next unresolved branch instruction in the current cycle, e.g., the 20 th branch instruction in the BOB has been looked up at the end of the current cycle and the branch instruction has been resolved, then the next cycle continues looking backward in the BOB from the 21 st branch instruction until an unresolved branch instruction is found or the BOB _ end is reached.

If during the multiple cycles of finding unresolved branch instructions, NRBrSN does not point to an unresolved branch instruction, no functional error is introduced, and the corresponding Load instruction is delayed for several cycles to be issued, i.e., it is necessary to wait until all branch instructions associated with the Load instruction have been resolved and then executed.

Second, as in table 1, for each Load instruction, logic is added to maintain load.brsn for the Load instruction in the dispatch pipeline segment, and the value of LBrSN is assigned to load.brsn.

If the value of Load. BrSN corresponding to the Load instruction is less than NRBrSN, it indicates that each branch instruction having a control dependency relationship with the Load instruction has been resolved, and there is no control dependency risk when executing the Load instruction.

Thirdly, in the sending water flow segment, adding an explicit judgment whether the control dependence of the Load instruction is resolved or not for the sending logic of the Load instruction, namely the value of Load. BrSN of the Load instruction is smaller than the value of NRBrSN, and determining that the control dependence of the Load instruction is resolved.

The launch logic of the Load instruction in the prior art is as follows: the Load instruction is issued when the data dependency has resolved, i.e., the register dependency and the write-read dependency are satisfied.

In the embodiment of the present application, the issue logic of the Load instruction needs to satisfy that the data dependency is resolved and also needs to satisfy that the control dependency is resolved. Transmit logic as in table 1: if (reg _ dep _ ok & mem dep ok & load. BrSN < NRBrSN) issue load.

Thus, the speculative access of the Load has no control-dependent risk enough to resist speculative access-based Cache latency side channel attacks such as spectrum attacks. Moreover, when the pipeline is emptied, for example, the pipeline is emptied due to branch misprediction, the LBrSN, NRBrSN and load.

A specific example is provided below in conjunction with table 1 and the above examples.

For example, suppose that when a Load instruction a is dispatched, 3 instructions dispatched before the Load instruction a are, in the dispatch order: branch instruction 1, branch instruction 2, and branch instruction 3, wherein branch instruction 1 is resolved, branch instruction 2 is unresolved, and branch instruction 3 is resolved.

The update of the value of LBrSN is as follows: when branch instruction 1 enters the dispatch flow segment, LBrSN is 1; when branch instruction 2 enters the dispatch flow segment, LBrSN is 2; when branch instruction 3 enters the dispatch flow segment, LBrSN is 3; when the access instruction A enters the distribution pipeline segment, the load is 3.

The update of the value of NRBrSN is as follows:

when Load instruction a is dispatched, branch instruction 2 is not resolved since branch instruction 1 was resolved, branch instruction 3 was resolved, and NRBrSN is 2. Therefore, the control dependence of the Load instruction A can be determined to be unresolved directly according to the fact that the value of BrSN is larger than the value of NRBrSN, and the Load instruction A is suspended from being executed.

When the branch instruction 2 is resolved, since the branch instruction 3 is resolved and there is no branch instruction having a control dependency relationship with the Load instruction a after the branch instruction 3, NRBrSN is 4. Therefore, the control dependence of the Load instruction A can be determined to be analyzed directly according to the fact that the load.BrSN is smaller than the load.BrSN, and the Load instruction A is executed.

Through the embodiment, only a small amount of pipeline logic is required to be added in the processor for out-of-order execution of speculation, a programmer and a compiler are not required to be relied on to recompile a program, and a Branch Target Buffer (BTB) is not required to be cleared in a software switching process, so that compared with the scheme provided in the prior art, the method and the device can effectively defend Cache time delay side channel attack based on speculation access, and can remarkably reduce performance loss.

In this application, the processor may also be a chip, and the chip is connected to the memory and is configured to read and execute the software program stored in the memory, so as to implement the execution method in any of the embodiments.

Embodiments of the present application also provide a computer storage medium for storing computer software instructions for executing the instructions, which includes program code designed to perform the method embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device), computer-readable storage medium, or computer program product. Accordingly, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "module" or "system.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the invention has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the invention. Accordingly, the specification and figures are merely exemplary of the invention as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

An instruction execution method, comprising:

acquiring a first instruction; the first instruction is an instruction for reading a memory;

determining each second instruction meeting the preset condition; the second instruction is an out-of-order executed branch instruction;

and executing the first instruction after determining that the analysis of each second instruction meeting the preset condition is completed.
The method of claim 1, wherein the preset condition comprises a presence of a control dependency with the first instruction.
The method of claim 2, wherein the preset condition comprises a preset type of control dependency existing with the first instruction.
The method of claim 1, wherein before determining that the parsing of each second instruction satisfying the preset condition is completed, the method further comprises:

determining the value of a first variable according to the serial numbers of the second instructions meeting the preset condition;

determining the value of a second variable according to the sequence number of an unresolved second instruction in each second instruction meeting the preset condition or the value of the first variable;

the determining that the analysis of each second instruction meeting the preset condition is completed includes:

and if the value of the first variable is smaller than that of the second variable, determining that the analysis of each second instruction meeting the preset condition is completed.
The method of claim 4, wherein the determining the value of the first variable according to the sequence number of each second instruction meeting the preset condition comprises:

determining the serial number of the second instruction with the largest serial number in the second instructions meeting the preset conditions as the value of the first variable;

the determining a value of a second variable according to the sequence number of the unresolved second instruction in the second instructions meeting the preset condition or the value of the first variable includes:

determining the serial number of the unresolved second instruction with the smallest serial number in the second instructions meeting the preset condition as the value of a second variable; or, if it is determined that there is no unresolved second instruction in the second instructions that satisfy the preset condition, adding one to the value of the first variable to serve as the value of the second variable.
The method according to claim 5, wherein the determining, as the value of the second variable, the sequence number of the unresolved second instruction having the smallest sequence number in the second instructions that satisfy the preset condition comprises:

and after the second instruction with the minimum sequence number and the unresolved sequence number is resolved, taking the sequence number of the unresolved second instruction positioned after the second instruction with the minimum sequence number and the unresolved sequence number as the value of the second variable.
An instruction execution device is characterized by comprising an instruction fetcher, a decoder, a pre-execution buffer, a dispatcher and an executor;

the instruction fetching device is used for fetching an instruction from the instruction cache;

the decoder is used for decoding the instruction to obtain a decoding result of the instruction; the decode result comprises an instruction type;

the pre-execution buffer is used for storing the instruction and a decoding result of the instruction;

the scheduler is configured to:

if a first instruction is fetched from the pre-execution buffer: determining second instructions meeting preset conditions, and after determining that the second instructions meeting the preset conditions are analyzed, sending the first instructions to an actuator; the first instruction is an instruction for reading a memory; the second instruction is an out-of-order executed branch instruction;

if a second instruction is acquired from the pre-execution buffer, sending the second instruction to the executor;

the executor is used for executing the first instruction and the second instruction.
The apparatus of claim 7, wherein the preset condition comprises a presence of a control dependency with the first instruction.
The apparatus of claim 7, wherein the preset condition comprises a preset type of control dependency existing with the first instruction.
The apparatus of claim 7, wherein the scheduler is further configured to:

determining the value of a first variable according to the serial numbers of the second instructions meeting the preset condition;

determining the value of a second variable according to the sequence number of an unresolved second instruction in each second instruction meeting the preset condition or the value of the first variable;

and if the value of the first variable is smaller than that of the second variable, determining that the analysis of each second instruction meeting the preset condition is completed.
The apparatus of claim 10, wherein the scheduler is specifically configured to:

determining the serial number of the second instruction with the largest serial number in the second instructions meeting the preset conditions as the value of the first variable;

determining the serial number of the unresolved second instruction with the smallest serial number in the second instructions meeting the preset condition as the value of a second variable; or, if it is determined that there is no unresolved second instruction in the second instructions that satisfy the preset condition, adding one to the value of the first variable to serve as the value of the second variable.
The apparatus of claim 11, wherein the scheduler is specifically configured to:

and after the second instruction with the minimum sequence number and the unresolved sequence number is resolved, taking the sequence number of the unresolved second instruction positioned after the second instruction with the minimum sequence number and the unresolved sequence number as the value of the second variable.
A chip, characterized in that it is connected to a memory for reading and executing a software program stored in said memory for implementing the method according to any one of claims 1 to 6.
A computer storage medium comprising computer readable instructions which, when read and executed by a computer, cause the computer to perform the method of any one of claims 1 to 6.