WO2019200618A1 - Instruction execution method and device - Google Patents

Instruction execution method and device Download PDF

Info

Publication number
WO2019200618A1
WO2019200618A1 PCT/CN2018/083991 CN2018083991W WO2019200618A1 WO 2019200618 A1 WO2019200618 A1 WO 2019200618A1 CN 2018083991 W CN2018083991 W CN 2018083991W WO 2019200618 A1 WO2019200618 A1 WO 2019200618A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
value
variable
preset condition
determining
Prior art date
Application number
PCT/CN2018/083991
Other languages
French (fr)
Chinese (zh)
Inventor
李国柱
孙涛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/083991 priority Critical patent/WO2019200618A1/en
Priority to CN201880091562.8A priority patent/CN111936968A/en
Publication of WO2019200618A1 publication Critical patent/WO2019200618A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present application relates to the field of communications, and in particular, to an instruction execution method and apparatus.
  • out-of-order execution is the basic technology for developing processor-level parallelism.
  • the correct execution of any instruction must satisfy two points: the first point, the control dependency is satisfied, that is, the instruction is on the correct branch path; the second point, the data dependency is satisfied, that is, the source operand of the instruction is correctly obtained.
  • out-of-order execution means that if the control dependency of an instruction has not been resolved, the instruction is speculatively executed as long as it is determined that the data dependency is satisfied. Of course, if a speculation is wrong, the instruction may be on the wrong branch path. Thereafter, when the processor detects a speculative error, the instruction for mispredicting execution is revoked and re-executed from the correct path, thereby ensuring correct program semantics.
  • mispredicting the executed instruction does not change the architectural visible state of any structure definition, but changes the micro-architural state, such as physical registers (physical). Register file) and Cache, and this microstructural state change will not be undone with the revocation of the instruction that was speculatively executed.
  • any microstructure state is not used by the software and is invisible, it is misunderstood that the microstructural state change caused by the execution does not cause a software operation error.
  • a method called "Cache Delay Side Channel Attack” can accurately detect the change of Cache micro-structure state, resulting in a very threatening against existing processors.
  • This kind of attack means such as the Spectre attack, mainly uses the fetching execution of the memory access instruction to implement the attack. In principle, it is roughly divided into two steps: the first step is to construct a branch mispredicted scene so that it is on the wrong branch path. The speculative execution of the Load instruction accesses the data of the protected area; the second step is to use the data to construct the address index to access the Cache, which will cause the Cache microstructure state to change (for example, the corresponding cache line access changes from miss to hit), and then This change is detected by the Cache Delay Side channel, thereby stealing protected data content. Processors currently using speculative out-of-order execution techniques are generally incapable of defending against Cache latency side channel attacks based on speculative memory access.
  • the branch target buffer (branch) is explicitly cleared during process switching by system software, such as an operating system (OS).
  • OS operating system
  • Target buffer, BTB prevents attacks, but this approach can greatly impair system performance.
  • the programmer in order to recompile and generate a binary code with a branch barrier feature for a specific sensitive code segment, since the compiler has not yet released a code for automatically identifying sensitive code, the programmer needs to modify the source code to explicitly indicate Sensitive code segments, but it is difficult to ensure that all programs, all sensitive code segments within the program are recognized and recompiled.
  • the prior art adopts an emergency avoidance scheme, which has two problems of completeness of defense and performance loss, and cannot effectively prevent Cache delay side channel attacks based on speculative memory access.
  • the embodiment of the present invention provides an instruction execution method and apparatus, which are used for effectively defending a Cache delay side channel attack based on a speculative memory access.
  • the first aspect provides an instruction execution method, in which the instruction execution device acquires a first instruction, and determines each second instruction that satisfies a preset condition, where the first instruction is an instruction to read a memory, and the second instruction is a disorder The branch instruction executed in sequence. After determining that the second instructions satisfying the preset condition are analyzed, the first instruction is executed. Since the first instruction is not speculatively executed after the second instruction that satisfies the preset condition is completed, the Cache microstructure state change caused by the misprediction of executing the first instruction does not occur, so the attacker cannot pass the attack. The Cache micro-structure state changes the data of the protected area, so that the Cache delay-side channel attack based on the speculative memory can be effectively defended.
  • the preset condition includes a preset type of control dependency relationship with the first instruction, in which case, after the second instruction with the preset type of control dependency of the first instruction is parsed, The first instruction can be executed without waiting for the second instruction having a control dependency with the first instruction to be parsed.
  • the preset condition is that there is a control dependency relationship with the first instruction. In this case, it is necessary to wait for a control dependency relationship with the first instruction. The first instruction is executed after each second instruction is parsed.
  • the method before determining that the second instructions satisfying the preset condition are completed, the method further includes: determining, according to the sequence number of each second instruction that meets the preset condition, a value of the first variable; The sequence number of the second instruction or the value of the first variable that is not resolved in each of the second instructions determines the value of the second variable. Further, the determining, that the second instruction that meets the preset condition is parsed, comprises: if the value of the first variable is less than the value of the second variable, determining that the second instruction parsing that meets the preset condition is completed. If the value of the first variable is greater than or equal to the value of the second variable, it is determined that each second instruction that satisfies the preset condition is not parsed.
  • the determining, according to the sequence number of each second instruction that meets the preset condition, determining a value of the first variable includes: determining that the content is satisfied a serial number of the second instruction having the largest serial number among the second instructions of the preset condition, as a value of the first variable; the serial number or the second instruction unresolved according to the second instruction satisfying the preset condition Determining the value of the first variable, determining the value of the second variable, comprising: determining a sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition, as a value of the second variable; or If it is determined that there is no unresolved second instruction in each of the second instructions that satisfy the preset condition, the value of the first variable is incremented by one as the value of the second variable.
  • the first instruction is executed to implement effective defense against the Cache delay side channel attack based on the speculative memory access.
  • the determining, by the second instruction that meets the preset condition, the sequence number of the second instruction having the smallest serial number and the unresolved, as the value of the second variable including: After parsing the unparsed second instruction, the sequence number of the unresolved second instruction after the second instruction having the smallest and unresolved sequence number is used as the value of the second variable.
  • the value of the second variable is dynamically updated, so that the value of the second variable always indicates the second instruction having the smallest serial number and the unresolved in each second instruction that satisfies the preset condition, thereby ensuring that each of the preset conditions is satisfied. Whether the second instruction resolves the accuracy of the completion.
  • an instruction execution apparatus including an interceptor, a decoder, a pre-execution buffer, a scheduler, and an executor; the fetcher is configured to fetch an instruction in an instruction cache; For decoding the instruction, obtaining a decoding result of the instruction; the decoding result includes an instruction type; the pre-execution buffer is configured to store the decoding result of the instruction and the instruction;
  • the scheduler is configured to: if the first instruction is obtained from the pre-execution buffer, determine: each second instruction that meets a preset condition, and determine, in determining the second instruction that meets the preset condition After completion, the first instruction is sent to an executor; the first instruction is an instruction to read a memory; the second instruction is a branch instruction executed out of order; if the first instruction is obtained from the pre-execution buffer The second instruction sends the second instruction to the executor; the executor is configured to execute the first instruction and the second instruction.
  • the instruction execution device can be at least one processing element or chip.
  • the preset condition includes a control dependency relationship with the first instruction.
  • the preset condition includes a preset type of control dependency with the first instruction.
  • the scheduler is further configured to: determine a value of the first variable according to the sequence number of each second instruction that meets the preset condition; according to the second instruction that meets the preset condition Determining the sequence number of the second instruction or the value of the first variable, determining a value of the second variable; if the value of the first variable is less than the value of the second variable, determining that the predetermined condition is met The second instruction analysis is completed.
  • the scheduler is specifically configured to: determine a sequence number of the second instruction having the largest serial number among the second instructions that meet the preset condition, as a value of the first variable; and determine that the content is satisfied a sequence number of the second instruction having the smallest and unresolved number among the second instructions of the preset condition, as the value of the second variable; or, if it is determined that the second instruction satisfying the preset condition does not exist in the unresolved
  • the second instruction adds one to the value of the first variable as the value of the second variable.
  • the scheduler is specifically configured to: after the second instruction with the smallest and unresolved sequence number, parse the unresolved second after the second instruction with the smallest and unresolved sequence number The sequence number of the instruction as the value of the second variable.
  • a chip is provided, the chip being coupled to a memory for reading and executing a software program stored in the memory to implement the method according to the first aspect or any of the possible designs above method.
  • a readable storage medium is provided, the instructions being stored in the readable storage medium, when executed on a computer, causing the computer to perform the above first aspect or any of the above possible designs Methods.
  • FIG. 1 is a schematic diagram of a system architecture of a network device according to an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of a possible processor according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a possible scheduler according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a method for executing an instruction according to an embodiment of the present application.
  • the present invention provides a method for executing an instruction, which can be applied to a system architecture of a network device as shown in FIG. 1.
  • FIG. 1 a system architecture of a network device provided by an embodiment of the present application.
  • the system architecture 100 includes a memory 110, a processor 120, and a communication interface 130; wherein the memory 110, the processor 120, and the communication interface 130 are connected to each other.
  • the memory 110 may include a volatile memory such as a random-access memory (RAM); the memory may also include a non-volatile memory such as a flash memory.
  • RAM random-access memory
  • non-volatile memory such as a flash memory.
  • HDD hard disk drive
  • SSD solid-state drive
  • the memory 110 may also include a combination of the above types of memories.
  • the processor 120 can be a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP.
  • the processor 120 may further include a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • the PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof.
  • the communication interface 130 can be a wired communication access port, a wireless communication interface, or a combination thereof, wherein the wired communication interface can be, for example, an Ethernet interface.
  • the Ethernet interface can be an optical interface, an electrical interface, or a combination thereof.
  • the wireless communication interface can be a WLAN interface.
  • the instruction execution method provided by the present application may be implemented by some components in the processor 120.
  • the processor 120 shown in FIG. 1 above may include various components. Based on the system architecture shown in FIG. 1 , with reference to FIG. 2 , a schematic structural diagram of a possible processor provided by an embodiment of the present application is provided. As shown in FIG. 2, the processor 120 includes an instruction cache (Icache) 210, a pre-execution buffer 220, a fetcher 230, a decoder 240, a scheduler 250, an executor 260, and an intermediate register 270. The intermediate register 270 is used to store the result of the speculatively executed instruction.
  • Icache instruction cache
  • the instruction cache 210 stores various types of instructions, such as branch instructions, memory access instructions, and other types of instructions.
  • the fetcher 230 fetches the instructions in order from the instruction cache 210 and passes them to the decoder 240.
  • the decoder 240 obtains the type of each instruction and then stores the instruction and its decoding information in the pre-execution buffer 220. For example, the decoder 240 decodes the four instructions acquired in order, and the decoding results are: the first instruction is a branch instruction, the second instruction is a fetch instruction, and the third instruction is a branch.
  • the instruction, the fourth instruction is the fetch instruction.
  • the scheduler 250 can retrieve the instructions and their types from the pre-execution buffer 220. After the instruction is fetched, the scheduler 250 can also send each instruction to the executor 260 that executes the instruction. As shown in FIG. 1, the executor 260 includes other pipelines 261 and a memory 262.
  • the memory access device 262 includes a memory access instruction queue 263. In a specific application, the memory access instruction includes two types: a load instruction and a store instruction. Therefore, the memory access instruction queue may be referred to as a load/store queue.
  • the branch instruction is sent to another pipeline 261 for execution. If the instruction acquired by the scheduler 250 is a memory access instruction, the memory access instruction is sent to the memory buffer 262 for execution.
  • the memory access instructions Since there are control dependencies and data dependencies between the instructions, if the data dependencies are not met, the memory access instructions will not be executed. Therefore, the default data dependency of this application is satisfied. If only the control dependencies are considered, if one is accessed, The control of the instruction depends on unresolved, and the Cache delay side channel attack based on the speculative memory access may occur during the speculative out-of-order execution. In order to avoid such an attack, in the embodiment provided by the present application, if the instruction acquired by the scheduler 250 is a memory access instruction, the memory is fetched after the branch instruction of the memory access instruction existence control is completed. The instructions are sent to the memory 262 for execution.
  • the memory access instruction is first stored in the pre-execution buffer 220 until there is control dependency with the memory access instruction.
  • the branch instruction is parsed, and the memory access instruction is dispatched from the pre-execution buffer 220 and transmitted to the memory buffer 262 for execution.
  • the memory 262 may store the memory access instruction in the Load/Store queue after receiving the memory access instruction, and execute each memory access instruction in an out-of-order manner when both the data dependency and the control dependency are satisfied.
  • the processing of the above-mentioned fetcher 230, decoder 240, scheduler 250, executor 260, etc. forms a pipeline including a plurality of pipelines, the pipeline includes fetching, decoding, scheduling, and The equal-flow segment is executed, wherein the finger-trigger 230 corresponds to the finger-flow segment, the decoder 240 corresponds to the decoded pipeline segment, the scheduler 250 corresponds to the distribution pipeline segment, and the actuator 260 corresponds to the execution pipeline segment.
  • the pipeline may also include one or more of a water flow segment (not shown in FIG. 1 for renaming, transmitting, writing back, commanding, etc.) as shown in FIG.
  • the instruction execution method is executed by the scheduler 250 in the processor 120.
  • FIG. 3 it is a schematic structural diagram of a scheduler provided by an embodiment of the present application.
  • the scheduler 250 includes an acquisition module 310, a determination module 320, and an execution module 330. among them:
  • the obtaining module 310 is configured to obtain an instruction from the pre-execution buffer 220.
  • the acquired instruction includes a branch instruction, a memory access instruction, and other types of instructions.
  • the determining module 320 is configured to determine, after the obtaining module 310 acquires the first instruction, a second instruction that meets the preset condition.
  • the first instruction is an instruction to read the memory
  • the second instruction is a branch instruction executed out of order.
  • the executing module 330 is configured to execute the first instruction after the scheduler 250 determines that the second instructions satisfying the preset condition are parsed.
  • FIG. 4 exemplarily shows a flow of an instruction execution method provided by the present application.
  • the process specifically includes:
  • step 401 the scheduler 250 acquires a first instruction, and the first instruction is an instruction to read the memory.
  • the acquiring module 310 in the scheduler 250 acquires the first instruction, and the first instruction may also be referred to as a load instruction, that is, a load instruction.
  • Step 402 the scheduler 250 determines each second instruction that satisfies the preset condition, and the second instruction is a branch instruction that is executed out of order.
  • Each second instruction is scheduled to be out of sequence in the other pipelines 261 by the scheduler 250, and during execution of the second instructions, the scheduler 250 continues to schedule subsequent instructions to the scheduler 250.
  • the second instructions in the other pipelines 261 can be executed out of order, and need not be executed in the instruction fetching order.
  • the three second instructions are respectively the second instruction 1 in the fetching decoding order.
  • the two instructions 2 and the second instruction 3 are executed in other pipelines 261 as follows. For example, if the second instruction 1 is not executed, the second instruction 2 and the second instruction 3 may be executed first.
  • Step 403 after the scheduler 250 determines that the second instructions satisfying the preset condition are analyzed, the first instruction is executed.
  • the determination module 320 in the scheduler 250 is required to determine whether each second instruction that satisfies the preset condition is parsed, and if so, the first instruction is executed, and the execution action refers to scheduling
  • the execution module 330 in the device 250 transmits the first instruction to the memory accessor 262; if not, that is, there is an unresolved second instruction, and then it is necessary to wait for the second instruction that satisfies the preset condition to be analyzed before executing the First instruction.
  • the second instruction that needs to meet the preset condition is parsed, and then the first instruction is executed. If a plurality of first instructions need to be executed, the plurality of first instructions may be executed out of order. For example, for example, each of the second instructions that meet the preset condition is sequentially followed by the scheduler 250.
  • the first instruction A, the first instruction B, and the first instruction C after the first instruction A is transmitted to the memory 262, the first instruction B is not parsed, and the first instruction C is parsed, then the first instruction is executed first.
  • the instruction C after the first instruction B, executes the first instruction B.
  • the Cache microstructure state caused by the mispredicting execution of the first instruction does not occur. Therefore, the attacker cannot steal the data of the protected area through the Cache micro-structure state change, so that the Cache delay-side channel attack based on the speculative memory can be effectively prevented.
  • the preset condition includes a control dependency relationship with the first instruction. That is, after the scheduler determines that the second instructions having the control dependency relationship with the first instruction are completed, the first instruction is executed.
  • the first instruction is executed or not needs to satisfy the condition that the control dependency has been resolved needs to satisfy the condition that the data dependency has been resolved.
  • the first instruction cannot be executed because the data of the first instruction is unresolved, and the first instruction cannot be executed. Therefore, in the embodiment of the present application, when the first instruction is executed by default, the content is satisfied. The data depends on the condition that has been resolved.
  • the instruction cache includes four instructions, which are instruction 1, instruction 2, instruction 3, and instruction 4.
  • instruction 1 is a branch instruction
  • instruction 2 is a load instruction
  • instruction 3 is a branch instruction
  • instruction 4 is a load instruction
  • the above four instructions sequentially enter the scheduler 250.
  • the scheduler 250 receives the instruction 1
  • the instruction 1 is sent to the other pipelines 261 for execution.
  • the scheduler 250 receives the instruction 2 it is found that the instruction 1 has been parsed at this time, that is, the instruction 1 has been executed in the other pipelines 261, and the execution result is obtained, at which time the instruction 2 is transmitted to the memory 262.
  • the scheduler 250 receives the instruction 3 it is sent to the other pipeline 261 for execution.
  • the scheduler 250 When the scheduler 250 receives the instruction 4, it finds that the instruction 3 is unresolved, that is, the control of the instruction 4 is unresolved at this time, then how to process the instruction 4 The following describes the difference between the solution adopted in this application and the solution adopted in the prior art:
  • the control of the instruction 4 depends on the unresolved, the scheduler 250 directly transmits the instruction 4 to the memory 262, and if the instruction 4 is mispresumed, the microstructural state in the cache is changed.
  • the attacker can find internal data through the changed microstructure state, resulting in internal data leakage and data security risks.
  • the control of the instruction 4 depends on unresolved, and the scheduler 250 suspends the transmission of the instruction 4.
  • the instruction 4 can be stored in the pre-execution buffer 220 until the instruction 3 is parsed, and the scheduler 250 The retransmission instruction 4 is executed in the memory 262.
  • the execution of other instructions after instruction 4 is not affected.
  • the solution of the present application can greatly improve data security, although the speculative execution scheme relative to the prior art can reduce the efficiency of instruction execution, but the impact of this performance degradation is much smaller than the background. The scheme of defense attacks given in the technology.
  • the preset condition includes a preset type of control dependency relationship with the first instruction.
  • the preset type may be a control dependency of a direct conditional branch jump, a control dependency of an indirect conditional branch jump, and may also be a dependency of other types of control.
  • the first execution can be performed.
  • An instruction does not need to wait for all branch instructions that have control dependencies on the first instruction to be parsed. In this case, you can defend against the variant 1 "boundary detection attack" of the Spectre attack.
  • the second instruction is determined to be The branch instruction that is dependent on the control of the indirect conditional branch jump with the first instruction will track whether each branch instruction that has an indirect conditional branch jump with the first instruction is parsed or not.
  • the scheduler 250 determines that the branch instruction of the control dependent branch with the first instruction has an indirect conditional branch jump is completed, the first instruction can be executed without waiting for all branch instructions having a control dependency with the first instruction. Both are parsed. In this case, variant 2 "branch target injection" of the Spectre attack can be defended.
  • the preset condition is that there is a control dependency relationship with the first instruction. In this case, it is necessary to wait for a control dependency relationship with the first instruction. After all the second instructions have been parsed, the first instruction can be executed. Compared with the previous implementation manner, in this embodiment, it is required to determine whether all the second instructions having the control dependency relationship with the first instruction are parsed, and the number of the second instructions that need to be tracked is only the second of the preset type. The number of instructions is more, so the defense attack is better.
  • an optional implementation manner of the foregoing FIG. 4 may be: setting a load filter in the scheduler 250, and implementing a filtering function of the load filter to implement defense based on the speculative memory access. Cache delay side channel attack. Specifically, when a different defense level needs to be set, the filtering function corresponding to each defense level can be set. For example, by reducing the type of branch instruction tracked by the Load Filter in the embodiment of the present application, Adjust the defense level.
  • the filtering function of the Load Filter is turned off, so that the second instruction that is dependent on the first instruction is not tracked, and the second instruction is not The pipeline has any logical impact.
  • the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction that satisfies the preset condition, according to The sequence number of the second instruction or the value of the first variable that is not resolved in each second instruction that satisfies the preset condition determines the value of the second variable. Then, the scheduler 250 determines whether each of the second instructions satisfying the preset condition is parsed based on the value of the first variable and the value of the second variable.
  • the scheduler 250 determines that the second instruction parsing that satisfies the preset condition is completed. If the value of the first variable is not less than the value of the second variable, it is determined that each second instruction that satisfies the preset condition is not parsed.
  • the sequence number is determined according to a decoding sequence of each second instruction that satisfies a preset condition.
  • the scheduler 250 can explicitly determine whether the first instruction has a control dependency risk based on the value of the first variable and the value of the second variable, and a small amount of decision logic before executing the first instruction.
  • the first instruction can be executed in the case that the first instruction does not have a control dependency risk, thereby effectively preventing the Cache delay side channel attack based on the speculative memory access.
  • the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction that meets the preset condition, and specifically includes: the scheduler 250 determines that the second serial number of each second instruction that meets the preset condition is the largest. The sequence number of the instruction as the value of the first variable. The scheduler 250 determines the value of the second variable according to the sequence number of the unresolved second instruction or the value of the first variable in each second instruction that satisfies the preset condition, and includes any one of the following two methods:
  • the scheduler 250 determines the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition as the value of the second variable.
  • the scheduler 250 determines that there is no unresolved second instruction in each second instruction that satisfies the preset condition, the value of the first variable is incremented by one as the value of the second variable.
  • the determining module 320 in the scheduler 250 determines the value of the second variable according to the mode 1, if the second instruction that meets the preset condition does not There is an unresolved second instruction, and the determination module 320 in the scheduler 250 determines the value of the second variable according to mode two.
  • the determining module 320 in the scheduler 250 can accurately determine whether the second instructions satisfying the preset condition are parsed by comparing the value of the first variable and the value of the second variable, and then satisfy After the second instruction of the preset condition is parsed, the first instruction is executed to implement effective defense against the Cache delay side channel attack based on the speculative memory access.
  • the scheduler 250 determines the sequence number of the second instruction that is the smallest and unresolved in the second instruction that meets the preset condition, and the value of the second variable is specifically included in the scheduler 250.
  • the determining module 320 takes the sequence number of the unresolved second instruction after the second instruction with the smallest and unresolved sequence number as the value of the second variable after parsing the second instruction with the smallest and unresolved sequence number.
  • the value of the second variable always indicates the second instruction with the smallest serial number and the unresolved in each second instruction that satisfies the preset condition, thereby ensuring that each of the preset conditions is satisfied. Whether the second instruction resolves the accuracy of the completion.
  • the parsing state of each second instruction may be tracked by adding two globally accessed registers to the processor that is inferred to perform out-of-order execution, and the parsing state includes parsed and unparsed.
  • the following takes the first instruction as the Load instruction and the second instruction as the branch instruction as an example, and is described in detail in conjunction with the specific embodiment.
  • the pipeline includes Fetch, Decode, Rename, Dispatch, Issue, Execution, WrBack, and Command ( Commit) and other eight flow sections.
  • the first register maintains a first variable in the distribution pipeline segment, the first variable records the latest dispatched branch sequence number (LBrSN), and the value of LBrSN is incremented by one each time a branch instruction is dispatched.
  • LBrSN latest dispatched branch sequence number
  • the second register maintains a second variable in the execution pipeline segment, the second variable recording the oldest un-resolve branch sequence number (NRBrSN). If the branch order buffer (BOB) is empty, or the branch instruction that satisfies the preset condition has been resolved, the value of NRBrSN is equal to the value of LBrSN plus one; if a branch instruction meeting the preset condition is found in the BOB Unresolved, the NRBrSN points to the oldest and unresolved branch number.
  • BOB branch order buffer
  • the NRBrSN value will continue to be updated for multiple cycles.
  • the branch instruction currently pointed to by the NRBrSN has been parsed, it is searched backwards along the BOB until the next unresolved branch instruction is found. If the next unresolved branch instruction is far away, such as spanning multiple BOB entries, the next unresolved branch instruction may not be found in the current cycle, such as the BOB in the end of the current cycle. 20 branch instructions, and the branch instruction has been parsed, then the next cycle continues to search backwards from the 21st branch instruction in the BOB until an unresolved branch instruction is found or BOB_end is reached.
  • the value of the load.BrSN corresponding to the Load instruction is less than NRBrSN, it indicates that each branch instruction having a control dependency relationship with the Load instruction has been parsed, and there is no control dependency risk when the Load instruction is executed.
  • an explicit control dependency is determined for the transmit logic of the Load instruction, that is, the load.BrSN value of the Load command is less than the value of the NRBrSN, and the control dependency of the Load command is determined to be resolved. .
  • the transmission logic of the Load instruction in the prior art is: the data dependency is resolved, that is, the register dependency and the write read dependency are satisfied, and the Load instruction is transmitted.
  • the transmission logic of the Load instruction needs to satisfy the data dependency to be resolved, and the control dependency needs to be resolved.
  • the launch logic in Table 1 is: if(reg_dep_ok & mem dep ok& load.BrSN ⁇ NRBrSN) issue load.
  • the speculative access of the Load has no control dependency risk, and is sufficient to defend against the Cache latency side channel attack based on speculative memory access such as Spectre attack.
  • the pipeline is emptied, such as the pipeline due to branch misprediction, the LBrSN, NRBrSN, and load.BrSN do not require additional recovery logic and do not affect the processing efficiency of the instruction.
  • branch instruction 1 when the load instruction A is dispatched, the three instructions distributed before the load instruction A are in the order of distribution: branch instruction 1, branch instruction 2, and branch instruction 3, wherein the branch instruction 1 has Analysis, branch instruction 2 is not parsed, branch instruction 3 has been parsed.
  • the control of the load command A may be determined to be unresolved and the load command A is suspended, according to the value of the load.BrSN being greater than the value of the NRBrSN.
  • the NRBrSN 4. Therefore, it can be determined that the control dependency of the load instruction A has been resolved according to load.BrSN is smaller than load.BrSN, and the load instruction A is executed.
  • the processor may also be a chip, and the chip is connected to the memory for reading and executing the software program stored in the memory to implement the execution method in any of the foregoing embodiments.
  • Embodiments of the present application also provide a computer storage medium for storing computer software instructions for execution of the above instructions, including program code for performing the above method embodiments.
  • embodiments of the present application can be provided as a method, apparatus (device), computer readable storage medium, or computer program product.
  • the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware aspects, which are collectively referred to herein as "module” or "system.”
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

An instruction execution method and device is provided for effectively defending a speculative memory access-based Cache delay side channel attack. The method in the embodiment of the present application comprises: acquiring a first instruction, the first instruction being an instruction for reading a memory; determining each second instruction that satisfies a preset condition, the second instructions being branch instructions that are executed out of order, and the preset condition including a control dependency relationship with the first instruction; and after determining that each of the second instructions satisfying the preset condition is parsed, executing the first instruction. After the second instructions that satisfy the preset condition are parsed, mis-speculation and execution of the first instruction can be avoided, and thus a change in the Cache microstructure state caused by mis-speculation and execution of the first instruction does not occur; therefore, an attacker cannot steal data of a protected area by means of the change in the Cache microstructure state, thereby effectively defending a speculative memory access-based Cache delay side channel attack.

Description

一种指令执行方法及装置Instruction execution method and device 技术领域Technical field
本申请涉及通信领域,尤其涉及一种指令执行方法及装置。The present application relates to the field of communications, and in particular, to an instruction execution method and apparatus.
背景技术Background technique
目前,推测乱序执行是现代处理器开发指令级并行性的基本技术。任何指令的正确执行必须满足两点:第一点,满足控制依赖,即该指令处在正确的分支路径上;第二点,满足数据依赖,即该指令的源操作数正确获得。推测乱序执行是指在某条指令的控制依赖尚未解析的情况下,只要判定其数据依赖满足,则投机执行这条指令。当然,如果推测错误,则该指令可能处于错误的分支路径上。此后当处理器检测到推测错误,则撤销误推测执行的指令、并从正确的路径上重新执行,从而保证程序语义的正确。At present, it is speculated that out-of-order execution is the basic technology for developing processor-level parallelism. The correct execution of any instruction must satisfy two points: the first point, the control dependency is satisfied, that is, the instruction is on the correct branch path; the second point, the data dependency is satisfied, that is, the source operand of the instruction is correctly obtained. It is speculated that out-of-order execution means that if the control dependency of an instruction has not been resolved, the instruction is speculatively executed as long as it is determined that the data dependency is satisfied. Of course, if a speculation is wrong, the instruction may be on the wrong branch path. Thereafter, when the processor detects a speculative error, the instruction for mispredicting execution is revoked and re-executed from the correct path, thereby ensuring correct program semantics.
在推测乱序执行的处理器上,误推测执行的指令不会改变任何结构定义的软件可见的状态(architectural visible state),但是会改变微结构状态(micro-architectural state),例如物理寄存器(physical register file)和缓存(Cache),并且这种微结构状态改变不会随着误推测执行的指令的撤销而撤销。在系统的正常使用中,由于任何微结构状态都是软件不使用且不可见的,所以误推测执行带来的微结构状态改变不会引起软件运行错误。但是,随着黑客技术的发展,一种称为“Cache时延侧信道攻击”的方法能够准确地检测出Cache微结构状态的改变,由此产生了一种针对现有处理器的极具威胁的攻击手段:基于推测访存的Cache时延侧信道攻击。这种攻击手段如幽灵(Spectre)攻击,主要利用了误推测执行的访存指令实现攻击,在原理上大致分为两步:第一步,构造分支误预测场景,使得处于错误分支路径上的误推测执行的Load指令访问受保护区域的数据;第二步,使用该数据构造地址索引访问Cache,这会导致Cache微结构状态的改变(例如对应的cache line访问由miss变为hit),再通过Cache时延侧信道检测这种改变,进而窃取受保护的数据内容。当前使用推测乱序执行技术的处理器在硬件上普遍无法抵御基于推测访存的Cache时延侧信道攻击。On a processor that speculates on out-of-order execution, mispredicting the executed instruction does not change the architectural visible state of any structure definition, but changes the micro-architural state, such as physical registers (physical). Register file) and Cache, and this microstructural state change will not be undone with the revocation of the instruction that was speculatively executed. In the normal use of the system, since any microstructure state is not used by the software and is invisible, it is misunderstood that the microstructural state change caused by the execution does not cause a software operation error. However, with the development of hacker technology, a method called "Cache Delay Side Channel Attack" can accurately detect the change of Cache micro-structure state, resulting in a very threatening against existing processors. Attack means: Cache delay side channel attack based on speculative memory access. This kind of attack means, such as the Spectre attack, mainly uses the fetching execution of the memory access instruction to implement the attack. In principle, it is roughly divided into two steps: the first step is to construct a branch mispredicted scene so that it is on the wrong branch path. The speculative execution of the Load instruction accesses the data of the protected area; the second step is to use the data to construct the address index to access the Cache, which will cause the Cache microstructure state to change (for example, the corresponding cache line access changes from miss to hit), and then This change is detected by the Cache Delay Side channel, thereby stealing protected data content. Processors currently using speculative out-of-order execution techniques are generally incapable of defending against Cache latency side channel attacks based on speculative memory access.
现有技术中,主要通过以下两种应急规避的方案实现防御上述攻击:一种方案中,通过系统软件,例如操作系统(operating system,OS)在进程切换时显式清空分支目标缓冲器(branch target buffer,BTB)防止攻击,但是这种方案会极大地损害系统性能。另一种方案中,为重新编译、为特定敏感代码段生成具有分支边界(branch barrier)特性的二进制代码,由于目前业界尚未发布自动识别敏感代码的编译器,因此需要程序员修改源码显式指出敏感代码段,但是这很难保证所有程序、程序内的所有敏感代码段都被识别和重新编译。In the prior art, the above-mentioned attacks are mainly defended by the following two emergency avoidance schemes: in one scheme, the branch target buffer (branch) is explicitly cleared during process switching by system software, such as an operating system (OS). Target buffer, BTB) prevents attacks, but this approach can greatly impair system performance. In another solution, in order to recompile and generate a binary code with a branch barrier feature for a specific sensitive code segment, since the compiler has not yet released a code for automatically identifying sensitive code, the programmer needs to modify the source code to explicitly indicate Sensitive code segments, but it is difficult to ensure that all programs, all sensitive code segments within the program are recognized and recompiled.
综上所述,现有技术采取应急规避的方案,存在防御的完备性和性能损失两方面的问题,并不能有效的防御基于推测访存的Cache时延侧信道攻击。In summary, the prior art adopts an emergency avoidance scheme, which has two problems of completeness of defense and performance loss, and cannot effectively prevent Cache delay side channel attacks based on speculative memory access.
发明内容Summary of the invention
本申请实施例提供一种指令执行方法及装置,用于实现有效的防御基于推测访存的 Cache时延侧信道攻击的目的。The embodiment of the present invention provides an instruction execution method and apparatus, which are used for effectively defending a Cache delay side channel attack based on a speculative memory access.
第一方面,提供了一种指令执行方法,该方法中指令执行装置获取第一指令,确定出满足预设条件的各第二指令,第一指令为读取内存的指令,第二指令为乱序执行的分支指令。在确定满足预设条件的各第二指令解析完成后,执行第一指令。由于在满足预设条件的各第二指令解析完成后,就可以避免误推测执行第一指令,进而不会出现因为误推测执行第一指令导致的Cache微结构状态变化,因此,攻击者无法通过Cache微结构状态变化窃取受保护区域的数据,从而可以实现有效防御基于推测访存的Cache时延侧信道攻击。The first aspect provides an instruction execution method, in which the instruction execution device acquires a first instruction, and determines each second instruction that satisfies a preset condition, where the first instruction is an instruction to read a memory, and the second instruction is a disorder The branch instruction executed in sequence. After determining that the second instructions satisfying the preset condition are analyzed, the first instruction is executed. Since the first instruction is not speculatively executed after the second instruction that satisfies the preset condition is completed, the Cache microstructure state change caused by the misprediction of executing the first instruction does not occur, so the attacker cannot pass the attack. The Cache micro-structure state changes the data of the protected area, so that the Cache delay-side channel attack based on the speculative memory can be effectively defended.
一种可能的设计中,为了满足不同级别的防御需求,可以通过设置不同的预设条件实现。可选的,预设条件包括与所述第一指令存在预设类型的控制依赖关系,这种情况下,与第一指令存在预设类型的控制依赖关系的各第二指令解析完成后,就可以执行第一指令,而不需要等待与第一指令存在控制依赖关系的各第二指令都解析完成。为了满足更高级别的防御需求,进一步避免误推测执行第一指令,可选的,预设条件为与第一指令存在控制依赖关系,这种情况下,需要等待与第一指令存在控制依赖关系的各第二指令都解析完成后,才执行第一指令。In a possible design, in order to meet different levels of defense requirements, it can be achieved by setting different preset conditions. Optionally, the preset condition includes a preset type of control dependency relationship with the first instruction, in which case, after the second instruction with the preset type of control dependency of the first instruction is parsed, The first instruction can be executed without waiting for the second instruction having a control dependency with the first instruction to be parsed. In order to meet the higher level of defense requirements, further avoiding mispredicting the execution of the first instruction, optionally, the preset condition is that there is a control dependency relationship with the first instruction. In this case, it is necessary to wait for a control dependency relationship with the first instruction. The first instruction is executed after each second instruction is parsed.
一种可能的设计中,在确定满足预设条件的各第二指令解析完成之前,还包括:根据满足预设条件的各第二指令的序号,确定第一变量的值;根据满足预设条件的各第二指令中未解析的第二指令的序号或第一变量的值,确定第二变量的值。进一步的,所述确定满足预设条件的各第二指令解析完成,包括:若第一变量的值小于第二变量的值,则确定满足预设条件的各第二指令解析完成。若第一变量的值大于等于第二变量的值,则确定满足预设条件的各第二指令未解析完成。如此,可以根据第一变量的值和第二变量的值,以及在执行第一指令之前通过少量的判断逻辑,显式地判断第一指令是否存在控制依赖风险,进而可以保证在第一指令不存在控制依赖风险的情况下执行第一指令,从而有效防御基于推测访存的Cache时延侧信道攻击。In a possible design, before determining that the second instructions satisfying the preset condition are completed, the method further includes: determining, according to the sequence number of each second instruction that meets the preset condition, a value of the first variable; The sequence number of the second instruction or the value of the first variable that is not resolved in each of the second instructions determines the value of the second variable. Further, the determining, that the second instruction that meets the preset condition is parsed, comprises: if the value of the first variable is less than the value of the second variable, determining that the second instruction parsing that meets the preset condition is completed. If the value of the first variable is greater than or equal to the value of the second variable, it is determined that each second instruction that satisfies the preset condition is not parsed. In this way, according to the value of the first variable and the value of the second variable, and before executing the first instruction, a small amount of judgment logic is used to explicitly determine whether the first instruction has a control dependency risk, thereby ensuring that the first instruction is not The first instruction is executed in the case of controlling the dependency risk, thereby effectively preventing the Cache delay side channel attack based on the speculative memory access.
一种可能的设计中,在第一方面的第三种实施方式中,所述根据所述满足预设条件的各第二指令的序号,确定第一变量的值,包括:确定出所述满足预设条件的各第二指令中序号最大的第二指令的序号,作为第一变量的值;所述根据所述满足预设条件的各第二指令中未解析的第二指令的序号或所述第一变量的值,确定第二变量的值,包括:确定出所述满足预设条件的各第二指令中序号最小且未解析的第二指令的序号,作为第二变量的值;或者,若确定出所述满足预设条件的各第二指令中不存在未解析的第二指令,则将所述第一变量的值加一,作为所述第二变量的值。如此,可以通过比较第一变量的值和第二变量的值的大小,精确的确定出满足预设条件的各第二指令是否解析完成,进而在满足预设条件的各第二指令解析完成后执行第一指令,实现有效防御基于推测访存的Cache时延侧信道攻击。In a possible implementation, in a third implementation manner of the first aspect, the determining, according to the sequence number of each second instruction that meets the preset condition, determining a value of the first variable includes: determining that the content is satisfied a serial number of the second instruction having the largest serial number among the second instructions of the preset condition, as a value of the first variable; the serial number or the second instruction unresolved according to the second instruction satisfying the preset condition Determining the value of the first variable, determining the value of the second variable, comprising: determining a sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition, as a value of the second variable; or If it is determined that there is no unresolved second instruction in each of the second instructions that satisfy the preset condition, the value of the first variable is incremented by one as the value of the second variable. In this way, by comparing the value of the first variable and the value of the second variable, it is possible to accurately determine whether the second instructions satisfying the preset condition are parsed, and then after the second instructions satisfying the preset condition are parsed. The first instruction is executed to implement effective defense against the Cache delay side channel attack based on the speculative memory access.
一种可能的设计中,所述确定出所述满足预设条件的各第二指令中序号最小且未解析的第二指令的序号,作为第二变量的值,包括:在所述序号最小且未解析的第二指令解析之后,将位于所述序号最小且未解析的第二指令之后的未解析的第二指令的序号,作为所述第二变量的值。如此,动态的更新第二变量的值,使得第二变量的值始终指示满足预设条件的各第二指令中序号最小、且未解析的第二指令,进而保证确定满足预设条件的各第二指令是否解析完成的精确性。In a possible design, the determining, by the second instruction that meets the preset condition, the sequence number of the second instruction having the smallest serial number and the unresolved, as the value of the second variable, including: After parsing the unparsed second instruction, the sequence number of the unresolved second instruction after the second instruction having the smallest and unresolved sequence number is used as the value of the second variable. In this way, the value of the second variable is dynamically updated, so that the value of the second variable always indicates the second instruction having the smallest serial number and the unresolved in each second instruction that satisfies the preset condition, thereby ensuring that each of the preset conditions is satisfied. Whether the second instruction resolves the accuracy of the completion.
第二方面,提供了一种指令执行装置,包括取指器、译码器、预执行缓冲器、调度器 和执行器;所述取指器,用于指令缓存中取出指令;所述译码器,用于译码所述指令,得到所述指令的译码结果;所述译码结果包括指令类型;所述预执行缓冲器,用于存储所述指令和所述指令的译码结果;所述调度器,用于:若从所述预执行缓冲器获取到第一指令,则:确定出满足预设条件的各第二指令,在确定所述满足预设条件的各第二指令解析完成后,将所述第一指令发送至执行器;所述第一指令为读取内存的指令;所述第二指令为乱序执行的分支指令;若从所述预执行缓冲器获取到第二指令,则将所述第二指令发送至所述执行器;所述执行器,用于执行所述第一指令和所述第二指令。In a second aspect, an instruction execution apparatus is provided, including an interceptor, a decoder, a pre-execution buffer, a scheduler, and an executor; the fetcher is configured to fetch an instruction in an instruction cache; For decoding the instruction, obtaining a decoding result of the instruction; the decoding result includes an instruction type; the pre-execution buffer is configured to store the decoding result of the instruction and the instruction; The scheduler is configured to: if the first instruction is obtained from the pre-execution buffer, determine: each second instruction that meets a preset condition, and determine, in determining the second instruction that meets the preset condition After completion, the first instruction is sent to an executor; the first instruction is an instruction to read a memory; the second instruction is a branch instruction executed out of order; if the first instruction is obtained from the pre-execution buffer The second instruction sends the second instruction to the executor; the executor is configured to execute the first instruction and the second instruction.
该指令执行装置可以是至少一个处理元件或芯片。The instruction execution device can be at least one processing element or chip.
一种可能的设计中,所述预设条件包括与所述第一指令存在控制依赖关系。In a possible design, the preset condition includes a control dependency relationship with the first instruction.
一种可能的设计中,所述预设条件包括与所述第一指令存在预设类型的控制依赖关系。In a possible design, the preset condition includes a preset type of control dependency with the first instruction.
一种可能的设计中,所述调度器还用于:根据所述满足预设条件的各第二指令的序号,确定第一变量的值;根据所述满足预设条件的各第二指令中未解析的第二指令的序号或所述第一变量的值,确定第二变量的值;若所述第一变量的值小于所述第二变量的值,则确定所述满足预设条件的各第二指令解析完成。In a possible design, the scheduler is further configured to: determine a value of the first variable according to the sequence number of each second instruction that meets the preset condition; according to the second instruction that meets the preset condition Determining the sequence number of the second instruction or the value of the first variable, determining a value of the second variable; if the value of the first variable is less than the value of the second variable, determining that the predetermined condition is met The second instruction analysis is completed.
一种可能的设计中,所述调度器具体用于:确定出所述满足预设条件的各第二指令中序号最大的第二指令的序号,作为第一变量的值;确定出所述满足预设条件的各第二指令中序号最小且未解析的第二指令的序号,作为第二变量的值;或者,若确定出所述满足预设条件的各第二指令中不存在未解析的第二指令,则将所述第一变量的值加一,作为所述第二变量的值。In a possible design, the scheduler is specifically configured to: determine a sequence number of the second instruction having the largest serial number among the second instructions that meet the preset condition, as a value of the first variable; and determine that the content is satisfied a sequence number of the second instruction having the smallest and unresolved number among the second instructions of the preset condition, as the value of the second variable; or, if it is determined that the second instruction satisfying the preset condition does not exist in the unresolved The second instruction adds one to the value of the first variable as the value of the second variable.
一种可能的设计中,所述调度器具体用于:在所述序号最小且未解析的第二指令解析之后,将位于所述序号最小且未解析的第二指令之后的未解析的第二指令的序号,作为所述第二变量的值。In a possible design, the scheduler is specifically configured to: after the second instruction with the smallest and unresolved sequence number, parse the unresolved second after the second instruction with the smallest and unresolved sequence number The sequence number of the instruction as the value of the second variable.
第三方面,提供了一种芯片,所述芯片与存储器相连,用于读取并执行所述存储器中存储的软件程序,以实现根据以上第一方面或任一种可能的设计中所述的方法。In a third aspect, a chip is provided, the chip being coupled to a memory for reading and executing a software program stored in the memory to implement the method according to the first aspect or any of the possible designs above method.
第四方面,提供了一种可读存储介质,所述可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行以上第一方面或上述任一种可能的设计中所述的方法。In a fourth aspect, a readable storage medium is provided, the instructions being stored in the readable storage medium, when executed on a computer, causing the computer to perform the above first aspect or any of the above possible designs Methods.
附图说明DRAWINGS
图1为本申请实施例提供的一种网络设备的系统架构的示意图;FIG. 1 is a schematic diagram of a system architecture of a network device according to an embodiment of the present disclosure;
图2为本申请实施例提供的一种可能的处理器结构示意图;2 is a schematic structural diagram of a possible processor according to an embodiment of the present application;
图3为本申请实施例提供的一种可能的调度器结构示意图;FIG. 3 is a schematic structural diagram of a possible scheduler according to an embodiment of the present disclosure;
图4为本申请实施例提供的一种指令执行方法流程示意图。FIG. 4 is a schematic flowchart of a method for executing an instruction according to an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present application will be clearly and completely described in the following with reference to the accompanying drawings in the embodiments.
本申请提供的一种指令执行的方法,该方法可以应用于如图1所示的网络设备的系统架构中。参考图1所示,为本申请的实施例提供的一种网络设备的系统架构。如图1所示, 系统架构100包括存储器110、处理器120和通信接口130;其中,存储器110、处理器120和通信接口130相互连接。The present invention provides a method for executing an instruction, which can be applied to a system architecture of a network device as shown in FIG. 1. Referring to FIG. 1 , a system architecture of a network device provided by an embodiment of the present application. As shown in FIG. 1, the system architecture 100 includes a memory 110, a processor 120, and a communication interface 130; wherein the memory 110, the processor 120, and the communication interface 130 are connected to each other.
存储器110可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储器也可以包括非易失性存储器(non-volatile memory),例如快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器110还可以包括上述种类的存储器的组合。The memory 110 may include a volatile memory such as a random-access memory (RAM); the memory may also include a non-volatile memory such as a flash memory. A hard disk drive (HDD) or a solid-state drive (SSD); the memory 110 may also include a combination of the above types of memories.
处理器120可以是中央处理器(central processing unit,CPU),网络处理器(network processor,NP)或者CPU和NP的组合。处理器120还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。The processor 120 can be a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP. The processor 120 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof.
通信接口130可以为有线通信接入口,无线通信接口或其组合,其中,有线通信接口例如可以为以太网接口。以太网接口可以是光接口,电接口或其组合。无线通信接口可以为WLAN接口。The communication interface 130 can be a wired communication access port, a wireless communication interface, or a combination thereof, wherein the wired communication interface can be, for example, an Ethernet interface. The Ethernet interface can be an optical interface, an electrical interface, or a combination thereof. The wireless communication interface can be a WLAN interface.
基于上述图1所示的系统架构,本申请提供的指令执行方法可由处理器120中的部分部件实现。Based on the system architecture shown in FIG. 1 above, the instruction execution method provided by the present application may be implemented by some components in the processor 120.
在一个可能的设计中,上述图1所示处理器120可以包括多种部件。基于图1所示的系统架构,参考图2所示,为本申请的实施例提供的一种可能的处理器结构示意图。如图2所示,该处理器120包括指令缓存(instruction cache,Icache)210、预执行缓冲器220、取指器230、译码器240、调度器250、执行器260、中间寄存器270。其中,中间寄存器270用于存放推测执行的指令的结果。In one possible design, the processor 120 shown in FIG. 1 above may include various components. Based on the system architecture shown in FIG. 1 , with reference to FIG. 2 , a schematic structural diagram of a possible processor provided by an embodiment of the present application is provided. As shown in FIG. 2, the processor 120 includes an instruction cache (Icache) 210, a pre-execution buffer 220, a fetcher 230, a decoder 240, a scheduler 250, an executor 260, and an intermediate register 270. The intermediate register 270 is used to store the result of the speculatively executed instruction.
其中,指令缓存210存储有各个类型的指令,比如分支指令、访存指令、以及其它类型的指令。取指器230从指令缓存210中按顺序取出指令,并传递给译码器240。译码器240得到每个指令的类型,然后将指令及其译码信息存放在预执行缓冲器220。举个例子,比如译码器240对按顺序获取的4个指令进行译码,译码结果分别为:第1个指令为分支指令,第2个指令为访存指令,第3个指令为分支指令,第4个指令为访存指令。The instruction cache 210 stores various types of instructions, such as branch instructions, memory access instructions, and other types of instructions. The fetcher 230 fetches the instructions in order from the instruction cache 210 and passes them to the decoder 240. The decoder 240 obtains the type of each instruction and then stores the instruction and its decoding information in the pre-execution buffer 220. For example, the decoder 240 decodes the four instructions acquired in order, and the decoding results are: the first instruction is a branch instruction, the second instruction is a fetch instruction, and the third instruction is a branch. The instruction, the fourth instruction is the fetch instruction.
调度器250可以从预执行缓冲器220中获取指令及其类型。在获取指令之后,调度器250还可以将各指令发送至执行该指令的执行器260中。如图1所示,执行器260包括其它流水线261和访存器262。访存器262包括访存指令队列263,具体应用中,访存指令包括加载(Load)指令和存储(Store)指令两种,所以访存指令队列又可称为Load/Store队列。The scheduler 250 can retrieve the instructions and their types from the pre-execution buffer 220. After the instruction is fetched, the scheduler 250 can also send each instruction to the executor 260 that executes the instruction. As shown in FIG. 1, the executor 260 includes other pipelines 261 and a memory 262. The memory access device 262 includes a memory access instruction queue 263. In a specific application, the memory access instruction includes two types: a load instruction and a store instruction. Therefore, the memory access instruction queue may be referred to as a load/store queue.
具体的,若调度器250获取的指令为分支指令,则将该分支指令发送至其他流水线261中进行执行。若调度器250获取的指令为访存指令,则将该访存指令发送至访存器262中进行执行。Specifically, if the instruction acquired by the scheduler 250 is a branch instruction, the branch instruction is sent to another pipeline 261 for execution. If the instruction acquired by the scheduler 250 is a memory access instruction, the memory access instruction is sent to the memory buffer 262 for execution.
由于指令之间是存在控制依赖和数据依赖关系的,如果数据依赖不满足,是不会执行访存指令的,所以本申请默认数据依赖满足,在只考虑控制依赖的情况下,如果一个访存指令的控制依赖未解析,可能会在推测乱序执行过程中,出现基于推测访存的Cache时延侧信道攻击。为了避免受到这种攻击,本申请提供的实施例中,若调度器250获取的指令为访存指令,则在与该访存指令存在控制依赖的各分支指令解析完成之后,再将该访存指 令发送至访存器262中进行执行。可选的,若与该访存指令存在控制依赖的各分支指令中存在未解析的分支指令,则先将该访存指令存储于预执行缓冲器220中,直到与该访存指令存在控制依赖的各分支指令解析完成,再从预执行缓冲器220中调度该访存指令,并发射至访存器262中进行执行。Since there are control dependencies and data dependencies between the instructions, if the data dependencies are not met, the memory access instructions will not be executed. Therefore, the default data dependency of this application is satisfied. If only the control dependencies are considered, if one is accessed, The control of the instruction depends on unresolved, and the Cache delay side channel attack based on the speculative memory access may occur during the speculative out-of-order execution. In order to avoid such an attack, in the embodiment provided by the present application, if the instruction acquired by the scheduler 250 is a memory access instruction, the memory is fetched after the branch instruction of the memory access instruction existence control is completed. The instructions are sent to the memory 262 for execution. Optionally, if there is an unresolved branch instruction in each branch instruction that has a control dependency on the memory access instruction, the memory access instruction is first stored in the pre-execution buffer 220 until there is control dependency with the memory access instruction. The branch instruction is parsed, and the memory access instruction is dispatched from the pre-execution buffer 220 and transmitted to the memory buffer 262 for execution.
访存器262可以在接收到访存指令之后,将访存指令存储于Load/Store队列中,当数据依赖和控制依赖均满足时乱序执行各访存指令。The memory 262 may store the memory access instruction in the Load/Store queue after receiving the memory access instruction, and execute each memory access instruction in an out-of-order manner when both the data dependency and the control dependency are satisfied.
针对一条指令来说,经过上述取指器230、译码器240、调度器250、执行器260等器的处理过程形成一条包括多个流水段的流水线,流水线包括取指、译码、调度和执行等流水段,其中,取指器230对应取指流水段,译码器240对应译码流水段,调度器250对应分发流水段,执行器260对应执行流水段。可选的,流水线还可以包括如图2所示的重命名、发射、写回、命令等流水段(未在图1中示出流水段相关的器)中的一个或多个。For an instruction, the processing of the above-mentioned fetcher 230, decoder 240, scheduler 250, executor 260, etc. forms a pipeline including a plurality of pipelines, the pipeline includes fetching, decoding, scheduling, and The equal-flow segment is executed, wherein the finger-trigger 230 corresponds to the finger-flow segment, the decoder 240 corresponds to the decoded pipeline segment, the scheduler 250 corresponds to the distribution pipeline segment, and the actuator 260 corresponds to the execution pipeline segment. Alternatively, the pipeline may also include one or more of a water flow segment (not shown in FIG. 1 for renaming, transmitting, writing back, commanding, etc.) as shown in FIG.
基于上述图2所示的处理器,本申请实施例中,指令执行方法由处理器120中的调度器250执行。参考图3所示,为本申请的实施例提供的一种调度器结构示意图。Based on the processor shown in FIG. 2 above, in the embodiment of the present application, the instruction execution method is executed by the scheduler 250 in the processor 120. Referring to FIG. 3, it is a schematic structural diagram of a scheduler provided by an embodiment of the present application.
如图3所示,调度器250包括获取模块310、确定模块320和执行模块330。其中:As shown in FIG. 3, the scheduler 250 includes an acquisition module 310, a determination module 320, and an execution module 330. among them:
获取模块310,用于从预执行缓冲器220获取指令,可选的,获取的指令包括分支指令、访存指令以及其它类型的指令。The obtaining module 310 is configured to obtain an instruction from the pre-execution buffer 220. Optionally, the acquired instruction includes a branch instruction, a memory access instruction, and other types of instructions.
确定模块320,用于在获取模块310获取第一指令之后,确定出满足预设条件的第二指令。其中,第一指令为读取内存的指令,第二指令为乱序执行的分支指令。The determining module 320 is configured to determine, after the obtaining module 310 acquires the first instruction, a second instruction that meets the preset condition. The first instruction is an instruction to read the memory, and the second instruction is a branch instruction executed out of order.
执行模块330,用于在调度器250确定满足预设条件的各第二指令解析完成后,执行该第一指令。The executing module 330 is configured to execute the first instruction after the scheduler 250 determines that the second instructions satisfying the preset condition are parsed.
结合上述图1、图2和图3,下面针对本申请提供的指令执行方法进行具体描述。With reference to FIG. 1, FIG. 2 and FIG. 3 above, the instruction execution method provided by the present application is specifically described below.
基于上述描述,图4示例性的示出了本申请提供的一种指令执行方法流程。Based on the above description, FIG. 4 exemplarily shows a flow of an instruction execution method provided by the present application.
如图4所示,该流程具体包括:As shown in FIG. 4, the process specifically includes:
步骤401,调度器250获取第一指令,第一指令为读取内存的指令。In step 401, the scheduler 250 acquires a first instruction, and the first instruction is an instruction to read the memory.
本申请实施例中,调度器250中的获取模块310获取该第一指令,该第一指令也可以称为加载指令,即Load指令。In the embodiment of the present application, the acquiring module 310 in the scheduler 250 acquires the first instruction, and the first instruction may also be referred to as a load instruction, that is, a load instruction.
步骤402,调度器250确定出满足预设条件的各第二指令,第二指令为乱序执行的分支指令。 Step 402, the scheduler 250 determines each second instruction that satisfies the preset condition, and the second instruction is a branch instruction that is executed out of order.
每个第二指令经过调度器250调度至其他流水线261中乱序执行,在该第二指令执行的过程中,调度器250继续调度后续到达调度器250的指令。在其他流水线261中的各第二指令可以乱序执行,并不需要按照取指译码顺序执行,举例来说,三个第二指令按取指译码顺序先后分别为第二指令1、第二指令2、第二指令3,在其他流水线261中执行情况举例如下:比如第二指令1未执行,可以先执行第二指令2和第二指令3。Each second instruction is scheduled to be out of sequence in the other pipelines 261 by the scheduler 250, and during execution of the second instructions, the scheduler 250 continues to schedule subsequent instructions to the scheduler 250. The second instructions in the other pipelines 261 can be executed out of order, and need not be executed in the instruction fetching order. For example, the three second instructions are respectively the second instruction 1 in the fetching decoding order. The two instructions 2 and the second instruction 3 are executed in other pipelines 261 as follows. For example, if the second instruction 1 is not executed, the second instruction 2 and the second instruction 3 may be executed first.
步骤403,在调度器250确定满足预设条件的各第二指令解析完成后,执行第一指令。 Step 403, after the scheduler 250 determines that the second instructions satisfying the preset condition are analyzed, the first instruction is executed.
当第一指令到达调度器250时,需要调度器250中的确定模块320确定出满足预设条件的各第二指令是否解析完成,若是,则执行该第一指令,该执行动作指的是调度器250中的执行模块330将该第一指令发射至访存器262;若否,即存在未解析的第二指令,则需要等待满足预设条件的各第二指令解析完成后,才执行该第一指令。When the first instruction arrives at the scheduler 250, the determination module 320 in the scheduler 250 is required to determine whether each second instruction that satisfies the preset condition is parsed, and if so, the first instruction is executed, and the execution action refers to scheduling The execution module 330 in the device 250 transmits the first instruction to the memory accessor 262; if not, that is, there is an unresolved second instruction, and then it is necessary to wait for the second instruction that satisfies the preset condition to be analyzed before executing the First instruction.
针对一个第一指令来说,需要满足预设条件的各第二指令解析后,再执行该第一指令。如果存在多个第一指令需要执行,那么多个第一指令之间可以是乱序执行的,举例来说, 比如,满足预设条件的各第二指令按照到达调度器250的先后顺序,依次为第一指令A、第一指令B和第一指令C,在第一指令A被发射至访存器262之后,第一指令B未解析,而第一指令C已解析,那么先执行第一指令C,待第一指令B之后,再执行第一指令B。For a first instruction, the second instruction that needs to meet the preset condition is parsed, and then the first instruction is executed. If a plurality of first instructions need to be executed, the plurality of first instructions may be executed out of order. For example, for example, each of the second instructions that meet the preset condition is sequentially followed by the scheduler 250. For the first instruction A, the first instruction B, and the first instruction C, after the first instruction A is transmitted to the memory 262, the first instruction B is not parsed, and the first instruction C is parsed, then the first instruction is executed first. The instruction C, after the first instruction B, executes the first instruction B.
通过上述实施例提供的方案,由于在满足预设条件的各第二指令解析完成后,就可以避免误推测执行第一指令,进而不会出现因为误推测执行第一指令导致的Cache微结构状态变化,因此,攻击者无法通过Cache微结构状态变化窃取受保护区域的数据,从而可以实现有效防御基于推测访存的Cache时延侧信道攻击。According to the solution provided by the foregoing embodiment, since the first instruction is not speculatively executed after the second instruction that satisfies the preset condition is completed, the Cache microstructure state caused by the mispredicting execution of the first instruction does not occur. Therefore, the attacker cannot steal the data of the protected area through the Cache micro-structure state change, so that the Cache delay-side channel attack based on the speculative memory can be effectively prevented.
一种可选的实现方式中,所述预设条件包括与所述第一指令存在控制依赖关系。也就是说,在调度器确定与第一指令存在控制依赖关系的各第二指令解析完成后,执行第一指令。In an optional implementation manner, the preset condition includes a control dependency relationship with the first instruction. That is, after the scheduler determines that the second instructions having the control dependency relationship with the first instruction are completed, the first instruction is executed.
需要说明的是,第一指令是否执行除了需要满足控制依赖已解析的条件之外,还需要满足数据依赖已解析的条件。由于在该第一指令的数据依赖未解析的情况下,不能得到正确的源操作数,也就不能执行第一指令,因此,本申请实施例中,默认执行第一指令的情况下,即满足数据依赖已解析的条件。It should be noted that whether the first instruction is executed or not needs to satisfy the condition that the control dependency has been resolved needs to satisfy the condition that the data dependency has been resolved. The first instruction cannot be executed because the data of the first instruction is unresolved, and the first instruction cannot be executed. Therefore, in the embodiment of the present application, when the first instruction is executed by default, the content is satisfied. The data depends on the condition that has been resolved.
结合图2以及上述实施例,以下提供介绍指令执行过程的具体示例。In conjunction with FIG. 2 and the above-described embodiments, a specific example of an instruction execution process is provided below.
比如在指令缓存中包括四条指令,依次为指令1、指令2、指令3和指令4。For example, the instruction cache includes four instructions, which are instruction 1, instruction 2, instruction 3, and instruction 4.
首先,从指令缓存中依次取出上述四条指令,译码结果为:指令1为分支指令,指令2为Load指令,指令3为分支指令,指令4为Load指令。First, the above four instructions are sequentially taken out from the instruction cache. The decoding result is: instruction 1 is a branch instruction, instruction 2 is a load instruction, instruction 3 is a branch instruction, and instruction 4 is a load instruction.
然后,上述四条指令依次进入调度器250,当调度器250接收到指令1时,将该指令1发送至其他流水线器261中执行。当调度器250接收到指令2时,发现此时指令1已解析,即指令1在其他流水线261中已执行完成,并得到执行结果,此时将该指令2发射至访存器262中。当调度器250接收到指令3时,发送至其他流水线261中执行,当调度器250接收到指令4时,发现指令3未解析,即此时指令4的控制依赖未解析,那么如何处理指令4,以下分别针对本申请采取的方案和现有技术采取的方案进行区别描述:Then, the above four instructions sequentially enter the scheduler 250. When the scheduler 250 receives the instruction 1, the instruction 1 is sent to the other pipelines 261 for execution. When the scheduler 250 receives the instruction 2, it is found that the instruction 1 has been parsed at this time, that is, the instruction 1 has been executed in the other pipelines 261, and the execution result is obtained, at which time the instruction 2 is transmitted to the memory 262. When the scheduler 250 receives the instruction 3, it is sent to the other pipeline 261 for execution. When the scheduler 250 receives the instruction 4, it finds that the instruction 3 is unresolved, that is, the control of the instruction 4 is unresolved at this time, then how to process the instruction 4 The following describes the difference between the solution adopted in this application and the solution adopted in the prior art:
在现有技术的方案中,此时指令4的控制依赖未解析,调度器250直接将指令4发射至访存器262中执行,如果指令4出现误推测执行,那么会改变缓存中微结构状态,攻击者通过发生改变的微结构状态可以找到内部数据,导致内部数据泄密,存在数据安全隐患。In the prior art solution, at this time, the control of the instruction 4 depends on the unresolved, the scheduler 250 directly transmits the instruction 4 to the memory 262, and if the instruction 4 is mispresumed, the microstructural state in the cache is changed. The attacker can find internal data through the changed microstructure state, resulting in internal data leakage and data security risks.
在本申请的方案中,此时指令4的控制依赖未解析,调度器250会暂停发射指令4,比如,可以将指令4存放在预执行缓冲器220中,直至指令3解析完成,调度器250再发射指令4至访存器262中执行。在指令4暂停发射期间,并不会影响指令4之后的其它指令的执行。相较于现有技术中的方案,本申请的方案可以大大提高数据安全性,虽然相对于现有技术的推测乱序执行方案会降低指令执行的效率,但是这种性能降低的影响远小于背景技术中给出的防御攻击的方案。In the solution of the present application, at this time, the control of the instruction 4 depends on unresolved, and the scheduler 250 suspends the transmission of the instruction 4. For example, the instruction 4 can be stored in the pre-execution buffer 220 until the instruction 3 is parsed, and the scheduler 250 The retransmission instruction 4 is executed in the memory 262. During the pause of transmission of instruction 4, the execution of other instructions after instruction 4 is not affected. Compared with the solution in the prior art, the solution of the present application can greatly improve data security, although the speculative execution scheme relative to the prior art can reduce the efficiency of instruction execution, but the impact of this performance degradation is much smaller than the background. The scheme of defense attacks given in the technology.
为了满足不同级别的防御需求,可以通过设置不同的预设条件实现。In order to meet different levels of defense needs, you can achieve this by setting different preset conditions.
一种可选的实现方式中,预设条件包括与所述第一指令存在预设类型的控制依赖关系。该预设类型可以为直接条件分支跳转的控制依赖,也可以为间接条件分支跳转的控制依赖,还可以为其它类型的控制依赖。In an optional implementation manner, the preset condition includes a preset type of control dependency relationship with the first instruction. The preset type may be a control dependency of a direct conditional branch jump, a control dependency of an indirect conditional branch jump, and may also be a dependency of other types of control.
举例来说,比如只跟踪直接条件分支跳转的控制依赖,也就是说,当调度器250确定与第一指令存在直接条件分支跳转的控制依赖的分支指令解析完成后,即可执行该第一指令,而不需要等待所有的与第一指令存在控制依赖关系的分支指令都解析完成。这种情况 下,可以防御Spectre攻击的变体1“边界检测攻击”。For example, for example, only the control dependency of the direct conditional branch jump is tracked, that is, when the scheduler 250 determines that the branch instruction of the control dependent with the direct instruction branch jump of the first instruction is completed, the first execution can be performed. An instruction does not need to wait for all branch instructions that have control dependencies on the first instruction to be parsed. In this case, you can defend against the variant 1 "boundary detection attack" of the Spectre attack.
再比如只跟踪间接条件分支跳转的控制依赖,也就是说,当调度器250接收到第二指令,若确定该第二指令为满足预设条件的第二指令,即确定该第二指令为与第一指令存在间接条件分支跳转的控制依赖的分支指令,则会跟踪每个与第一指令存在间接条件分支跳转的控制依赖的分支指令是否解析完成。在调度器250确定与第一指令存在间接条件分支跳转的控制依赖的分支指令解析完成后,即可执行该第一指令,而不需要等待所有的与第一指令存在控制依赖关系的分支指令都解析完成。这种情况下,可以防御Spectre攻击的变体2“分支目标注入”。For example, only the control dependency of the indirect conditional branch jump is tracked, that is, when the scheduler 250 receives the second instruction, if it is determined that the second instruction is the second instruction that meets the preset condition, the second instruction is determined to be The branch instruction that is dependent on the control of the indirect conditional branch jump with the first instruction will track whether each branch instruction that has an indirect conditional branch jump with the first instruction is parsed or not. After the scheduler 250 determines that the branch instruction of the control dependent branch with the first instruction has an indirect conditional branch jump is completed, the first instruction can be executed without waiting for all branch instructions having a control dependency with the first instruction. Both are parsed. In this case, variant 2 "branch target injection" of the Spectre attack can be defended.
为了满足更高级别的防御需求,进一步避免误推测执行第一指令,可选的,预设条件为与第一指令存在控制依赖关系,这种情况下,需要等待与第一指令存在控制依赖关系的所有的第二指令解析完成后,即可执行第一指令。相较于前一种实现方式,本实施例中需要确定所有的与第一指令存在控制依赖关系的第二指令是否解析完成,需要跟踪的第二指令的数量比只跟踪预设类型的第二指令的数量更多,所以防御攻击的效果更好。In order to meet the higher level of defense requirements, further avoiding mispredicting the execution of the first instruction, optionally, the preset condition is that there is a control dependency relationship with the first instruction. In this case, it is necessary to wait for a control dependency relationship with the first instruction. After all the second instructions have been parsed, the first instruction can be executed. Compared with the previous implementation manner, in this embodiment, it is required to determine whether all the second instructions having the control dependency relationship with the first instruction are parsed, and the number of the second instructions that need to be tracked is only the second of the preset type. The number of instructions is more, so the defense attack is better.
本申请实施例中,一种可选的实现上述图4的具体方式,可以为在调度器250中设置访存过滤器(Load Filter),通过开启Load Filter的过滤功能实现防御基于推测访存的Cache时延侧信道攻击,具体地,可以在需要设置不同的防御级别时,设置各个防御级别对应的过滤功能,比如,通过缩小本申请实施例中Load Filter跟踪的分支指令种类,就能够按需调整防御级别。当然,也可以在不需要防御基于推测访存的Cache时延侧信道攻击的情况下,关闭Load Filter的过滤功能,如此,不会跟踪与第一指令存在控制依赖的第二指令,不会对流水线产生任何逻辑影响。In the embodiment of the present application, an optional implementation manner of the foregoing FIG. 4 may be: setting a load filter in the scheduler 250, and implementing a filtering function of the load filter to implement defense based on the speculative memory access. Cache delay side channel attack. Specifically, when a different defense level needs to be set, the filtering function corresponding to each defense level can be set. For example, by reducing the type of branch instruction tracked by the Load Filter in the embodiment of the present application, Adjust the defense level. Of course, if the Cache delay side channel attack based on the speculative memory access is not required, the filtering function of the Load Filter is turned off, so that the second instruction that is dependent on the first instruction is not tracked, and the second instruction is not The pipeline has any logical impact.
可选的,基于上述实施例步骤403,在确定满足预设条件的各第二指令解析完成之前,调度器250根据满足预设条件的各第二指令的序号,确定第一变量的值,根据满足预设条件的各第二指令中未解析的第二指令的序号或第一变量的值,确定第二变量的值。然后,调度器250根据第一变量的值和第二变量的值确定满足预设条件的各第二指令是否解析完成。具体的,若第一变量的值小于第二变量的值,则调度器250确定满足预设条件的各第二指令解析完成。若第一变量的值不小于第二变量的值,则确定满足预设条件的各第二指令未解析完成。可选的,序号为根据满足预设条件的各第二指令译码顺序确定的。Optionally, based on step 403 in the foregoing embodiment, before determining that the second instruction parsing that meets the preset condition is completed, the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction that satisfies the preset condition, according to The sequence number of the second instruction or the value of the first variable that is not resolved in each second instruction that satisfies the preset condition determines the value of the second variable. Then, the scheduler 250 determines whether each of the second instructions satisfying the preset condition is parsed based on the value of the first variable and the value of the second variable. Specifically, if the value of the first variable is smaller than the value of the second variable, the scheduler 250 determines that the second instruction parsing that satisfies the preset condition is completed. If the value of the first variable is not less than the value of the second variable, it is determined that each second instruction that satisfies the preset condition is not parsed. Optionally, the sequence number is determined according to a decoding sequence of each second instruction that satisfies a preset condition.
通过该可选的实施例,调度器250可以根据第一变量的值和第二变量的值,以及在执行第一指令之前通过少量的判断逻辑,显式地判断第一指令是否存在控制依赖风险,进而可以保证在第一指令不存在控制依赖风险的情况下执行第一指令,从而有效防御基于推测访存的Cache时延侧信道攻击。With this alternative embodiment, the scheduler 250 can explicitly determine whether the first instruction has a control dependency risk based on the value of the first variable and the value of the second variable, and a small amount of decision logic before executing the first instruction. In addition, the first instruction can be executed in the case that the first instruction does not have a control dependency risk, thereby effectively preventing the Cache delay side channel attack based on the speculative memory access.
可选的,调度器250根据满足预设条件的各第二指令的序号,确定第一变量的值,具体包括:调度器250确定出满足预设条件的各第二指令中序号最大的第二指令的序号,作为第一变量的值。调度器250根据满足预设条件的各第二指令中未解析的第二指令的序号或第一变量的值,确定第二变量的值,包括以下两种方式中的任一种:Optionally, the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction that meets the preset condition, and specifically includes: the scheduler 250 determines that the second serial number of each second instruction that meets the preset condition is the largest. The sequence number of the instruction as the value of the first variable. The scheduler 250 determines the value of the second variable according to the sequence number of the unresolved second instruction or the value of the first variable in each second instruction that satisfies the preset condition, and includes any one of the following two methods:
方式一,调度器250确定出满足预设条件的各第二指令中序号最小且未解析的第二指令的序号,作为第二变量的值。In a first manner, the scheduler 250 determines the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition as the value of the second variable.
方式二,若调度器250确定出满足预设条件的各第二指令中不存在未解析的第二指令,则将第一变量的值加一,作为第二变量的值。In the second mode, if the scheduler 250 determines that there is no unresolved second instruction in each second instruction that satisfies the preset condition, the value of the first variable is incremented by one as the value of the second variable.
若满足预设条件的各第二指令中存在未解析的第二指令,则调度器250中的确定模块 320根据方式一确定第二变量的值,若满足预设条件的各第二指令中不存在未解析的第二指令,则调度器250中的确定模块320根据方式二确定第二变量的值。通过上述实施例,调度器250中的确定模块320可以通过比较第一变量的值和第二变量的值的大小,精确的确定出满足预设条件的各第二指令是否解析完成,进而在满足预设条件的各第二指令解析完成后执行第一指令,实现有效防御基于推测访存的Cache时延侧信道攻击。If there is an unresolved second instruction in each second instruction that meets the preset condition, the determining module 320 in the scheduler 250 determines the value of the second variable according to the mode 1, if the second instruction that meets the preset condition does not There is an unresolved second instruction, and the determination module 320 in the scheduler 250 determines the value of the second variable according to mode two. Through the above embodiment, the determining module 320 in the scheduler 250 can accurately determine whether the second instructions satisfying the preset condition are parsed by comparing the value of the first variable and the value of the second variable, and then satisfy After the second instruction of the preset condition is parsed, the first instruction is executed to implement effective defense against the Cache delay side channel attack based on the speculative memory access.
可选的,基于上述方式一,调度器250确定出满足预设条件的各第二指令中序号最小且未解析的第二指令的序号,作为第二变量的值,具体包括:调度器250中的确定模块320在序号最小且未解析的第二指令解析之后,将位于序号最小且未解析的第二指令之后的未解析的第二指令的序号,作为第二变量的值。如此,通过动态的更新第二变量的值,使得第二变量的值始终指示满足预设条件的各第二指令中序号最小、且未解析的第二指令,进而保证确定满足预设条件的各第二指令是否解析完成的精确性。Optionally, based on the foregoing manner 1, the scheduler 250 determines the sequence number of the second instruction that is the smallest and unresolved in the second instruction that meets the preset condition, and the value of the second variable is specifically included in the scheduler 250. The determining module 320 takes the sequence number of the unresolved second instruction after the second instruction with the smallest and unresolved sequence number as the value of the second variable after parsing the second instruction with the smallest and unresolved sequence number. In this way, by dynamically updating the value of the second variable, the value of the second variable always indicates the second instruction with the smallest serial number and the unresolved in each second instruction that satisfies the preset condition, thereby ensuring that each of the preset conditions is satisfied. Whether the second instruction resolves the accuracy of the completion.
本申请具体实施例中,可以通过在推测乱序执行的处理器中添加两个全局访问的寄存器,实现跟踪各第二指令的解析状态,解析状态包括已解析和未解析。下面以第一指令为Load指令、第二指令为分支指令为例,结合具体实施例进行详细介绍。In the specific embodiment of the present application, the parsing state of each second instruction may be tracked by adding two globally accessed registers to the processor that is inferred to perform out-of-order execution, and the parsing state includes parsed and unparsed. The following takes the first instruction as the Load instruction and the second instruction as the branch instruction as an example, and is described in detail in conjunction with the specific embodiment.
为实现本申请中的方案,在处理指令的各流水段添加的流水线逻辑如下表1所示。To implement the solution in this application, the pipeline logic added in each pipeline segment of the processing instruction is shown in Table 1 below.
表1Table 1
Figure PCTCN2018083991-appb-000001
Figure PCTCN2018083991-appb-000001
如表1所示,该流水线包括取指(Fetch)、译码(Decode)、重命名(Rename)、分发(Dispatch)、发射(Issue)、执行(Execution)、写回(WrBack)和命令(Commit)等八个流水段。As shown in Table 1, the pipeline includes Fetch, Decode, Rename, Dispatch, Issue, Execution, WrBack, and Command ( Commit) and other eight flow sections.
首先,在推测乱序执行的处理器120中添加两个全局访问的寄存器,作用分别为:First, two globally accessed registers are added to the processor 120 that speculates that the out-of-order execution is performed, respectively:
第一寄存器,在分发流水段维护第一变量,该第一变量记录最新分发的分支编号(latest dispatched branch sequence number,LBrSN),每当调度一条分支指令,LBrSN的值加1。The first register maintains a first variable in the distribution pipeline segment, the first variable records the latest dispatched branch sequence number (LBrSN), and the value of LBrSN is incremented by one each time a branch instruction is dispatched.
第二寄存器在执行流水段维护第二变量,该第二变量记录最老且未解析的分支编号(oldest un-resolve branch sequence number,NRBrSN)。若分支指令缓冲区(branch order buffer,BOB)为空,或者满足预设条件的分支指令都已解析,则NRBrSN的值等于LBrSN 的值加一;若在BOB中找到满足预设条件的分支指令未解析,则NRBrSN指向最老且未解析的分支编号。The second register maintains a second variable in the execution pipeline segment, the second variable recording the oldest un-resolve branch sequence number (NRBrSN). If the branch order buffer (BOB) is empty, or the branch instruction that satisfies the preset condition has been resolved, the value of NRBrSN is equal to the value of LBrSN plus one; if a branch instruction meeting the preset condition is found in the BOB Unresolved, the NRBrSN points to the oldest and unresolved branch number.
具体应用中,随着分支指令解析完成,NRBrSN值的也会持续多个周期的更新。In a specific application, as the branch instruction is parsed, the NRBrSN value will continue to be updated for multiple cycles.
举例来说,比如在当前周期中,若NRBrSN当前指向的分支指令已解析,则沿着BOB依次向后寻找,直至找到下一个未解析的分支指令。如果下一个未解析的分支指令离的很远,比如跨了多个BOB条目(entry),可能在当前周期内无法找到下一个未解析的分支指令,比如当前周期结束时已查找至BOB中第20条分支指令、且该分支指令已解析,那么下一个周期继续在BOB中从第21条分支指令开始继续向后寻找即可,直到找到未解析的分支指令或者到达BOB_end。For example, in the current cycle, if the branch instruction currently pointed to by the NRBrSN has been parsed, it is searched backwards along the BOB until the next unresolved branch instruction is found. If the next unresolved branch instruction is far away, such as spanning multiple BOB entries, the next unresolved branch instruction may not be found in the current cycle, such as the BOB in the end of the current cycle. 20 branch instructions, and the branch instruction has been parsed, then the next cycle continues to search backwards from the 21st branch instruction in the BOB until an unresolved branch instruction is found or BOB_end is reached.
若在上述多个周期寻找未解析的分支指令期间,NRBrSN并未指向未解析的分支指令,这不会带来功能错误,只会使得相应的Load指令延迟几个周期再发射,也就是说,需要等到与该Load指令相关的分支指令都解析完成再执行。If the NRBrSN does not point to an unresolved branch instruction during the above-mentioned multiple cycles of finding an unresolved branch instruction, this will not cause a functional error, and will only cause the corresponding Load instruction to be delayed for several cycles and then retransmitted, that is, It is necessary to wait until the branch instruction related to the Load instruction is parsed and then executed.
其次,如表1中,针对每个Load指令来说,在分发流水段为该Load指令添加逻辑维护load.BrSN,并将LBrSN的值赋值给load.BrSN。Second, as shown in Table 1, for each load instruction, a logical maintenance load.BrSN is added to the load instruction in the distribution pipeline, and the value of LBrSN is assigned to load.BrSN.
若该Load指令对应的load.BrSN的值小于NRBrSN,则说明与该Load指令存在控制依赖关系的各分支指令都已解析,此时执行该Load指令不存在控制依赖风险。If the value of the load.BrSN corresponding to the Load instruction is less than NRBrSN, it indicates that each branch instruction having a control dependency relationship with the Load instruction has been parsed, and there is no control dependency risk when the Load instruction is executed.
再次,在发射流水段中,为Load指令的发射逻辑添加显式的控制依赖是否已解析的判断,即该Load指令的load.BrSN的值小于NRBrSN的值,确定该Load指令的控制依赖已解析。Again, in the transmit pipeline segment, an explicit control dependency is determined for the transmit logic of the Load instruction, that is, the load.BrSN value of the Load command is less than the value of the NRBrSN, and the control dependency of the Load command is determined to be resolved. .
现有技术中的Load指令的发射逻辑为:满足数据依赖已解析,即满足寄存器依赖和写读依赖,就发射该Load指令。The transmission logic of the Load instruction in the prior art is: the data dependency is resolved, that is, the register dependency and the write read dependency are satisfied, and the Load instruction is transmitted.
而本申请实施例中,Load指令的发射逻辑除了需要满足数据依赖已解析,还需要满足控制依赖已解析。如表1中的发射逻辑:if(reg_dep_ok & mem dep ok& load.BrSN<NRBrSN)issue load。In the embodiment of the present application, the transmission logic of the Load instruction needs to satisfy the data dependency to be resolved, and the control dependency needs to be resolved. The launch logic in Table 1 is: if(reg_dep_ok & mem dep ok& load.BrSN<NRBrSN) issue load.
如此,该Load的推测访问已经不存在控制依赖风险,足以抵御例如Spectre攻击的基于推测访存的Cache时延侧信道攻击。而且,当流水线发生清空时,例如分支误预测导致的流水线清空,上述LBrSN、NRBrSN和load.BrSN都无需额外的恢复逻辑,不会影响到指令的处理效率。As such, the speculative access of the Load has no control dependency risk, and is sufficient to defend against the Cache latency side channel attack based on speculative memory access such as Spectre attack. Moreover, when the pipeline is emptied, such as the pipeline due to branch misprediction, the LBrSN, NRBrSN, and load.BrSN do not require additional recovery logic and do not affect the processing efficiency of the instruction.
结合表1以及上述实施例,下面提供一个具体的实施例。In conjunction with Table 1 and the above-described embodiments, a specific embodiment is provided below.
举例来说,假设当调度到Load指令A时,在该Load指令A之前分发的3个指令,按分发顺序依次为:分支指令1、分支指令2、以及分支指令3,其中,分支指令1已解析,分支指令2未解析,分支指令3已解析。For example, assume that when the load instruction A is dispatched, the three instructions distributed before the load instruction A are in the order of distribution: branch instruction 1, branch instruction 2, and branch instruction 3, wherein the branch instruction 1 has Analysis, branch instruction 2 is not parsed, branch instruction 3 has been parsed.
LBrSN的值的更新情况如下:当分支指令1进入分发流水段时,LBrSN=1;当分支指令2进入分发流水段时,LBrSN=2;当分支指令3进入分发流水段时,LBrSN=3;当访存指令A进入分发流水段时,load.BrSN=3。The update of the value of LBrSN is as follows: when branch instruction 1 enters the distribution pipeline segment, LBrSN=1; when branch instruction 2 enters the distribution pipeline segment, LBrSN=2; when branch instruction 3 enters the distribution pipeline segment, LBrSN=3; When the fetch command A enters the distribution pipeline, load.BrSN=3.
NRBrSN的值的更新情况如下:The update of the value of NRBrSN is as follows:
当调度到Load指令A时,由于分支指令1已解析,分支指令2未解析,分支指令3已解析,此时NRBrSN=2。因此,可以直接根据load.BrSN的值大于NRBrSN的值,确定出该Load指令A的控制依赖未解析,暂停执行该Load指令A。When the load instruction A is dispatched, since the branch instruction 1 has been parsed, the branch instruction 2 is not resolved, and the branch instruction 3 has been parsed, at which time NRBrSN=2. Therefore, the control of the load command A may be determined to be unresolved and the load command A is suspended, according to the value of the load.BrSN being greater than the value of the NRBrSN.
当分支指令2解析完成时,由于分支指令3已解析,而且在分支指令3后没有与该Load 指令A存在控制依赖关系的分支指令,此时NRBrSN=4。因此,可以直接根据load.BrSN小于load.BrSN,确定出该Load指令A的控制依赖已解析,执行该Load指令A。When the branch instruction 2 is parsed, since the branch instruction 3 has been parsed, and there is no branch instruction following the branch instruction 3 with the control instruction A, the NRBrSN=4. Therefore, it can be determined that the control dependency of the load instruction A has been resolved according to load.BrSN is smaller than load.BrSN, and the load instruction A is executed.
通过上述实施例,只需要在推测乱序执行的处理器中添加少量的流水线逻辑,不需要依靠程序员和编译器重新编译程序,也不需要在软件切换进程中清空分支目标缓冲器(branch target buffer,BTB),所以与现有技术中提供的方案相比,既可以在有效防御基于推测访存的Cache时延侧信道攻击,又可以显著降低性能损失。With the above embodiment, it is only necessary to add a small amount of pipeline logic to the processor that is speculatively executed out of order, without having to rely on the programmer and the compiler to recompile the program, and there is no need to clear the branch target buffer in the software switching process (branch target) Buffer, BTB), so compared with the solution provided in the prior art, it can effectively prevent the side channel attack based on the Cache based on the speculative memory, and can significantly reduce the performance loss.
本申请实施中,上述处理器还可以为芯片,所述芯片与存储器相连,用于读取并执行所述存储器中存储的软件程序,以实现上述任一实施例中执行方法。In the implementation of the present application, the processor may also be a chip, and the chip is connected to the memory for reading and executing the software program stored in the memory to implement the execution method in any of the foregoing embodiments.
本申请的实施例还提供了一种计算机存储介质,用于储存为上述指令执行所用的计算机软件指令,其包含用于执行上述方法实施例所设计的程序代码。Embodiments of the present application also provide a computer storage medium for storing computer software instructions for execution of the above instructions, including program code for performing the above method embodiments.
本领域技术人员应明白,本申请的实施例可提供为方法、装置(设备)、计算机可读存储介质或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式,这里将它们都统称为“模块”或“系统”。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, apparatus (device), computer readable storage medium, or computer program product. Thus, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware aspects, which are collectively referred to herein as "module" or "system."
本申请是参照本申请的方法、装置(设备)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of the methods, apparatus, and computer program products of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
尽管结合具体特征及其实施例对本发明进行了描述,显而易见的,在不脱离本发明的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本发明的示例性说明,且视为已覆盖本发明范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。While the invention has been described with respect to the specific embodiments and embodiments thereof, various modifications and combinations may be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are to be construed as the It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims (14)

  1. 一种指令执行方法,其特征在于,包括:An instruction execution method, comprising:
    获取第一指令;所述第一指令为读取内存的指令;Obtaining a first instruction; the first instruction is an instruction to read a memory;
    确定出满足预设条件的各第二指令;所述第二指令为乱序执行的分支指令;Determining each second instruction that satisfies a preset condition; the second instruction is a branch instruction that is executed out of order;
    在确定所述满足预设条件的各第二指令解析完成后,执行所述第一指令。After determining that the second instructions satisfying the preset condition are analyzed, the first instruction is executed.
  2. 如权利要求1所述的方法,其特征在于,所述预设条件包括与所述第一指令存在控制依赖关系。The method of claim 1 wherein said predetermined condition comprises a control dependency relationship with said first instruction.
  3. 如权利要求2所述的方法,其特征在于,所述预设条件包括与所述第一指令存在预设类型的控制依赖关系。The method of claim 2, wherein the predetermined condition comprises a preset type of control dependency with the first instruction.
  4. 如权利要求1所述的方法,其特征在于,确定所述满足预设条件的各第二指令解析完成之前,还包括:The method of claim 1, wherein before determining that the second instructions satisfying the preset condition are analyzed, the method further comprises:
    根据所述满足预设条件的各第二指令的序号,确定第一变量的值;Determining a value of the first variable according to the serial number of each second instruction that satisfies the preset condition;
    根据所述满足预设条件的各第二指令中未解析的第二指令的序号或所述第一变量的值,确定第二变量的值;Determining a value of the second variable according to the sequence number of the second instruction that is not parsed in each second instruction that satisfies the preset condition or the value of the first variable;
    所述确定所述满足预设条件的各第二指令解析完成,包括:Determining that the second instructions satisfying the preset condition are parsed, including:
    若所述第一变量的值小于所述第二变量的值,则确定所述满足预设条件的各第二指令解析完成。If the value of the first variable is smaller than the value of the second variable, it is determined that the second instruction parsing that satisfies the preset condition is completed.
  5. 如权利要求4所述的方法,其特征在于,所述根据所述满足预设条件的各第二指令的序号,确定第一变量的值,包括:The method according to claim 4, wherein the determining the value of the first variable according to the sequence number of each second instruction that satisfies the preset condition comprises:
    确定出所述满足预设条件的各第二指令中序号最大的第二指令的序号,作为第一变量的值;Determining, as the value of the first variable, the serial number of the second instruction having the largest serial number among the second instructions satisfying the preset condition;
    所述根据所述满足预设条件的各第二指令中未解析的第二指令的序号或所述第一变量的值,确定第二变量的值,包括:Determining a value of the second variable according to the sequence number of the unresolved second instruction or the value of the first variable in each second instruction that meets the preset condition, including:
    确定出所述满足预设条件的各第二指令中序号最小且未解析的第二指令的序号,作为第二变量的值;或者,若确定出所述满足预设条件的各第二指令中不存在未解析的第二指令,则将所述第一变量的值加一,作为所述第二变量的值。Determining, as the value of the second variable, the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition; or, if determining the second instruction that meets the preset condition If there is no unresolved second instruction, the value of the first variable is incremented by one as the value of the second variable.
  6. 如权利要求5所述的方法,其特征在于,所述确定出所述满足预设条件的各第二指令中序号最小且未解析的第二指令的序号,作为第二变量的值,包括:The method according to claim 5, wherein the determining the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions that satisfy the preset condition, as the value of the second variable, comprises:
    在所述序号最小且未解析的第二指令解析之后,将位于所述序号最小且未解析的第二指令之后的未解析的第二指令的序号,作为所述第二变量的值。After parsing the second instruction with the smallest and unresolved sequence number, the sequence number of the unresolved second instruction after the second instruction with the smallest and unresolved sequence number is used as the value of the second variable.
  7. 一种指令执行装置,其特征在于,包括取指器、译码器、预执行缓冲器、调度器和执行器;An instruction execution apparatus, comprising: an indexer, a decoder, a pre-execution buffer, a scheduler, and an executor;
    所述取指器,用于指令缓存中取出指令;The fetcher is configured to fetch an instruction in the instruction cache;
    所述译码器,用于译码所述指令,得到所述指令的译码结果;所述译码结果包括指令类型;The decoder is configured to decode the instruction to obtain a decoding result of the instruction; the decoding result includes an instruction type;
    所述预执行缓冲器,用于存储所述指令和所述指令的译码结果;The pre-execution buffer is configured to store the instruction and a decoding result of the instruction;
    所述调度器,用于:The scheduler is configured to:
    若从所述预执行缓冲器获取到第一指令,则:确定出满足预设条件的各第二指令,在确定所述满足预设条件的各第二指令解析完成后,将所述第一指令发送至执行器;所述第一指令为读取内存的指令;所述第二指令为乱序执行的分支指令;If the first instruction is obtained from the pre-execution buffer, determining: each second instruction that satisfies the preset condition, and after determining that the second instruction that satisfies the preset condition is completed, the first Sending an instruction to an executor; the first instruction is an instruction to read a memory; and the second instruction is a branch instruction executed out of order;
    若从所述预执行缓冲器获取到第二指令,则将所述第二指令发送至所述执行器;And if the second instruction is obtained from the pre-execution buffer, sending the second instruction to the executor;
    所述执行器,用于执行所述第一指令和所述第二指令。The executor is configured to execute the first instruction and the second instruction.
  8. 如权利要求7所述的装置,其特征在于,所述预设条件包括与所述第一指令存在控制依赖关系。The apparatus of claim 7, wherein the predetermined condition comprises a control dependency relationship with the first instruction.
  9. 如权利要求7所述的装置,其特征在于,所述预设条件包括与所述第一指令存在预设类型的控制依赖关系。The apparatus of claim 7, wherein the preset condition comprises a preset type of control dependency with the first instruction.
  10. 如权利要求7所述的装置,其特征在于,所述调度器还用于:The apparatus of claim 7, wherein the scheduler is further configured to:
    根据所述满足预设条件的各第二指令的序号,确定第一变量的值;Determining a value of the first variable according to the serial number of each second instruction that satisfies the preset condition;
    根据所述满足预设条件的各第二指令中未解析的第二指令的序号或所述第一变量的值,确定第二变量的值;Determining a value of the second variable according to the sequence number of the second instruction that is not parsed in each second instruction that satisfies the preset condition or the value of the first variable;
    若所述第一变量的值小于所述第二变量的值,则确定所述满足预设条件的各第二指令解析完成。If the value of the first variable is smaller than the value of the second variable, it is determined that the second instruction parsing that satisfies the preset condition is completed.
  11. 如权利要求10所述的装置,其特征在于,所述调度器具体用于:The apparatus according to claim 10, wherein the scheduler is specifically configured to:
    确定出所述满足预设条件的各第二指令中序号最大的第二指令的序号,作为第一变量的值;Determining, as the value of the first variable, the serial number of the second instruction having the largest serial number among the second instructions satisfying the preset condition;
    确定出所述满足预设条件的各第二指令中序号最小且未解析的第二指令的序号,作为第二变量的值;或者,若确定出所述满足预设条件的各第二指令中不存在未解析的第二指令,则将所述第一变量的值加一,作为所述第二变量的值。Determining, as the value of the second variable, the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition; or, if determining the second instruction that meets the preset condition If there is no unresolved second instruction, the value of the first variable is incremented by one as the value of the second variable.
  12. 如权利要求11所述的装置,其特征在于,所述调度器具体用于:The apparatus according to claim 11, wherein the scheduler is specifically configured to:
    在所述序号最小且未解析的第二指令解析之后,将位于所述序号最小且未解析的第二指令之后的未解析的第二指令的序号,作为所述第二变量的值。After parsing the second instruction with the smallest and unresolved sequence number, the sequence number of the unresolved second instruction after the second instruction with the smallest and unresolved sequence number is used as the value of the second variable.
  13. 一种芯片,其特征在于,所述芯片与存储器相连,用于读取并执行所述存储器中存储的软件程序,以实现根据权利要求1至6任一项所述的方法。A chip, characterized in that the chip is connected to a memory for reading and executing a software program stored in the memory to implement the method according to any one of claims 1 to 6.
  14. 一种计算机存储介质,其特征在于,包括计算机可读指令,当计算机读取并执行所述计算机可读指令时,使得计算机执行如权利要求1至6中任一项所述的方法。A computer storage medium comprising computer readable instructions for causing a computer to perform the method of any one of claims 1 to 6 when the computer reads and executes the computer readable instructions.
PCT/CN2018/083991 2018-04-21 2018-04-21 Instruction execution method and device WO2019200618A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/083991 WO2019200618A1 (en) 2018-04-21 2018-04-21 Instruction execution method and device
CN201880091562.8A CN111936968A (en) 2018-04-21 2018-04-21 Instruction execution method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/083991 WO2019200618A1 (en) 2018-04-21 2018-04-21 Instruction execution method and device

Publications (1)

Publication Number Publication Date
WO2019200618A1 true WO2019200618A1 (en) 2019-10-24

Family

ID=68240570

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/083991 WO2019200618A1 (en) 2018-04-21 2018-04-21 Instruction execution method and device

Country Status (2)

Country Link
CN (1) CN111936968A (en)
WO (1) WO2019200618A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900712B (en) * 2021-10-26 2022-05-06 海光信息技术股份有限公司 Instruction processing method, instruction processing apparatus, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080177981A1 (en) * 2004-03-26 2008-07-24 International Business Machines Corporation Apparatus and method for decreasing the latency between instruction cache and a pipeline processor
CN101706714A (en) * 2009-11-23 2010-05-12 北京龙芯中科技术服务中心有限公司 System and method for issuing instruction, processor and design method thereof
CN104423927A (en) * 2013-08-30 2015-03-18 华为技术有限公司 Method and device for processing instructions and processor
CN107688468A (en) * 2016-12-23 2018-02-13 北京国睿中数科技股份有限公司 Speculate the verification method for performing branch instruction and branch prediction function in processor

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223201A1 (en) * 2004-03-30 2005-10-06 Marc Tremblay Facilitating rapid progress while speculatively executing code in scout mode
US8990543B2 (en) * 2008-03-11 2015-03-24 Qualcomm Incorporated System and method for generating and using predicates within a single instruction packet
US20160055029A1 (en) * 2014-08-21 2016-02-25 Qualcomm Incorporated Programmatic Decoupling of Task Execution from Task Finish in Parallel Programs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080177981A1 (en) * 2004-03-26 2008-07-24 International Business Machines Corporation Apparatus and method for decreasing the latency between instruction cache and a pipeline processor
CN101706714A (en) * 2009-11-23 2010-05-12 北京龙芯中科技术服务中心有限公司 System and method for issuing instruction, processor and design method thereof
CN104423927A (en) * 2013-08-30 2015-03-18 华为技术有限公司 Method and device for processing instructions and processor
CN107688468A (en) * 2016-12-23 2018-02-13 北京国睿中数科技股份有限公司 Speculate the verification method for performing branch instruction and branch prediction function in processor

Also Published As

Publication number Publication date
CN111936968A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
US9875106B2 (en) Computer processor employing instruction block exit prediction
CN107092467B (en) Instruction sequence buffer for enhancing branch prediction efficiency
US9378020B2 (en) Asynchronous lookahead hierarchical branch prediction
US8301870B2 (en) Method and apparatus for fast synchronization and out-of-order execution of instructions in a meta-program based computing system
US9250912B2 (en) Fast index tree for accelerated branch prediction
US7861066B2 (en) Mechanism for predicting and suppressing instruction replay in a processor
US7444501B2 (en) Methods and apparatus for recognizing a subroutine call
TWI550511B (en) Method for fault detection in instruction translations
TWI411957B (en) Out-of-order execution microprocessor that speculatively executes dependent memory access instructions by predicting no value change by older instruction that load a segment register
US10310859B2 (en) System and method of speculative parallel execution of cache line unaligned load instructions
US20180349144A1 (en) Method and apparatus for branch prediction utilizing primary and secondary branch predictors
US9244688B2 (en) Branch target buffer preload table
US20150039862A1 (en) Techniques for increasing instruction issue rate and reducing latency in an out-of-order processor
KR102635965B1 (en) Front end of microprocessor and computer-implemented method using the same
CN112241288A (en) Dynamic control flow reunion point for detecting conditional branches in hardware
US20200004551A1 (en) Appratus and method for using predicted result values
US10445101B2 (en) Controlling processing of instructions in a processing pipeline
WO2019200618A1 (en) Instruction execution method and device
US7945767B2 (en) Recovery apparatus for solving branch mis-prediction and method and central processing unit thereof
US9250909B2 (en) Fast index tree for accelerated branch prediction
US10929137B2 (en) Arithmetic processing device and control method for arithmetic processing device
US9152425B2 (en) Mitigating instruction prediction latency with independently filtered presence predictors
Kaeli et al. Data Speculation Yiannakis Sazeides, 1 Pedro Marcuello, 2 James E. Smith, 3 and An-tonio Gonza´ lez2, 4 1University of Cyprus; 2Intel-UPC Barcelona Research Center; 3University of Wisconsin-Madison; 4Universitat Politecnica de

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18915426

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18915426

Country of ref document: EP

Kind code of ref document: A1