WO2019200618A1 - Procédé et dispositif d'exécution d'instructions - Google Patents

Procédé et dispositif d'exécution d'instructions Download PDF

Info

Publication number
WO2019200618A1
WO2019200618A1 PCT/CN2018/083991 CN2018083991W WO2019200618A1 WO 2019200618 A1 WO2019200618 A1 WO 2019200618A1 CN 2018083991 W CN2018083991 W CN 2018083991W WO 2019200618 A1 WO2019200618 A1 WO 2019200618A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
value
variable
preset condition
determining
Prior art date
Application number
PCT/CN2018/083991
Other languages
English (en)
Chinese (zh)
Inventor
李国柱
孙涛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880091562.8A priority Critical patent/CN111936968A/zh
Priority to PCT/CN2018/083991 priority patent/WO2019200618A1/fr
Publication of WO2019200618A1 publication Critical patent/WO2019200618A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present application relates to the field of communications, and in particular, to an instruction execution method and apparatus.
  • out-of-order execution is the basic technology for developing processor-level parallelism.
  • the correct execution of any instruction must satisfy two points: the first point, the control dependency is satisfied, that is, the instruction is on the correct branch path; the second point, the data dependency is satisfied, that is, the source operand of the instruction is correctly obtained.
  • out-of-order execution means that if the control dependency of an instruction has not been resolved, the instruction is speculatively executed as long as it is determined that the data dependency is satisfied. Of course, if a speculation is wrong, the instruction may be on the wrong branch path. Thereafter, when the processor detects a speculative error, the instruction for mispredicting execution is revoked and re-executed from the correct path, thereby ensuring correct program semantics.
  • mispredicting the executed instruction does not change the architectural visible state of any structure definition, but changes the micro-architural state, such as physical registers (physical). Register file) and Cache, and this microstructural state change will not be undone with the revocation of the instruction that was speculatively executed.
  • any microstructure state is not used by the software and is invisible, it is misunderstood that the microstructural state change caused by the execution does not cause a software operation error.
  • a method called "Cache Delay Side Channel Attack” can accurately detect the change of Cache micro-structure state, resulting in a very threatening against existing processors.
  • This kind of attack means such as the Spectre attack, mainly uses the fetching execution of the memory access instruction to implement the attack. In principle, it is roughly divided into two steps: the first step is to construct a branch mispredicted scene so that it is on the wrong branch path. The speculative execution of the Load instruction accesses the data of the protected area; the second step is to use the data to construct the address index to access the Cache, which will cause the Cache microstructure state to change (for example, the corresponding cache line access changes from miss to hit), and then This change is detected by the Cache Delay Side channel, thereby stealing protected data content. Processors currently using speculative out-of-order execution techniques are generally incapable of defending against Cache latency side channel attacks based on speculative memory access.
  • the branch target buffer (branch) is explicitly cleared during process switching by system software, such as an operating system (OS).
  • OS operating system
  • Target buffer, BTB prevents attacks, but this approach can greatly impair system performance.
  • the programmer in order to recompile and generate a binary code with a branch barrier feature for a specific sensitive code segment, since the compiler has not yet released a code for automatically identifying sensitive code, the programmer needs to modify the source code to explicitly indicate Sensitive code segments, but it is difficult to ensure that all programs, all sensitive code segments within the program are recognized and recompiled.
  • the prior art adopts an emergency avoidance scheme, which has two problems of completeness of defense and performance loss, and cannot effectively prevent Cache delay side channel attacks based on speculative memory access.
  • the embodiment of the present invention provides an instruction execution method and apparatus, which are used for effectively defending a Cache delay side channel attack based on a speculative memory access.
  • the first aspect provides an instruction execution method, in which the instruction execution device acquires a first instruction, and determines each second instruction that satisfies a preset condition, where the first instruction is an instruction to read a memory, and the second instruction is a disorder The branch instruction executed in sequence. After determining that the second instructions satisfying the preset condition are analyzed, the first instruction is executed. Since the first instruction is not speculatively executed after the second instruction that satisfies the preset condition is completed, the Cache microstructure state change caused by the misprediction of executing the first instruction does not occur, so the attacker cannot pass the attack. The Cache micro-structure state changes the data of the protected area, so that the Cache delay-side channel attack based on the speculative memory can be effectively defended.
  • the preset condition includes a preset type of control dependency relationship with the first instruction, in which case, after the second instruction with the preset type of control dependency of the first instruction is parsed, The first instruction can be executed without waiting for the second instruction having a control dependency with the first instruction to be parsed.
  • the preset condition is that there is a control dependency relationship with the first instruction. In this case, it is necessary to wait for a control dependency relationship with the first instruction. The first instruction is executed after each second instruction is parsed.
  • the method before determining that the second instructions satisfying the preset condition are completed, the method further includes: determining, according to the sequence number of each second instruction that meets the preset condition, a value of the first variable; The sequence number of the second instruction or the value of the first variable that is not resolved in each of the second instructions determines the value of the second variable. Further, the determining, that the second instruction that meets the preset condition is parsed, comprises: if the value of the first variable is less than the value of the second variable, determining that the second instruction parsing that meets the preset condition is completed. If the value of the first variable is greater than or equal to the value of the second variable, it is determined that each second instruction that satisfies the preset condition is not parsed.
  • the determining, according to the sequence number of each second instruction that meets the preset condition, determining a value of the first variable includes: determining that the content is satisfied a serial number of the second instruction having the largest serial number among the second instructions of the preset condition, as a value of the first variable; the serial number or the second instruction unresolved according to the second instruction satisfying the preset condition Determining the value of the first variable, determining the value of the second variable, comprising: determining a sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition, as a value of the second variable; or If it is determined that there is no unresolved second instruction in each of the second instructions that satisfy the preset condition, the value of the first variable is incremented by one as the value of the second variable.
  • the first instruction is executed to implement effective defense against the Cache delay side channel attack based on the speculative memory access.
  • the determining, by the second instruction that meets the preset condition, the sequence number of the second instruction having the smallest serial number and the unresolved, as the value of the second variable including: After parsing the unparsed second instruction, the sequence number of the unresolved second instruction after the second instruction having the smallest and unresolved sequence number is used as the value of the second variable.
  • the value of the second variable is dynamically updated, so that the value of the second variable always indicates the second instruction having the smallest serial number and the unresolved in each second instruction that satisfies the preset condition, thereby ensuring that each of the preset conditions is satisfied. Whether the second instruction resolves the accuracy of the completion.
  • an instruction execution apparatus including an interceptor, a decoder, a pre-execution buffer, a scheduler, and an executor; the fetcher is configured to fetch an instruction in an instruction cache; For decoding the instruction, obtaining a decoding result of the instruction; the decoding result includes an instruction type; the pre-execution buffer is configured to store the decoding result of the instruction and the instruction;
  • the scheduler is configured to: if the first instruction is obtained from the pre-execution buffer, determine: each second instruction that meets a preset condition, and determine, in determining the second instruction that meets the preset condition After completion, the first instruction is sent to an executor; the first instruction is an instruction to read a memory; the second instruction is a branch instruction executed out of order; if the first instruction is obtained from the pre-execution buffer The second instruction sends the second instruction to the executor; the executor is configured to execute the first instruction and the second instruction.
  • the instruction execution device can be at least one processing element or chip.
  • the preset condition includes a control dependency relationship with the first instruction.
  • the preset condition includes a preset type of control dependency with the first instruction.
  • the scheduler is further configured to: determine a value of the first variable according to the sequence number of each second instruction that meets the preset condition; according to the second instruction that meets the preset condition Determining the sequence number of the second instruction or the value of the first variable, determining a value of the second variable; if the value of the first variable is less than the value of the second variable, determining that the predetermined condition is met The second instruction analysis is completed.
  • the scheduler is specifically configured to: determine a sequence number of the second instruction having the largest serial number among the second instructions that meet the preset condition, as a value of the first variable; and determine that the content is satisfied a sequence number of the second instruction having the smallest and unresolved number among the second instructions of the preset condition, as the value of the second variable; or, if it is determined that the second instruction satisfying the preset condition does not exist in the unresolved
  • the second instruction adds one to the value of the first variable as the value of the second variable.
  • the scheduler is specifically configured to: after the second instruction with the smallest and unresolved sequence number, parse the unresolved second after the second instruction with the smallest and unresolved sequence number The sequence number of the instruction as the value of the second variable.
  • a chip is provided, the chip being coupled to a memory for reading and executing a software program stored in the memory to implement the method according to the first aspect or any of the possible designs above method.
  • a readable storage medium is provided, the instructions being stored in the readable storage medium, when executed on a computer, causing the computer to perform the above first aspect or any of the above possible designs Methods.
  • FIG. 1 is a schematic diagram of a system architecture of a network device according to an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of a possible processor according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a possible scheduler according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a method for executing an instruction according to an embodiment of the present application.
  • the present invention provides a method for executing an instruction, which can be applied to a system architecture of a network device as shown in FIG. 1.
  • FIG. 1 a system architecture of a network device provided by an embodiment of the present application.
  • the system architecture 100 includes a memory 110, a processor 120, and a communication interface 130; wherein the memory 110, the processor 120, and the communication interface 130 are connected to each other.
  • the memory 110 may include a volatile memory such as a random-access memory (RAM); the memory may also include a non-volatile memory such as a flash memory.
  • RAM random-access memory
  • non-volatile memory such as a flash memory.
  • HDD hard disk drive
  • SSD solid-state drive
  • the memory 110 may also include a combination of the above types of memories.
  • the processor 120 can be a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP.
  • the processor 120 may further include a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • the PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof.
  • the communication interface 130 can be a wired communication access port, a wireless communication interface, or a combination thereof, wherein the wired communication interface can be, for example, an Ethernet interface.
  • the Ethernet interface can be an optical interface, an electrical interface, or a combination thereof.
  • the wireless communication interface can be a WLAN interface.
  • the instruction execution method provided by the present application may be implemented by some components in the processor 120.
  • the processor 120 shown in FIG. 1 above may include various components. Based on the system architecture shown in FIG. 1 , with reference to FIG. 2 , a schematic structural diagram of a possible processor provided by an embodiment of the present application is provided. As shown in FIG. 2, the processor 120 includes an instruction cache (Icache) 210, a pre-execution buffer 220, a fetcher 230, a decoder 240, a scheduler 250, an executor 260, and an intermediate register 270. The intermediate register 270 is used to store the result of the speculatively executed instruction.
  • Icache instruction cache
  • the instruction cache 210 stores various types of instructions, such as branch instructions, memory access instructions, and other types of instructions.
  • the fetcher 230 fetches the instructions in order from the instruction cache 210 and passes them to the decoder 240.
  • the decoder 240 obtains the type of each instruction and then stores the instruction and its decoding information in the pre-execution buffer 220. For example, the decoder 240 decodes the four instructions acquired in order, and the decoding results are: the first instruction is a branch instruction, the second instruction is a fetch instruction, and the third instruction is a branch.
  • the instruction, the fourth instruction is the fetch instruction.
  • the scheduler 250 can retrieve the instructions and their types from the pre-execution buffer 220. After the instruction is fetched, the scheduler 250 can also send each instruction to the executor 260 that executes the instruction. As shown in FIG. 1, the executor 260 includes other pipelines 261 and a memory 262.
  • the memory access device 262 includes a memory access instruction queue 263. In a specific application, the memory access instruction includes two types: a load instruction and a store instruction. Therefore, the memory access instruction queue may be referred to as a load/store queue.
  • the branch instruction is sent to another pipeline 261 for execution. If the instruction acquired by the scheduler 250 is a memory access instruction, the memory access instruction is sent to the memory buffer 262 for execution.
  • the memory access instructions Since there are control dependencies and data dependencies between the instructions, if the data dependencies are not met, the memory access instructions will not be executed. Therefore, the default data dependency of this application is satisfied. If only the control dependencies are considered, if one is accessed, The control of the instruction depends on unresolved, and the Cache delay side channel attack based on the speculative memory access may occur during the speculative out-of-order execution. In order to avoid such an attack, in the embodiment provided by the present application, if the instruction acquired by the scheduler 250 is a memory access instruction, the memory is fetched after the branch instruction of the memory access instruction existence control is completed. The instructions are sent to the memory 262 for execution.
  • the memory access instruction is first stored in the pre-execution buffer 220 until there is control dependency with the memory access instruction.
  • the branch instruction is parsed, and the memory access instruction is dispatched from the pre-execution buffer 220 and transmitted to the memory buffer 262 for execution.
  • the memory 262 may store the memory access instruction in the Load/Store queue after receiving the memory access instruction, and execute each memory access instruction in an out-of-order manner when both the data dependency and the control dependency are satisfied.
  • the processing of the above-mentioned fetcher 230, decoder 240, scheduler 250, executor 260, etc. forms a pipeline including a plurality of pipelines, the pipeline includes fetching, decoding, scheduling, and The equal-flow segment is executed, wherein the finger-trigger 230 corresponds to the finger-flow segment, the decoder 240 corresponds to the decoded pipeline segment, the scheduler 250 corresponds to the distribution pipeline segment, and the actuator 260 corresponds to the execution pipeline segment.
  • the pipeline may also include one or more of a water flow segment (not shown in FIG. 1 for renaming, transmitting, writing back, commanding, etc.) as shown in FIG.
  • the instruction execution method is executed by the scheduler 250 in the processor 120.
  • FIG. 3 it is a schematic structural diagram of a scheduler provided by an embodiment of the present application.
  • the scheduler 250 includes an acquisition module 310, a determination module 320, and an execution module 330. among them:
  • the obtaining module 310 is configured to obtain an instruction from the pre-execution buffer 220.
  • the acquired instruction includes a branch instruction, a memory access instruction, and other types of instructions.
  • the determining module 320 is configured to determine, after the obtaining module 310 acquires the first instruction, a second instruction that meets the preset condition.
  • the first instruction is an instruction to read the memory
  • the second instruction is a branch instruction executed out of order.
  • the executing module 330 is configured to execute the first instruction after the scheduler 250 determines that the second instructions satisfying the preset condition are parsed.
  • FIG. 4 exemplarily shows a flow of an instruction execution method provided by the present application.
  • the process specifically includes:
  • step 401 the scheduler 250 acquires a first instruction, and the first instruction is an instruction to read the memory.
  • the acquiring module 310 in the scheduler 250 acquires the first instruction, and the first instruction may also be referred to as a load instruction, that is, a load instruction.
  • Step 402 the scheduler 250 determines each second instruction that satisfies the preset condition, and the second instruction is a branch instruction that is executed out of order.
  • Each second instruction is scheduled to be out of sequence in the other pipelines 261 by the scheduler 250, and during execution of the second instructions, the scheduler 250 continues to schedule subsequent instructions to the scheduler 250.
  • the second instructions in the other pipelines 261 can be executed out of order, and need not be executed in the instruction fetching order.
  • the three second instructions are respectively the second instruction 1 in the fetching decoding order.
  • the two instructions 2 and the second instruction 3 are executed in other pipelines 261 as follows. For example, if the second instruction 1 is not executed, the second instruction 2 and the second instruction 3 may be executed first.
  • Step 403 after the scheduler 250 determines that the second instructions satisfying the preset condition are analyzed, the first instruction is executed.
  • the determination module 320 in the scheduler 250 is required to determine whether each second instruction that satisfies the preset condition is parsed, and if so, the first instruction is executed, and the execution action refers to scheduling
  • the execution module 330 in the device 250 transmits the first instruction to the memory accessor 262; if not, that is, there is an unresolved second instruction, and then it is necessary to wait for the second instruction that satisfies the preset condition to be analyzed before executing the First instruction.
  • the second instruction that needs to meet the preset condition is parsed, and then the first instruction is executed. If a plurality of first instructions need to be executed, the plurality of first instructions may be executed out of order. For example, for example, each of the second instructions that meet the preset condition is sequentially followed by the scheduler 250.
  • the first instruction A, the first instruction B, and the first instruction C after the first instruction A is transmitted to the memory 262, the first instruction B is not parsed, and the first instruction C is parsed, then the first instruction is executed first.
  • the instruction C after the first instruction B, executes the first instruction B.
  • the Cache microstructure state caused by the mispredicting execution of the first instruction does not occur. Therefore, the attacker cannot steal the data of the protected area through the Cache micro-structure state change, so that the Cache delay-side channel attack based on the speculative memory can be effectively prevented.
  • the preset condition includes a control dependency relationship with the first instruction. That is, after the scheduler determines that the second instructions having the control dependency relationship with the first instruction are completed, the first instruction is executed.
  • the first instruction is executed or not needs to satisfy the condition that the control dependency has been resolved needs to satisfy the condition that the data dependency has been resolved.
  • the first instruction cannot be executed because the data of the first instruction is unresolved, and the first instruction cannot be executed. Therefore, in the embodiment of the present application, when the first instruction is executed by default, the content is satisfied. The data depends on the condition that has been resolved.
  • the instruction cache includes four instructions, which are instruction 1, instruction 2, instruction 3, and instruction 4.
  • instruction 1 is a branch instruction
  • instruction 2 is a load instruction
  • instruction 3 is a branch instruction
  • instruction 4 is a load instruction
  • the above four instructions sequentially enter the scheduler 250.
  • the scheduler 250 receives the instruction 1
  • the instruction 1 is sent to the other pipelines 261 for execution.
  • the scheduler 250 receives the instruction 2 it is found that the instruction 1 has been parsed at this time, that is, the instruction 1 has been executed in the other pipelines 261, and the execution result is obtained, at which time the instruction 2 is transmitted to the memory 262.
  • the scheduler 250 receives the instruction 3 it is sent to the other pipeline 261 for execution.
  • the scheduler 250 When the scheduler 250 receives the instruction 4, it finds that the instruction 3 is unresolved, that is, the control of the instruction 4 is unresolved at this time, then how to process the instruction 4 The following describes the difference between the solution adopted in this application and the solution adopted in the prior art:
  • the control of the instruction 4 depends on the unresolved, the scheduler 250 directly transmits the instruction 4 to the memory 262, and if the instruction 4 is mispresumed, the microstructural state in the cache is changed.
  • the attacker can find internal data through the changed microstructure state, resulting in internal data leakage and data security risks.
  • the control of the instruction 4 depends on unresolved, and the scheduler 250 suspends the transmission of the instruction 4.
  • the instruction 4 can be stored in the pre-execution buffer 220 until the instruction 3 is parsed, and the scheduler 250 The retransmission instruction 4 is executed in the memory 262.
  • the execution of other instructions after instruction 4 is not affected.
  • the solution of the present application can greatly improve data security, although the speculative execution scheme relative to the prior art can reduce the efficiency of instruction execution, but the impact of this performance degradation is much smaller than the background. The scheme of defense attacks given in the technology.
  • the preset condition includes a preset type of control dependency relationship with the first instruction.
  • the preset type may be a control dependency of a direct conditional branch jump, a control dependency of an indirect conditional branch jump, and may also be a dependency of other types of control.
  • the first execution can be performed.
  • An instruction does not need to wait for all branch instructions that have control dependencies on the first instruction to be parsed. In this case, you can defend against the variant 1 "boundary detection attack" of the Spectre attack.
  • the second instruction is determined to be The branch instruction that is dependent on the control of the indirect conditional branch jump with the first instruction will track whether each branch instruction that has an indirect conditional branch jump with the first instruction is parsed or not.
  • the scheduler 250 determines that the branch instruction of the control dependent branch with the first instruction has an indirect conditional branch jump is completed, the first instruction can be executed without waiting for all branch instructions having a control dependency with the first instruction. Both are parsed. In this case, variant 2 "branch target injection" of the Spectre attack can be defended.
  • the preset condition is that there is a control dependency relationship with the first instruction. In this case, it is necessary to wait for a control dependency relationship with the first instruction. After all the second instructions have been parsed, the first instruction can be executed. Compared with the previous implementation manner, in this embodiment, it is required to determine whether all the second instructions having the control dependency relationship with the first instruction are parsed, and the number of the second instructions that need to be tracked is only the second of the preset type. The number of instructions is more, so the defense attack is better.
  • an optional implementation manner of the foregoing FIG. 4 may be: setting a load filter in the scheduler 250, and implementing a filtering function of the load filter to implement defense based on the speculative memory access. Cache delay side channel attack. Specifically, when a different defense level needs to be set, the filtering function corresponding to each defense level can be set. For example, by reducing the type of branch instruction tracked by the Load Filter in the embodiment of the present application, Adjust the defense level.
  • the filtering function of the Load Filter is turned off, so that the second instruction that is dependent on the first instruction is not tracked, and the second instruction is not The pipeline has any logical impact.
  • the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction that satisfies the preset condition, according to The sequence number of the second instruction or the value of the first variable that is not resolved in each second instruction that satisfies the preset condition determines the value of the second variable. Then, the scheduler 250 determines whether each of the second instructions satisfying the preset condition is parsed based on the value of the first variable and the value of the second variable.
  • the scheduler 250 determines that the second instruction parsing that satisfies the preset condition is completed. If the value of the first variable is not less than the value of the second variable, it is determined that each second instruction that satisfies the preset condition is not parsed.
  • the sequence number is determined according to a decoding sequence of each second instruction that satisfies a preset condition.
  • the scheduler 250 can explicitly determine whether the first instruction has a control dependency risk based on the value of the first variable and the value of the second variable, and a small amount of decision logic before executing the first instruction.
  • the first instruction can be executed in the case that the first instruction does not have a control dependency risk, thereby effectively preventing the Cache delay side channel attack based on the speculative memory access.
  • the scheduler 250 determines the value of the first variable according to the sequence number of each second instruction that meets the preset condition, and specifically includes: the scheduler 250 determines that the second serial number of each second instruction that meets the preset condition is the largest. The sequence number of the instruction as the value of the first variable. The scheduler 250 determines the value of the second variable according to the sequence number of the unresolved second instruction or the value of the first variable in each second instruction that satisfies the preset condition, and includes any one of the following two methods:
  • the scheduler 250 determines the sequence number of the second instruction having the smallest and unresolved sequence number among the second instructions satisfying the preset condition as the value of the second variable.
  • the scheduler 250 determines that there is no unresolved second instruction in each second instruction that satisfies the preset condition, the value of the first variable is incremented by one as the value of the second variable.
  • the determining module 320 in the scheduler 250 determines the value of the second variable according to the mode 1, if the second instruction that meets the preset condition does not There is an unresolved second instruction, and the determination module 320 in the scheduler 250 determines the value of the second variable according to mode two.
  • the determining module 320 in the scheduler 250 can accurately determine whether the second instructions satisfying the preset condition are parsed by comparing the value of the first variable and the value of the second variable, and then satisfy After the second instruction of the preset condition is parsed, the first instruction is executed to implement effective defense against the Cache delay side channel attack based on the speculative memory access.
  • the scheduler 250 determines the sequence number of the second instruction that is the smallest and unresolved in the second instruction that meets the preset condition, and the value of the second variable is specifically included in the scheduler 250.
  • the determining module 320 takes the sequence number of the unresolved second instruction after the second instruction with the smallest and unresolved sequence number as the value of the second variable after parsing the second instruction with the smallest and unresolved sequence number.
  • the value of the second variable always indicates the second instruction with the smallest serial number and the unresolved in each second instruction that satisfies the preset condition, thereby ensuring that each of the preset conditions is satisfied. Whether the second instruction resolves the accuracy of the completion.
  • the parsing state of each second instruction may be tracked by adding two globally accessed registers to the processor that is inferred to perform out-of-order execution, and the parsing state includes parsed and unparsed.
  • the following takes the first instruction as the Load instruction and the second instruction as the branch instruction as an example, and is described in detail in conjunction with the specific embodiment.
  • the pipeline includes Fetch, Decode, Rename, Dispatch, Issue, Execution, WrBack, and Command ( Commit) and other eight flow sections.
  • the first register maintains a first variable in the distribution pipeline segment, the first variable records the latest dispatched branch sequence number (LBrSN), and the value of LBrSN is incremented by one each time a branch instruction is dispatched.
  • LBrSN latest dispatched branch sequence number
  • the second register maintains a second variable in the execution pipeline segment, the second variable recording the oldest un-resolve branch sequence number (NRBrSN). If the branch order buffer (BOB) is empty, or the branch instruction that satisfies the preset condition has been resolved, the value of NRBrSN is equal to the value of LBrSN plus one; if a branch instruction meeting the preset condition is found in the BOB Unresolved, the NRBrSN points to the oldest and unresolved branch number.
  • BOB branch order buffer
  • the NRBrSN value will continue to be updated for multiple cycles.
  • the branch instruction currently pointed to by the NRBrSN has been parsed, it is searched backwards along the BOB until the next unresolved branch instruction is found. If the next unresolved branch instruction is far away, such as spanning multiple BOB entries, the next unresolved branch instruction may not be found in the current cycle, such as the BOB in the end of the current cycle. 20 branch instructions, and the branch instruction has been parsed, then the next cycle continues to search backwards from the 21st branch instruction in the BOB until an unresolved branch instruction is found or BOB_end is reached.
  • the value of the load.BrSN corresponding to the Load instruction is less than NRBrSN, it indicates that each branch instruction having a control dependency relationship with the Load instruction has been parsed, and there is no control dependency risk when the Load instruction is executed.
  • an explicit control dependency is determined for the transmit logic of the Load instruction, that is, the load.BrSN value of the Load command is less than the value of the NRBrSN, and the control dependency of the Load command is determined to be resolved. .
  • the transmission logic of the Load instruction in the prior art is: the data dependency is resolved, that is, the register dependency and the write read dependency are satisfied, and the Load instruction is transmitted.
  • the transmission logic of the Load instruction needs to satisfy the data dependency to be resolved, and the control dependency needs to be resolved.
  • the launch logic in Table 1 is: if(reg_dep_ok & mem dep ok& load.BrSN ⁇ NRBrSN) issue load.
  • the speculative access of the Load has no control dependency risk, and is sufficient to defend against the Cache latency side channel attack based on speculative memory access such as Spectre attack.
  • the pipeline is emptied, such as the pipeline due to branch misprediction, the LBrSN, NRBrSN, and load.BrSN do not require additional recovery logic and do not affect the processing efficiency of the instruction.
  • branch instruction 1 when the load instruction A is dispatched, the three instructions distributed before the load instruction A are in the order of distribution: branch instruction 1, branch instruction 2, and branch instruction 3, wherein the branch instruction 1 has Analysis, branch instruction 2 is not parsed, branch instruction 3 has been parsed.
  • the control of the load command A may be determined to be unresolved and the load command A is suspended, according to the value of the load.BrSN being greater than the value of the NRBrSN.
  • the NRBrSN 4. Therefore, it can be determined that the control dependency of the load instruction A has been resolved according to load.BrSN is smaller than load.BrSN, and the load instruction A is executed.
  • the processor may also be a chip, and the chip is connected to the memory for reading and executing the software program stored in the memory to implement the execution method in any of the foregoing embodiments.
  • Embodiments of the present application also provide a computer storage medium for storing computer software instructions for execution of the above instructions, including program code for performing the above method embodiments.
  • embodiments of the present application can be provided as a method, apparatus (device), computer readable storage medium, or computer program product.
  • the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware aspects, which are collectively referred to herein as "module” or "system.”
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

L'invention concerne un procédé et dispositif d'exécution d'instructions, destinés à une défense efficace contre une attaque de canal latéral à retard de cache basée sur un accès spéculatif à une mémoire. Le procédé selon le mode de réalisation de la présente invention comporte les étapes consistant à: acquérir une première instruction, la première instruction étant une instruction pour la lecture d'une mémoire; déterminer chaque seconde instruction qui satisfait une condition préétablie, les secondes instructions étant des instructions de branchement qui sont exécutées dans le désordre, et la condition préétablie comprenant une relation de dépendance de commande avec la première instruction; et après avoir déterminé que chacune des secondes instructions satisfaisant la condition préétablie est analysée, exécuter la première instruction. Après que les secondes instructions qui satisfont la condition préétablie ont été analysées, une spéculation erronée et l'exécution de la première instruction peuvent être évitées, et un changement dans l'état de microstructure de cache causé par la spéculation erronée et l'exécution de la première instruction n'a donc pas lieu; par conséquent, un attaquant ne peut pas voler des données d'une zone protégée au moyen du changement dans l'état de microstructure de cache, ce qui constitue une défense efficace contre une attaque de canal latéral à retard de cache basée sur un accès spéculatif à une mémoire.
PCT/CN2018/083991 2018-04-21 2018-04-21 Procédé et dispositif d'exécution d'instructions WO2019200618A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880091562.8A CN111936968A (zh) 2018-04-21 2018-04-21 一种指令执行方法及装置
PCT/CN2018/083991 WO2019200618A1 (fr) 2018-04-21 2018-04-21 Procédé et dispositif d'exécution d'instructions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/083991 WO2019200618A1 (fr) 2018-04-21 2018-04-21 Procédé et dispositif d'exécution d'instructions

Publications (1)

Publication Number Publication Date
WO2019200618A1 true WO2019200618A1 (fr) 2019-10-24

Family

ID=68240570

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/083991 WO2019200618A1 (fr) 2018-04-21 2018-04-21 Procédé et dispositif d'exécution d'instructions

Country Status (2)

Country Link
CN (1) CN111936968A (fr)
WO (1) WO2019200618A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792352B (zh) * 2021-08-18 2024-06-21 中山大学 一种功耗均衡的指令调度优化方法、系统、装置及介质
CN113900712B (zh) * 2021-10-26 2022-05-06 海光信息技术股份有限公司 指令处理方法、指令处理装置及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080177981A1 (en) * 2004-03-26 2008-07-24 International Business Machines Corporation Apparatus and method for decreasing the latency between instruction cache and a pipeline processor
CN101706714A (zh) * 2009-11-23 2010-05-12 北京龙芯中科技术服务中心有限公司 指令发射系统及方法、处理器及其设计方法
CN104423927A (zh) * 2013-08-30 2015-03-18 华为技术有限公司 指令处理方法及装置、处理器
CN107688468A (zh) * 2016-12-23 2018-02-13 北京国睿中数科技股份有限公司 推测执行处理器中分支指令与分支预测功能的验证方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005098613A2 (fr) * 2004-03-30 2005-10-20 Sun Microsystems, Inc. Dispositif pour faciliter le traitement rapide tout en executant de maniere speculative un code en mode reperage
US7454602B2 (en) * 2004-12-15 2008-11-18 International Business Machines Corporation Pipeline having bifurcated global branch history buffer for indexing branch history table per instruction fetch group
US8990543B2 (en) * 2008-03-11 2015-03-24 Qualcomm Incorporated System and method for generating and using predicates within a single instruction packet
US9348599B2 (en) * 2013-01-15 2016-05-24 International Business Machines Corporation Confidence threshold-based opposing branch path execution for branch prediction
US20160055029A1 (en) * 2014-08-21 2016-02-25 Qualcomm Incorporated Programmatic Decoupling of Task Execution from Task Finish in Parallel Programs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080177981A1 (en) * 2004-03-26 2008-07-24 International Business Machines Corporation Apparatus and method for decreasing the latency between instruction cache and a pipeline processor
CN101706714A (zh) * 2009-11-23 2010-05-12 北京龙芯中科技术服务中心有限公司 指令发射系统及方法、处理器及其设计方法
CN104423927A (zh) * 2013-08-30 2015-03-18 华为技术有限公司 指令处理方法及装置、处理器
CN107688468A (zh) * 2016-12-23 2018-02-13 北京国睿中数科技股份有限公司 推测执行处理器中分支指令与分支预测功能的验证方法

Also Published As

Publication number Publication date
CN111936968A (zh) 2020-11-13

Similar Documents

Publication Publication Date Title
US9875106B2 (en) Computer processor employing instruction block exit prediction
CN107092467B (zh) 用于增强分支预测效率的指令序列缓冲器
US9378020B2 (en) Asynchronous lookahead hierarchical branch prediction
US8301870B2 (en) Method and apparatus for fast synchronization and out-of-order execution of instructions in a meta-program based computing system
US9250912B2 (en) Fast index tree for accelerated branch prediction
US7861066B2 (en) Mechanism for predicting and suppressing instruction replay in a processor
US7444501B2 (en) Methods and apparatus for recognizing a subroutine call
TWI550511B (zh) 用於指令轉譯錯誤偵測之方法
TWI411957B (zh) 亂序執行微處理器、微處理器及其相關之提升效能之方法及執行方法
US10310859B2 (en) System and method of speculative parallel execution of cache line unaligned load instructions
US20180349144A1 (en) Method and apparatus for branch prediction utilizing primary and secondary branch predictors
US9244688B2 (en) Branch target buffer preload table
US20150039862A1 (en) Techniques for increasing instruction issue rate and reducing latency in an out-of-order processor
WO2019200618A1 (fr) Procédé et dispositif d'exécution d'instructions
US10445101B2 (en) Controlling processing of instructions in a processing pipeline
KR102635965B1 (ko) 마이크로 프로세서의 프론트 엔드 및 이를 이용한 컴퓨터 구현 방법
CN112241288A (zh) 在硬件中检测条件分支的动态控制流重汇聚点
US20200004551A1 (en) Appratus and method for using predicted result values
US7945767B2 (en) Recovery apparatus for solving branch mis-prediction and method and central processing unit thereof
US9152425B2 (en) Mitigating instruction prediction latency with independently filtered presence predictors
US9250909B2 (en) Fast index tree for accelerated branch prediction
US10929137B2 (en) Arithmetic processing device and control method for arithmetic processing device
Kaeli et al. Data Speculation Yiannakis Sazeides, 1 Pedro Marcuello, 2 James E. Smith, 3 and An-tonio Gonza´ lez2, 4 1University of Cyprus; 2Intel-UPC Barcelona Research Center; 3University of Wisconsin-Madison; 4Universitat Politecnica de

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18915426

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18915426

Country of ref document: EP

Kind code of ref document: A1