WO2015027809A1 - 指令处理方法及装置、处理器 - Google Patents

指令处理方法及装置、处理器 Download PDF

Info

Publication number
WO2015027809A1
WO2015027809A1 PCT/CN2014/083879 CN2014083879W WO2015027809A1 WO 2015027809 A1 WO2015027809 A1 WO 2015027809A1 CN 2014083879 W CN2014083879 W CN 2014083879W WO 2015027809 A1 WO2015027809 A1 WO 2015027809A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
processor
sequence
instructions
control code
Prior art date
Application number
PCT/CN2014/083879
Other languages
English (en)
French (fr)
Inventor
侯锐
郭旭斌
冯煜晶
王曦爽
李晔
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015027809A1 publication Critical patent/WO2015027809A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to an instruction processing method and apparatus, and a processor.
  • Some special instruction sequences have high cache miss rate and high branch prediction error rate.
  • the load instruction in the ⁇ load, load, load> instruction sequence will cause the processor pipeline due to cache miss.
  • the branch instruction in the ⁇ branch,store,load,compare> instruction sequence can also cause the processor pipeline to stall due to branch prediction errors.
  • the present invention provides an instruction processing method and apparatus, and a processor, which are used to solve a pause in a processor work pipeline due to a high cache miss rate or a branch prediction error, and a processor execution instruction sequence.
  • an instruction processing method including:
  • the sequence of instructions refers to a special sequence of instructions that cause the processor to stall in the pipeline.
  • the multiple instructions are sequentially read from the second cache of the slave processor, and if it is determined that a special instruction sequence exists in the multiple instructions,
  • the control code corresponding to each instruction in the special instruction sequence includes:
  • control code includes a code for instructing the prediction execution component after closing the loading or a code for closing the branch prediction component;
  • the adjusting the microstructure of the processor according to the control code includes:
  • an instruction processing apparatus including:
  • a determining module configured to sequentially read a plurality of instructions from the second cache of the processor, and if it is determined that the special instruction sequence exists in the plurality of instructions, determine a control code corresponding to each instruction in the special instruction sequence;
  • the first cache of the processor configured to sequentially read a plurality of instructions from the second cache of the processor, and if it is determined that the special instruction sequence exists in the plurality of instructions, determine a control code corresponding to each instruction in the special instruction sequence
  • An adjustment module configured to: if it is determined that an instruction read from the first cache of the processor has a corresponding control code, adjust a microstructure of the processor according to the control code, so that the pipeline of the processor is not Pause
  • the special sequence of instructions refers to a special sequence of instructions that cause the pipeline of the processor to stall.
  • the determining module specifically includes: a first determining unit, configured to sequentially read multiple instructions from the second cache of the processor, according to the instruction and Corresponding relationship between pre-decoded values, respectively determining, in order, a pre-decode value corresponding to each instruction of the plurality of instructions;
  • a second determining unit configured to: according to a sequence consisting of pre-decode values corresponding to each instruction of the plurality of instructions, if it is determined that there is a sequence of pre-decode values corresponding to the special instruction sequence, determining that the plurality of instructions are included Special instruction sequence;
  • a third determining unit configured to determine, according to a correspondence between each instruction and the control code in the special instruction sequence, a control code corresponding to each instruction in the special instruction sequence.
  • control code includes a code for instructing the prediction execution component after closing the loading or a code for closing the branch prediction component;
  • the adjustment module is specifically configured to:
  • a processor comprising: the above instruction processing device.
  • a fourth aspect provides a terminal device, including: the foregoing processor.
  • the embodiment of the present invention detects whether there is a special instruction sequence by sequentially reading a plurality of instructions in the second cache of the slave processor, and if there is a special instruction sequence, determining a control code corresponding to each instruction in the special instruction sequence; And saving each instruction in the special instruction sequence and its corresponding control code to the first buffer of the processor; thereafter, if it is determined to read from the first cache of the processor.
  • the instruction has a corresponding control code, and the microstructure of the processor is adjusted according to the control code, so that the pipeline of the processor does not pause; the solution provided above can avoid the cache miss or when the special instruction sequence is executed.
  • FIG. 1 is a schematic flowchart diagram of an instruction processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an instruction processing apparatus according to another embodiment of the present invention.
  • Figure 3 is a schematic block diagram of the embodiment shown in Figure 2;
  • FIG. 4 is a schematic structural diagram of an instruction processing apparatus according to another embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a processor according to another embodiment of the present invention. detailed description
  • the existing processor micro-architecture design mainly predicts an instruction to be executed according to a history instruction that has been executed, and adjusts a processing strategy for the instruction stream to be executed (for example, predicting a jump direction and a jump address of the jump instruction); Optimize the overall execution efficiency of the processor's instruction stream.
  • the instruction sequence is ⁇ 10 (1,10 (1,10 (1>)
  • the load instruction is to retrieve data from the memory to the register, if the data is high speed)
  • the buffer memory (Cache) contains the data that needs to be retrieved, and the pipeline in which the processor executes the instruction can run smoothly, and continues to execute the load instruction and then enters the pipeline instruction. If the data cache does not contain the data that needs to be retrieved, A data cache miss occurs ( Cache Miss ), which requires refreshing the pipeline of processor execution instructions. That is, the instructions that enter the pipeline after the Load instruction are cleared, which results in the processor executing the instruction pipeline. Pause.
  • the command sequence ⁇ 1 «3 ⁇ 411.11,81;0 ⁇ ,103 (1 0111 3 ⁇ >branch instruction refers to the branch jump instruction, according to the existing processor micro-structure design, when the branch instruction When entering the pipeline, since the branch prediction component is normally turned on after the processor completes the initialization work, the position of the next instruction is predicted according to the branch jump direction provided by the branch prediction component.
  • the experimental data shows that when the processor is When the # ⁇ branch, store, load, compare> instruction sequence is executed, the instructions that do not need to be executed are sent to the pipeline incorrectly. After the branch prediction error is found, these instructions need to be cleared, which causes the processor pipeline to stall.
  • an embodiment of the present invention provides an instruction processing method, which can solve the problem that in the existing processor pipeline micro-structure design, when some special instruction sequences appear, there is a pipeline stall of the processor. Problem, you can optimize the efficiency of the processor's execution instructions.
  • Micro Architecture of the processor in this embodiment refers to a set of some functional components inside the processor, where the functional components include, for example, an instruction prediction execution component or a branch prediction component.
  • FIG. 1 is a schematic flowchart of an instruction processing method according to an embodiment of the present invention. As shown in FIG. 1, the instruction processing method in this embodiment may include:
  • step 101 specifically includes:
  • the second cache described in this embodiment is, for example, a secondary cache L2 Cache.
  • the special instruction sequence described in this embodiment includes but is not limited to the ⁇ load, load, load> command sequence. 1 J ⁇ . ⁇ branch,store,load,compare> directive ⁇ 1 J;
  • the special instruction sequence ⁇ 10 (1,10 (1,10 (1> in the load command's control code is the code that predicts the execution of the instruction after the load is closed;
  • the control code of the branch instruction is the code for closing the branch prediction component. It should be noted that the special instruction sequence ⁇ 1 «3 ⁇ 411011,81;0 ⁇ ,103(1 , the other three store ⁇ load, compare instructions corresponding to the control in 00111 3 ⁇ >
  • the code can be set to 0 by default, which means that the processor's microstructure adjustment is not required.
  • the first cache described in this embodiment is, for example, an instruction cache (I-Cache).
  • I-Cache instruction cache
  • the instruction read from the first cache of the processor specifically refers to an instruction that enters the pipeline of the processor, that is, an instruction to be executed;
  • the instruction to be executed is ⁇ load, load, 10 in the special instruction sequence (in the case of the load instruction in 1>, in order to avoid the problem that the Cache Miss may cause the pipeline of the processor to stall when the load instruction is executed, the load instruction corresponds to
  • the control code is a code for predicting the execution of the component after the load is closed; therefore, the step 103 is specifically as follows: the instruction of the processor is predicted to be executed after the load is closed according to the code of the instruction execution component after the load is closed.
  • step 103 is specifically: turning off the branch prediction component of the processor according to the code that turns off the branch prediction component.
  • the embodiment of the present invention detects whether there is a special instruction sequence by sequentially reading a plurality of instructions in the second cache of the slave processor, and if there is a special instruction sequence, determining a control code corresponding to each instruction in the special instruction sequence; And saving each instruction in the special instruction sequence and its corresponding control code to the first buffer of the processor; thereafter, if it is determined that there is corresponding control for the instruction read from the first cache of the processor a code, wherein the microstructure of the processor is adjusted according to the control code; Since the control code of this embodiment is to avoid the problem of pipeline stall when some special instruction sequences enter the pipeline of the processor, for each instruction in a special instruction sequence, the processor can be turned off after loading.
  • the pipelined instruction of the processor predicts the execution component, thereby avoiding the problem of causing the pipeline of the processor to stall when the load instruction is executed; for example, when detecting the special instruction sequence ⁇ branch, store, load, compare> is about to enter the processor
  • the processor can turn off the branch prediction component, thereby avoiding branch prediction errors caused when the branch instruction is executed; therefore, the efficiency of the processor execution instruction can be optimized.
  • FIG. 2 is a schematic flowchart of an instruction processing method according to another embodiment of the present invention
  • FIG. 3 is a schematic block diagram of the embodiment shown in FIG. 2.
  • the instruction processing method in this embodiment is FIG. Specific implementations of the illustrated embodiment include:
  • the L2 Cache read port is 16 fields (Bytes), each instruction is 4 Bytes, and only 4 instructions (Instr0, Instrl, Instr2, Instr3) can be read in each clock cycle.
  • Table 1 is a first relationship correspondence table established in the embodiment of the present invention, as shown in Table 1:
  • the first relationship corresponds to the correspondence between each instruction stored in the table and its pre-decoded value.
  • the pre-decoder (Pre-decoder) stores the first relationship correspondence table, which can be based on the table.
  • the correspondence shown in 1 determines the pre-decode values (precode0, precode 1, precode2, precode3) corresponding to the four instructions read in the first clock cycle.
  • the first relationship correspondence table may be set by using a hardware module; or may be defined by using a software module.
  • the pre-decoder can determine the pre-decode values (precode4, precode5, precode6, precode7) corresponding to the four instructions read in the second clock cycle according to the correspondence shown in Table 1. Measure, and according to the detection result, generate the control code of the four instructions read in the first clock cycle.
  • the control codes of Instr3) are (ctlcode0, ctlcodel, ctlcode2, ctlcode3).
  • Table 2 is a second relationship correspondence table established in the embodiment of the present invention, as shown in Table 2:
  • the second relationship correspondence table holds each of the eves of the event.
  • the instruction sequence mode detector (Instruction Sequential Pattern Check) stores a second relationship correspondence table, and according to the sequential eight instructions sequentially read and the corresponding pre-decode values, the query is performed.
  • the two relationship correspondence table can determine whether there is a pipeline of a sequence of special instructions that is to be entered into the processor in the above-mentioned consecutive eight instructions.
  • the foregoing second relationship correspondence table may be set by using a hardware module; or may be defined by using a software module.
  • the description of the special instruction sequence detection is performed by ⁇ 10 (1, 10 (1, 10 (1> instruction sequence): If there are three consecutive pre-decode values of 1 in the pre-decode value of the four instructions read continuously in the first clock cycle, it can be determined that four instructions are continuously read in the first clock cycle. a special instruction sequence with ⁇ load,load,load>; or
  • the pre-decode value of the last instruction read in the first clock cycle is 1, and the pre-decode value of the first and second instructions read in the second clock cycle is also 1 When, it can be determined that there are ⁇ load, load, load ⁇ special instruction sequences among the consecutive eight instructions read out in sequence;
  • Table 3 is a third relationship correspondence table established in the embodiment of the present invention, as shown in Table 3:
  • the third relationship correspondence table stores a correspondence relationship between each instruction in the special instruction sequence and its control code, and according to the correspondence relationship shown in Table 3, the control code of each instruction in the detected special instruction sequence can be determined. It should be noted that the foregoing second relationship correspondence table may be set by using a hardware module; or may be defined by using a software module.
  • the ⁇ load, load, load> instruction sequence is taken as an example to describe the generation of the control code of the instruction read in the first clock cycle:
  • the pre-decode value of the last two instructions read in the first clock cycle is 1, and the pre-decode value of the first instruction is also 1 in the second clock cycle, Determining the special instruction sequence of ⁇ load, load, load> among the consecutive 8 instructions read out in sequence; since the first and second instructions read in the first clock cycle are other instructions except the load instruction At this time, the control code corresponding to the first and second instructions read in the first clock cycle is 0, and the last two instructions read in the first clock cycle are the special instruction sequence ⁇ 10 (1,10) (1, 10 (1>
  • the first and second load instructions according to the correspondence shown in Table 3, you can determine the special instruction sequence ⁇ 10 (1,10 (1,10 (1, 2 in 1>) The control code corresponding to the load instruction, so that the control code corresponding to the last two load instructions read in the first clock cycle can be determined.
  • step 207 Read an instruction from the I-Cache, determine whether the instruction is bound with a control code, and if yes, perform step 208; otherwise, perform step 209.
  • the instruction is a special instruction sequence ⁇ 1 0 (1,10 (1,10 (load instruction in 1>, it is necessary to turn off the instruction prediction execution part after loading; when executing the special instruction sequence ⁇ 1 0 ( 1,10 (1,10 (1) After the three load instructions in the 1st, re-start the instruction of the processor after the load is predicted to execute the part.
  • the processor's microstructure is not adjusted.
  • the adjustment is made according to the microstructure design of the existing processor.
  • the L2 Cache is used to detect whether there is a special instruction sequence according to the eight consecutively read instructions and the corresponding pre-decode value. If yes, determine the instruction in the special instruction sequence.
  • Control code adjusts the processor's microstructure, such as After the load is turned off, the instruction predicts the execution component, turns off the branch prediction component, and the like, thereby avoiding the problem of pipeline stall or branch prediction error caused by executing the instruction in the special instruction sequence, and optimizing the efficiency of the processor execution instruction.
  • FIG. 4 is a schematic structural diagram of an instruction processing apparatus according to another embodiment of the present invention. As shown in FIG. 4, the method includes:
  • the determining module 41 is configured to sequentially read a plurality of instructions from the second cache of the processor, and if it is determined that the special instruction sequence exists in the plurality of instructions, determine a control code corresponding to each instruction in the special instruction sequence;
  • the saving module 42 is configured to save each instruction in the special instruction sequence and its corresponding control code to the first buffer of the processor;
  • the adjusting module 43 is configured to: if it is determined that the instruction read from the first cache of the processor has a corresponding control code, adjust a microstructure of the processor according to the control code, so that the processor is pipelined Don't stop.
  • the determining module 41 specifically includes:
  • a first determining unit 411 configured to sequentially read a plurality of instructions from the second cache of the processor, and sequentially determine, according to a correspondence between the instructions and the pre-decoded values, each of the plurality of instructions a pre-decode value corresponding to the instruction;
  • a second determining unit 412 configured to determine, according to a sequence consisting of pre-decode values corresponding to each instruction of the plurality of instructions, if it is determined that a pre-decode value sequence corresponding to the special instruction sequence exists, determining the plurality of instructions Including special instruction sequences;
  • the third determining unit 413 is configured to determine, according to the correspondence between each instruction and the control code in the special instruction sequence, a control code corresponding to each instruction in the special instruction sequence.
  • control code includes, but is not limited to, a code that predicts execution of a component after the load is closed or a code that closes the branch prediction component;
  • the adjustment module 43 is specifically configured to:
  • the embodiment of the present invention detects whether there is a special instruction sequence by sequentially reading a plurality of instructions in the second cache of the slave processor, and if there is a special instruction sequence, determining a control code corresponding to each instruction in the special instruction sequence; And saving each instruction in the special instruction sequence and its corresponding control code to the first buffer of the processor; thereafter, if it is determined that there is corresponding control for the instruction read from the first cache of the processor a code, the micro-structure of the processor is adjusted according to the control code; because the control code of the embodiment is to avoid the problem that the pipeline may be stopped when some special instruction sequences enter the pipeline of the processor, for some special
  • the processor can close the instruction of the processor to predict the execution component after loading, thereby avoiding the problem of causing the pipeline of the processor to pause when the load instruction is executed; for example, when detecting a special Instruction sequence ⁇ 1«3 ⁇ 41 ⁇ 11 ⁇ 0 ⁇ ,10 (1 01 ⁇ 3 ⁇ > is about to enter the pipeline of the processor,
  • the embodiment of the present invention further provides a processor, including: the instruction processing device described in the embodiment shown in FIG. 4, and details are not described herein again.
  • FIG. 5 is a schematic structural diagram of a processor according to another embodiment of the present invention. As shown in FIG. 5, the method includes: a first buffer 51, a second buffer 52, a predecoder 53 and a special instruction sequence detector 54; The first buffer 51, the second buffer 52, the predecoder 53 and the special instruction sequence detector 54 are connected by a communication bus.
  • the second buffer 52 is configured to sequentially read a plurality of instructions in sequence
  • the pre-decoder 53 is configured to determine, according to the multiple instructions sequentially read by the second buffer 52, the pre-decode value corresponding to the plurality of instructions by using a correspondence between the instruction and the pre-decode value. ;
  • the special instruction sequence detector 54 is configured to perform a sequence consisting of pre-decode values corresponding to the plurality of instructions determined by the pre-decoder 53 in sequence, and if it is determined that there is a sequence of pre-decode values corresponding to the special instruction sequence, determining the sequence a special instruction sequence exists in a plurality of instructions, and further according to the special instruction sequence Corresponding relationship between control codes of each instruction in the special instruction sequence, determining a control code corresponding to each instruction in the special instruction sequence;
  • the first buffer 51 is configured to save each instruction in the special instruction sequence determined by the special instruction sequence detector 54 and its corresponding control code;
  • the first buffer 51 is further configured to save other instructions of the plurality of instructions except the special instruction sequence.
  • the microstructure of the processor is adjusted according to the control code, so that the processor The assembly line does not stop.
  • the first buffer 51 can be an I-Cache and the second buffer 52 can be an L2 CACHE.
  • the above special instruction sequence includes but is not limited to the ⁇ load, load, load> instruction sequence and the ⁇ branch, store, load, compare> instruction sequence ⁇ 1 J.
  • the special instruction sequence is ⁇ 1. (1, load, load> instruction sequence
  • the special instruction sequence ⁇ 10 (1, load, 10 (1>
  • the control code of each load instruction is the code of the instruction execution execution component after the load is turned off, correspondingly, according to The control code of the load instruction, and the instruction after the loading is predicted to execute the component;
  • the special instruction sequence is a ⁇ branch, store, load, compare> instruction sequence
  • the control code of the branch instruction in ⁇ branch, store, load, compare> is the code for closing the branch prediction component, and correspondingly , according to the control code of the branch instruction, the branch prediction component is turned off.
  • the embodiment of the present invention detects whether there is a special instruction sequence by sequentially reading a plurality of instructions in the second buffer of the slave processor, and if there is a special instruction sequence, determining a control code corresponding to each instruction in the special instruction sequence. And binding the instructions in the special instruction sequence and their corresponding control codes to the first buffer of the processor; afterwards, when reading from the first buffer of the processor The command is bound with a corresponding control code, and the microstructure of the processor is adjusted according to the control code;
  • control code of this embodiment is to avoid the flow of certain special instruction sequences into the processor
  • the processor can turn off the instruction of the processor after the load to predict the execution component, thereby avoiding the processing when the load instruction is executed.
  • the problem of the pipeline stalling for example, when detecting the special instruction sequence ⁇ 1 «3 ⁇ 41 ⁇ 11 ⁇ 0 ⁇ ,10 (1 01 ⁇ 3 ⁇ > is about to enter the pipeline of the processor, in order to avoid branch prediction of the branch prediction component
  • the processor can turn off the branch prediction component, thereby avoiding branch prediction errors caused by executing the branch instruction; therefore, the efficiency of the processor executing the instruction can be optimized.
  • the embodiment of the present invention further provides a terminal device, including: the processor described in the embodiment shown in FIG. 5, and details are not described herein again.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the components displayed for the unit may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software function unit.
  • the above-described integrated unit implemented in the form of a software functional unit may be stored in the form of code in a computer readable storage medium.
  • the above code is stored in a computer readable storage medium A number of instructions are included to cause a processor or hardware circuit to perform some or all of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a universal serial bus interface, a micro high-capacity mobile storage disk without a physical drive, a mobile hard disk, a read-only memory (English: Read-Only Memory, ROM for short), and a random access memory (English: Random Access Memory (referred to as RAM), disk or optical disk, and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

一种指令处理方法及装置、处理器,涉及计算机技术领域。当从处理器的第二缓存中按序读取多条指令时,若确定所述多条指令中存在特殊指令序列,则确定所述特殊指令序列中各指令对应的控制码;将所述特殊指令序列中各指令及其对应的控制码保存到所述处理器的第一缓存中;若确定从所述处理器的第一缓存中读取的指令存在对应的控制码,则根据所述控制码调整所述处理器的微结构,使得所述处理器的流水线不停顿,可以优化处理器执行指令的工作效率。

Description

指令处理方法及装置、 处理器 本申请要求于 2013 年 8 月 30 日提交中国专利局、 申请号为 201310389245.3、 发明名称为"指令处理方法及装置、 处理器,,的中国专利申请 的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及计算机技术领域, 尤其涉及一种指令处理方法及装置、 处理器。
背景技术
基于数据挖掘方法研究处理器硬件发现: 一些特殊指令序列存在緩存缺 失率高、 分支预测失误率高等问题, 例如, <load,load,load>指令序列中的 load 指令由于緩存缺失会导致处理器流水线的停顿; 又例如, <branch,store,load,compare>指令序列中的 branch指令由于分支预测失误也会 导致处理器流水线的停顿。
由此可见, 现有处理器的一些特殊指令序列由于緩存缺失率高或分支预 测失误等原因导致处理器工作流水线的停顿, 从而影响了处理器执行指令的 工作效率。 发明内容
本发明提供一种指令处理方法及装置、 处理器, 用以解决现有处理器的 一些指令序列由于緩存缺失率高或分支预测失误等原因导致处理器工作流水 线的停顿, 以及处理器执行指令序列的效率低的问题。
第一方面, 提供一种指令处理方法, 包括:
从处理器的第二緩存中按序读取多条指令, 若确定所述多条指令中存在 特殊指令序列, 则确定所述特殊指令序列中各指令对应的控制码; 将所述特殊指令序列中各指令及其对应的控制码保存到所述处理器的第 一緩存中;
若确定从所述处理器的第一緩存中读取的指令存在对应的控制码, 则根 据所述控制码调整所述处理器的微结构, 使得所述处理器的流水线不停顿; 所述特殊指令序列是指造成所述处理器的流水线停顿的特殊指令序列。 基于第一方面, 在第一种可能的实现方式中, 所述从处理器的第二緩存 中按序读取多条指令, 若确定所述多条指令中存在特殊指令序列, 则确定所 述特殊指令序列中各指令对应的控制码, 包括:
从所述处理器的第二緩存中按序读取多条指令, 根据指令与预译码值之 间的对应关系, 按序分别确定所述多条指令中每条指令对应的预译码值; 根据所述多条指令中每条指令对应的预译码值组成的序列, 若确定存在 特殊指令序列对应的预译码值序列, 则确定所述多条指令中包括特殊指令序 列;
根据所述特殊指令序列中各指令和控制码之间的对应关系, 确定所述特 殊指令序列中各指令对应的控制码。
基于第一方面或第一方面的第一种可能的实现方式, 在第二种可能的实 现方式中, 所述控制码包括关闭加载之后指令预测执行部件的代码或关闭分 支预测部件的代码;
所述根据所述控制码调整所述处理器的微结构 , 包括:
根据所述关闭加载之后指令预测执行部件的代码, 关闭加载之后所述处 理器的指令预测执行部件; 或
才艮据所述关闭分支预测部件的代码, 关闭所述处理器的分支预测部件。 第二方面, 提供一种指令处理装置, 包括:
确定模块, 用于从处理器的第二緩存中按序读取多条指令, 若确定所述 多条指令中存在特殊指令序列, 则确定所述特殊指令序列中各指令对应的控 制码; 所述处理器的第一緩存中;
调整模块, 用于若确定从所述处理器的第一緩存中读取的指令存在对应 的控制码, 则根据所述控制码调整所述处理器的微结构, 使得所述处理器的 流水线不停顿;
所述特殊指令序列是指造成所述处理器的流水线停顿的特殊指令序列。 基于第二方面, 在第一种可能的实现方式中, 所述确定模块具体包括: 第一确定单元, 用于从所述处理器的第二緩存中按序读取多条指令, 根 据指令与预译码值之间的对应关系, 按序分别确定所述多条指令中每条指令 对应的预译码值;
第二确定单元, 用于根据所述多条指令中每条指令对应的预译码值组成 的序列, 若确定存在特殊指令序列对应的预译码值序列, 则确定所述多条指 令中包括特殊指令序列;
第三确定单元, 用于根据所述特殊指令序列中各指令和控制码之间的对 应关系, 确定所述特殊指令序列中各指令对应的控制码。
基于第二方面或第二方面的第一种可能的实现方式, 在第二种可能的实 现方式中, 所述控制码包括关闭加载之后指令预测执行部件的代码或关闭分 支预测部件的代码;
所述调整模块具体用于:
根据所述关闭加载之后指令预测执行部件的代码, 关闭加载之后所述处 理器的指令预测执行部件; 或
才艮据所述关闭分支预测部件的代码, 关闭所述处理器的分支预测部件。 第三方面, 提供一种处理器, 包括: 上述指令处理装置。
第四方面, 提供一种终端设备, 包括: 上述处理器。
本发明实施例通过对从处理器的第二緩存中按序读取多条指令, 检测是 否存在特殊指令序列, 若存在特殊指令序列, 则确定所述特殊指令序列中各 指令对应的控制码; 并将所述特殊指令序列中各指令及其对应的控制码保存 到所述处理器的第一緩存中; 之后, 若确定从所述处理器的第一緩存中读取 的指令存在对应的控制码, 则根据所述控制码调整所述处理器的微结构, 使 得所述处理器的流水线不停顿; 上述提供的方案可以避免在执行特殊指令序 列时, 由于緩存缺失或分支预测失误等原因造成的处理器流水线停顿的问题; 因此, 可以优化处理器执行指令的工作效率。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下 面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 在 不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明一实施例提供的指令处理方法的流程示意图;
图 2为本发明另一实施例提供的指令处理装置的结构示意图;
图 3为图 2所示实施例的原理框图;
图 4为本发明另一实施例提供的指令处理装置的结构示意图;
图 5为本发明另一实施例提供的处理器的结构示意图。 具体实施方式
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本发 明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于 本发明中的实施例, 本领域普通技术人员在没有作出创造性劳动前提下所获 得的所有其他实施例, 都属于本发明保护的范围。
本发明的说明书和权利要求书及上述附图中的术语 "第一"、 "第二"、 "第三" "第四" 等(如果存在)是用于区别类似的对象, 而不必用于描述特 定的顺序或先后次序。 应该理解这样使用的数据在适当情况下可以互换, 以 便这里描述的本发明的实施例例如能够以除了在这里图示或描述的那些以外 的顺序实施。 此外, 术语 "包括" 和 "具有" 以及他们的任何变形, 意图在 于覆盖不排他的包含, 例如, 包含了一系列步骤或单元的过程、 方法、 系统、 产品或设备不必限于清楚地列出的那些步骤或单元, 而是可包括没有清楚地 列出的或对于这些过程、 方法、 产品或设备固有的其它步骤或单元。
现有的处理器微结构设计主要是根据已经执行过的历史指令预测即将执 行的指令, 调整对即将执行指令流的处理策略(例如预测跳转指令的跳转方 向与跳转地址;), 从而对处理器的指令流的整体执行效率进行优化。
然而, 有些特殊指令序列容易造成处理器的流水线的停顿, 例如, 指令 序列 <10 (1,10 (1,10 (1>中的 Load指令就是要从存储器中取回数据到寄存器, 如 果数据高速緩冲存储器(Cache ) 中包含这需要取回的数据, 处理器执行指令 的流水线就可以流畅运行, 继续执行 Load指令之后进入流水线的指令。 如果 数据 Cache不包含这一需要取回的数据, 则会发生一次数据緩存缺失( Cache Miss ), 这样就需要刷新处理器执行指令的流水线, 也就是说, 在 Load指令 之后进入流水线的指令都要被清除掉, 这样就导致了处理器执行指令的流水 线停顿。
又例如, 指令序列<1«¾11。11,81;0^,103(1 0111 3^>中的 branch指令是指进行 分支跳转的指令, 根据现有的处理器微结构设计, 当 branch指令进入流水线 时, 由于在处理器完成初始化工作以后, 通常分支预测部件就开启, 因此, 会根据分支预测部件提供的分支跳转方向, 预测下一条指令的位置。 而实验 数据表明, 当处理器在执 #<branch, store, load, compare>指令序列时, 错误地 将不需要执行的指令送入流水线, 发现分支预测错误后还需要清除掉这些指 令, 这样就导致了处理器流水线停顿。。
上述仅仅以指令序歹1 J <load,load,load>、 <branch,store,load,compare> 例进 行说明, 可以理解, 本领域的普通技术人员根据上述指令序列所联想到其他 导致由于緩存缺失率高或分支预测失误导致的指令序列也属于本专利的保护 范围。
由此可知, 现有的处理器的微结构设计中, 当出现某些特殊指令序列时, 会由于緩存缺失率高或分支预测失误或其他原因导致处理器的流水线停顿的 问题, 从而降低了处理器执行指令的工作效率。
基于上述现有技术存在的问题, 本发明实施例提供一种指令处理方法, 可以解决在现有的处理器流水线微结构设计中, 当出现某些特殊指令序列时, 存在处理器的流水线停顿的问题, 可以优化处理器执行指令的工作效率。
需要说明的是, 本实施例所述的处理器的微结构( Micro Architecture )具 体是指处理器内部的一些功能部件的集合, 其中, 功能部件例如包括指令预 测执行部件或分支预测部件等。
图 1为本发明一实施例提供的指令处理方法的流程示意图, 如图 1所示, 本实施例的指令处理方法可以包括:
101、 从处理器的第二緩存中按序读取多条指令, 若确定所述多条指令中 存在特殊指令序列, 则确定所述特殊指令序列中各指令对应的控制码;
在本发明的一个可选的实施方式中, 步骤 101具体包括:
从所述处理器的第二緩存中按序读取多条指令, 根据指令与预译码值之 间的对应关系, 按序分别确定所述多条指令中每条指令对应的预译码值; 根据所述多条指令中每条指令对应的预译码值组成的序列, 若确定存在 特殊指令序列对应的预译码值序列, 则确定所述多条指令中包括特殊指令序 列;
根据所述特殊指令序列中各指令和控制码之间的对应关系, 确定所述特 殊指令序列中各指令对应的控制码。
需要说明的是, 本实施例所述的第二緩存例如为二级緩存 L2 Cache„ 需要说明的是, 本实施例所述的特殊指令序列包括但不限于 <load,load,load>指令序歹1 J ^。<branch,store,load,compare>指令序歹1 J;
其中, 特殊指令序列 <10 (1,10 (1,10 (1>中的 load指令的控制码为关闭加载 之后指令预测执行部件的代码;
特殊指令序列 <branch,store,load,compare>† branch指令的控制码为关闭 分支预测部件的代码,其中 , 需要说明的是, 该特殊指令序列 <1«¾11011,81;0^,103(1,00111 3^>中的其他三个 store ^ load、 compare指令对应的控 制码可以默认设为 0, 表示不需要进行处理器的微结构的调整。
102、 将所述特殊指令序列中各指令及其对应的控制码保存到所述处理器 的第一緩存中;
需要说明的是, 本实施例所述的第一緩存例如为指令高速緩冲存储器 ( Instruction- Cache , I-Cache )。
103、 若确定所述处理器的第一緩存中读取的指令存在对应的控制码, 则 根据所述控制码调整所述处理器的微结构, 使得所述处理器的流水线不停顿。
从处理器的第一緩存中读取的指令具体是指进入处理器的流水线的指 令, 也就是即将执行的指令;
假设即将执行的指令为特殊指令序列中 <load ,load ,10 (1>中的 load指令 时, 为了避免在执行 load指令时, 由于 Cache Miss可能造成处理器的流水线 停顿的问题, 该 load指令对应的控制码为关闭加载之后指令预测执行部件的 代码; 因此, 步骤 103具体为: 根据关闭加载之后指令预测执行部件的代码, 关闭加载之后所述处理器的指令预测执行部件。
需要说明的是, 当执行完特殊指令序列中 <load ,load ,load>†的三个 load 之后, 需要重新开启加载之后所述处理器的指令预测执行部件。
£设即将执行的指令为特殊指令序列†<branch, store, load, 0011^3^>中 的 branch指令时, 为了避免在执行 branch指令时分支预测部件的分支预测失 误, 该 branch指令对应的控制码为关闭分支预测部件的代码; 因此, 步骤 103 具体为: 根据关闭分支预测部件的代码, 关闭所述处理器的分支预测部件。
需要说明的是, 当执行完特殊指令序列中 <branch, store, load, compare>中 的 branch指令之后, 需要重新开启处理器的分支预测部件。
本发明实施例通过对从处理器的第二緩存中按序读取多条指令, 检测是 否存在特殊指令序列, 若存在特殊指令序列, 则确定所述特殊指令序列中各 指令对应的控制码; 并将所述特殊指令序列中各指令及其对应的控制码保存 到所述处理器的第一緩存中; 之后, 若确定从所述处理器的第一緩存中读取 的指令存在对应的控制码, 则根据所述控制码调整所述处理器的微结构; 由于本实施例的控制码是为了避免当某些特殊指令序列进入处理器的流 水线时会造成流水线停顿的问题, 针对某些特殊指令序列中的各指令设计的, 时, 处理器可以关闭加载之后所述处理器的流水线的指令预测执行部件, 从 而避免在执行 load指令时造成处理器的流水线停顿的问题; 又例如, 当检测 出特殊指令序列 <branch,store,load,compare>即将进入处理器的流水线时,为了 避免分支预测部件的分支预测失误, 处理器可以关闭分支预测部件, 从而可 以避免在执行 branch指令时造成的分支预测失误; 因此, 可以优化处理器执 行指令的工作效率。
图 2为本发明另一实施例提供的指令处理方法的流程示意图,图 3为图 2 所示实施例的原理框图, 如图 2和图 3所示, 本实施例的指令处理方法是图 1 所示实施例的具体实现, 包括:
201、 在第一个时钟周期内从 L2 Cache中按序读取 4条指令。
具体实现时,假设 L2 Cache读端口为 16字段( Bytes ),每条指令为 4 Bytes, 每个时钟周期内只能读出 4条指令(Instr0、 Instrl、 Instr2、 Instr3 )。
202、 确定第一个时钟周期内读取的 4条指令对应的预译码值。
表 1为本发明实施例建立的第一关系对应表, 如表 1所示:
Figure imgf000010_0001
该第一关系对应表中保存的每个指令与其预译码值之间的对应关系, 如 图 3所示, 预译码器( Pre-decoder ) 中保存有第一关系对应表, 可以根据表 1 所示的对应关系, 确定第一个时钟周期内读取的 4 条指令对应的预译码值 ( precode0、 precode 1、 precode2、 precode3 )。 需要说明的是, 上述第一关系对应表可以釆用硬件模块进行设置; 或者 也可以釆用软件模块进行定义。
203、 在第二个时钟周期内从 L2 Cache中按序读取 4条指令。
为了保证不会漏掉任一特殊指令序列, 需要从 L2 Cache中按序连续读取 8条指令, 因此, 本实施例中, 需要将在第一个时钟周期内读出四条指令及其 对应的预译码值緩存一拍, 之后, 在第二个时钟周期内从 L2 Cache中按序读 取 4条指令(Instr4、 Instr5、 Instr6、 Instr7 )„
204、 确定第二个时钟周期内读取的 4条指令对应的预译码值。
预译码器( Pre-decoder )根据表 1所示的对应关系, 可以确定第二个时钟 周期内读取的 4 条指令对应的预译码值 (precode4、 precode5、 precode6、 precode7 )。 测, 并根据检测结果产生第一个时钟周期内读取的 4条指令的控制码。
如图 3所示, 第一个时钟周期内读取的 4条指令(Instr0、 Instrl、 Instr2、
Instr3 ) 的控制码分别为 (ctlcode0、 ctlcodel、 ctlcode2、 ctlcode3 )。
表 2为本发明实施例建立的第二关系对应表, 如表 2所示:
Figure imgf000011_0001
第二关系对应表中保存有每个符夕尽 厅 n 兵 t贝
应关系; 如图 3 所示, 指令序列模式检测器 (Instruction Sequential Pattern Check )中保存有第二关系对应表, 根据上述按序读出的连续 8条指令以及对 应的预译码值, 查询第二关系对应表, 可以确定上述按序读出的连续 8条指 令中是否存在特殊指令序列即将进入处理器的流水线。
需要说明的是, 上述第二关系对应表可以釆用硬件模块进行设置; 或者 也可以釆用软件模块进行定义。
本实施例以<10 (1,10 (1,10 (1>指令序列为例进行特殊指令序列检测的说明: 若在第一个时钟周期内连续读取的 4条指令的预译码值中存在 3个连续 为 1 的预译码值时, 可以确定第一个时钟周期内连续读取的 4条指令中存在 <load,load,load>的特殊指令序列; 或者
若在第二个时钟周期内连续读取的 4条指令的预译码值中存在 3个连续 为 1 的预译码值时, 可以确定第二个时钟周期内连续读取的 4条指令中存在 <load,load,load>的特殊指令序列; 或者
若在第一个时钟周期内读取的最后 2条指令的预译码值为 1 时, 且在第 二个时钟周期内读取第 1条指令的预译码值也为 1 时, 则可以确定上述按序 读出的连续 8条指令中存在 <load,load,load^々特殊指令序列; 或者
若在第一个时钟周期内读取的最后 1条指令的预译码值为 1 时, 且在第 二个时钟周期内读取第 1条和第 2条指令的预译码值也为 1时, 则可以确定 上述按序读出的连续 8条指令中存在 <load,load,load^々特殊指令序列;
表 3为本发明实施例建立的第三关系对应表, 如表 3所示:
Figure imgf000012_0001
该第三关系对应表中保存有特殊指令序列中各指令与其控制码之间的对 应关系, 根据表 3 所示的对应关系, 可以确定检测出的特殊指令序列中各指 令的控制码。 需要说明的是, 上述第二关系对应表可以釆用硬件模块进行设 置; 或者也可以釆用软件模块进行定义。
本实施例以 <load,load,load>指令序列为例进行第一个时钟周期内读取的 指令的控制码的产生说明:
当确定第一个时钟周期内连续读取的 4条指令中存在 <load ,load ,10 (1>的 特殊指令序列, 例如第一个时钟周期内连续读取的第 1、 2、 3个指令都是 load 时, 第 4个指令是除 load指令外的其他指令时, 根据表 3所示的对应关系, 确定第 1、 2、 3个 load指令对应的控制码, 其中, 第 4个其他指令的控制码 缺省设置为 0, 即表示当执行第 4个其他指令时, 不需要对处理器的微结构进 行调整, 按照现有的处理器的微结构设计进行调整。
若在第一个时钟周期内读取的最后 2条指令的预译码值为 1 时, 且在第 二个时钟周期内读取第 1条指令的预译码值也为 1 时, 则可以确定上述按序 读出的连续 8条指令中存在 <load,load,load>的特殊指令序列;由于在第一个时 钟周期内读取的第 1、 2个指令为除 load指令外的其他指令, 此时, 第一个时 钟周期内读取的第 1、 2个指令对应的控制码为 0 , 而第一个时钟周期内读取 的最后 2条指令为特殊指令序列<10 (1,10 (1,10 (1>中的第 1、 2个 load指令, 根 据表 3所示的对应关系, 可以确定特殊指令序列<10 (1,10 (1,10 (1>中的第 1、 2 个 load指令对应的控制码, 从而可以确定第一个时钟周期内读取的最后 2个 load指令对应的控制码。
206、 将第一个时钟周期内读取的 4 条指令以及对应的控制码保存到 I-Cache中。
207、 从 I-Cache中读取指令, 确定该指令是否绑定有控制码, 若是, 则 执行步骤 208 , 否则执行步骤 209。
208、 根据该指令的控制码, 调整处理器的微结构。
例如, 若该指令是特殊指令序列 <10 (1,10 (1,10 (1>中的 load指令, 则需要 关掉加载之后的指令预测执行部件; 当执行完特殊指令序列<10 (1,10 (1,10 (1> 中的 3个 load指令之后, 重新开启加载之后处理器的指令预测执行部件。
209、 不调整处理器的微结构。
也就是说, 按照现有的处理器的微结构设计进行调整。
本发明实施例利用在 L2 Cache回填 I-Cache的过程中,根据连续读取的 8 条指令以及对应预译码值, 检测是否存在特殊指令序列, 若存在, 则确定该 特殊指令序列中各指令对应的控制码, 并将该特殊指令序列中各指令以及对 应的控制码绑定保存到 I-Cache中,使得当该特殊指令序列中各指令进入流水 线时, 根据该特殊指令序列中各指令对应的控制码调整处理器的微结构, 如 关掉 Load之后指令预测执行部件, 关闭分支预测部件等, 从而可以避免在执 行特殊指令序列中的指令时造成的流水线停顿的问题或分支预测失误的问 题, 可以优化处理器执行指令的工作效率。
图 4为本发明另一实施例提供的指令处理装置的结构示意图, 如图 4所 示, 包括:
确定模块 41 , 用于从处理器的第二緩存中按序读取多条指令, 若确定所 述多条指令中存在特殊指令序列, 则确定所述特殊指令序列中各指令对应的 控制码;
保存模块 42, 用于将所述特殊指令序列中各指令及其对应的控制码保存 到所述处理器的第一緩存中;
调整模块 43 , 用于若确定从所述处理器的第一緩存中读取的指令存在对 应的控制码, 则根据所述控制码调整所述处理器的微结构, 使得所述处理器 的流水线不停顿。
举例来说, 所述确定模块 41具体包括:
第一确定单元 411 , 用于从所述处理器的第二緩存中按序读取多条指令, 根据指令与预译码值之间的对应关系, 按序分别确定所述多条指令中每条指 令对应的预译码值;
第二确定单元 412 ,用于根据所述多条指令中每条指令对应的预译码值组 成的序列, 若确定存在特殊指令序列对应的预译码值序列, 则确定所述多条 指令中包括特殊指令序列;
第三确定单元 413 ,用于根据所述特殊指令序列中各指令和控制码之间的 对应关系, 确定所述特殊指令序列中各指令对应的控制码。
举例来说, 所述控制码包括但不限于关闭加载之后指令预测执行部件的 代码或关闭分支预测部件的代码;
所述调整模块 43具体用于:
根据关闭加载之后指令预测执行部件的代码, 关闭加载之后所述处理器 的指令预测执行部件; 或 才艮据关闭分支预测部件的代码, 关闭所述处理器的分支预测部件。
本发明实施例通过对从处理器的第二緩存中按序读取多条指令, 检测是 否存在特殊指令序列, 若存在特殊指令序列, 则确定所述特殊指令序列中各 指令对应的控制码; 并将所述特殊指令序列中各指令及其对应的控制码保存 到所述处理器的第一緩存中; 之后, 若确定从所述处理器的第一緩存中读取 的指令存在对应的控制码, 则根据所述控制码调整所述处理器的微结构; 由于本实施例的控制码是为了避免当某些特殊指令序列进入处理器的流 水线时会造成流水线停顿的问题, 针对某些特殊指令序列中的各指令设计的, 时, 处理器可以关闭加载之后所述处理器的指令预测执行部件, 从而避免在 执行 load指令时造成处理器的流水线停顿的问题; 又例如, 当检测出特殊指 令序列 <1«¾1^11^0^,10 (1 01^3^>即将进入处理器的流水线时,为了避免分支 预测部件的分支预测失误, 处理器可以关闭分支预测部件, 从而可以避免在 执行 branch指令时造成的分支预测失误; 因此, 可以优化处理器执行指令的 工作效率。
本发明实施例还提供一种处理器, 包括: 图 4 所示实施例所述的指令处 理装置, 详细内容不再赘述。
图 5为本发明另一实施例提供的处理器的结构示意图, 如图 5所示, 包 括: 第一緩存器 51、 第二緩存器 52、 预译码器 53和特殊指令序列检测器 54; 第一緩存器 51、 第二緩存器 52、 预译码器 53和特殊指令序列检测器 54之间 通过通信总线进行连接。
其中, 第二緩存器 52 , 用于按序连续读取多条指令;
预译码器 53 , 用于根据第二緩存器 52按序读取的多条指令, 利用指令与 预译码值之间的对应关系, 按序分别确定该多条指令对应的预译码值;
特殊指令序列检测器 54 ,用于根据预译码器 53按序分别确定的多条指令 对应的预译码值组成的序列, 若确定存在特殊指令序列对应的预译码值序列, 则确定该多条指令中存在特殊指令序列, 则进一步根据该特殊指令序列与该 特殊指令序列中各指令的控制码之间的对应关系, 确定该特殊指令序列中各 指令对应的控制码;
第一緩存器 51 ,用于保存特殊指令序列检测器 54确定的特殊指令序列中 各指令及其对应的控制码;
需要说明的是, 第一緩存器 51 , 还用于保存上述多条指令中除特殊指令 序列之外的其他指令。
对应地, 当第一緩存器 51中保存的指令进入处理器的流水线时, 若确定 进入处理器的流水线的指令存在对应的控制码, 则根据控制码调整该处理器 的微结构, 使得处理器的流水线不停顿。
举例来说, 第一緩存器 51 可以是 I-Cache, 第二緩存器 52 可以是 L2 CACHE。
举例来说, 上述特殊指令序列包括但不限于 <load,load,load>指令序列和 <branch,store,load,compare>指令序歹1 J。
举例来说, 若所述特殊指令序列为<1。 (1, load, load>指令序列时, 则特殊 指令序列<10 (1, load, 10 (1>中各 load指令的控制码为关掉加载之后的指令预测 执行部件的代码, 相应地, 根据 load指令的控制码, 掉加载之后的指令预测 执行部件;
举例来说 , 若所述特殊指令序列为 <branch,store,load,compare>指令序列 时, 则 <branch,store,load,compare>中 branch指令的控制码为关闭分支预测部 件的代码, 相应地, 根据 branch指令的控制码, 关闭分支预测部件。
本发明实施例通过对从处理器的第二緩存器中按序读取多条指令, 检测 是否存在特殊指令序列, 若存在特殊指令序列, 则确定所述特殊指令序列中 各指令对应的控制码; 并将所述特殊指令序列中各指令及其对应的控制码绑 定后保存到所述处理器的第一緩存器中; 之后, 当从所述处理器的第一緩存 器中读取的指令中绑定有对应的控制码, 则根据所述控制码调整所述处理器 的微结构;
由于本实施例的控制码是为了避免当某些特殊指令序列进入处理器的流 水线时会造成流水线停顿的问题, 针对某些特殊指令序列中的各指令设计的, 时, 处理器可以关闭加载之后所述处理器的指令预测执行部件, 从而避免在 执行 load指令时造成处理器的流水线停顿的问题; 又例如, 当检测出特殊指 令序列 <1«¾1^11^0^,10 (1 01^3^>即将进入处理器的流水线时,为了避免分支 预测部件的分支预测失误, 处理器可以关闭分支预测部件, 从而可以避免在 执行 branch指令时造成的分支预测失误; 因此, 可以优化处理器执行指令的 工作效率。
本发明实施例还提供一种终端设备, 包括: 图 5 所示实施例所述的处理 器, 详细内容不再赘述。
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统, 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。 可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元中 , 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一个单 元中。 上述集成的单元既可以釆用硬件的形式实现, 也可以釆用软件功能单 元的形式实现。
上述以软件功能单元的形式实现的集成的单元, 可以以代码的形式存储 在一个计算机可读取存储介质中。 上述代码存储在一个计算机可读存储介质 中, 包括若干指令用以使处理器或硬件电路执行本发明各个实施例所述方法 的部分或全部步骤。 而前述的存储介质包括: 通用串行总线接口的无需物理 驱动器的微型高容量移动存储盘、 移动硬盘、 只读存储器(英文: Read-Only Memory, 简称 ROM )、 随机存取存储器(英文: Random Access Memory, 简 称 RAM )、 磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对其 限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通技术 人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修改, 或 者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技 术方案的本质脱离本发明各实施例技术方案的保护范围。

Claims

权利 要求 书
1、 一种指令处理方法, 其特征在于, 包括:
从处理器的第二緩存中按序读取多条指令, 若确定所述多条指令中存在特 殊指令序列, 则确定所述特殊指令序列中各指令对应的控制码;
将所述特殊指令序列中各指令及其对应的控制码保存到所述处理器的第一 緩存中;
若确定从所述处理器的第一緩存中读取的指令存在对应的控制码, 则根据 所述控制码调整所述处理器的微结构, 使得所述处理器的流水线不发生停顿; 所述特殊指令序列是指造成所述处理器的流水线停顿的特殊指令序列。
2、 根据权利要求 1所述的方法, 其特征在于, 所述从处理器的第二緩存中 按序读取多条指令, 若确定所述多条指令中存在特殊指令序列, 则确定所述特 殊指令序列中各指令对应的控制码, 包括:
从所述处理器的第二緩存中按序读取多条指令, 根据指令与预译码值之间 的对应关系, 按序分别确定所述多条指令中每条指令对应的预译码值;
在所述多条指令所对应的预译码值组成的序列中, 若确定存在特殊指令序 列对应的预译码值序列, 则确定所述多条指令中包括特殊指令序列;
根据所述特殊指令序列中各指令和控制码之间的对应关系, 确定所述特殊 指令序列中各指令对应的控制码。
3、 根据权利要求 1-2中任一项所述的方法, 其特征在于, 所述控制码包括 关闭加载之后指令预测执行部件的代码或关闭分支预测部件的代码;
所述根据所述控制码调整所述处理器的微结构 , 包括:
根据所述关闭加载之后指令预测执行部件的代码, 关闭加载之后所述处理 器的指令预测执行部件; 或
才艮据所述关闭分支预测部件的代码, 关闭所述处理器的分支预测部件。
4、 一种指令处理装置, 其特征在于, 包括:
确定模块, 用于从处理器的第二緩存中按序读取多条指令, 若确定所述多 条指令中存在特殊指令序列, 则确定所述特殊指令序列中各指令对应的控制码; 保存模块, 用于将所述特殊指令序列中各指令及其对应的控制码保存到所 述处理器的第一緩存中;
调整模块, 用于若确定从所述处理器的第一緩存中读取的指令存在对应的 控制码, 则根据所述控制码调整所述处理器的微结构, 使得所述处理器的流水 线不停顿;
所述特殊指令序列是指造成所述处理器的流水线停顿的特殊指令序列。
5、 根据权利要求 4所述的装置, 其特征在于, 所述确定模块具体包括: 第一确定单元, 用于从所述处理器的第二緩存中按序读取多条指令, 根据 指令与预译码值之间的对应关系, 按序分别确定所述多条指令中每条指令对应 的预译码值;
第二确定单元, 用于根据所述多条指令中每条指令对应的预译码值组成的 序列, 若确定存在特殊指令序列对应的预译码值序列, 则确定所述多条指令中 包括特殊指令序列;
第三确定单元, 用于根据所述特殊指令序列中各指令和控制码之间的对应 关系, 确定所述特殊指令序列中各指令对应的控制码。
6、 根据权利要求 4-5中任一项所述的装置, 其特征在于, 所述控制码包括 关闭加载之后指令预测执行部件的代码或关闭分支预测部件的代码;
所述调整模块具体用于:
根据所述关闭加载之后指令预测执行部件的代码, 关闭加载之后所述处理 器的指令预测执行部件; 或
才艮据所述关闭分支预测部件的代码, 关闭所述处理器的分支预测部件。
7、 一种处理器, 其特征在于, 包括: 如权利要求 4-6任一项所述的指令处 理装置。
8、 一种终端设备, 其特征在于, 包括: 如权利要求 7所述的处理器。
PCT/CN2014/083879 2013-08-30 2014-08-07 指令处理方法及装置、处理器 WO2015027809A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310389245.3 2013-08-30
CN201310389245.3A CN104423927B (zh) 2013-08-30 2013-08-30 指令处理方法及装置、处理器

Publications (1)

Publication Number Publication Date
WO2015027809A1 true WO2015027809A1 (zh) 2015-03-05

Family

ID=52585536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/083879 WO2015027809A1 (zh) 2013-08-30 2014-08-07 指令处理方法及装置、处理器

Country Status (2)

Country Link
CN (1) CN104423927B (zh)
WO (1) WO2015027809A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783954A (zh) * 2020-06-30 2020-10-16 安徽寒武纪信息科技有限公司 一种用于确定神经网络的性能的方法和设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111936968A (zh) * 2018-04-21 2020-11-13 华为技术有限公司 一种指令执行方法及装置
CN110688160B (zh) * 2019-09-04 2021-11-19 苏州浪潮智能科技有限公司 一种指令流水线处理方法、系统、设备及计算机存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101377734A (zh) * 2008-07-10 2009-03-04 威盛电子股份有限公司 运算系统及设定运算系统的方法
US20090217016A1 (en) * 2008-02-22 2009-08-27 International Business Machines Corporation System and method for search area confined branch prediction
CN101770358A (zh) * 2010-02-10 2010-07-07 北京龙芯中科技术服务中心有限公司 微处理器跳转指令分支预测处理系统和方法
CN103150146A (zh) * 2013-01-31 2013-06-12 西安电子科技大学 基于可扩展处理器架构的专用指令集处理器及其实现方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090217016A1 (en) * 2008-02-22 2009-08-27 International Business Machines Corporation System and method for search area confined branch prediction
CN101377734A (zh) * 2008-07-10 2009-03-04 威盛电子股份有限公司 运算系统及设定运算系统的方法
CN101770358A (zh) * 2010-02-10 2010-07-07 北京龙芯中科技术服务中心有限公司 微处理器跳转指令分支预测处理系统和方法
CN103150146A (zh) * 2013-01-31 2013-06-12 西安电子科技大学 基于可扩展处理器架构的专用指令集处理器及其实现方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783954A (zh) * 2020-06-30 2020-10-16 安徽寒武纪信息科技有限公司 一种用于确定神经网络的性能的方法和设备
CN111783954B (zh) * 2020-06-30 2023-05-02 安徽寒武纪信息科技有限公司 一种用于确定神经网络的性能的方法、电子设备和存储介质

Also Published As

Publication number Publication date
CN104423927A (zh) 2015-03-18
CN104423927B (zh) 2018-07-13

Similar Documents

Publication Publication Date Title
US10942737B2 (en) Method, device and system for control signalling in a data path module of a data stream processing engine
TWI552070B (zh) 於確認時執行狀態更新指令、裝置、方法與系統
KR101540633B1 (ko) 하이브리드 명령 큐를 갖는 프로세서
US9405552B2 (en) Method, device and system for controlling execution of an instruction sequence in a data stream accelerator
CN1103960C (zh) 在多级流水线结构中处理条件跳转的结构和方法
CN111352659B (zh) 用于分支和获取流水线的误预测恢复设备和方法
KR101734350B1 (ko) 동적 포트 리맵핑을 이용하여 명령어 스케줄링 동안 데드록을 방지하기 위한 방법 및 장치
WO2006130466A2 (en) A method and apparatus for predicting branch instructions
CN105242963B (zh) 执行机构间的切换控制
US6209086B1 (en) Method and apparatus for fast response time interrupt control in a pipelined data processor
US10013257B2 (en) Register comparison for operand store compare (OSC) prediction
US11163577B2 (en) Selectively supporting static branch prediction settings only in association with processor-designated types of instructions
EP2936323B1 (en) Speculative addressing using a virtual address-to-physical address page crossing buffer
JP5301554B2 (ja) プロシージャリターンシーケンスを加速するための方法およびシステム
WO2015027809A1 (zh) 指令处理方法及装置、处理器
JP2008090848A (ja) データ処理システム内のレジスタリネーミング
JP2007514237A (ja) 分岐先バッファにおいてエントリを割り当てる方法及び装置
US8214617B2 (en) Apparatus and method of avoiding bank conflict in single-port multi-bank memory system
WO2018059337A1 (zh) 数据处理装置和方法
CN100533373C (zh) 一种处理单发射流水线数据相关的动态调度控制器和方法
US10990405B2 (en) Call/return stack branch target predictor to multiple next sequential instruction addresses
KR100861073B1 (ko) 적응형 파이프라인을 적용한 병렬 처리 프로세서 구조
KR102639414B1 (ko) 멀티스레딩 프로세서 및 이의 동작 방법
US7124277B2 (en) Method and apparatus for a trace cache trace-end predictor
US11474821B1 (en) Processor dependency-aware instruction execution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14841238

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14841238

Country of ref document: EP

Kind code of ref document: A1