WO2022094964A1 - 处理指令的方法以及图计算装置 - Google Patents

处理指令的方法以及图计算装置 Download PDF

Info

Publication number
WO2022094964A1
WO2022094964A1 PCT/CN2020/127243 CN2020127243W WO2022094964A1 WO 2022094964 A1 WO2022094964 A1 WO 2022094964A1 CN 2020127243 W CN2020127243 W CN 2020127243W WO 2022094964 A1 WO2022094964 A1 WO 2022094964A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
speculative
dependency
bit
instructions
Prior art date
Application number
PCT/CN2020/127243
Other languages
English (en)
French (fr)
Inventor
朱凡
周若愚
孙文博
周昔平
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080106413.1A priority Critical patent/CN116348850A/zh
Priority to PCT/CN2020/127243 priority patent/WO2022094964A1/zh
Priority to EP20960425.5A priority patent/EP4227801A4/en
Publication of WO2022094964A1 publication Critical patent/WO2022094964A1/zh
Priority to US18/312,365 priority patent/US20230297385A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding

Definitions

  • the present application relates to the field of graph computing, and in particular, to a method for processing instructions and a graph computing device.
  • Dataflow architecture is a computer system architecture. Different from the mainstream Von Neumann architecture in the industry, the program counter indicates the execution order of instructions, the data flow architecture determines the execution order by judging the validity of program parameters. In this way, the dependence of the instruction on the control flow is converted into the dependence on the data flow, so that the data flow architecture has a huge advantage in the utilization of parallelism.
  • the traditional data flow architecture also needs to support the control flow.
  • the computer architecture of this data flow + control flow is collectively referred to as a graph computing device or a graph computing architecture (graphflow architecture).
  • memory aliasing is a common problem that affects the efficiency of hardware access.
  • memory aliasing refers to the situation where two pointers point to one storage address at the same time.
  • the order of execution of programs must obey true dependencies between instructions, such as in the following example of executing instructions:
  • Instruction 3 store R1, 2(R2);
  • instruction 3 indicates that the data in the register R1 is stored in the address (R2+2)
  • instruction 4 indicates that the data in the address (R4+4) is read into the register R3. Because the dependency between access instructions cannot be pre-judged by identifying the register number, it is necessary to wait until the results of R2+2 and R4+4 are calculated to know whether the dependency exists. Therefore, there is an order dependency between the two instructions, that is, instruction 4 can be executed after instruction 3 is executed. If the addresses are different, there is no order dependency. That is, the execution order of the two instructions does not affect the correctness of the program. Therefore, instruction 4 needs to be delayed until the dependencies of the access instruction are established before issuing. However, in most cases, R2+2 is not equal to R4+4, that is to say, order dependence does not exist in most cases. The latency of instruction 4 reduces efficiency in this case.
  • the mainstream computer systems will adopt the speculative execution method, that is, in the case where the address has not been obtained, it is assumed that the order dependency does not hold, so as to speculatively execute the read instruction.
  • the order dependency does not hold, so speculatively executed read instructions are not affected in any way, and a lot of time waiting for addresses to be calculated is saved.
  • the order dependency holds, then the speculative behavior is invalidated, and the computer system clears and redoes the read operation and other subsequent operations that have entered the pipeline.
  • the graph computing device based on the data flow architecture also faces the problem of memory aliasing when executing programs, so how to improve the efficiency of speculative execution in the data flow architecture needs to be solved urgently.
  • the present application provides a method for processing instructions and a graph computing device, which can improve the efficiency of speculative execution and reduce the cost of speculative failure.
  • a method for processing instructions is provided, the method is applied to a graph computing device, the graph computing device is based on a data flow architecture, the graph computing device includes at least one processing engine PE and a load storage unit LSU, so
  • the PE includes an information buffer IB, the IB is used to cache the instruction queue, and the PE is used to execute the instructions cached in the IB; wherein, the IB includes a speculative bit and a speculative ID field segment, and the speculative The bit is used to indicate whether the current instruction is a speculatively executable instruction, the speculative ID field segment is used to store the speculative ID of a speculative operation of the current instruction, the LSU includes a load queue LQ, and the LQ is used to cache the read instruction
  • the method includes: the IB sends a first instruction to the LQ, the first instruction is used to request to read data, the first instruction meets a first preset condition, and the first preset The conditions include: the specu
  • the data flow architecture executes instructions based on a data dependency graph, which may refer to a data flow graph composed of nodes and directed arcs connecting nodes. Nodes represent operations or functions performed, and directed arcs represent the order in which nodes are performed.
  • the speculative operation may refer to an operation of executing an instruction first when there is a possibility that an instruction depends on another instruction and the other instruction has not been executed. If after another instruction is executed, it is confirmed that it has no dependency on the instruction, the speculation is successful. If another instruction is confirmed to have a dependency on the instruction after execution, the speculation fails.
  • the graph computing device can judge whether the instruction can perform a speculative operation according to the speculative bit, and indicate the speculative operation through the speculative ID, so that the speculative operation can be used.
  • ID marks the source of speculation.
  • the situation where the speculative bit is yes includes a situation where the first instruction has speculative dependencies on other instructions.
  • the speculative dependency relationship indicates that there is a possibility that the first instruction depends on the third instruction, and the first instruction can be speculatively executed .
  • the first instruction and the third instruction meet a second preset condition
  • the second preset condition includes that the ideal execution order of the first instruction is after the third instruction
  • the second preset condition includes:
  • the LQ includes the speculative ID field segment
  • the method further includes: after the IB transmits the first instruction to the LQ, the The LQ assigns the first speculative ID to the first instruction, and writes the first speculative ID into the speculative ID field of the first instruction in the LQ; the LQ sends the IB the first speculation ID; and the IB determining the first speculation ID includes: the IB receiving the first speculation ID from the LQ.
  • the speculative ID is a one-hot code.
  • the speculative ID adopts the one-hot code. Since the new speculative ID obtained after the two one-hot codes are phased or later can still retain the information of the previous speculative ID, after the speculative operation fails, the one-hot code can be used to quickly find the speculative ID.
  • the source of the operation is convenient to clear and re-execute the instruction related to the speculative operation, so as to improve the efficiency of the speculative operation of the access instruction executed by the data flow architecture.
  • the method further includes: according to the first instruction, the LQ searches and acquires the data requested to be read by the first instruction from the SB or the memory.
  • the memory may be on-chip memory or off-chip memory.
  • the memory is another storage device other than the SB.
  • the method further includes: after acquiring the data requested by the first instruction to read, the LQ transmits the first speculative ID to the second instruction in the IB
  • the speculative ID field in the second instruction depends on the first instruction.
  • the first speculative ID is also carried in the second instruction that has a dependency relationship with the first instruction, so that after the speculative operation fails, the instruction associated with the first speculative ID can be found according to the first speculative ID, thereby Improve the efficiency of speculative operations.
  • the fourth instruction is an instruction that has undergone a speculative operation, and the speculative operation corresponds to the third speculative ID.
  • the IB can generate a fourth speculation ID based on the first speculation ID and the third speculation ID, and store the fourth speculation ID in the speculation ID field of the second instruction in the IB.
  • the fourth speculative ID can retain the information of the first speculative ID and the third speculative ID, that is, the fourth speculative ID can indicate the speculative corresponding to the first speculative ID and the third speculative ID at the same time. operation, so that multiple speculative operations can be traced according to the fourth speculative ID, so that after any speculative operation fails, the instructions related to the speculative operation can be traced and cleared according to the fourth speculative ID, thereby reducing the cost of speculative failure.
  • the IB further includes a speculative flag bit, where the speculative flag bit is used to indicate whether the current instruction has been speculatively launched, and the method further includes: in the IB After issuing the first instruction to the LQ, the IB sets the speculative flag bit of the first instruction to yes.
  • the IB further includes a speculative flag bit, where the speculative flag bit is used to indicate whether the current instruction has been speculatively launched, and the method further includes: after obtaining the After the first instruction requests the read data, the IB issues a second instruction that depends on the first instruction; The speculative flag bit is set to yes.
  • the IB further includes a dependency valid bit and a dependency presence bit, and the dependency valid bit is used to indicate whether the current instruction depends on the execution of another instruction before execution is completed,
  • the dependency existence bit is used to indicate whether the execution of the instruction on which the current instruction depends is completed;
  • the first preset condition further includes: the dependency valid bit of the first instruction in the IB is set to yes, and the dependency existence bit is set is no.
  • the IB further includes at least one parameter field, at least one parameter field valid bit, and at least one parameter field presence bit, and the at least one parameter field is associated with the at least one parameter field.
  • the valid bits of the parameter field are in a one-to-one correspondence with the existence bits of the at least one parameter field
  • the parameter field is used to store the input data of the current command
  • the valid bits of the parameter field are used for Indicates whether the parameter field corresponding to the valid bit of the parameter field is valid
  • the parameter field existence bit is used to indicate whether data already exists in the parameter field corresponding to the parameter field existence bit
  • the first preset condition is also Including: the first parameter field valid bit in the at least one parameter field valid bit is set to Yes, the parameter field existence bit corresponding to the first parameter field valid bit is set to Yes, the first parameter field valid bit is all Any one of the valid bits of the at least one parameter
  • the LSU further includes a storage buffer SB, where the SB is used to cache and store an instruction queue, and the method further includes: the IB according to the third instruction and the read address of the first instruction to determine whether the speculative operation of the first instruction corresponding to the first speculative ID is wrong; the speculative operation of the first instruction corresponding to the first speculative ID In the case of an operation error, the IB re-transmits the first instruction to the LQ.
  • the method further includes: the IB determines the first speculative ID according to the storage address of the third instruction and the read address of the first instruction Whether the speculative operation of the corresponding first instruction is incorrect; if the speculative operation of the first instruction corresponding to the first speculative ID is incorrect, the IB re-transmits the first instruction to the LQ.
  • the method further includes: after the IB re-transmits the first instruction to the LQ, the LQ re-allocates a second instruction to the first instruction speculative ID, and write the second speculative ID into the speculative ID field of the first instruction in the LQ; the LQ transmits the second speculative ID to the first instruction in the IB's speculative ID field Speculative ID field.
  • a speculative ID is used to indicate a speculative operation. Therefore, when the first instruction is re-executed, the LQ will assign a new speculative ID to the first instruction, so as to use the new speculative ID to retrace the instruction related to the new speculative operation, so as to improve the efficiency of the speculative operation.
  • the method further includes: in the case that the corresponding speculative operation of the first speculative ID is incorrect, the IB sends a message to the at least one PE, the LQ or the SB broadcasts the first speculative ID; the at least one PE, the LQ or the SB compares the first speculative ID with the speculative ID of the instruction being executed by itself to determine whether the two exist Association relationship; in the presence of an association relationship, the at least one PE, the LQ or the SB stops executing the executing instruction, and stops transmitting data or dependencies of the executing instruction. .
  • the IB broadcasts the first speculative ID of the first instruction to the pipeline in the graph computing device, so as to clear only the speculative operation-related instruction and avoid clearing the speculative operation after the speculative operation. Speculative operations irrelevant instructions, thereby reducing the cost of speculative failure and improving the efficiency of speculative operations.
  • the IB further includes a time stamp, where the time stamp is used to indicate an ideal execution sequence of an access instruction, where the access instruction includes a store instruction or a read instruction .
  • the graph computing device executes the instructions in the data dependency graph, and when compiling the data dependency graph, assigns a timestamp to the access instructions in the instruction to indicate the ideal execution sequence between the access instructions, and the timestamp can be used as auxiliary information for executing the instruction , to support the correct execution of the access instructions, thereby improving the memory access efficiency based on the data flow architecture.
  • the time stamp is also included in the SB and the LQ.
  • a graph computing device is provided, the graph computing device is based on a data flow architecture, the graph computing device includes at least one processing engine PE and a load storage unit LSU, the PE includes an information buffer IB, the The IB is used to cache the instruction queue, and the PE is used to execute the instructions cached in the IB; wherein, the IB includes a speculative bit and a speculative ID field segment, and the speculative bit is used to indicate whether the current instruction is speculative.
  • the executed instruction, the speculative ID field segment is used to store the speculative ID of a speculative operation of the current instruction, and the LSU includes a load queue LQ, and the LQ is used to cache the read instruction queue.
  • the graph computing device can judge whether the instruction can perform a speculative operation according to the speculative bit, and indicate the speculative operation through the speculative ID, so that the speculative operation can be used.
  • ID marks the source of speculation.
  • the speculative ID in the speculative ID field segment is a one-hot code.
  • the IB is configured to perform the following operations: the IB sends a first instruction to the LQ, the first instruction is used to request to read data, the The first instruction meets a first preset condition, and the first preset condition includes: the speculative bit of the first instruction in the IB is set to yes; the IB determines the first speculative ID, and sets the first speculative ID to A speculative ID is stored in the speculative ID field of the first instruction in the IB, and the first speculative ID is used to indicate the current speculative operation.
  • the LQ includes the speculative ID field segment, and the LQ is configured to: after the IB transmits the first instruction to the LQ, assigning the first speculative ID by the first instruction, and writing the first speculative ID into the speculative ID field of the first instruction in the LQ; sending the first speculative ID to the IB;
  • the IB is specifically configured to receive the first speculative ID from the LQ.
  • the LQ is configured to search for and acquire the data requested to be read by the first instruction from the SB or the memory according to the first instruction.
  • the IB is further configured to: after acquiring the data requested to be read by the first instruction, issue a second instruction, where the second instruction depends on the first instruction an instruction; and after issuing the second instruction, a speculative flag bit of the second instruction is set to yes.
  • the IB further includes a dependency valid bit and a dependency presence bit, and the dependency valid bit is used to indicate whether the current instruction depends on another instruction before execution is completed, The dependency existence bit is used to indicate whether the instruction on which the current instruction depends is executed or not;
  • the first preset condition further includes: the dependency valid bit of the first instruction in the IB is set to yes, and the dependency existence bit is set to no.
  • the LSU further includes a storage buffer SB, where the SB is used for buffering the storage instruction queue.
  • the IB is further configured to: after transmitting the first instruction, transmit a third instruction to the SB, where the third instruction is a storage instruction, wherein, The first instruction and the third instruction meet a second preset condition, and the second preset condition includes that the ideal execution order of the first instruction is after the third instruction, and the first instruction has no effect on all
  • the third instruction may have a memory dependency relationship, and the memory dependency relationship refers to a sequential dependency relationship between access instructions due to operating the same address; after the third instruction is issued, the third instruction The storage address is sent to the LQ.
  • the IB is further configured to: determine, according to the storage address of the third instruction and the read address of the first instruction, the corresponding ID of the first speculative ID. Whether the speculative operation of the first instruction is incorrect; if the speculative operation of the first instruction corresponding to the first speculative ID is incorrect, retransmit the first instruction to the LQ.
  • the LQ is further configured to: after the IB re-transmits the first instruction to the LQ, re-assign a second speculative ID to the first instruction , and write the second speculative ID into the speculative ID field of the first instruction in the LQ; transmit the second speculative ID to the speculative ID field of the first instruction in the IB.
  • the IB is further configured to: in the case that the corresponding speculative operation of the first speculative ID is incorrect, the IB sends a message to the at least one PE, the The LQ or the SB broadcasts the first speculative ID; the at least one PE, the LQ or the SB is used to: compare the first speculative ID with the speculative ID of the instruction being executed by itself, to determine Whether there is an association relationship between the two; and if there is an association relationship, stop executing the executing instruction, and stop transmitting the data or dependencies of the executing instruction.
  • the IB further includes a time stamp, where the time stamp is used to indicate an ideal execution sequence of an access instruction, where the access instruction includes a store instruction or a read instruction .
  • the time stamp is also included in the SB and the LQ.
  • a method for processing instructions including: acquiring program code; determining multiple instructions in the program code and a dependency relationship between the multiple instructions; according to the multiple instructions and the Dependency relationship, determining a data dependency graph; wherein, the determining of multiple instructions in the program code and the dependency relationship between the multiple instructions includes: for the instructions whose dependency relationship cannot be identified among the multiple instructions , establish a speculative dependency relationship between the first instruction and the third instruction that meet the second preset condition, the second preset condition includes: the first instruction is a read instruction, and the third instruction is a storage instruction instruction, the ideal execution order of the first instruction is after the third instruction, the first instruction may have a memory dependency on the third instruction, and the memory dependency refers to the memory dependency between access instructions There are sequential dependencies due to operating on the same address.
  • the speculative dependency relationship may refer to the possibility that the first instruction depends on the third instruction, and the first instruction can be speculatively executed.
  • determining a plurality of instructions in the program code and a dependency relationship between the plurality of instructions further includes: identifying a dependency among the plurality of instructions For the instructions of the relationship, the dependency relationship between the instructions is established; for the instructions that can be identified as having no dependency relationship among the multiple instructions, it is determined that the dependency relationship between the instructions is not established.
  • the determining a plurality of instructions in the program code and a dependency relationship between the plurality of instructions further includes: for the storage of the plurality of instructions Instruction fetches assign timestamps to indicate the ideal execution order between the access instructions, including store instructions or fetch instructions.
  • the last one of the first branch is adopted at the place where the multiple branches converge
  • the time stamp starts counting and the first branch is the branch with the largest number of access instructions among the plurality of branches.
  • the dependencies include at least one of the following: data dependencies, memory dependencies, and control dependencies.
  • a method for processing instructions comprising: acquiring program code; determining multiple instructions in the program code and a dependency relationship between the multiple instructions; according to the multiple instructions and the A dependency relationship, determining a data dependency graph; wherein, the determining a plurality of instructions in the program code and a dependency relationship between the plurality of instructions includes: assigning a timestamp to an access instruction in the plurality of instructions , the time stamp is used to indicate an ideal execution sequence between the access instructions, and the access instructions include a store instruction or a read instruction.
  • time stamps are assigned to the access instructions in the instructions to indicate the ideal execution order between the access instructions, and the time stamp can be used as auxiliary information for the execution instructions to support the correct execution of the access instructions.
  • the last one of the first branch is adopted at the place where the multiple branches converge.
  • the time stamp starts counting, and the first branch is the branch with the largest number of access instructions among the plurality of branches.
  • Timestamps can provide a solution for the execution order of each branch when multiple branches are converged in parallel, thereby improving the memory access efficiency based on data flow architecture.
  • an apparatus for processing instructions includes a functional unit, and the functional unit is configured to execute the instructions of the third aspect or the method in any possible implementation manner of the third aspect, or to execute Instructions for the method of the fourth aspect or any possible implementation of the fourth aspect.
  • a computer storage medium for storing instructions that, when executed on a graph computing device, cause the graph computing device to execute the method in the first aspect or any possible implementation manner of the first aspect instruction.
  • a computer storage medium for storing instructions that, when executed on a computer, cause the computer to execute the instructions of the method in the third aspect or any possible implementation manner of the third aspect , or an instruction for executing the method in the fourth aspect or any possible implementation manner of the fourth aspect.
  • FIG. 1 is a schematic structural diagram of a graph computing device 100 according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for processing an instruction according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a data dependency graph according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a data dependency graph for identifying timestamps according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an IB111 according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an LQ141 according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a speculative ID according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an SB142 according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a method for processing an instruction according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an instruction execution flow of the graph computing device 100 according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a data dependency graph according to an embodiment of the present application.
  • FIG. 12 to FIG. 19 are schematic diagrams of states of the graph computing device 100 in different stages of executing the data dependency graph of FIG. 11 , respectively.
  • FIG. 1 is a schematic structural diagram of a graph computing device 100 according to an embodiment of the present application. As shown in FIG. 1, the graph computing device 100 includes:
  • An execution unit (process engine, PE) 110 may include one or more PEs 110.
  • PEs 110 may include one or more PEs 110.
  • FIG. 1, 8 PEs 110 are used as an example for description. It should be understood that the number of PEs 110 may also be reduced or increased.
  • the graph computing device 100 can be used as a part of a central processing unit (CPU) to accelerate instruction execution and reduce CPU power consumption by taking advantage of the parallelism between instructions and the low power consumption of the graph architecture.
  • CPU central processing unit
  • the GBU 120 is mainly used to initialize graph commands and send graph commands to each PE 110 .
  • the PE 110 is mainly used to execute graph instructions put into it, and send data or requests to other units in the graph computing device 100 , for example, other PE 110 or LSU 140 .
  • An information buffer (IB) 111 is included in the PE 110.
  • the IB 111 is used for buffering data flow instructions, and selects the instructions that have been prepared to be sent to the PE 110 or the LSU 140 for execution.
  • the PE110 further includes a control unit (not shown in FIG. 1 ), which can be used to control the IB111 to perform corresponding functions, such as controlling the IB111 to issue instructions or receive data and information.
  • a control unit (not shown in FIG. 1 ), which can be used to control the IB111 to perform corresponding functions, such as controlling the IB111 to issue instructions or receive data and information.
  • the method executed by the IB111 is controlled and executed by the control unit in the PE110.
  • the functions implemented by the control unit of the PE110 controlling the IB111 are described as being executed by the IB111.
  • the data bus and dependency bus 130 are mainly used to transfer data and dependency information between the PEs 110.
  • the dependency information may refer to information for indicating a dependency relationship between instructions.
  • the LSU 140 is mainly used to receive and execute access instructions from each PE 110. Access instructions include store instructions and read instructions.
  • the LSU 140 also includes a load queue (LQ) 141 and a store buffer (SB) 142.
  • the LQ141 is used to cache the instruction queue that requests to read memory data.
  • the SB142 is used to cache the instruction queue that requests to store data in memory. In some examples, the SB142 can also be used to transfer data into the LQ141 to avoid the power consumption and latency associated with accessing memory.
  • IB111 may also be referred to as an instruction information cache
  • LQ141 may also be referred to as a read request cache
  • SB142 may also be referred to as a store request cache.
  • the LSU 140 includes a control unit (not shown in FIG. 1 ), which can be used to control the LQ141 and SB142 to perform corresponding functions, for example, control the LQ141 and SB142 to transmit or receive data and information.
  • a control unit (not shown in FIG. 1 ), which can be used to control the LQ141 and SB142 to perform corresponding functions, for example, control the LQ141 and SB142 to transmit or receive data and information.
  • the methods executed by the LQ141 and the SB142 are controlled and executed by the control unit in the LSU 140 .
  • the functions implemented by the control unit in the LSU 140 to control the LQ141 or the SB142 are described as being executed by the LQ141 or the SB142.
  • FIG. 1 also shows a CPU front-end (CPU front-end) 200, which mainly includes an instruction fetching and decoding unit of the CPU front-end, which is used to read and parse the instruction content from the memory, and send the graph instruction to the graph computing device 100.
  • CPU front-end 200 does not belong to the hardware part of the graph computing device 100 .
  • the graph computing device 100 is a hardware system based on a data flow architecture.
  • the data flow architecture expresses the degree of parallelism between instructions directly to the hardware for execution by explicitly describing (explicit) the dependencies of the instructions at the level of the instruction set.
  • the data flow architecture can be abstracted into a directed graph consisting of N nodes, that is, a data dependency graph. Node-to-node connections represent a dataflow. Once the input of each node is ready, the current node can perform the operation and pass the result to the next node. Nodes that are not on the same path in the same graph can run concurrently.
  • the graph computing apparatus 100 in FIG. 1 is only used as an example, and in practice, various units in the graph computing apparatus 100 may also perform operations such as merging, replacing, and the like. Or include more or fewer units. This embodiment of the present application does not limit this.
  • Assembly writing for dataflow architectures also differs from mainstream von Neumann architecture program assembly.
  • Each assembly instruction does not need to specify the input of the program, but only needs to specify the destination of the output result of the instruction.
  • the code expressed in this way can make it easy for the hardware to find the dependency chain between the instructions.
  • the present application makes use of the above advantages and optimizes access instructions.
  • a method for speculative execution of access instructions in data flow architecture is proposed. The method improves the efficiency of accessing instructions in the data flow architecture. And according to the characteristics of data-dependent graph execution in data flow architecture, the speculative feature of access instructions is used to speed up the operation of graph instructions, and the re-execution cost after speculative failure is reduced. That is, after the speculation fails, only the instructions that depend on the result of the speculation failure are re-executed, thereby improving the memory access efficiency of the graph computing device based on the data flow architecture.
  • the data dependency graph may refer to a data flow graph composed of nodes and directed arcs connecting the nodes. Nodes represent operations or functions performed, and directed arcs represent the order in which nodes are performed. Nodes in different paths in the data dependency graph can execute in parallel.
  • a speculative operation can refer to an operation where an instruction is executed first if there is a possibility that it will depend on another instruction and the other instruction has not yet been executed. If after another instruction is executed, it is confirmed that it has no dependency on the instruction, the speculation is successful. If another instruction is confirmed to have a dependency on the instruction after execution, the speculation fails.
  • the compiler Before the graph computing apparatus 100 processes the instructions, the compiler first needs to analyze the dependency relationship between the instructions according to the original program, and compile the original program into an instruction set based on the data dependency graph according to the dependency relationship. The compiled instruction set is then sent to the IB 111 in the graph computing device 100 so that the graph computing device 100 can execute the instructions.
  • the method of processing the instruction executed by the compiler side is first described below.
  • FIG. 2 is a schematic flowchart of a method for processing an instruction according to an embodiment of the present application. This method can be executed by the compiler. As shown in Figure 2, the method includes:
  • the above program code may refer to a source file written in a language supported by the development tool.
  • the dependencies include at least one of the following: data dependencies, memory dependencies, and control dependencies.
  • a data dependency can mean that one instruction needs to fetch the data transferred by another instruction before it can be executed.
  • a memory dependency can refer to an order dependency between access instructions due to operations on the same address.
  • Control dependencies may refer to conditional dependencies due to control flow. For example, if the preceding instruction is a conditional statement, a control dependency relationship is generated between the succeeding instruction and the preceding instruction.
  • conditional statements include if-or (if-else) statements.
  • Sequential dependencies can mean that one instruction must be executed after another instruction has completed. It should be noted that memory dependencies, data dependencies, and control dependencies can all lead to sequential dependencies. In other words, sequential dependencies include data dependencies, memory dependencies, and control dependencies.
  • the compiler After the compiler obtains the original instructions, it can analyze the dependencies and determine the dependencies identifiable by the compiler, so as to establish a data dependency graph.
  • a data dependency graph instructions can be abstracted into instruction nodes.
  • the compiler may be between the first instruction and the third instruction that meet the second preset condition Build speculative dependencies.
  • the first instruction is a read instruction
  • the third instruction is a store instruction.
  • the speculative dependency relationship means that there is a possibility that the first instruction depends on the third instruction, and the first instruction can be speculatively executed.
  • the compiler can establish dependencies between two instructions, and at the same time indicate that the following instructions can be speculatively executed.
  • the second preset condition includes: the first instruction is a read instruction, the third instruction is a store instruction, the ideal execution order of the first instruction is after the third instruction, and the third instruction is a storage instruction.
  • the memory dependency refers to an order dependency between access instructions caused by operating the same address.
  • the third case is mainly aimed at accessing instructions, that is, storing instructions or reading instructions.
  • the compiler is mainly used to analyze the memory dependencies between access instructions, that is, the analysis related to memory aliasing.
  • the compiler cannot identify whether there is a memory dependency between access instructions, and needs to wait until the execution state to know whether there is a memory dependency.
  • a speculative dependency can be established for the case of a fetch instruction followed by a store instruction.
  • the graph computing device 100 may perform a speculative operation on an instruction with a speculative dependency.
  • the compiler may not establish dependencies, but directly perform speculative execution by the graph computing device 100 .
  • the analysis of the memory dependency of access instructions mainly includes the following four types: store instruction followed by read instruction, store instruction followed by store instruction, read instruction followed by read instruction, and read instruction followed by storage situation of the order. Next, analyze the processing methods for not establishing dependencies in the above four situations.
  • the dependency relationship of the read instruction followed by the store instruction is not established because during the subsequent execution of the instruction, the graph computing device 100 can query the sequence and address information of the previously issued access instruction through SB142. , and read the data from the correct place. Since the graph computing device 100 is in the process of executing the program, the corresponding data of the storage instruction actually exists in the SB142, and the stored data in the memory is updated accordingly until the execution of the program is completed.
  • the program has a read instruction followed by a store instruction, and both have the same address, but the store instruction that should be issued after the read instruction is actually issued before the read instruction.
  • the graph computing device 100 will find that there are storage instructions with the same address in the SB142, the order of which is later than the read instruction, and thus direct the read instruction to read data from the memory , instead of taking the value from SB142.
  • the reason for not establishing a store-instruction-followed-store-instruction dependency is because the order-preserving issue order between store instructions will not affect the efficiency of instruction execution. Therefore, for the simplicity of the process, the graph computing device 100 may choose to store Strong ordering of instructions (total store ordering). Among them, strong order preservation can also be called absolute order preservation. That is, the graph computing apparatus 100 can ensure that the storage instructions are executed in strict accordance with the ideal order, so there is no need to establish a speculative dependency relationship between the storage instructions.
  • the reason for not establishing a read instruction followed by a read instruction is that if there is no store instruction between the two read instructions, when the two read instructions read the same address, both are the same value, and there is no order dependency. If there is no store instruction between the two read instructions, it can be processed according to the situation that the store instruction is followed by the read instruction.
  • the compiler does not need to establish a dependency relationship for the storage instruction followed by the read instruction, but identifies the access sequence through the graph computing device 100, the principle of which is similar to the hardware solution of the read instruction followed by the store instruction , each time after receiving the storage command, the graph computing device 100 will query the sequence and address information of the read command that has been sent before by reading the LQ 141 . If there is a read command with the same address but should be issued after the store command, the graph computing device 100 triggers a read command speculative error and re-executes it. If not, the store instruction has no problem caused by out-of-order issue.
  • FIG. 3 is a schematic diagram of a data dependency graph according to an embodiment of the present application.
  • instruction 1 means to store the data in register 1 into memory [x].
  • Instruction 2 means to read data from memory [x+i] and store it in register 2.
  • Instruction 3 means to read data from memory [x] and store it in register 9.
  • Instruction 4 means to store the data in register 9 into memory [x+n].
  • Instruction 5 means to add the value of register 3 and register 2 and store the result in register 4.
  • Instruction 6 means to add the values of register 6 and register 5 and store into register 7.
  • the compiler can obtain the following results by analyzing the dependencies of the above instructions.
  • instruction 1 and instruction 3 the compiler can recognize that there is a memory dependency and an order dependency between the two, because both are store operations to address x. Therefore, instruction 3 needs to wait for instruction 1 to be written into the memory before it can be executed to ensure that the correct data is read.
  • edge 1 an edge between the two nodes that represents the order dependency.
  • edge 3 For instruction 2 and instruction 5, there is a data dependency between the two and can be recognized by the compiler. Therefore, an edge (ie edge 3) with a data dependency relationship is added between the two.
  • the store instructions will be concatenated into an instruction dependency chain according to the execution order (ie, the edge 4 in Figure 3).
  • the strong order preservation of the stored instructions can be performed by hardware without requiring the compiler to establish an instruction dependency chain.
  • the characteristics of the data flow architecture can be used to execute the data dependency graph, and a speculative dependency relationship can be established for the situation where the read instruction is followed by the read instruction, so as to identify the speculative dependency in the compilation stage.
  • Dependencies so that the data flow architecture can perform speculative operations during subsequent program execution, thereby improving the memory access efficiency based on the data flow architecture.
  • the determining of the multiple instructions in the program code and the dependency relationship between the multiple instructions further includes: assigning a timestamp to an access instruction in the multiple instructions, the timestamp Used to indicate the ideal execution order between the access instructions, including store instructions or read instructions.
  • the timestamps described above apply only to access instructions among the plurality of instructions, and not to other types of instructions.
  • instruction 4 and instruction 5 in FIG. 3 are not access instructions, so there is no need to assign a time stamp.
  • the count starts with the last timestamp of the first branch where the multiple branches converge, and the first branch is counted.
  • a timestamp is allocated to the access instructions in the instructions to indicate the ideal execution sequence between the access instructions.
  • the timestamp can be used as auxiliary information for executing instructions to support storage Correct execution of instruction fetches. And it can provide a solution for the execution order of each branch when it is converged in the case of multiple branches in parallel, thereby improving the memory access efficiency based on the data flow architecture.
  • FIG. 4 is a schematic diagram of a data dependency graph for identifying timestamps according to an embodiment of the present application.
  • ST represents a store instruction
  • LD represents a read instruction.
  • the compiler will assign timestamps (stamps) in advance to the access instructions in the program according to the ideal execution order. For example, timestamps 1 through 8 are used to identify the ideal execution order for each store instruction.
  • timestamps 1 through 8 are used to identify the ideal execution order for each store instruction.
  • each branch will start counting with the same timestamp. At the end of convergence, the count starts with the last timestamp of the branch with the most access instructions. For example, two branches appear after timestamp 1, so the count starts with timestamp 2, respectively. And where the two branches converge, since the right branch includes more access instructions, the counting continues with the last timestamp 5 of the right branch.
  • the three storage devices may be placed to facilitate the implementation of access instructions.
  • the three storage devices include IB 111, LQ141 and SB142.
  • IB111 is used to cache data flow instructions, and select qualified instructions for execution.
  • LQ141 is used to cache the read instruction queue.
  • SB142 is used to cache the storage instruction queue.
  • the SB142 can also be used to pass data directly into the LQ141 to avoid the power consumption and latency of accessing memory.
  • IB111, LQ141 and SB142 The structures and functions of IB111, LQ141 and SB142 will be described in detail below with reference to the accompanying drawings.
  • FIG. 5 is a schematic structural diagram of an IB111 according to an embodiment of the present application.
  • IB111 may include multiple domain segments. The definitions of the above multiple domain segments are shown in Table 1 below. It should be noted that each field segment in FIG. 5 is only used as an example, and the IB111 may also include more or less field segments.
  • the instruction field (Inst) and the parameter field (op0/1) are information required for the operation of the auxiliary data flow architecture, which are used to instruct the graph computing device 100 to perform related operations according to the instructions.
  • the instruction field (Inst) can be used to indicate the operation type of the instruction: for example, a store instruction (denoted as ST), a read instruction (denoted as LD), and so on.
  • the parameter field (op0/1) is used to store the input data of the current command.
  • the above input data may include addresses or parameters and the like.
  • each instruction may have one or more parameter fields (op0/1), one or more parameter field valid bits (vld0/1), and one or more parameter field existence bits (rdy0/1) in IB111. ).
  • the parameter domain (op0/1) is in a one-to-one correspondence with the valid bits of the parameter domain (vld0/1) and the existence bits of the parameter domain (rdy0/1. In the embodiment of this application, it includes two parameter domains (op0/1), Two parameter field valid bits (vld0/1) and two parameter field presence bits (rdy0/1) are used as an example to illustrate.
  • the parameter field valid bit (vld0) of the first parameter field (op0) is 1.
  • the parameter field valid bit (vld1) of the second parameter field (op1) is 0.
  • the valid bits (vld0/1) of the two parameter fields are all set to 1. It should be noted that, in the embodiment of the present application, 1 means valid, and 0 means invalid. However, the above is only an example. Optionally, 0 can be used to represent valid, and 1 can be used to represent invalid.
  • issuing an instruction may mean that the IB 111 starts to execute the instruction, and sends the relevant information of the instruction to the corresponding unit in the graph computing device 100, so as to facilitate the execution of the instruction.
  • the parameter field presence bits (rdy0/1) indicate that the input required by the instruction is already present in op0/1.
  • the valid bit (vld0/1) of the parameter field of a certain instruction and the existence bit (rdy0/1) of the parameter field are valid at the same time, it indicates that the input required by this instruction is ready and can be launched and executed.
  • the valid bit (vld1) of the second parameter field is 1 and the existence bit (rdy1) of the second parameter field is 0, it means that there is a parameter field that is not ready for this instruction, and the instruction cannot be executed yet.
  • the graph computing apparatus 100 does not need to determine whether execution is possible according to the fact that the parameter field valid bit (vld0/1) and the parameter field presence bit (rdy0/1) are valid at the same time. That is, for any instruction, when its speculative flag bit (sgo) is 1, if any one of its parameter fields (op0/1) is updated, and the corresponding parameter field valid bit (vld0/1) is set to 1. Then the instruction can be issued and executed regardless of whether another parameter field of the instruction is valid. When this happens, it means that a speculative error occurred in a previous read instruction, and subsequent instructions need to be re-triggered for execution. Therefore, the updated parameter field is the parameter field that caused the speculative error. The correct data can be entered in this update, while the data stored in the other parameter field is the data that has been stored in the previous speculative operation, so there is no need to confirm another parameter field. is it effective.
  • the dependency valid bit (prd) is used to indicate whether the current instruction depends on the execution of another instruction before execution.
  • the dependency significant bit (prd) can be used to represent dependencies added by the compiler. For example, where the compiler determines that there is a dependency between two instructions, the dependency valid bit (prd) may be set to yes.
  • the dependency significant bit (prd) is only used to indicate dependencies other than data dependencies. That is, for the case where the current instruction depends on the data transferred by another instruction to execute, the dependent valid bit (prd) indication may not be used.
  • the dependency valid bit (prd) may be used to indicate a situation where the current instruction has a memory dependency or a control dependency on other instructions, but is not used to indicate a situation where the current instruction has a data dependency on other instructions.
  • the dependency valid bit may be set to yes if the current instruction has a speculative dependency on another conditional instruction.
  • the dependency presence bit (prdy) corresponds to the dependency valid bit (prd), and the dependency presence bit (prdy) is used to indicate whether the execution of the instruction on which the current instruction depends is completed.
  • the dependency valid bit (prd) is not used to indicate the situation of data dependency
  • the dependency presence bit (prdy) is also not used to indicate whether the instruction that has a data dependency relationship with the current instruction is executed or not, but is used to indicate that the execution is completed. Whether an instruction that has a memory dependency or a control dependency with the current instruction is executed.
  • the speculative bit (spc) is used to indicate that the current access instruction can be speculatively executed.
  • an instruction whose dependent valid bit (prd) is 1 needs to wait for the execution of its dependent instruction to complete before executing the instruction, but for an access instruction, if its speculative bit (spc) is set to 1, it does not need to wait.
  • Its dependent access instructions can be executed after completion. In other words, it can be executed without waiting for the dependency presence bit (prdy) corresponding to the dependency valid bit (prd) to become 1. This situation may be referred to as speculative execution or speculative operation of access instructions.
  • the speculative flag bit (sgo) may be set to 1 when an access instruction is speculatively issued.
  • the dependency valid bit (prd) in IB111 can be set to yes, and the speculative bit (spc) can be set to yes.
  • the speculative bit (spc) in IB111 can be set to yes and the dependency valid bit (prd) to no. That is, the second processing method of the compiler in the third situation in Figure 2.
  • the instruction can be executed speculatively as long as the speculative bit (spc) is set to yes, regardless of how the dependency valid bit (prd) is set.
  • the time stamp is used to indicate the ideal execution order between access instructions, which will be stored in LQ or SB when the access instruction is executed. It should be noted that, considering the situation of speculative execution, when the program is actually executed, the access instructions are not necessarily executed in the order indicated by the time stamp. If speculative execution is not considered, access instructions shall be executed in the order indicated by the timestamp.
  • the LQ141 will assign a speculative identifier (identity, ID) (spcID) to the speculatively read data.
  • ID speculative identifier
  • the data obtained through the speculative operation can be called speculative data.
  • the speculative ID (spcID) field of the instruction using the speculative data and the data derived from the data will be both
  • the information of the speculation ID can be carried, that is, the information of the speculation source can be carried to indicate that the data is obtained by speculation. If the speculation is wrong, the subsequent instructions can be cleared and changed.
  • FIG. 6 is a schematic structural diagram of an LQ141 according to an embodiment of the present application.
  • the LQ 141 includes a number of field segments.
  • the definitions of the above-mentioned multiple domain segments are shown in Table 2 below. It should be noted that each field segment in FIG. 6 is only used as an example, and the LQ 141 may also include more or less field segments.
  • address addr The address field segment used to store read instructions, used to find memory destination dest After the read data is returned, send the data to the parameter field of the IB indicated by the destination data data Used to temporarily store the data returned by the read command timestamp stamp stamp Used to indicate the ideal execution order of access instructions.
  • Speculative ID spcID The flag used to indicate that the read instruction is speculative execution presence bit rdy Indicates that data has been returned from memory or SB significant bit vld Indicates that the read command is valid
  • LQ141 After the read command is sent from IB111 to LQ141, LQ141 will record the address, destination and other information of the read command, and set the valid bit corresponding to the read command to 1 to indicate that the read command is valid.
  • the LQ 141 will assign a speculative ID to the speculative operation of the read instruction. In addition, if the speculative ID is set to 0, it means that the instruction is not generated by speculation.
  • the speculative ID generated by the LQ141 is a one-hot code, which refers to a code value that uses one bit to represent a state.
  • a one-hot code refers to a code value that uses one bit to represent a state.
  • FIG. 7 is a schematic diagram of a speculative ID according to an embodiment of the present application.
  • the speculative ID is a one-hot code, you only need to OR the two speculative IDs to get a new speculative ID.
  • one speculative ID is 001
  • the other speculative ID is 010
  • the speculative ID obtained after the phase or later is 011, which still retains the ID information of the two speculative operations. If it is found that a speculative ID fails to speculate later, the graph computing device 100 only needs to add the incorrect ID to the currently executing ID when checking. If the result is not all 0, it means that the data comes from speculatively wrong data.
  • the LQ141 transmits the speculative ID along with the data to the destination of the read command. Therefore, if the input of each instruction comes from a speculative operation, the source of the speculative operation can be quickly found.
  • the function of carrying the speculative ID is that when the graph computing device 100 finds an error in a speculative operation, it can efficiently stop and clear all instructions using the data from the speculative operation.
  • graph computing device 100 may use the timestamp and address in the read command to query SB 142. If there is a store command with the same address in SB142, and the timestamp of the store command is smaller than the timestamp of the current read command, that is, the store command at the same address precedes the read command, the graph computing device 100 will store the store command in SB142. The data in , sent to LQ142, and returned to the destination of the read command. If there is no request for the same address in SB142, or the timestamp of the storage command at the same address is greater than the timestamp of the current read command, that is, the storage command at the same address is later than the read command, the graph computing device 100 stores the read command. Emitted to memory. When the memory returns the data, the data will be stored in the data (data) field segment of the LQ142, and the data will be returned to the destination of the read command.
  • FIG. 8 is a schematic structural diagram of an SB142 according to an embodiment of the present application.
  • the SB 142 includes a plurality of field segments.
  • the definitions of the above-mentioned multiple domain segments are shown in Table 3 below. It should be noted that each field segment in FIG. 8 is only used as an example, and the SB142 may also include more or less field segments.
  • domain segment short name describe address Addr
  • the address field segment of the storage instruction the data can be stored in the corresponding memory location through the address data data Data used to temporarily store instructions timestamp stamp stamp Used to indicate the ideal execution order between access instructions.
  • Speculative ID spcID Indicates that the address or data of a store instruction is directly or indirectly derived from a speculative read operation significant bit vld Indicates that the store instruction is valid
  • SB142 selects from at least one read command with the same address in SB142 according to the time stamp of the read command, which is smaller than the time stamp and closest to the time stamp. store instructions and return data to save time reading memory. Among them, the smaller the timestamp, the higher the order in time.
  • the SB 142 in addition to recording the address and data of the storage instruction, the SB 142 also records through the speculative ID whether the aforementioned address and data are from speculative data. For example, if the SB142 receives a clear command request, the clear command request is used to instruct to clear the command associated with the target speculative ID, the SB142 will check whether the speculative ID in its cached storage command is associated with the target speculative ID. If it exists, the effective position of the store instruction is set to 0. If not, the clear command request is ignored. After the SB142 receives the instruction to transmit the instruction, the SB142 can transmit the cached storage instruction to the memory in turn.
  • FIG. 9 is a schematic diagram of a method for processing an instruction according to an embodiment of the present application.
  • the method may be performed by the graph computing device 100 .
  • IB in FIG. 9 may be IB111 in FIG. 1
  • LQ may be LQ141 in FIG. 1
  • SB may be SB142 in FIG. 1
  • PE may be PE110 in FIG. 1 .
  • the graph computing device 100 is used to execute instructions based on a data dependency graph.
  • the method includes:
  • IB sends a first instruction to the LQ, where the first instruction is used to request to read data, the first instruction meets a first preset condition, and the first preset condition includes: the first The speculative bit of the instruction in the IB is set to yes.
  • the IB determines the first speculation ID, and stores the first speculation ID in the speculation ID field of the first instruction in the IB, where the first speculation ID is used to indicate the current speculation operation.
  • speculative ID field segment ie, spcID
  • speculative bit ie, spc
  • the situation where the speculative bit is YES includes a situation where the first instruction has speculative dependencies on other instructions.
  • the speculative dependency relationship indicates that there is a possibility that the first instruction depends on the third instruction, and the first instruction can be speculatively executed .
  • the first instruction and the third instruction meet a second preset condition
  • the second preset condition includes that the ideal execution order of the first instruction is after the third instruction, and the first instruction
  • the memory dependency relationship refers to an order dependency relationship between access instructions caused by operating the same address.
  • the above second preset condition can be understood as the possibility of a dependency relationship between the first instruction and the third instruction due to memory aliasing, but the compiler cannot identify the definite dependency relationship, and the dependency relationship can only be executed after the instruction is executed. to be sure.
  • graph computing device 100 may perform speculative operations.
  • the first preset condition further includes: the dependency valid bit of the first instruction in the IB is set to Yes, and the dependency existence position is No. It indicates that there is a dependency relationship between the first instruction and other instructions, and the instruction that has a dependency relationship with it has not been executed, but the first instruction can perform speculative operations. If the speculative operation is wrong, the first instruction will be re-executed subsequently.
  • the IB further includes at least one parameter field, at least one parameter field valid bit, and at least one parameter field presence bit, the at least one parameter field is in one-to-one correspondence with the at least one parameter field valid bit, and the At least one parameter field valid bit is in one-to-one correspondence with the at least one parameter field presence bit, the parameter field is used to store the input data of the current command, and the parameter field valid bit is used to indicate whether the corresponding parameter field is valid, so The parameter field existence bit is used to indicate whether the data of the corresponding parameter field already exists in the cache.
  • the first preset condition further includes: the first parameter field valid bit in the at least one parameter field valid bit is set to Yes, and the parameter field existence bit corresponding to the first parameter field valid bit is set to Yes . It is assumed that the first parameter field valid bit corresponds to the first parameter field, which indicates that the first parameter field of the first instruction is valid and the input data has been stored in the first parameter field. For example, for a read command, the above-mentioned input data may be the address of the read command.
  • the IB further includes a time stamp, where the time stamp is used to indicate an ideal execution sequence of the access instruction, where the access instruction includes a store instruction or a read instruction.
  • the timestamps of the first instruction and the third instruction are used to indicate that the ideal execution order of the first instruction is later than the third instruction.
  • the time stamp is also included in the SB and the LQ.
  • dependent valid bits ie prd
  • parameter field ie op0/1
  • parameter field valid bits ie vld0/1
  • parameter field presence bits ie rdy0/1
  • timestamp ie rdy0/1
  • the speculative bit and speculative ID field are set for the instruction in the IB of the graph computing device, so that the graph computing device can determine whether the instruction can perform a speculative operation according to the speculative bit, and use the speculative ID to indicate this time.
  • Speculative operation so that the speculative ID can be used to mark the source of speculation.
  • the method further includes: after receiving the first instruction, the LQ assigns the first speculative ID to the first instruction, and writes the first speculative ID into the speculative ID field section of the first instruction in the LQ; The ID is transmitted to the Speculative ID field of the IB for the first instruction.
  • the speculative ID is a one-hot code.
  • the one-hot code please refer to the description above.
  • the one-hot code is used for the speculative ID. Since the new speculative ID obtained after the two one-hot codes can still retain the information of the previous speculative ID, after the speculative operation fails, use The one-hot code can quickly find the source of the speculative operation, so as to clear and re-execute the instruction related to the speculative operation, so as to improve the efficiency of the speculative operation of the access instruction executed by the data flow architecture.
  • the LQ searches and obtains the data requested by the first instruction from the SB or the memory according to the first instruction.
  • LQ can first search for data in SB according to the address in the first instruction, and if it does not exist, search for data in memory.
  • the LQ transmits the first speculative ID to the speculative ID field segment in the IB of the second instruction, and the second instruction depends on the first instruction.
  • the first speculative ID is also carried in the second instruction having a dependency relationship with the first instruction, so that the instruction from the first speculative ID can be found according to the first speculative ID after the speculative operation fails.
  • the two speculative IDs corresponding to the two speculative operations can be ORed together, and a new speculative ID will be obtained as the second instruction. speculative ID.
  • the method further includes: after the IB transmits the first instruction to the LQ, the IB sets a speculative flag bit (ie sgo) of the first instruction to yes to indicate that the first instruction An order has been speculatively fired.
  • a speculative flag bit ie sgo
  • the method further includes: after acquiring the data requested by the first instruction to read, the IB transmits a second instruction, the second instruction depends on the first instruction; after transmitting the second instruction After the instruction, the IB sets the speculative flag bit (ie sgo) of the second instruction to yes. That is, subsequent instructions that have a dependency relationship with the first instruction can also be marked by a speculative flag bit (sgo).
  • the method of FIG. 9 further includes: after transmitting the first instruction, the IB transmits a third instruction to the SB. After issuing the third instruction, the IB passes the storage address of the third instruction to the parameter field of the first instruction in the IB, and sets the Dependent Presence bit of the first instruction in the IB to yes. In other words, after issuing the third instruction, the IB may set the dependency existence bit of the first instruction to Yes, to indicate that the execution of the dependent instruction of the first instruction has been completed.
  • the method of FIG. 9 further includes: IB determines whether the speculative operation of the first instruction corresponding to the first speculative ID is wrong according to the storage address of the third instruction and the read address of the first instruction; In the event that the speculative operation of the first instruction is wrong, the IB reselects and re-transmits the first instruction to the LQ.
  • the first parameter field of the first instruction in the IB may be used to store the read address of the first instruction
  • the second parameter field may be used to store the storage address of the third instruction
  • the method in FIG. 9 further includes: the LQ re-allocates a second speculative ID to the first instruction, and writes the second speculative ID into the speculative ID field of the LQ. segment; LQ transmits the second speculative ID to the speculative ID field segment of the first instruction in IB.
  • the method of FIG. 9 further includes: in the case that the speculative operation of the first instruction is wrong, the IB broadcasts the first speculative ID of the first instruction to at least one PE, LQ or SB; at least one PE, LQ and SB will The first speculative ID is compared with the speculative ID of the instruction being executed by itself to determine whether there is an association between the two; in the case of an association, at least one PE, LQ and SB stops executing the current instruction and stops transmitting the current instruction. Instruction's data or dependencies.
  • the IB broadcasts the first speculative ID of the first instruction to the pipeline in the graph computing device 100, so as to clear only the instructions related to the speculative operation, and avoid clearing the The instructions after the speculative operation have nothing to do with the speculative operation, thereby reducing the cost of speculative failure and improving the efficiency of the speculative operation.
  • the above-mentioned at least one PE may include the PE where the IB is located and other PEs. Broadcasting to the aforementioned at least one PE may include broadcasting to units within the at least one PE. For example, IBs or other functional units may be included.
  • the above method of judging whether there is an association relationship may include: judging whether the speculative ID of the instruction being executed is the same as the first speculative ID, or whether it is derived from the first speculative ID. For example, as described above, the current speculative ID and the first speculative ID are added by using the one-hot code feature. If the result is not all 0, it means that the current speculative ID and the first speculative ID are associated.
  • the graph computing device can track subsequent instructions that are associated with the erroneous speculative operation according to the first speculative ID, and stop transmitting the data and dependencies of the current instruction to prevent Guarantee the correct operation of the command.
  • the speculative ID only the instructions related to the wrong speculative operation can be cleared, and the instructions unrelated to the speculative operation can be reserved, so that the efficiency of the speculative operation of accessing the instructions in the data flow architecture can be improved.
  • FIG. 10 is a schematic diagram of an instruction execution flow of the graph computing device 100 according to an embodiment of the present application.
  • the IB selects an instruction from the instructions stored in itself and issues the instruction.
  • the instruction meets any of the following four conditions and has the smallest timestamp among the eligible instructions.
  • the above four conditions include:
  • Condition 1 The presence of the instruction's input is valid, and the input does not come from a speculative operation. That is, the order input is ready, and none of the input comes from speculative data.
  • the input of the instruction also includes a dependency relationship, that is, the instruction does not have a dependency relationship or the dependent instruction has been executed.
  • Condition 2 The instruction is speculatively readable and has not been speculatively executed. That is, the speculative bit (spc) is set to 1, and the speculative flag bit (sgo) is set to 0. Condition 2 indicates that the graph computing device 100 will speculatively execute the read instruction for the first time. IB speculatively executes the read command and issues the read command to LQ.
  • Condition 3 The instruction is a read instruction that has been speculatively operated, and the previous speculative operation is wrong. That is, the graph computing device determines that the address of the read instruction and the store instruction are the same, and there is a dependency relationship between the two.
  • condition 3 due to the previous speculative operation error, after the execution of the storage instruction on which the read instruction depends, the address of the read instruction is updated, that is, the parameter field for the storage address in the read instruction is renew. Therefore, the parameter field valid bit (vld), speculative flag bit (sgo) and dependency presence bit (prdy) corresponding to this parameter field are all 1.
  • Condition 3 means that the graph computing device 100 recognizes that the read command issued before the erroneous speculative data, so the instruction needs to be re-executed, and the speculative ID is broadcast to the pipeline of the graph computing device 100 to clear the speculative ID. There is a relationship directive.
  • Condition 4 The instruction is a non-fetch instruction and the previous speculative operation was wrong. Condition 4 indicates that the current instruction was previously executed with incorrect speculative data, so the instruction needs to be re-executed, passing a new speculative ID. Since the input of the instruction has been updated, the parameter field valid bit corresponding to one parameter field of the instruction is 1, and the speculative flag bit (sgo) is 1.
  • the read instruction When the instruction of condition 2 or condition 3 is selected, the read instruction will enter the LQ, and a speculative ID will be assigned by the LQ. After fetching the data, the LQ sends the data to the destination of the instruction. Subsequent instructions that have dependencies on this instruction can be speculatively executed, but the input of subsequent instructions must carry the speculative ID.
  • the storage instruction can trigger the hardware to identify whether the speculative operation of the previous read instruction fails.
  • the IB may transmit the storage instruction to the SB, and send the address and dependency information to the destination indicated by the storage instruction, where the destination may refer to the parameter field in the IB of the instruction that may depend on the storage instruction.
  • IB will compare the address sent by the store instruction with its own address after receiving the address. If the addresses are equal, it means that the speculation preceding the read instruction was wrong.
  • the hardware will broadcast the speculative ID of the read instruction, halting the execution of the instruction corresponding to the speculative ID. The data obtained by executing the store instruction will always be stored in the SB until the IB sends an instruction signal to instruct the SB to send the stored data to the memory.
  • a store order may trigger IB to recognize a speculative failure of a previous read order. Therefore, after the store instruction is executed, the read instruction will be reselected by the selection logic of IB, and the updated data will be read from SB, and the data will be sent to the destination of the read instruction. Instructions in IB that depend on the read instruction and were speculatively executed will also be re-executed.
  • the speculative flag bit for these speculative failed instructions is set to 1 to indicate that the instruction was previously executed with speculative data.
  • the valid position of one of its parameter fields (op0/1) is 1, indicating that the parameter field has updated data. This means that the data before the instruction came from wrong speculation, and the new data will cause these instructions to be re-executed.
  • FIG. 11 is a schematic diagram of a data dependency graph according to an embodiment of the present application.
  • 12 to 19 are schematic diagrams of states of the graph computing device 100 at different stages of executing the data dependency graph of FIG. 11 , which show the information content in IB111 , LQ141 and SB142 during the execution of instructions by the graph computing device 100 .
  • the data dependency graph includes 4 instructions, and the 4 instructions are abstracted into four nodes.
  • the instruction 1 (ST[x], a) is a storage instruction, which is used to instruct to store the data a into the address x in the memory.
  • Instruction 2 (LD[x+i]) is a read instruction, which is used to instruct to read data from address x+i in the memory and transfer it to instruction 4.
  • the instruction 3 (ST[z], b) is a storage instruction, which is used to instruct the address z to store the data b in the memory.
  • Instruction 4 (addi) is an addition instruction, which is used to instruct to add two inputs from the data of instruction 2 and the constant 1 respectively.
  • the compiler can add dependencies between nodes according to the rules in FIGS. 2 to 4 .
  • the store instruction is followed by the read instruction (that is, instruction 1 is followed by instruction 2)
  • the compiler cannot know the value of i, it cannot recognize whether the address x+i is equal to the address x, so the compiler does not recognize the two instructions Build speculative dependencies (sidelines p).
  • the compiler can establish a strong order-preserving relationship between a store instruction followed by a store instruction.
  • a strong order-preserving relationship can also be analyzed and established by hardware, and the compiler side does not establish a dependency relationship between these two instructions.
  • FIG. 12 is a schematic diagram of the information stored in the IB111 by the instructions of the data dependency graph of FIG. 11 .
  • the graph computing device 100 may select to input a valid instruction to start the execution. Wherein, the input is valid may mean that the data in the parameter field of the instruction has been prepared, and the instruction it depends on has also been executed.
  • time stamps need to be assigned. Its timestamps in IB111 are 0, 1, and 2 in sequence, which indicates the ideal execution order of instruction 1 to instruction 3.
  • instruction 1 instruction 3, and instruction 4 all lack conditions for execution. Among them, instruction 1 lacks the stored data and address (op0/1 is empty). The input data of instruction 3 has been prepared, but according to the strong order preservation principle between storage instructions, instruction 3 must be executed after the execution of instruction 1 is completed. ) is 0, and the speculative bit (spc) is also 0. Instruction 4 is missing data from Instruction 2 as its input data.
  • instruction 2 lacks the execution completion flag of its speculatively dependent instructions, that is, the dependency presence bit (prdy) is 0. But the speculative bit (spc) of instruction 2 is set to 1. This means that graph computing device 100 does not need to wait for instructions on which it can speculatively depend to complete to execute instruction 2 .
  • the first parameter field valid bit (vld0) corresponding to the first parameter field (op0) of instruction 2 is 1, and the first presence bit (rdy0) corresponding to the first parameter field (op0) is also 1.
  • the valid bit (vld1) of the second parameter field corresponding to the second parameter field (op1) is 0. This means that instruction 2 has only one input data, and the input data already exists in the first parameter field (op0), so instruction 2 can be issued.
  • transmitting an instruction may refer to the IB111 transmitting the information of the instruction to other units to execute the instruction. For example, if the instruction is a store instruction, the instruction may be issued to SB142. If the command is a read command, the command is sent to the LQ141. If the instruction is an operation instruction, the instruction can be sent to the computing unit in the PE110.
  • the first presence bit (rdy0) of the instruction 2 can be set to 0, indicating that the data in the first parameter field (op0) of the instruction 2 is no longer valid.
  • the speculative flag bit (sgo) will be set to 1 to indicate that instruction 2 was executed speculatively.
  • the instruction 2 is sent to the LQ141, and the LQ141 can record the information of the instruction 2, and the above information may include the address, destination, time stamp and other information of the instruction 2.
  • LQ141 first searches SB142 according to the address in instruction 2, and if there is no data that instruction 2 requests to read in SB142, it sends instruction 2 to the memory to request data acquisition.
  • LQ141 will also assign a speculative ID to instruction 2 and write it to the speculative ID field segment of LQ141. However, in FIG. 13, this speculative ID is represented by "001". LQ141 is also used to send the speculative ID to IB111, so that it can be written into the speculative ID (spc ID) field of IB111.
  • LQ141 After LQ141 obtains the data requested by instruction 2, it can be stored in the data field segment of LQ141. However, this data is represented by "72" in FIG. 13 .
  • LQ141 after LQ141 obtains the data requested by instruction 2, it can transmit the data to the first parameter field (op0) of instruction 4 in IB111, and store the corresponding parameter field in the first parameter field (op0). (rdy0) is set to 1. Since this data is obtained speculatively, IB111 can write the speculative ID into the Speculative ID (spcID) field of instruction 4. For the result of the instruction 4 obtained later, if it is obtained from the speculative data, the speculative ID also needs to be written in the speculative ID (spcID) corresponding to the destination when the result is passed.
  • spcID Speculative ID
  • instruction 4 has been prepared, and instruction 4 in IB111 can be executed. Upon execution, instruction 4 may be issued to the computing device in PE 110 . Since the data for instruction 4 comes from speculative behavior, the speculative flag (sgo) is set to 1. In addition, after issuing command 4, its parameter field existence bit (rdy0/1) will be set to 0, which means that the command has been issued.
  • the input data of instruction 1 is all ready.
  • IB111 sends the information of command 1 to SB142.
  • SB142 records the address, data, time stamp, speculative ID and other information of the storage request of instruction 1.
  • the address of instruction 1 is x
  • the data is a
  • the timestamp is 0.
  • Order 1 is not speculative, so it is not assigned a speculative ID, and the speculative ID is set to 0.
  • IB111 sets the parameter field existence bit (rdy0/1) of instruction 1 to 0 to indicate that the data stored in the parameter field (op0/1) has been sent.
  • IB 111 sends a dependency indication and a storage address to the destination of instruction 1.
  • the dependency indication is used to indicate that the instruction 2 speculatively depended on has been completed.
  • the storage address is the address of the read data of instruction 1 .
  • the above-mentioned sending the dependency indication to the destination includes: setting the dependency presence bit (prdy) of instruction 2 in IB111 to 1, to indicate that the speculatively dependent instruction of instruction 2 has been executed.
  • the above-mentioned sending the storage address to the destination includes: writing the address of the read data of the instruction 1 into the second parameter field (op1) of the instruction 2 in the IB111, and writing the parameter field corresponding to the second parameter field in the bit field. (rdy1) is set to 1.
  • IB111 will compare whether the address newly written in IB111 by instruction 2 (stored in op1) is equal to the address of the speculative execution (stored in op0). If the two are equal, it means that the speculative read operation performed before instruction 2 was wrong. Therefore, the IB111 will re-execute instruction 2. If the two are not equal, then there is no need to re-execute instruction 2, and the presence bit (rdy1) and the dependency presence bit (prdy) of the second parameter field of instruction 2 in IB111 are set to 0.
  • the IB111 may also set the dependency existence bit (prdy) of the instruction 3 in the IB to 1 to indicate that the instruction 1 on which the instruction 3 depends has been executed.
  • the dependency existence bit (prdy) of the instruction 3 in the IB may also set the dependency existence bit (prdy) of the instruction 3 in the IB to 1 to indicate that the instruction 1 on which the instruction 3 depends has been executed.
  • instruction 3 can be executed.
  • the dependency existence bit (prdy) of instruction 3 is set to 1, and the input data of instruction 3 has already been prepared, so instruction 3 can be issued to SB142.
  • SB142 can record the address, data, time stamp, and speculative ID information of instruction 3.
  • the address is z
  • the data is b
  • the timestamp is 2. Since neither the address nor the data of the storage instruction comes from speculative behavior, the speculative ID is 0.
  • the IB111 will broadcast the speculative ID (spcID) of instruction 2 to each stage of the hardware pipeline. After receiving the speculative ID, each level of the pipeline will compare the speculative ID of the executing instruction to determine whether the speculative ID is related to the speculative ID of the instruction currently being executed by the pipeline.
  • spcID speculative ID
  • the above-mentioned pipelines at all levels may refer to units in the graph computing device 100 for executing instructions, including, but not limited to, IB111, LQ141, SB142, and PE110.
  • the speculative ID is usually a one-hot code. If a bit in both speculative IDs is 1, it means that there is an association between the two. Therefore, it is necessary to stop executing the current instruction and stop transmitting the data or dependencies of the current instruction. relation. If there is no bit that is 1 at the same time in the two speculative IDs, it means that there is no association between the two, and no execution operation is affected.
  • the LQ141 replaces the previous instruction 2 with a new instruction 2, and assigns a new speculative ID to the instruction 2.
  • the newly allocated speculative ID in FIG. 18 is represented as "010”.
  • LQ141 re-searches SB142 or the memory according to the new instruction 2, and obtains the data requested by instruction 2. In Fig. 18, this data is represented as data a.
  • IB111 needs to re-execute instruction 4.
  • the LQ141 acquires the data a requested by the instruction 2, it transmits the data a and the speculative ID to the destination of the instruction 2 again, and updates the speculative ID of the destination.
  • the destination of instruction 2 is the first parameter field (op0) of instruction 4, and the updated speculative ID is "010".
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

一种处理指令的方法以及图计算装置,能够提高投机执行的效率和降低投机失败的代价。该图计算装置基于数据流架构,其包括信息缓冲器IB和加载队列LQ,IB用于缓存指令队列,LQ用于缓存读取指令队列。其中,IB中包括投机位和投机标识ID域段,投机位用于指示当前指令是否为可投机执行的指令,投机ID域段用于存储当前指令的一次投机操作的投机ID。该方法包括:IB将第一指令发射至LQ,第一指令用于请求读取数据,第一指令符合第一预设条件,第一预设条件包括:第一指令在IB中的投机位设置为是(S401);IB确定第一投机ID,并将第一投机ID存入第一指令在IB的投机ID域段,第一投机ID用于指示当前投机操作(S402)。

Description

处理指令的方法以及图计算装置 技术领域
本申请涉及图计算领域,尤其涉及处理指令的方法以及图计算装置。
背景技术
数据流架构是一种计算机系统体系结构。有别于业界主流的冯·诺依曼架构以程序计数器指示指令的执行顺序,数据流架构通过判断程序参数的有效性来决定执行顺序。这种方式将指令对控制流的依赖转换为对数据流的依赖,从而使得数据流架构在并行度的利用上具有巨大的优势。但传统的数据流架构还需要对控制流进行支持,这种数据流+控制流的计算机架构统称为图计算装置或者图计算架构(graphflow architecture)。
在执行指令序列的过程中,内存别名(memory aliasing)是影响硬件存取效率的常见问题。其中,内存别名是指两个指针同时指向一个存储地址的情况。在冯·诺依曼架构中,程序的执行顺序必须遵守指令间的真依赖,例如在下面的执行指令的例子中:
指令1:R3<=R1+R2;
指令2:R4<=R5+R3。
由于指令2中的R3需要依赖于指令1的完成,因此指令2必须等到指令1的结果算出之后才能开始执行。在这个例子中,指令2和指令1之间存在依赖关系,但依赖关系是静态的,处理器很容易识别。这种对于某一个值的依赖关系被称之为数据依赖。但是对于存取指令,还存在一种内存别名引起的依赖关系,其并不能在执行程序之前被处理器预先识别,因此经常影响硬件的存取效率。例如,如下面的例子所示:
指令3:store R1,2(R2);
指令4:load R3,4(R4)。
其中,指令3表示将寄存器R1的数据存入地址(R2+2)中,指令4表示将地址(R4+4)中的数据读取到寄存器R3中。由于存取指令间的依赖并不能通过识别寄存器编号来预先判断,而是要等到R2+2与R4+4的结果算出才知道依赖关系是否存在。因此两条指令之间存在顺序依赖,即指令3执行完之后才能执行指令4,如果地址不同,则不存在顺序依赖。即两条指令的执行顺序不会影响程序的正确性。因此,指令4需要延迟直到存取指令的依赖关系确立后再发射。然而在大部分情况下,R2+2是不等于R4+4的,也就是说,大部分情况下顺序依赖是不存在的。指令4的延迟在这种情况下就会降低效率。
为了解决这个问题,主流的计算机系统都会采用投机执行的方式,也就是在还未得到地址的情况下,假设顺序依赖不成立,从而投机执行读取指令。在真实应用中,大部分顺序依赖都不成立,那么投机执行的读取指令不会受到任何影响,并且节省了大量等待计算地址的时间。在小部分情况下,若顺序依赖成立,那么投机行为作废,计算机系统清除并重新进行读取操作和后续已经进入流水线的其它操作。
在投机执行的方式中,若投机执行失败,则会清空流水线中投机失败指令之后的所有 指令。这将导致一部分和投机失败没有关系的指令也被重新执行,从而导致性能损失。类似地,基于数据流架构的图计算装置在执行程序时也面临着内存别名问题,因此在数据流架构中如何提高投机执行的效率亟待解决。
发明内容
本申请提供一种处理指令的方法以及图计算装置,能够提高投机执行的效率和降低投机失败的代价。
第一方面,提供了一种处理指令的方法,所述方法应用于图计算装置,所述图计算装置基于数据流架构,所述图计算装置包括至少一个处理引擎PE和加载存储单元LSU,所述PE包括信息缓冲器IB,所述IB用于缓存指令队列,所述PE用于执行所述IB中缓存的指令;其中,所述IB中包括投机位和投机标识ID域段,所述投机位用于指示当前指令是否为可投机执行的指令,所述投机ID域段用于存储当前指令的一次投机操作的投机ID,所述LSU包括加载队列LQ,所述LQ用于缓存读取指令队列;所述方法包括:所述IB将第一指令发射至所述LQ,所述第一指令用于请求读取数据,所述第一指令符合第一预设条件,所述第一预设条件包括:所述第一指令在所述IB中的投机位设置为是;所述IB确定第一投机ID,并将所述第一投机ID存入所述第一指令在所述IB的投机ID域段,所述第一投机ID用于指示当前投机操作。
其中,数据流架构执行基于数据依赖图的指令,数据依赖图可以指由节点和连接节点的有向弧组成的数据流图。节点表示执行的运算或功能,有向弧表示节点被执行的次序。
其中,投机操作可以指在一条指令存在依赖于另一条指令的可能性且另一条指令尚未执行的情况下,首先执行该指令的操作。若另一条指令在执行之后,确认其与该指令不存在依赖关系,则投机成功。若另一条指令在执行之后,确认其与该指令存在依赖关系,则投机失败。
通过在图计算装置的IB中为指令设置投机位和投机ID域段,以便于图计算装置根据投机位判断该指令是否可以执行投机操作,并通过投机ID指示该次投机操作,从而能够利用投机ID标记投机源头,在出现投机错误时,只清除和重新执行与该投机ID有关联关系的指令,而避免重新执行与投机ID无关的指令,从而减少了投机错误的代价,提高了数据流架构执行存取指令的投机操作的效率。
可选地,投机位为是的情形包括第一指令对其它指令存在可投机依赖的情形。例如,第一指令和第三指令之间存在可投机依赖关系,所述可投机依赖关系表示所述第一指令存在依赖于所述第三指令的可能性,且所述第一指令可投机执行。
可选地,所述第一指令和所述第三指令符合第二预设条件,所述第二预设条件包括所述第一指令的理想执行顺序在所述第三指令之后,所述第一指令对所述第三指令有存在内存依赖关系的可能性,所述内存依赖关系指存取指令之间存在由于操作同一地址导致的顺序依赖关系。
结合第一方面,在一种可能的实现方式中,所述LQ中包括所述投机ID域段,所述方法还包括:在所述IB将所述第一指令发射至所述LQ之后,所述LQ为所述第一指令分配所述第一投机ID,并将所述第一投机ID写入所述第一指令在所述LQ的投机ID域段;所述LQ向所述IB发送所述第一投机ID;所述IB确定第一投机ID,包括:所述IB从所 述LQ接收所述第一投机ID。
结合第一方面,在一种可能的实现方式中,所述投机ID为独热码。
投机ID采用了独热码,由于两个独热码在相或之后得到的新的投机ID依然可以保留之前的投机ID的信息,因此,在投机操作失败后,利用独热码可以快速找到投机操作的源头,以便于清除和重新执行与该投机操作有关的指令,以提高数据流架构执行存取指令的投机操作的效率。
结合第一方面,在一种可能的实现方式中,所述方法还包括:所述LQ根据所述第一指令,从所述SB或者内存查找并获取所述第一指令请求读取的数据。
可选地,所述内存可以为片上内存,也可以为片外内存。或者,可以理解为,所述内存为除所述SB外的其它存储设备。
结合第一方面,在一种可能的实现方式中,还包括:在获取所述第一指令请求读取的数据之后,所述LQ将所述第一投机ID传输至第二指令在所述IB中的投机ID域段,所述第二指令依赖于所述第一指令。
在投机操作之后,在与第一指令存在依赖关系的第二指令之中也携带该第一投机ID,以便于投机操作失败之后根据第一投机ID寻找与该第一投机ID关联的指令,从而提高投机操作的效率。
可选地,若所述第二指令还依赖于第四指令,所述第四指令为已进行投机操作的指令,且该投机操作对应于第三投机ID。则所述IB可以基于第一投机ID和第三投机ID生成第四投机ID,并将所述第四投机ID存储至第二指令在所述IB中的投机ID域段。
其中,利用投机ID为独热码的特性,第四投机ID可以保留第一投机ID和第三投机ID的信息,即第四投机ID可以同时指示第一投机ID和第三投机ID对应的投机操作,从而根据第四投机ID可以追溯多个投机操作,以便于任一个投机操作失败之后,能够根据第四投机ID针对性地追溯并清除与该投机操作有关的指令,减少投机失败的代价。
结合第一方面,在一种可能的实现方式中,所述IB中还包括投机标记位,所述投机标记位用于指示当前指令是否已经被投机发射,所述方法还包括:在所述IB向所述LQ发射所述第一指令之后,所述IB将所述第一指令的投机标记位设置为是。
结合第一方面,在一种可能的实现方式中,所述IB中还包括投机标记位,所述投机标记位用于指示当前指令是否已经被投机发射,所述方法还包括:在获取所述第一指令请求读取的数据之后,所述IB发射第二指令,所述第二指令依赖于所述第一指令;在发射所述第二指令之后,所述IB在所述第二指令的投机标记位设置为是。
结合第一方面,在一种可能的实现方式中,所述IB中还包括依赖有效位和依赖存在位,所述依赖有效位用于指示当前指令是否依赖于另一条指令执行完成之后才执行,所述依赖存在位用于指示当前指令所依赖的指令是否执行完成;所述第一预设条件还包括:所述第一指令在所述IB中的依赖有效位设置为是,依赖存在位设置为否。
结合第一方面,在一种可能的实现方式中,所述IB中还包括至少一个参数域、至少一个参数域有效位以及至少一个参数域存在位,所述至少一个参数域与所述至少一个参数域有效位一一对应,所述至少一个参数域有效位与所述至少一个参数域存在位一一对应,所述参数域用于存储当前指令的输入数据,所述参数域有效位用于指示所述参数域有效位对应的参数域是否有效,所述参数域存在位用于指示数据是否已存在于所述参数域存在位 对应的参数域中;其中,所述第一预设条件还包括:所述至少一个参数域有效位中的第一参数域有效位设置为是,所述第一参数域有效位对应的参数域存在位设置为是,所述第一参数域有效位为所述至少一个参数域有效位中的任意一个有效位。
结合第一方面,在一种可能的实现方式中,所述LSU中还包括存储缓冲器SB,所述SB用于缓存存储指令队列,所述方法还包括:所述IB根据所述第三指令的存储地址和所述第一指令的读取地址,确定所述第一投机ID对应的所述第一指令的投机操作是否错误;在所述第一投机ID对应的所述第一指令的投机操作错误的情况下,所述IB重新向所述LQ发射所述第一指令。
结合第一方面,在一种可能的实现方式中,所述方法还包括:所述IB根据所述第三指令的存储地址和所述第一指令的读取地址,确定所述第一投机ID对应的所述第一指令的投机操作是否错误;在所述第一投机ID对应的所述第一指令的投机操作错误的情况下,所述IB重新向所述LQ发射所述第一指令。
结合第一方面,在一种可能的实现方式中,所述方法还包括:在所述IB重新向所述LQ发射所述第一指令之后,所述LQ重新为所述第一指令分配第二投机ID,并将所述第二投机ID写入所述第一指令在所述LQ的投机ID域段;所述LQ将所述第二投机ID传输至所述第一指令在所述IB的投机ID域段。
在本申请实施例中,一个投机ID用于指示一次投机操作。因此,当重新执行第一指令时,LQ将为第一指令分配新的投机ID,以便于利用新的投机ID重新追踪与新的投机操作相关的指令,以提高投机操作的效率。
结合第一方面,在一种可能的实现方式中,所述方法还包括:在所述第一投机ID的对应的投机操作错误的情况下,所述IB向所述至少一个PE、所述LQ或所述SB广播所述第一投机ID;所述至少一个PE、所述LQ或所述SB将所述第一投机ID与自身正在执行的指令的投机ID进行比较,以判断两者是否存在关联关系;在存在关联关系的情况下,所述至少一个PE、所述LQ或所述SB停止执行所述正在执行的指令,并停止传输所述正在执行的指令的数据或依赖关系。。
在第一指令的投机操作错误的情况下,IB向图计算装置中的流水线广播第一指令的第一投机ID,以便于只清除与投机操作有关的指令,而避免清除在投机操作之后的与投机操作无关的指令,从而降低了投机失败的代价,提高了投机操作的效率。
结合第一方面,在一种可能的实现方式中,所述IB中还包括时间戳,所述时间戳用于指示存取指令的理想执行顺序,所述存取指令包括存储指令或者读取指令。
图计算装置执行数据依赖图中的指令,在编译数据依赖图时,为指令中的存取指令分配时间戳,以指示存取指令之间的理想执行顺序,时间戳可以作为执行指令的辅助信息,以支持存取指令的正确执行,从而提升了基于数据流架构的内存存取效率。
结合第一方面,在一种可能的实现方式中,所述SB和所述LQ中也包括所述时间戳。
第二方面,提供了一种图计算装置,所述图计算装置基于数据流架构,所述图计算装置包括至少一个处理引擎PE和加载存储单元LSU,所述PE包括信息缓冲器IB,所述IB用于缓存指令队列,所述PE用于执行所述IB中缓存的指令;其中,所述IB中包括投机位和投机标识ID域段,所述投机位用于指示当前指令是否为可投机执行的指令,所述投机ID域段用于存储当前指令的一次投机操作的投机ID,所述LSU包括加载队列LQ,所 述LQ用于缓存读取指令队列。
通过在图计算装置的IB中为指令设置投机位和投机ID域段,以便于图计算装置根据投机位判断该指令是否可以执行投机操作,并通过投机ID指示该次投机操作,从而能够利用投机ID标记投机源头,在出现投机错误时,只清除和重新执行与该投机ID有关联关系的指令,而避免重新执行与投机ID无关的指令,从而减少了投机错误的代价,提高了数据流架构执行存取指令的投机操作的效率。
结合第二方面,在一种可能的实现方式中,所述投机ID域段中的投机ID为独热码。
结合第二方面,在一种可能的实现方式中,所述IB用于执行以下操作:所述IB将第一指令发射至所述LQ,所述第一指令用于请求读取数据,所述第一指令符合第一预设条件,所述第一预设条件包括:所述第一指令在所述IB中的投机位设置为是;所述IB确定第一投机ID,并将所述第一投机ID存入所述第一指令在所述IB的投机ID域段,所述第一投机ID用于指示当前投机操作。
结合第二方面,在一种可能的实现方式中,所述LQ中包括所述投机ID域段,所述LQ用于:在所述IB将所述第一指令发射至所述LQ之后,为所述第一指令分配所述第一投机ID,并将所述第一投机ID写入所述第一指令在所述LQ的投机ID域段;向所述IB发送所述第一投机ID;所述IB具体用于从所述LQ接收所述第一投机ID。
结合第二方面,在一种可能的实现方式中,所述LQ用于根据所述第一指令,从所述SB或者内存查找并获取所述第一指令请求读取的数据。
结合第二方面,在一种可能的实现方式中,所述IB还用于:在获取所述第一指令请求读取的数据之后,发射第二指令,所述第二指令依赖于所述第一指令;以及在发射所述第二指令之后,在所述第二指令的投机标记位设置为是。
结合第二方面,在一种可能的实现方式中,所述IB中还包括依赖有效位和依赖存在位,所述依赖有效位用于指示当前指令是否依赖于另一条指令执行完成之后才执行,所述依赖存在位用于指示当前指令所依赖的指令是否执行完成;
结合第二方面,在一种可能的实现方式中,所述第一预设条件还包括:所述第一指令在所述IB中的依赖有效位设置为是,依赖存在位设置为否。
结合第二方面,在一种可能的实现方式中,所述LSU中还包括存储缓冲器SB,所述SB用于缓存存储指令队列。
结合第二方面,在一种可能的实现方式中,所述IB还用于:在发射所述第一指令之后,向所述SB发射第三指令,所述第三指令为存储指令,其中,所述第一指令和所述第三指令符合第二预设条件,所述第二预设条件包括所述第一指令的理想执行顺序在所述第三指令之后,所述第一指令对所述第三指令有存在内存依赖关系的可能性,所述内存依赖关系指存取指令之间存在由于操作同一地址导致的顺序依赖关系;在发射所述第三指令之后,将所述第三指令的存储地址发送至所述LQ。
结合第二方面,在一种可能的实现方式中,所述IB还用于:根据所述第三指令的存储地址和所述第一指令的读取地址,确定所述第一投机ID对应的所述第一指令的投机操作是否错误;在所述第一投机ID对应的所述第一指令的投机操作错误的情况下,重新向所述LQ发射所述第一指令。
结合第二方面,在一种可能的实现方式中,所述LQ还用于:在所述IB重新向所述 LQ发射所述第一指令之后,重新为所述第一指令分配第二投机ID,并将所述第二投机ID写入所述第一指令在所述LQ的投机ID域段;将所述第二投机ID传输至所述第一指令在所述IB的投机ID域段。
结合第二方面,在一种可能的实现方式中,所述IB还用于:在所述第一投机ID的对应的投机操作错误的情况下,所述IB向所述至少一个PE、所述LQ或所述SB广播所述第一投机ID;所述至少一个PE、所述LQ或所述SB用于:将所述第一投机ID与自身正在执行的指令的投机ID进行比较,以判断两者是否存在关联关系;以及在存在关联关系的情况下,停止执行所述正在执行的指令,并停止传输所述正在执行的指令的数据或依赖关系。
结合第二方面,在一种可能的实现方式中,所述IB中还包括时间戳,所述时间戳用于指示存取指令的理想执行顺序,所述存取指令包括存储指令或者读取指令。
结合第二方面,在一种可能的实现方式中,所述SB和所述LQ中也包括所述时间戳。
第三方面,提供了一种处理指令的方法,包括:获取程序代码;确定所述程序代码中的多个指令以及所述多个指令之间的依赖关系;根据所述多个指令和所述依赖关系,确定数据依赖图;其中,所述确定所述程序代码中的多个指令以及所述多个指令之间的依赖关系,包括:对于所述多个指令中无法识别出依赖关系的指令,将符合第二预设条件的第一指令和第三指令之间建立可投机依赖关系,所述第二预设条件包括:所述第一指令为读取指令,所述第三指令为存储指令,所述第一指令的理想执行顺序在所述第三指令之后,所述第一指令对所述第三指令有存在内存依赖关系的可能性,所述内存依赖关系指存取指令之间存在由于操作同一地址导致的顺序依赖关系。
其中,可投机依赖关系可以指第一指令存在依赖于第三指令的可能性,且所述第一指令可投机执行。
在编译数据依赖图时,可以利用数据流架构执行数据依赖图的特点,对读取指令之后接读取指令的情形建立可投机依赖关系,从而在编译阶段识别可投机依赖关系,以便于数据流架构在后续执行程序时进行投机操作,从而提升了基于数据流架构的内存存取效率。
结合第三方面,在一种可能的实现方式中,确定所述程序代码中的多个指令以及所述多个指令之间的依赖关系,还包括:对于所述多个指令中可识别出依赖关系的指令,建立指令间的依赖关系;对于所述多个指令中可识别出不存在依赖关系的指令,确定不建立指令间的依赖关系。
结合第三方面,在一种可能的实现方式中,所述确定所述程序代码中的多个指令以及所述多个指令之间的依赖关系,还包括:为所述多个指令中的存取指令分配时间戳,所述时间戳用于指示所述存取指令之间的理想执行顺序,所述存取指令包括存储指令或读取指令。
结合第三方面,在一种可能的实现方式中,在所述多个指令中的存取指令中存在多个分支的情况下,在所述多个分支汇聚之处采用第一分支的最后一个时间戳开始计数,所述第一分支为所述多个分支中存取指令数目最多的分支。
结合第三方面,在一种可能的实现方式中,所述依赖关系包括以下至少一种:数据依赖关系、内存依赖关系、控制依赖关系。
第四方面,提供了一种处理指令的方法,包括:获取程序代码;确定所述程序代码中 的多个指令以及所述多个指令之间的依赖关系;根据所述多个指令和所述依赖关系,确定数据依赖图;其中,所述确定所述程序代码中的多个指令以及所述多个指令之间的依赖关系,包括:为所述多个指令中的存取指令分配时间戳,所述时间戳用于指示所述存取指令之间的理想执行顺序,所述存取指令包括存储指令或读取指令。
在编译数据依赖图时,为指令中的存取指令分配时间戳,以指示存取指令之间的理想执行顺序,时间戳可以作为执行指令的辅助信息,以支持存取指令的正确执行。
结合第四方面,在一种可能的实现方式中,在所述多个指令中的存取指令中存在多个分支的情况下,在所述多个分支汇聚之处采用第一分支的最后一个时间戳开始计数,所述第一分支为所述多个分支中存取指令数目最多的分支。
时间戳可以在多个分支并行的情况下提供各个分支汇聚时的执行顺序的解决方案,从而提升了基于数据流架构的内存存取效率。
第五方面,提供了一种处理指令的装置,所述装置包括功能单元,所述功能单元用于执行第三方面或第三方面的任意可能的实现方式中的方法的指令,或者用于执行第四方面或第四方面的任意可能的实现方式中的方法的指令。
第六方面,提供了一种计算机存储介质,用于存储指令,该指令在图计算装置上运行时,使得所述图计算装置执行第一方面或第一方面的任意可能的实现方式中的方法的指令。
第七方面,提供了一种计算机存储介质,用于存储指令,该指令在计算机上运行时,使得所述计算机用于执行第三方面或第三方面的任意可能的实现方式中的方法的指令,或者用于执行第四方面或第四方面的任意可能的实现方式中的方法的指令。
附图说明
图1是本申请一实施例的图计算装置100的结构示意图。
图2是本申请一实施例的处理指令的方法的流程示意图。
图3是本申请一实施例的数据依赖图的示意图。
图4是本申请一实施例的标识时间戳的数据依赖图的示意图。
图5是本申请一实施例的IB111的结构示意图。
图6是本申请一实施例的LQ141的结构示意图。
图7是本申请一实施例的投机ID的示意图。
图8是本申请一实施例的SB142的结构示意图。
图9是本申请一实施例的处理指令的方法的示意图。
图10是本申请一实施例的图计算装置100的指令执行流程的示意图。
图11是本申请一实施例的数据依赖图的示意图。
图12至图19分别为图计算装置100执行图11的数据依赖图的不同阶段的状态示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图1是本申请一实施例的图计算装置100的结构示意图。如图1所示,图计算装置 100包括:
执行单元(process engine,PE)110、图架构初始化单元(graph build unit,GBU)120、数据总线与依赖总线(data&predication bus)130、加载存储单元(load store unit,LSU)140。其中,图计算装置100中可包括一个或多个PE 110,图1中以包括8个PE 110为例进行说明,应理解,PE 110的数目也可以减少或增加。
可选地,图计算装置100可作为中央处理器(central processing unit,CPU)的一部分,通过利用指令间并行度与图架构的低功耗优势达到加速指令执行并减少CPU功耗的作用。
GBU120主要用于初始化图指令,并将图指令发送至各个PE 110。
PE110主要用于执行放入其中的图指令,并将数据或请求发给图计算装置100中的其它单元,例如,其它PE110或者LSU 140。PE 110中包括信息缓冲器(information buffer,IB)111。IB 111用于缓存数据流指令,并选择已经准备完成的指令发射到PE 110内部或LSU 140中执行。
可选地,PE110中还包括控制单元(图1中未示出),该控制单元可用于控制IB111执行相应的功能,例如控制IB111发射(issue)指令或者接收数据和信息。可以理解为,在本申请实施例中,IB111所执行的方法是由PE110中的控制单元所控制执行的。为了简洁,本申请实施例中将PE110的控制单元控制IB111实现的功能描述为由IB111执行。
数据总线与依赖总线130主要用于在PE 110之间传递数据和依赖信息。依赖信息可以指用于指示指令之间的依赖关系的信息。
LSU 140主要用于接收并执行来自各个PE110的存取指令。存取指令包括存储指令和读取指令。LSU 140中还包括加载队列(load queue,LQ)141以及为存储缓冲器(store buffer,SB)142。LQ141用于缓存请求读取内存数据的指令队列。SB142用于缓存请求向内存中存储数据的指令队列。在一些示例中,SB142还可以用于将数据传输至LQ141中,以避免访问内存产生的功耗和延时。
在一些示例中,IB111也可以称为指令信息缓存,LQ141也可以称为读取请求缓存,SB142也可以称为存储请求缓存。
可选地,LSU140中该包括控制单元(图1中未示出),该控制单元可用于控制LQ141和SB142执行相应的功能,例如控制LQ141和SB142发送或者接收数据和信息。可以理解为,在本申请实施例中,LQ141和SB142所执行的方法是由LSU140中的控制单元所控制执行的。为了简洁,本申请实施例中将LSU140中的控制单元控制LQ141或SB142实现的功能描述为由LQ141或SB142执行。
图1中还示出了CPU前端(CPU front-end)200,其主要包括CPU前端的取指和解码单元,用于从内存读取并解析指令内容,并将图指令发送给图计算装置100。需要说明的是,CPU前端200并不属于图计算装置100的硬件部分。
应理解,图计算装置100是基于数据流架构的硬件系统。其中,数据流架构通过在指令集层面上明确描述(explicit)指令的依赖关系,将指令间并行度直接展现给硬件来执行。数据流架构可以抽象化成一个由N个节点组成的有向图,即数据依赖图。节点与节点的连接代表一条数据流(dataflow)。一旦每个节点的输入准备好后,当前节点就可以进行运算并将结果传给下个节点。在同一个图里面并不在一条路径上的节点可并发运行。
应理解,图1中的图计算装置100仅仅作为示例,在实践中,图计算装置100中的各 个单元也可以进行合并、替换等操作。或者包括更多或更少的单元。本申请实施例对此不作限定。
数据流架构的汇编编写也与主流的冯·诺依曼架构程序汇编不同。其每一条汇编指令不用指明该程序的输入,而只需要指明该指令的输出结果目的地。采用这种表述的代码,可以使得硬件容易发现指令间的依赖链条。
本申请利用上述优势,针对存取指令进行了优化。提出了一种针对数据流架构的存取指令投机执行的方法。该方法提高了数据流架构中存取指令的效率。并且根据数据流架构执行数据依赖图的特点,利用存取指令的可投机特性加速图指令的运行,降低了投机失败后的重新执行代价。即在投机失败之后只重新执行依赖于投机失败结果的指令,从而提升了基于数据流架构的图计算装置的内存存取效率。
其中,数据依赖图可以指由节点和连接节点的有向弧组成的数据流图。节点表示执行的运算或功能,有向弧表示节点被执行的次序。数据依赖图中的不同路径中的节点可以并行执行。
投机操作可以指在一条指令存在依赖于另一条指令的可能性且另一条指令尚未执行的情况下,首先执行该指令的操作。若另一条指令在执行之后,确认其与该指令不存在依赖关系,则投机成功。若另一条指令在执行之后,确认其与该指令存在依赖关系,则投机失败。
在图计算装置100处理指令之前,首先需要由编译器根据原程序分析指令之间的依赖关系,并根据依赖关系将原程序编译成基于数据依赖图的指令集。然后将编译后的指令集发送至图计算装置100中的IB111,以便于图计算装置100执行指令。下面首先描述编译器侧执行的处理指令的方法。
图2是本申请一实施例的处理指令的方法的流程示意图。该方法可以由编译器执行。如图2所示,该方法包括:
S201、获取程序代码。
其中,上述程序代码可以指通过开发工具所支持的语言写出的源文件。
S202、确定所述程序代码中的多个指令以及所述多个指令之间的依赖关系。
可选地,所述依赖关系包括以下至少一种:数据依赖关系、内存依赖关系、控制依赖关系。数据依赖关系可以指一条指令需要获取另一条指令传输的数据之后才能执行。内存依赖关系可以指存取指令之间存在由于操作同一地址导致的顺序依赖关系。控制依赖关系可以指由于控制流导致的条件依赖。例如,前续指令为条件语句,则后续指令和前续指令之间产生了控制依赖关系。作为示例,条件语句包括如果-或者(if-else)语句。
顺序依赖关系可以指一条指令必须在另一条指令完成之后执行。需要说明的是,内存依赖关系、数据依赖关系、控制依赖关系均可以导致顺序依赖关系。或者说,顺序依赖关系包括数据依赖关系、内存依赖关系和控制依赖关系。
编译器在获取原始的指令之后,可以进行依赖关系的分析,确定编译器可识别的依赖关系,以建立数据依赖图。在数据依赖图中,指令可以抽象为指令节点。其中,上述确定所述程序代码中的多个指令以及所述多个指令之间的依赖关系,包括但不限于以下三种情况:
在第一种情况下,对于所述多个指令中可识别出依赖关系的指令,建立指令间的依赖 关系。
在第二种情况下,对于所述多个指令中可识别出不存在依赖关系的指令,确定不建立指令间的依赖关系。
在第三种情况下,对于所述多个指令中无法识别出依赖关系的指令,在第一种处理方式中,编译器可以为符合第二预设条件的第一指令与第三指令之间建立可投机依赖关系。第一指令为读取指令,第三指令为存储指令,可投机依赖关系是指第一指令存在依赖于第三指令的可能性,且所述第一指令可投机执行。其中,对于可投机依赖关系,编译器可以为两条指令之间建立依赖关系,同时表明在后的指令是可以投机执行的。
其中,所述第二预设条件包括:所述第一指令为读取指令,所述第三指令为存储指令,所述第一指令的理想执行顺序在所述第三指令之后,所述第一指令对所述第三指令有存在内存依赖关系的可能性,所述内存依赖关系指存取指令之间存在由于操作同一地址导致的顺序依赖关系。
应理解,第三种情况主要针对于存取指令,即存储指令或者读取指令。编译器主要用于分析存取指令之间的内存依赖关系,即与内存别名有关的分析。
在第三种情况下,编译器无法识别存取指令间是否存在内存依赖关系,需要等到执行状态才能知道是否存在内存依赖关系。对于这部分存取指令,可以为读取指令之后接存储指令的情形建立可投机依赖关系。其中,对于存在可投机依赖关系的指令,图计算装置100可以对其执行投机操作。需要说明的是,上述读取指令之后接存储指令的情形,是指读取指令的理想执行顺序在存储指令之前,两个指令之间还可以存在其它指令,也可以不存在其它指令。下文中的其它情形也遵循类似的原则,例如,读取指令之后接读取指令,不再赘述。
在第三种情况的第二种处理方式中,对于多个指令中无法识别出内存依赖关系的存取指令,编译器也可以不建立依赖关系,而是由图计算装置100直接去投机执行。对于存取指令的内存依赖关系的分析主要包括以下四种:存储指令之后接读取指令,存储指令之后接存储指令的情形,读取指令之后接读取指令的情形以及读取指令之后接存储指令的情形。接下来依次分析上述四种情形下不建立依赖关系的处理方式。
在一些示例中,不建立读取指令后接存储指令的依赖关系是因为在后续执行指令过程中,这里可以指图计算装置100可以通过SB142查询到之前已发射的存取指令的顺序和地址信息,并从正确的地方读取数据。由于图计算装置100在执行程序过程中,存储指令的对应数据实际存在SB142之中,直到程序执行完毕,再相应地更新内存中的存储数据。假设程序有一条读取指令后接一条存储指令,并且两者具有相同的地址,但本应在读取指令后发射的存储指令实际上先于读取指令发射。通过顺序和地址信息,即使存储指令先于读取指令发射,图计算装置100也会发现SB142中有相同地址的存储指令,其顺序晚于读取指令,于是指引读取指令从内存读取数据,而非从SB142取值。
在一些示例中,不建立存储指令后接存储指令的依赖关系的原因是,由于存储指令之间保序的发射顺序不会影响指令执行的效率,因此为了流程简单,图计算装置100可以选择存储指令强保序(total store ordering)。其中,强保序也可以称为绝对保序。即图计算装置100可以保证存储指令之间严格按照理想顺序执行,因此无需建立存储指令之间的可投机依赖关系。
在一些示例中,不建立读取指令后接读取指令的依赖关系的原因是,如果两个读取指令之间不存在存储指令,则两个读取指令读取同一地址时,获取的都是相同的值,不存在顺序依赖。若如果两个读取指令之间不存在存储指令,则可以按照存储指令后接读取指令的情形处理。
在一些示例中,编译器也无需为存储指令后接读取指令的情形建立依赖关系,而是通过图计算装置100识别存取顺序,其原理同读取指令后接存储指令的硬件解决方法类似,每当接收到存储指令后,图计算装置100会通过读取LQ141查询之前已发射的读取指令的顺序和地址信息。若存在一条地址相同,但本应在该存储指令之后发射的读取指令,则图计算装置100触发读取指令投机错误,并会重新执行。若不存在,则该存储指令没有因乱序发射导致的问题。
图3是本申请一实施例的数据依赖图的示意图。如图3所示,指令1表示将寄存器1中的数据存储到内存[x]中。指令2表示从内存[x+i]中读取数据,并存放至寄存器2中。指令3表示从内存[x]读取数据并存放至寄存器9中。指令4表示将寄存器9中的数据存储至内存[x+n]中。指令5表示将寄存器3与寄存器2的值相加,并将结果存储至寄存器4中。指令6表示将寄存器6和寄存器5的值相加,并存储入寄存器7中。
编译器通过对上述指令的依赖关系的分析可得到如下结果。
对于指令1和指令3,编译器可以识别出两者之间存在内存依赖关系和顺序依赖关系,因为两者都是对地址x的存储操作。因此,指令3需要等待指令1写入内存后才能执行,以保证读取到正确的数据。对于指令1和指令3之间明显的顺序依赖关系,编译器将在两个节点之间加一条表示顺序依赖关系的边线(即边线1)。
对于指令1和指令2,若i的值为0,则指令1和指令2之间存在顺序依赖关系,因为指令1会向内存x地址写数据,指令2会在指令1执行完成后将内存x地址的数据读出。若i的值不为0,则指令1与指令2之间不存在顺序依赖关系。由于i的值只有在图计算装置100运行时才能得到,编译器无法预先得知,所以编译器将假设两条指令间存在可投机依赖关系,并在两个节点之间增加一条表示可投机依赖关系的边线(即边线2),以确保程序的正确运行。
对于指令2和指令5,两者之间存在数据依赖关系,并且可以被编译器识别。因此两者之间增加存在数据依赖关系的边线(即边线3)。
对于指令4和其它读取指令,它们之间不存在任何依赖关系,并且可以被编译器识别,因此,指令4所在的节点没有表示依赖关系的边线。
对于指令1和指令6,为了存储指令的强保序,存储指令间将按照执行顺序被串联成一条指令依赖链(即图3中的边线4)。可选地,存储指令的强保序可由硬件来执行,无需编译器建立指令依赖链。
S203、根据多个指令和所述依赖关系,确定数据依赖图。
在本申请实施例中,在编译数据依赖图时,可以利用数据流架构执行数据依赖图的特点,对读取指令之后接读取指令的情形建立可投机依赖关系,从而在编译阶段识别可投机依赖关系,以便于数据流架构在后续执行程序时进行投机操作,从而提升了基于数据流架构的内存存取效率。
可选地,所述确定所述程序代码中的多个指令以及所述多个指令之间的依赖关系,还 包括:为所述多个指令中的存取指令分配时间戳,所述时间戳用于指示所述存取指令之间的理想执行顺序,所述存取指令包括存储指令或读取指令。
在一些示例中,上述时间戳只应用于多个指令之中的存取指令,而不应用于其它类型的指令。例如,图3中的指令4和指令5并非存取指令,因此无需分配时间戳。
可选地,在所述多个指令中的存取指令中存在多个分支的情况下,在所述多个分支汇聚之处采用第一分支的最后一个时间戳开始计数,所述第一分支为所述多个分支中存取指令数目最多的分支。
在本申请实施例中,在编译数据依赖图时,为指令中的存取指令分配时间戳,以指示存取指令之间的理想执行顺序,时间戳可以作为执行指令的辅助信息,以支持存取指令的正确执行。并可以在多个分支并行的情况下提供各个分支汇聚时的执行顺序的解决方案,从而提升了基于数据流架构的内存存取效率。
图4是本申请一实施例的标识时间戳的数据依赖图的示意图。其中,ST表示存储指令,LD表示读取指令。如图4所示,为了支持保证存取指令的正确执行,编译器会预先将程序中的存取指令按照理想执行顺序分配时间戳(stamp)。例如,时间戳1至8用于标识各个存储指令的理想执行顺序。若程序遇到如果-或者(if-else)分支,每个分支都将以相同的时间戳开始计数。在结尾汇聚时,以拥有存取指令最多的分支的最后一个时间戳为起始开始计数。例如,在时间戳1之后出现两个分支,因此采用时间戳2分别开始计数。并且在两个分支汇聚之处,由于右边的分支包括更多的存取指令,因此以右边的分支的最后一个时间戳5开始继续计数。
接下来描述本申请实施例的硬件侧方案,即图计算装置100侧的方案。
如图1所示,在图计算装置100中,可以放置三个存储装置来帮助存取指令的实现。该三个存储装置包括IB 111、LQ141和SB142。其中IB111用于缓存数据流指令,并选择符合条件的指令发射执行。LQ141用于缓存读取指令队列。SB142用于缓存存储指令队列。SB142还可以用于将数据直接传递到LQ141中,以避免访问内存产生的功耗和延时。
下面将结合附图详细描述IB111、LQ141以及SB142的结构和功能。
(1)IB111
图5是本申请一实施例的IB111的结构示意图。如图5所示,IB111可包括多个域段。上述多个域段的定义如下表1所示。需要说明的是,图5的各个域段仅仅作为示例,IB111中还可以包括更多或更少的域段。
表1
Figure PCTCN2020127243-appb-000001
Figure PCTCN2020127243-appb-000002
其中,指令域(Inst)和参数域(op0/1)为辅助数据流架构运行所需的信息,其用于指示图计算装置100按照指令去执行相关操作。指令域(Inst)可以用于指示指令的操作类型:例如,存储指令(表示为ST)、读取指令(表示为LD)等。参数域(op0/1)用于存放当前指令的输入数据。上述输入数据可以包括地址或者参数等。
可选地,每条指令在IB111中可存在一个或多个参数域(op0/1)、一个或多个参数域有效位(vld0/1)以及一个或多个参数域存在位(rdy0/1)。参数域(op0/1)与参数域有效位(vld0/1)、参数域存在位(rdy0/1为一一对应的关系。本申请实施例中以包括两个参数域(op0/1)、两个参数域有效位(vld0/1)和两个参数域存在位(rdy0/1为例说明。
在一些示例中,若指令只需要一个参数域,则第一参数域(op0)的参数域有效位(vld0)为1。第二参数域(op1)的参数域有效位(vld1)为0。若指令需要两个参数域,则两个参数域有效位(vld0/1)全部设置为1。需要说明的是,在本申请实施例中,1表示有效,0表示无效。但以上仅作为示例,可选地,也可以用0代表有效,1代表无效。
需要说明的是,发射(issue)指令可以指IB111开始执行该指令,并将指令的相关信息发送至图计算装置100中的相应单元,以便于执行该指令。
在一些示例中,参数域存在位(rdy0/1)表示该指令所需的输入已经存在于op0/1中。当某一条指令的参数域有效位(vld0/1)与参数域存在位(rdy0/1)同时有效时,表示这条指令所需要的输入已经准备就绪,可以发射执行。
例如,若第二参数域有效位(vld1)为1,第二参数域存在位(rdy1)为0,则表示这条指令还存在没有准备好的参数域,则该指令还不能发射执行。
在一种情况下,图计算装置100不需要根据参数域有效位(vld0/1)与参数域存在位(rdy0/1)同时有效来判断是否可以执行。即对于任意一条指令,当其投机标记位(sgo)为1时,若其任意一个参数域(op0/1)被更新,并且对应的参数域有效位(vld0/1)被置1。则无论该指令的另一个参数域是否有效,该指令都可以发射执行。这种情况出现时,代表之前的某条读取指令出现投机错误,后续指令需要重新触发执行。因此被更新的参数域是导致投机错误的参数域,本次更新可以输入正确的数据,而另一个参数域中存储的数据为之前投机操作时已存入的数据,因此无需确认另一参数域是否有效。
依赖有效位(prd)用于指示当前指令是否依赖于另一条指令执行完成之后才执行。依赖有效位(prd)可用于表示编译器添加的依赖关系。例如,在编译器判断两条指令之间存在依赖关系情况下,可以将依赖有效位(prd)设置为是。
在一些示例中,依赖有效位(prd)仅用于指示除数据依赖之外的其它依赖关系。也 就是说,对于当前指令依赖于另一条指令传输的数据才能执行的情况,可以不使用依赖有效位(prd)指示。作为示例,依赖有效位(prd)可以用于指示当前指令对其它指令存在内存依赖或控制依赖的情形,而不用于指示当前指令对其它指令存在数据依赖的情形。
在一些示例中,若当前指令对另一条件指令存在可投机依赖关系,则可以将依赖有效位设置为是。
依赖存在位(prdy)与依赖有效位(prd)对应,依赖存在位(prdy)用于指示当前指令所依赖的指令是否执行完成。
可选地,若依赖有效位(prd)不用于指示数据依赖的情形,相应地,依赖存在位(prdy)也不用于指示与当前指令存在数据依赖关系的指令是否执行完成,而是用于指示与当前指令存在内存依赖关系或控制依赖关系的指令是否执行完成。
投机位(spc)用于表示当前的存取指令可以被投机执行。作为示例,一般情况下依赖有效位(prd)为1的指令需等待其依赖的指令执行完成后才能执行该指令,但对于存取指令,若其投机位(spc)被置1,则不用等待其依赖的存取指令完成即可执行。换句话说,不用等待依赖有效位(prd)所对应的依赖存在位(prdy)变为1即可执行。这种情况可以称为存取指令的投机执行或投机操作。并且,当存取指令被投机发射时,可以将投机标记位(sgo)置为1。
作为示例,在编译器设置当前指令对另一条指令存在可投机依赖关系的情况下,可以将IB111中依赖有效位(prd)设置为是,将投机位(spc)设置为是。
作为示例,对于编译器无法识别内存依赖关系的指令,若编译器没有设置可投机依赖关系,则可以将IB111中的投机位(spc)设置为是,将依赖有效位(prd)设置为否。也就是编译器在图2中的第三种情形的第二种处理方式。
换句话说,无论依赖有效位(prd)如何设置,只要投机位(spc)设置为是,该指令就可以投机执行。
时间戳(stamp)用于表示存取指令间的理想执行顺序,其将会在存取指令执行时存入LQ或SB。需要说明的是,考虑到投机执行的情形,在实际执行程序时,存取指令并不一定按照时间戳指示的先后顺序执行。若不考虑投机执行的情形,存取指令应按照时间戳指示的先后顺序执行。
可选地,当读取指令被投机执行后,LQ141会分配给该投机读取的数据一个投机标识(identity,ID)(spcID)。通过该投机操作获取的数据可以称为投机数据,如果在后续的指令中继续传递该投机数据,则使用该投机数据和后续由该数据衍生出的数据的指令的投机ID(spcID)域段均可以携带该投机ID的信息,即携带投机源头的信息,以表示该数据是投机得到的,若投机错误则可以清除和更改后续的指令。
(2)LQ141
图6是本申请一实施例的LQ141的结构示意图。如图6所示,LQ141包括多个域段。上述多个域段的定义如下表2所示。需要说明的是,图6的各个域段仅仅作为示例,LQ141中还可以包括更多或更少的域段。
表2
域段 简称 定义
地址 addr 用于存储读取指令的地址域段,以用于查找内存
目的地 dest 读取数据返回后,将该数据送到目的地所指示的IB的参数域中
数据 data 用于暂存读取指令返还的数据
时间戳 stamp 用于表示存取指令的理想执行顺序。
投机ID spcID 用于指示该读取指令为投机执行的标识
存在位 rdy 表示数据已经从内存或SB返还
有效位 vld 表示该读取指令有效
当读取指令从IB111发射至LQ141之后,LQ141将记录该读取指令的地址、目的地等信息,并将该读取指令对应的有效位置为1,以表示该读取指令有效。
为了保证之后出现投机错误时,图计算装置100可以快速清空由该投机操作衍生的后续操作,LQ141将为该读取指令的投机操作分配一个投机ID。另外,若投机ID置为0,则表示该指令不是投机产生的。
可选地,LQ141生成的投机ID是独热码(one-hot code),独热码是指用一个比特位表示一种状态的码值。换句话说,存在多少个状态,独热码中就包括多少个比特。因此,独热码中的每一个比特位表示一个投机ID。
图7是本申请一实施例的投机ID的示意图。如图7所示,当某条指令的两个输入来自于不同的投机ID时,该指令的投机ID中需要同时保留两个ID。如果投机ID是独热码,只需要将两个投机ID进行相或,就可以得到新的投机ID。例如,一个投机ID为001,另一个投机ID为010,相或之后得到的投机ID为011,其仍然保留了两次投机操作的ID信息。若之后发现某一个投机ID投机失败后,图计算装置100在检查时只需要将错误的ID与当前正在执行的ID相与,如果结果不是全0,则代表该数据来自于投机错误的数据。
将该投机ID同时存放在LQ141和IB111对应的投机ID(spc ID)域段。在返还数据时,LQ141将投机ID和数据一起传输给读取指令的目的地。因此,每条指令的输入若来自投机操作,则能够快速地找到投机操作的源头。携带投机ID的作用是当图计算装置100发现某一次投机操作错误时,可以高效率地将所有使用来自于该投机操作的数据的指令停止并清空。
在一些示例中,在将读取指令写入LQ141的同时,图计算装置100可以使用读取指令中的时间戳和地址去查询SB142。如果SB142中有相同的地址的存储指令,并且该存储指令的时间戳小于当前读取指令的时间戳,即相同地址的存储指令先于读取指令,图计算装置100将把该存储指令在SB142中的数据,发送至LQ142中,并返回给读取指令的目的地。若SB142中没有相同地址的请求,或相同地址的存储指令的时间戳大于当前读取指令的时间戳,即相同地址的存储指令晚于该读取指令,则图计算装置100将该读取指令发射给内存。当内存返回数据之后,数据将被存入LQ142的数据(data)域段,并将数据返还给读取指令的目的地。
(3)SB142
图8是本申请一实施例的SB142的结构示意图。如图8所示,SB142包括多个域段。上述多个域段的定义如下表3所示。需要说明的是,图8的各个域段仅仅作为示例,SB142中还可以包括更多或更少的域段。
表3
域段 简称 描述
地址 Addr 存储指令的地址域段,通过地址可以将数据存入对应内存位置
数据 data 用于暂存存储指令的数据
时间戳 stamp 用于表示存取指令间的理想执行顺序。
投机ID spcID 表示存储指令的地址或数据直接或间接的来自投机读取操作
有效位 vld 表示存储指令有效
可选地,对于与存储指令的地址相同的读取指令,SB142根据读取指令的时间戳,从SB142中的地址相同的至少一个读取指令中选择小于该时间戳并与该时间戳最接近的存储指令,并返回数据,以节省读取内存的时间。其中,时间戳越小,在时间上的排序越靠前。
在一些示例中,除了记录存储指令的地址和数据外,SB142还通过投机ID记录上述地址和数据是否来自于投机数据。例如,若SB142接收到清除指令请求,该清除指令请求用于指示清除与目标投机ID存在关联关系的指令,SB142将会查找其缓存的存储指令中的投机ID是否与目标投机ID关联。若存在,则把该存储指令的有效位置0。若没有,则忽略该清除指令请求。当SB142接收到发射指令的指示后,SB142可以将其缓存的存储指令依次发射给内存。
接下来将介绍图计算装置侧的处理指令的方法。
图9是本申请一实施例的处理指令的方法的示意图。该方法可以由图计算装置100执行。图9中的IB可以为图1中的IB111,LQ可以为图1中的LQ141,SB可以为图1中的SB142,PE可以为图1中的PE110。图计算装置100用于执行基于数据依赖图的指令。该方法包括:
S401、IB将第一指令发射至所述LQ,所述第一指令用于请求读取数据,所述第一指令符合第一预设条件,所述第一预设条件包括:所述第一指令在所述IB中的投机位设置为是。
S402、IB确定第一投机ID,并将第一投机ID存入第一指令在所述IB的投机ID域段,第一投机ID用于指示当前投机操作。
其中,所述投机ID域段(即spcID)和投机位(即spc)的定义可以参见图5中的相关描述,此处不再赘述。
可选地,在投机位为是的情形包括第一指令对其它指令存在可投机依赖的情形。例如,第一指令和第三指令之间存在可投机依赖关系,所述可投机依赖关系表示所述第一指令存在依赖于所述第三指令的可能性,且所述第一指令可投机执行。
其中,所述第一指令和所述第三指令符合第二预设条件,所述第二预设条件包括所述第一指令的理想执行顺序在所述第三指令之后,所述第一指令对所述第三指令有存在内存依赖关系的可能性,所述内存依赖关系指存取指令之间存在由于操作同一地址导致的顺序依赖关系。
上述第二预设条件可以理解为第一指令和第三指令之间存在由于内存别名导致的依赖关系的可能性,但是编译器无法识别出确定的依赖关系,该依赖关系只能在指令运行之 后才能确定。对于存在可投机依赖关系的指令,图计算装置100可以执行投机操作。
可选地,所述第一预设条件还包括:所述第一指令在所述IB中的依赖有效位设置为是,依赖存在位置为否。表示第一指令与其它指令之间存在依赖关系,并且与其存在依赖关系的指令并未执行完成,但是第一指令可以进行投机操作。如果投机操作错误,则后续将重新执行该第一指令。
可选地,所述IB中还包括至少一个参数域、至少一个参数域有效位以及至少一个参数域存在位,所述至少一个参数域与所述至少一个参数域有效位一一对应,所述至少一个参数域有效位与所述至少一个参数域存在位一一对应,所述参数域用于存储当前指令的输入数据,所述参数域有效位用于指示其对应的参数域是否有效,所述参数域存在位用于指示其对应的参数域的数据是否已存在于缓存中。
进一步地,所述第一预设条件还包括:所述至少一个参数域有效位中的第一参数域有效位设置为是,所述第一参数域有效位对应的参数域存在位设置为是。假设第一参数域有效位对应于第一参数域,其表示第一指令的第一参数域有效并且输入数据已经存入第一参数域中。例如,对于读取指令来说,上述输入数据可以为读取指令的地址。
可选地,所述IB中还包括时间戳,所述时间戳用于指示存取指令的理想执行顺序,所述存取指令包括存储指令或者读取指令。其中,所述第一指令和所述第三指令的时间戳用于指示所述第一指令的理想执行顺序晚于所述第三指令。
可选地,所述SB和所述LQ中也包括所述时间戳。
其中,关于依赖有效位(即prd)、参数域(即op0/1)、参数域有效位(即vld0/1)、参数域存在位(即rdy0/1)以及时间戳(stamp)的定义可以参见前文中的相关描述,此处不再赘述。
在本申请实施例中,通过在图计算装置的IB中为指令设置投机位和投机ID域段,以便于图计算装置根据投机位判断该指令是否可以执行投机操作,并通过投机ID指示该次投机操作,从而能够利用投机ID标记投机源头,在出现投机错误时,只清除和重新执行与该投机ID有关联关系的指令,而避免重新执行与投机ID无关的指令,从而减少了投机错误的代价,提高了数据流架构执行存取指令的投机操作的效率。
进一步地,该方法还包括:在接收第一指令之后,LQ为第一指令分配第一投机ID,并将第一投机ID写入第一指令在LQ的投机ID域段;LQ将第一投机ID传输至第一指令在IB的投机ID域段。
可选地,上述投机ID为独热码。关于独热码的定义可以参见前文中的描述。
在本申请实施例中,投机ID采用了独热码,由于两个独热码在相或之后得到的新的投机ID依然可以保留之前的投机ID的信息,因此,在投机操作失败后,利用独热码可以快速找到投机操作的源头,以便于清除和重新执行与该投机操作有关的指令,以提高数据流架构执行存取指令的投机操作的效率。
可选地,在接收到第一指令之后,LQ根据第一指令,从SB或者内存查找并获取第一指令请求读取的数据。LQ可以首先根据第一指令中的地址在SB中查找数据,若不存在,则在内存中查找数据。
可选地,在获取第一指令请求读取的数据之后,LQ将第一投机ID传输至第二指令在IB中的投机ID域段,第二指令依赖于第一指令。
在投机操作之后,在与第一指令存在依赖关系的第二指令之中也携带该第一投机ID,以便于投机操作失败之后根据第一投机ID寻找来自该第一投机ID的指令。
可选地,如前文中的图7所示,如果第二指令还依赖于其它投机操作,则可以将两次投机操作对应的两个投机ID相或,将得到新的投机ID作为第二指令的投机ID。
可选地,该方法还包括:在所述IB向所述LQ发射所述第一指令之后,所述IB将所述第一指令的投机标记位(即sgo)设置为是,以表示该第一指令已经被投机发射。
可选地,该方法还包括:在获取所述第一指令请求读取的数据之后,所述IB发射第二指令,所述第二指令依赖于所述第一指令;在发射所述第二指令之后,所述IB在所述第二指令的投机标记位(即sgo)设置为是。即与第一指令存在依赖关系的后续指令也可以被投机标记位(sgo)标记。
可选地,图9的方法还包括:在发射第一指令之后,IB向SB发射第三指令。在发射第三指令之后,IB将第三指令的存储地址传递至第一指令在IB中的参数域,并且将第一指令在IB中的依赖存在位设置为是。换句话说,IB在发射第三指令之后,可以将第一指令的依赖存在位设置为是,以表示第一指令所依赖的指令已执行完成。
可选地,图9的方法还包括:IB根据第三指令的存储地址和第一指令的读取地址,确定第一投机ID对应的第一指令的投机操作是否错误;在第一投机ID对应的第一指令的投机操作错误的情况下,IB重新选择并重新向LQ发射第一指令。
例如,第一指令在IB中的第一参数域可用于存储第一指令的读取地址,第二参数域中可用于存储第三指令的存储地址。
具体地,若上述两个地址相等,则说明两条指令之间存在内存别名问题,该投机操作错误。若上述两个地址不相等,则说明该投机操作正确。
可选地,在第一投机ID对应的投机操作错误的情况下,图9的方法还包括:LQ重新为第一指令分配第二投机ID,并将第二投机ID写入LQ的投机ID域段;LQ将第二投机ID传输至第一指令在IB的投机ID域段。
可选地,图9的方法还包括:在第一指令的投机操作错误的情况下,IB向至少一个PE、LQ或SB广播第一指令的第一投机ID;至少一个PE、LQ以及SB将第一投机ID与自身正在执行的指令的投机ID进行比较,以判断两者是否存在关联关系;在存在关联关系的情况下,至少一个PE、LQ以及SB停止执行当前的指令,并停止传输当前指令的数据或依赖关系。
可以理解为,在第一指令的投机操作错误的情况下,IB向图计算装置100中的流水线广播第一指令的第一投机ID,以便于只清除与投机操作有关的指令,而避免清除在投机操作之后的与投机操作无关的指令,从而降低了投机失败的代价,提高了投机操作的效率。
上述至少一个PE可以包括IB所在的PE以及其它PE。向上述至少一个PE广播可以包括向至少一个PE内部的单元广播。例如,可以包括IB或者其它功能单元。
其中,上述判断是否存在关联关系的方式可以包括:判断正在执行的指令的投机ID与第一投机ID是否相同,或者是否来源于所述第一投机ID。例如,如前文所述,利用独热码的特性,将当前投机ID与第一投机ID相与,如果结果不是全0,则代表当前投机ID与第一投机ID存在关联关系。
在本申请实施例中,当出现投机错误的情况下,图计算装置可以根据第一投机ID追踪与该错误投机操作存在关联关系的后续的指令,并停止传输当前指令的数据和依赖关系,以保证指令的正确操作。并且利用投机ID可以只清除与错误投机操作有关的指令,而保留与该投机操作无关的指令,从而可以提高数据流架构存取指令的投机操作的效率。
图10是本申请一实施例的图计算装置100的指令执行流程的示意图。如图10所示,当图计算装置100开始运行时,IB从自身存储的指令中选择一条指令并发射该指令。该指令符合以下四种条件中的任意一条,并且在符合条件的指令中的时间戳中最小。上述四个条件包括:
条件1:指令的存在的输入都有效,并且输入不来于自投机操作。即该指令输入准备就绪,并且输入都不来自于投机数据。其中,指令的输入还包括依赖关系,即该指令不存在依赖关系或者其依赖的指令已执行完成。
条件2:该指令为可投机读取指令,且未被投机执行过。即投机位(spc)置为1,并且投机标记位(sgo)置为0。条件2表示图计算装置100将第一次投机执行读取指令。IB投机执行该读取指令,并将读取指令发射至LQ。
条件3:该指令为已经投机操作过的读取指令,并且之前的投机操作错误。即图计算装置确定该读取指令与存储指令的地址相同,两者之间存在依赖关系。在条件3的情况下,由于之前投机操作错误,因此在读取指令所依赖的存储指令执行之后,该读取指令的地址被更新,即该读取指令中的用于存储地址的参数域被更新。因此该参数域对应的参数域有效位(vld)、投机标记位(sgo)与依赖存在位(prdy)均为1。
条件3表示图计算装置100识别出了该读取指令之前发出的是错误的投机数据,因此需要重新执行该指令,并且将投机ID广播至图计算装置100的流水线中,以清除与该投机ID存在关系的指令。
条件4:该指令为非读取指令,并且之前的投机操作错误。条件4表示当前指令之前使用错误的投机数据执行过,因此需要重新执行该指令,并且传递新的投机ID。由于该指令的输入已更新,因此该指令的一个参数域对应的参数域有效位为1,并且投机标记位(sgo)为1。
在选择条件2或条件3的指令时,读取指令都将进入到LQ中,并由LQ分配一个投机ID。在获取数据之后,LQ将该数据发送给指令的目的地。后续对这条指令有依赖关系的指令均可以投机执行,但是后续指令的输入均需携带该投机ID。
可选地,在对读取指令执行可投机操作之后,若执行读取指令存在可投机依赖关系的存储指令,则该存储指令可以触发硬件识别出之前读取指令的投机操作是否失败。具体地,IB可以将该存储指令发射至SB中,并向存储指令指示的目的地发送地址和依赖信息其中,该目的地可以指有可能依赖于该存储指令的指令在IB中的参数域。IB将在接收到地址后比较存储指令传来的地址和其本身的地址。若地址相等,则表示该读取指令之前的投机是错误的。硬件将广播该读取指令的投机ID,使正在执行的对应于该投机ID的指令的操作停止。执行存储指令所获取的数据将一直存储于SB中,直到IB发送指示信号指示SB向内存发送其存储的数据。
根据上文所述,存储指令可能会触发IB识别出之前读取指令的投机失败。因此在存储指令执行之后,读取指令将被IB的选取逻辑重新选取,并从SB中读取到更新后的数据, 以及将数据发送至读取指令的目的地。在IB中依赖于该读取指令并且被投机执行过的指令也将会重新执行。这些投机失败的指令的投机标记位置为1,以表示该指令在之前使用投机数据执行过。其一个参数域(op0/1)的有效位置为1,表示该参数域更新了数据。这表示指令之前的数据来自于错误的投机,新的数据将会使这些指令重新执行。
接下来将结合具体实例来描述本申请实施例的投机机制以及投机出现错误之后的纠正机制。
图11是本申请一实施例的数据依赖图的示意图。图12至图19分别为图计算装置100执行图11的数据依赖图的不同阶段的状态示意图,其显示了图计算装置100执行指令过程中在IB111、LQ141和SB142中的信息内容。
如图11所示,该数据依赖图包括4个指令,这4个指令抽象为四个节点。其中,指令1(ST[x],a)为存储指令,用于指示将数据a存入内存中的地址x。指令2(LD[x+i])为读取指令,用于指示从内存中的地址x+i读出数据并传送给指令4。指令3(ST[z],b)为存储指令,用于指示将数据b存入内存中的地址z。指令4(addi)为加法指令,用于指示将两个输入相加,这两个输入分别来自指令2的数据以及常数1。
继续参见图11,编译器可以根据图2至图4中的规则增加节点之间的依赖关系。对于存储指令之后接读取指令的情形(即指令1后接指令2),由于编译器无法知道i的值,因此无法识别地址x+i与地址x是否相等,所以编译器对这两条指令建立可投机依赖关系(边线p)。
对于读取指令之后接存储指令(指令2后接指令3)的情形,虽然编译器识别不出地址z是否与x+i相等,但根据编译器侧的规则,读取指令后接存储指令之间不需要建立依赖关系。
对于指令1和指令3,编译器可以对存储指令之后接存储指令建立强保序关系。或者,也可以由硬件分析和建立强保序关系,编译器侧不对这两个指令建立依赖关系。
当编译器侧对图11中的数据依赖图完成初始配置之后,可以将相关指令存入IB111中。图12是图11的数据依赖图的指令在IB111中存储的信息的示意图。当程序开始执行时,图计算装置100可以选取输入有效指令开始执行。其中,输入有效可以指该指令的参数域中的数据均已准备完成,并且其依赖的指令也已经执行完成。
如图12所示,由于指令1至指令3为存取指令,因此需要分配时间戳。其在IB111中的时间戳依次为0、1、2,这表示了指令1至指令3的理想执行顺序。
如图12所示,指令1、指令3和指令4均缺乏执行的条件。其中,指令1缺少存储的数据和地址(op0/1为空)。指令3的输入数据已准备好,但是根据存储指令之间的强保序原则,指令3必须在指令1执行完成之后才能执行,指令3的依赖有效位(prdy)为1,依赖存在位(prdy)为0,并且投机位(spc)也为0。指令4缺少来自指令2的数据以作为其输入数据。
如图12所示,指令2缺少其可投机依赖的指令执行完成的标记,即依赖存在位(prdy)为0。但是指令2的投机位(spc)置为1。这表示图计算装置100不需要等待其可投机依赖的指令完成即可执行指令2。指令2的第一参数域(op0)对应的第一参数域有效位(vld0)为1,第一参数域(op0)对应的第一存在位(rdy0)也为1。第二参数域(op1)对应的第二参数域有效位(vld1)为0。这表示指令2只有一个输入数据,并且该输入数据已经 存在于第一参数域(op0),因此可以发射指令2。
其中,发射指令可以指IB111将指令的信息发射至其它单元以执行该指令。例如,若指令为存储指令,则可以将该指令发射至SB142。若该指令为读取指令,则将该指令发射至LQ141。若该指令为运算指令,则可以将该指令发射至PE110中的计算单元。
如图13所示,当IB111发射指令2之后,可以将指令2的第一存在位(rdy0)置为0,表示指令2的第一参数域(op0)中的数据已经不再有效。投机标记位(sgo)将置1,以表示指令2是投机执行的。
与此同时,指令2被发射至LQ141,LQ141可以记录指令2的信息,上述信息可以包括指令2的地址、目的地、时间戳等信息。
LQ141根据指令2中的地址,首先查找SB142,若SB142中没有指令2请求读取的数据,则向内存发射指令2,以请求获取数据。
LQ141还将为指令2分配投机ID,并将其写入LQ141的投机ID域段。其中,图13中以“001”表示该投机ID。LQ141还用于向IB111发送该投机ID,以便于写入IB111的投机ID(spc ID)域段。
在LQ141获取指令2请求的数据之后,可以将其存入LQ141的数据域段。其中,图13中以“72”表示该数据。
如图14所示,在LQ141获取指令2请求的数据之后,可以将该数据传输至指令4在IB111的第一参数域(op0),并将第一参数域(op0)对应的参数域存在位(rdy0)置为1。由于该数据是投机得到的,IB111可以将投机ID写入指令4的投机ID(spcID)域段。对于之后得到的指令4的结果,若其由此投机数据得到的,在结果传递时,也需要在目的地对应的投机ID(spcID)中写入该投机ID。
如图15所示,指令4的输入数据已准备完成,可以执行IB111中的指令4。在执行时,可以将指令4发射至PE110中的计算器件。由于指令4的数据来自于投机行为,因此投机标志位(sgo)被置为1。另外,在发射指令4之后,其参数域存在位(rdy0/1)将被置0,代表其该指令已经被发出。
如图16所示,在某个时钟周期之后,指令1的输入数据全部准备就绪。IB111将指令1的信息发送至SB142。SB142记录指令1的存储请求的地址、数据、时间戳、投机ID等信息。其中,指令1的地址为x,数据为a,时间戳为0。指令1并非投机行为,因此并没有为其分配投机ID,投机ID置0。
与此同时,IB111将指令1的参数域存在位(rdy0/1)置0,以表示参数域(op0/1)中存储的数据已发出。
如图17所示,在执行指令1之后,IB111向指令1的目的地发送依赖指示以及存储地址。所述依赖指示用于指示指令2可投机依赖的指令已经完成。所述存储地址为指令1的读取数据的地址。
可选地,上述向目的地发送依赖指示包括:将指令2在IB111中的依赖存在位(prdy)置为1,以指示指令2可投机依赖的指令已执行完成。
可选地,上述向目的地发送存储地址包括:将指令1的读取数据的地址写入指令2在IB111中的第二参数域(op1),并将第二参数域对应的参数域存在位(rdy1)置为1。
IB111将比较指令2在IB111中新写入的地址(存储于op1)其投机执行的地址(存 储于op0)是否相等。若两者相等,则表示指令2之前投机执行的读取操作是错误的。因此,IB111将重新执行指令2。若两者不相等,则无需重新执行指令2,并且将指令2在IB111中的第二参数域存在位(rdy1)以及依赖存在位(prdy)置0。
可选地,在执行指令1之后,IB111还可以将指令3在IB中的依赖存在位(prdy)置为1,以表示指令3所依赖的指令1已经执行完成。
可选地,根据存储指令之间的强保序原则,在指令1发射之后,才能执行指令3。如图17所示,指令3的依赖存在位(prdy)置为1,并且指令3的输入数据早已准备完成,因此,可以向SB142发射指令3。在指令3进入SB142之后,SB142可以记录指令3的地址、数据、时间戳、投机ID信息。其中,地址为z,数据为b,时间戳为2,由于存储指令的地址和数据都不是来自于投机行为,所以投机ID为0。
可选地,假设指令2之前的投机执行是错误的。IB111将向硬件的各级流水线广播指令2的投机ID(spcID)。各级流水线在收到该投机ID后,将对比正在执行指令的投机ID,以确定投机ID与流水线当前正在执行的指令的投机ID是否存在关联关系。
其中,上述各级流水线可以指图计算装置100中的用于执行指令的单元,包括但不限于,IB111、LQ141、SB142以及PE110。
例如,投机ID通常为独热码,若两个投机ID中的某一比特位都为1,则说明两者之间存在关联关系,因此需要停止执行当前指令并停止传输当前指令的数据或依赖关系。若两个投机ID中没有同时为1的比特位,则说明两者之间不存在关联关系,不影响任何执行操作。
如图18所示,在重新将指令2发射至LQ141后,LQ141将之前的指令2替换成新的指令2,并为指令2分配新的投机ID。例如,图18中新分配的投机ID表示为“010”。LQ141根据新的指令2重新查找SB142或者内存,并获取指令2所请求的数据。在图18中,该数据表示为数据a。
在指令2完成之后,因为指令4与指令2之间存在数据依赖关系,因此IB111需要重新执行指令4。如图19所示,当LQ141获取指令2请求的数据a之后,将数据a和投机ID再次传给指令2的目的地,并更新该目的地的投机ID。例如,指令2的目的地为指令4的第一参数域(op0),更新后的投机ID为“010”。
在指令4获取指令2传输的数据之后,其第一参数域(op0)的参数域存在位(rdy0)为1,第二参数域(op1)的参数域有效位(rdy1)为0,这代表第二参数域(op1)的数据还未准备好。但根据在前文中IB111的执行规则,由于IB111中的投机标记位(sgo)置为1,说明指令4曾经被投机执行过,因此第二参数域(op1)的数据一直存在于IB111当中,所以指令4可以重新执行。
在执行指令4之后,图11的数据依赖图中的指令已全部执行完毕。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装 置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (31)

  1. 一种处理指令的方法,其特征在于,所述方法应用于图计算装置,所述图计算装置基于数据流架构,所述图计算装置包括至少一个处理引擎PE和加载存储单元LSU,所述PE包括信息缓冲器IB,所述IB用于缓存指令队列,所述PE用于执行所述IB中缓存的指令;其中,所述IB中包括投机位和投机标识ID域段,所述投机位用于指示当前指令是否为可投机执行的指令,所述投机ID域段用于存储当前指令的一次投机操作的投机ID,所述LSU包括加载队列LQ,所述LQ用于缓存读取指令队列;
    所述方法包括:
    所述IB将第一指令发射至所述LQ,所述第一指令用于请求读取数据,所述第一指令符合第一预设条件,所述第一预设条件包括:所述第一指令在所述IB中的投机位设置为是;
    所述IB确定第一投机ID,并将所述第一投机ID存入所述第一指令在所述IB的投机ID域段,所述第一投机ID用于指示当前投机操作。
  2. 如权利要求1所述的方法,其特征在于,所述LQ中包括所述投机ID域段,所述方法还包括:
    在所述IB将所述第一指令发射至所述LQ之后,所述LQ为所述第一指令分配所述第一投机ID,并将所述第一投机ID写入所述第一指令在所述LQ的投机ID域段;
    所述LQ向所述IB发送所述第一投机ID;
    所述IB确定第一投机ID,包括:所述IB从所述LQ接收所述第一投机ID。
  3. 如权利要求2所述的方法,其特征在于,所述投机ID为独热码。
  4. 如权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    所述LQ根据所述第一指令,从所述SB或者内存查找并获取所述第一指令请求读取的数据。
  5. 如权利要求1至4中任一项所述的方法,其特征在于,还包括:
    在获取所述第一指令请求读取的数据之后,所述LQ将所述第一投机ID传输至第二指令在所述IB中的投机ID域段,所述第二指令依赖于所述第一指令。
  6. 如权利要求1至5中任一项所述的方法,其特征在于,所述IB中还包括投机标记位,所述投机标记位用于指示当前指令是否已经被投机发射,所述方法还包括:
    在所述IB向所述LQ发射所述第一指令之后,所述IB将所述第一指令的投机标记位设置为是。
  7. 如权利要求1至6中任一项所述的方法,其特征在于,所述IB中还包括投机标记位,所述投机标记位用于指示当前指令是否已经被投机发射,所述方法还包括:
    在获取所述第一指令请求读取的数据之后,所述IB发射第二指令,所述第二指令依赖于所述第一指令;
    在发射所述第二指令之后,所述IB在所述第二指令的投机标记位设置为是。
  8. 如权利要求1至7中任一项所述的方法,其特征在于,所述IB中还包括依赖有效位和依赖存在位,所述依赖有效位用于指示当前指令是否依赖于另一条指令执行完成之后 才执行,所述依赖存在位用于指示当前指令所依赖的指令是否执行完成;
    所述第一预设条件还包括:所述第一指令在所述IB中的依赖有效位设置为是,依赖存在位设置为否。
  9. 如权利要求1至8中任一项所述的方法,其特征在于,所述LSU中还包括存储缓冲器SB,所述SB用于缓存存储指令队列,所述方法还包括:
    在发射所述第一指令之后,所述IB向所述SB发射第三指令,所述第三指令为存储指令,其中,所述第一指令和所述第三指令符合第二预设条件,所述第二预设条件包括所述第一指令的理想执行顺序在所述第三指令之后,所述第一指令对所述第三指令有存在内存依赖关系的可能性,所述内存依赖关系指存取指令之间存在由于操作同一地址导致的顺序依赖关系;
    在发射所述第三指令之后,所述IB将所述第三指令的存储地址发送至所述LQ。
  10. 如权利要求9所述的方法,其特征在于,所述方法还包括:
    所述IB根据所述第三指令的存储地址和所述第一指令的读取地址,确定所述第一投机ID对应的所述第一指令的投机操作是否错误;
    在所述第一投机ID对应的所述第一指令的投机操作错误的情况下,所述IB重新向所述LQ发射所述第一指令。
  11. 如权利要求10所述的方法,其特征在于,所述方法还包括:
    在所述IB重新向所述LQ发射所述第一指令之后,所述LQ重新为所述第一指令分配第二投机ID,并将所述第二投机ID写入所述第一指令在所述LQ的投机ID域段;
    所述LQ将所述第二投机ID传输至所述第一指令在所述IB的投机ID域段。
  12. 如权利要求1至11中任一项所述的方法,其特征在于,所述方法还包括:
    在所述第一投机ID的对应的投机操作错误的情况下,所述IB向所述至少一个PE、所述LQ或所述SB广播所述第一投机ID;
    所述至少一个PE、所述LQ或所述SB将所述第一投机ID与自身正在执行的指令的投机ID进行比较,以判断两者是否存在关联关系;
    在存在关联关系的情况下,所述至少一个PE、所述LQ或所述SB停止执行所述正在执行的指令,并停止传输所述正在执行的指令的数据或依赖关系。
  13. 如权利要求1至12中任一项所述的方法,其特征在于,所述IB中还包括时间戳,所述时间戳用于指示存取指令的理想执行顺序,所述存取指令包括存储指令或者读取指令。
  14. 一种图计算装置,其特征在于,所述图计算装置基于数据流架构,所述图计算装置包括至少一个处理引擎PE和加载存储单元LSU,所述PE包括信息缓冲器IB,所述IB用于缓存指令队列,所述PE用于执行所述IB中缓存的指令;其中,所述IB中包括投机位和投机标识ID域段,所述投机位用于指示当前指令是否为可投机执行的指令,所述投机ID域段用于存储当前指令的一次投机操作的投机ID,所述LSU包括加载队列LQ,所述LQ用于缓存读取指令队列。
  15. 如权利要求14所述的装置,其特征在于,所述投机ID域段中的投机ID为独热码。
  16. 如权利要求14或15所述的装置,其特征在于,所述IB用于:将第一指令发射 至所述LQ,所述第一指令用于请求读取数据,所述第一指令符合第一预设条件,所述第一预设条件包括:所述第一指令在所述IB中的投机位设置为是;
    所述IB还用于:确定第一投机ID,并将所述第一投机ID存入所述第一指令在所述IB的投机ID域段,所述第一投机ID用于指示当前投机操作。
  17. 如权利要求16所述的装置,其特征在于,所述LQ中包括所述投机ID域段,所述LQ用于:在所述IB将所述第一指令发射至所述LQ之后,为所述第一指令分配所述第一投机ID,并将所述第一投机ID写入所述第一指令在所述LQ的投机ID域段;向所述IB发送所述第一投机ID;
    所述IB具体用于从所述LQ接收所述第一投机ID。
  18. 如权利要求16或17所述的装置,其特征在于,所述LQ用于根据所述第一指令,从所述SB或者内存查找并获取所述第一指令请求读取的数据。
  19. 如权利要求16至18中任一项所述的装置,其特征在于,所述LQ还用于:在获取所述第一指令请求读取的数据之后,将所述第一投机ID传输至第二指令在所述IB中的投机ID域段,所述第二指令依赖于所述第一指令。
  20. 如权利要求14至19中任一项所述的装置,其特征在于,所述IB中还包括投机标记位,所述投机标记位用于指示当前指令是否已经被投机发射。
  21. 如权利要求20所述的装置,其特征在于,所述IB还用于在向所述LQ发射所述第一指令之后,将所述第一指令的投机标记位设置为是。
  22. 如权利要求20或21所述的装置,其特征在于,所述IB还用于:在获取所述第一指令请求读取的数据之后,发射第二指令,所述第二指令依赖于所述第一指令;以及在发射所述第二指令之后,在所述第二指令的投机标记位设置为是。
  23. 如权利要求14至22中任一项所述的装置,其特征在于,所述IB中还包括依赖有效位和依赖存在位,所述依赖有效位用于指示当前指令是否依赖于另一条指令执行完成之后才执行,所述依赖存在位用于指示当前指令所依赖的指令是否执行完成;
  24. 如权利要求23所述的装置,其特征在于,所述第一预设条件还包括:所述第一指令在所述IB中的依赖有效位设置为是,依赖存在位设置为否。
  25. 如权利要求14至24中任一项所述的装置,其特征在于,所述LSU中还包括存储缓冲器SB,所述SB用于缓存存储指令队列。
  26. 如权利要求25所述的装置,其特征在于,所述IB还用于:在发射所述第一指令之后,向所述SB发射第三指令,所述第三指令为存储指令,其中,所述第一指令和所述第三指令符合第二预设条件,所述第二预设条件包括所述第一指令的理想执行顺序在所述第三指令之后,所述第一指令对所述第三指令有存在内存依赖关系的可能性,所述内存依赖关系指存取指令之间存在由于操作同一地址导致的顺序依赖关系;在发射所述第三指令之后,将所述第三指令的存储地址发送至所述LQ。
  27. 如权利要求26所述的装置,其特征在于,所述IB还用于:根据所述第三指令的存储地址和所述第一指令的读取地址,确定所述第一投机ID对应的所述第一指令的投机操作是否错误;在所述第一投机ID对应的所述第一指令的投机操作错误的情况下,重新向所述LQ发射所述第一指令。
  28. 如权利要求27所述的装置,其特征在于,所述LQ还用于:在所述IB重新向所 述LQ发射所述第一指令之后,重新为所述第一指令分配第二投机ID,并将所述第二投机ID写入所述第一指令在所述LQ的投机ID域段;将所述第二投机ID传输至所述第一指令在所述IB的投机ID域段。
  29. 如权利要求16至28中任一项所述的装置,其特征在于,所述IB还用于:在所述第一投机ID的对应的投机操作错误的情况下,所述IB向所述至少一个PE、所述LQ或所述SB广播所述第一投机ID;
    所述至少一个PE、所述LQ或所述SB用于:将所述第一投机ID与自身正在执行的指令的投机ID进行比较,以判断两者是否存在关联关系;以及在存在关联关系的情况下,停止执行所述正在执行的指令,并停止传输所述正在执行的指令的数据或依赖关系。
  30. 如权利要求14至29中任一项所述的装置,其特征在于,所述IB中还包括时间戳,所述时间戳用于指示存取指令的理想执行顺序,所述存取指令包括存储指令或者读取指令。
  31. 一种计算机存储介质,其特征在于,包括指令,其特征在于,当所述指令在图计算装置上运行时,使得所述图计算装置执行如权利要求1至13中任一项所述的方法。
PCT/CN2020/127243 2020-11-06 2020-11-06 处理指令的方法以及图计算装置 WO2022094964A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202080106413.1A CN116348850A (zh) 2020-11-06 2020-11-06 处理指令的方法以及图计算装置
PCT/CN2020/127243 WO2022094964A1 (zh) 2020-11-06 2020-11-06 处理指令的方法以及图计算装置
EP20960425.5A EP4227801A4 (en) 2020-11-06 2020-11-06 INSTRUCTION PROCESSING METHOD AND GRAPHFLOW APPARATUS
US18/312,365 US20230297385A1 (en) 2020-11-06 2023-05-04 Instruction Processing Method and Graphflow Apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/127243 WO2022094964A1 (zh) 2020-11-06 2020-11-06 处理指令的方法以及图计算装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/312,365 Continuation US20230297385A1 (en) 2020-11-06 2023-05-04 Instruction Processing Method and Graphflow Apparatus

Publications (1)

Publication Number Publication Date
WO2022094964A1 true WO2022094964A1 (zh) 2022-05-12

Family

ID=81458525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/127243 WO2022094964A1 (zh) 2020-11-06 2020-11-06 处理指令的方法以及图计算装置

Country Status (4)

Country Link
US (1) US20230297385A1 (zh)
EP (1) EP4227801A4 (zh)
CN (1) CN116348850A (zh)
WO (1) WO2022094964A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024046018A1 (zh) * 2022-09-02 2024-03-07 上海寒武纪信息科技有限公司 指令控制方法、数据缓存方法及相关产品

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269016A (zh) * 2022-09-27 2022-11-01 之江实验室 一种用于图计算的指令执行方法及装置
CN117827285B (zh) * 2024-03-04 2024-06-14 芯来智融半导体科技(上海)有限公司 向量处理器访存指令缓存方法、系统、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872299A (zh) * 2010-07-06 2010-10-27 浙江大学 冲突预测实现方法及所用冲突预测处理装置事务存储器
CN102662634A (zh) * 2012-03-21 2012-09-12 杭州中天微系统有限公司 非阻塞发射和执行的存储器访问执行装置
CN102722341A (zh) * 2012-05-17 2012-10-10 杭州中天微系统有限公司 存储载入单元投机执行控制装置
CN104951281A (zh) * 2014-03-28 2015-09-30 英特尔公司 用于实现动态无序处理器流水线的方法和装置
US20160328237A1 (en) * 2015-05-07 2016-11-10 Via Alliance Semiconductor Co., Ltd. System and method to reduce load-store collision penalty in speculative out of order engine

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5799179A (en) * 1995-01-24 1998-08-25 International Business Machines Corporation Handling of exceptions in speculative instructions
US6098166A (en) * 1998-04-10 2000-08-01 Compaq Computer Corporation Speculative issue of instructions under a load miss shadow
US7051195B2 (en) * 2001-10-26 2006-05-23 Hewlett-Packard Development Company, L.P. Method of optimization of CPU and chipset performance by support of optional reads by CPU and chipset
CN106227507B (zh) * 2016-07-11 2019-10-18 北京深鉴智能科技有限公司 计算系统及其控制器
US20180341488A1 (en) * 2017-05-26 2018-11-29 Microsoft Technology Licensing, Llc Microprocessor instruction predispatch before block commit
US10515049B1 (en) * 2017-07-01 2019-12-24 Intel Corporation Memory circuits and methods for distributed memory hazard detection and error recovery

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872299A (zh) * 2010-07-06 2010-10-27 浙江大学 冲突预测实现方法及所用冲突预测处理装置事务存储器
CN102662634A (zh) * 2012-03-21 2012-09-12 杭州中天微系统有限公司 非阻塞发射和执行的存储器访问执行装置
CN102722341A (zh) * 2012-05-17 2012-10-10 杭州中天微系统有限公司 存储载入单元投机执行控制装置
CN104951281A (zh) * 2014-03-28 2015-09-30 英特尔公司 用于实现动态无序处理器流水线的方法和装置
US20160328237A1 (en) * 2015-05-07 2016-11-10 Via Alliance Semiconductor Co., Ltd. System and method to reduce load-store collision penalty in speculative out of order engine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4227801A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024046018A1 (zh) * 2022-09-02 2024-03-07 上海寒武纪信息科技有限公司 指令控制方法、数据缓存方法及相关产品

Also Published As

Publication number Publication date
EP4227801A1 (en) 2023-08-16
CN116348850A (zh) 2023-06-27
US20230297385A1 (en) 2023-09-21
EP4227801A4 (en) 2023-11-15

Similar Documents

Publication Publication Date Title
WO2022094964A1 (zh) 处理指令的方法以及图计算装置
US11340903B2 (en) Processing method, device, equipment and storage medium of loop instruction
US20130024631A1 (en) Method and apparatus for realtime detection of heap memory corruption by buffer overruns
JP2013239166A (ja) ロードストア依存関係予測器のコンテンツマネージメント
US10216516B2 (en) Fused adjacent memory stores
US9280349B2 (en) Decode time instruction optimization for load reserve and store conditional sequences
RU2635044C2 (ru) Режим слежения в устройстве обработки в системах трассировки команд
US20130326200A1 (en) Integrated circuit devices and methods for scheduling and executing a restricted load operation
US11748109B2 (en) System and method for implementing strong load ordering in a processor using a circular ordering ring
WO2014190699A1 (zh) 一种cpu指令处理方法和处理器
JP4137735B2 (ja) 動的遅延演算情報を使用して制御投機ロードの即時遅延を制御する方法およびシステム
US20230305742A1 (en) Precise longitudinal monitoring of memory operations
US20150309796A1 (en) Renaming with generation numbers
US9116719B2 (en) Partial commits in dynamic binary translation based systems
US10891130B2 (en) Implementing a received add program counter immediate shift (ADDPCIS) instruction using a micro-coded or cracked sequence
US20200034152A1 (en) Preventing Information Leakage In Out-Of-Order Machines Due To Misspeculation
US20230088780A1 (en) Profiling of sampled operations processed by processing circuitry
US9710389B2 (en) Method and apparatus for memory aliasing detection in an out-of-order instruction execution platform
US8589735B2 (en) Creating randomly ordered fields while maintaining the temporal ordering based on the value of the fields
US20190384608A1 (en) Arithmetic processor and control method of arithmetic processor
CN110347400B (zh) 编译加速方法、路由单元和缓存
US9164761B2 (en) Obtaining data in a pipelined processor
CN113488099A (zh) Dsp寄存器访问冲突处理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20960425

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020960425

Country of ref document: EP

Effective date: 20230509

NENP Non-entry into the national phase

Ref country code: DE