WO2015024432A1 - Procédé et dispositif de programmation d'instructions - Google Patents

Procédé et dispositif de programmation d'instructions Download PDF

Info

Publication number
WO2015024432A1
WO2015024432A1 PCT/CN2014/083603 CN2014083603W WO2015024432A1 WO 2015024432 A1 WO2015024432 A1 WO 2015024432A1 CN 2014083603 W CN2014083603 W CN 2014083603W WO 2015024432 A1 WO2015024432 A1 WO 2015024432A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
candidate
instructions
slot
queue
Prior art date
Application number
PCT/CN2014/083603
Other languages
English (en)
Chinese (zh)
Inventor
黄磊
连瑞琦
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015024432A1 publication Critical patent/WO2015024432A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level

Definitions

  • the present invention relates to the field of communications, and in particular, to an instruction scheduling method and apparatus.
  • the functional components in the CPU are usually independent and parallel, so the compiler uses the instruction scheduling method to improve the instruction level parallelism based on the CPU structure.
  • the instruction scheduling is a technique in which instructions are executed in parallel, and the compiler or the machine hardware increases the number of execution instructions of the machine per beat by adjusting the order of the instructions, which is the machine execution simulated by the compiler when compiling the source program.
  • the clock cycle of the instruction In the existing compilation technology, a table scheduling algorithm is usually used to implement instruction scheduling, and a candidate instruction queue is usually used.
  • the data dependency graph is composed of a plurality of nodes, each node represents an instruction, and the data dependency graph can be used to represent a dependency between the instructions. relationship.
  • the priority of each instruction is then calculated, and then the instructions in the data dependency graph are scheduled on a beat-by-shot basis.
  • the instruction that finds zero degree in the data dependency graph is added to the candidate instruction queue; and the other candidate instruction queues are set to be empty; specifically, the scheduling method for each beat is: according to the instruction priority Selecting an instruction from the candidate instruction queue to fill in the instruction slot in turn, and updating the candidate instruction queue; filling the instruction instruction slot for failing to select the instruction slot filled in by the instruction; updating the beat after scheduling the instruction slot within one beat Updating the candidate instruction queue, repeating the above steps to perform scheduling within one beat, and ending until all instructions in the data dependency graph complete the scheduling.
  • multi-core processors are composed of multiple single-core processors.
  • the structure of a single core tends to be simple, and the organization of serial functional components, even functional component arrays. If the instruction scheduling method on the multi-core processor is completed by using the table scheduling method in the prior art, when the instruction is executed, the instruction having the dependency may be executed in the same shot or the next instruction that depends on the instruction precedes the instruction. Execution occurs, which can cause the processor to run incorrectly or the pipeline to stall, and the scheduling is less correct.
  • Embodiments of the present invention provide an information display method and device, which are capable of
  • an instruction scheduling method which is applied to an instruction scheduling apparatus, and includes: constructing a data dependency graph;
  • the n represents the number of instruction slots in a very long instruction word
  • the n is an integer greater than or equal to 1
  • the m represents the number of super long instruction words per beat
  • m is an integer greater than or equal to 1
  • t is an integer greater than or equal to 1 and less than or equal to n-1.
  • the method further includes:
  • Each of the super long instruction words is executed in accordance with an arrangement order of the respective instructions in the super long instruction word.
  • the method further includes:
  • n+1 candidate instruction queues wherein the n+1 candidate instruction queues are the first to n+1th candidate instruction queues, respectively;
  • the n+1 candidate instruction queues are initialized such that the n+1 candidate instruction queues are all empty.
  • the relationship between the long instruction words is parallel execution, and the instruction of the tth instruction slot of any super long instruction word of the next two shots is the t + of the super long instruction word of the previous shot.
  • the h instructions are deleted from the first candidate instruction queue;
  • the relationship between the long instruction words is parallel execution, and the instruction of the tth instruction slot of any super long instruction word of the next two shots is the t + of the super long instruction word of the previous shot. There is no dependency between the instructions of one instruction slot.
  • P is an integer greater than 0,
  • the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
  • Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words Put a null operation instruction into the instruction slot that has not been filled in the first instruction slot.
  • the true dependencies with the instructions in the q-1th instruction slot and satisfying both the time delay and the resource requirements include:
  • an instruction scheduling apparatus including:
  • a scheduling unit configured to respectively extract k instructions from the data dependency graph for scheduling to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are in parallel execution relationship, adjacent There is no dependency between the instruction of the tth instruction slot of any of the super long instruction words of the next shot in the two shots and the instruction of the t+1th instruction slot of any of the super long instruction words of the previous shot;
  • Q ⁇ A ⁇ m X n the n represents the number of instruction slots in a very long instruction word, the n is an integer greater than or equal to 1, and the m represents the number of super long instruction words per beat.
  • the m is an integer greater than or equal to 1
  • the t is an integer greater than or equal to 1 and less than or equal to n-1.
  • the instruction scheduling apparatus further includes: an executing unit, configured to execute each of the super long instruction words according to an arrangement order of each instruction in the super long instruction word instruction.
  • the instruction scheduling apparatus further includes:
  • An establishing unit configured to establish n+1 candidate instruction queues, where the n+1 candidate instruction queues are first to n+1th candidate instruction queues, respectively;
  • an initializing unit configured to initialize the n+1 candidate instruction queues, so that the n+1 candidate instruction queues are all empty.
  • the scheduling unit is specifically configured to: when performing the 0th beat scheduling,
  • Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q ⁇ m;
  • the scheduling unit is specifically configured to: when performing the P-th scheduling, P is an integer greater than 0,
  • the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
  • Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q ⁇ m;
  • Extracting a new instruction with zero degree of inference in the data dependency graph to obtain a second candidate instruction queue performing the following steps:
  • the true dependencies with the instructions in the q-1th instruction slot and satisfying both the time delay and the resource requirements include:
  • An embodiment of the present invention provides an instruction scheduling method and apparatus, including: constructing a data dependency graph; extracting k instructions from the data dependency graph to perform scheduling to obtain m super long instruction words for each beat, so that the same shot is performed within the same shot
  • the long instruction word is a parallel execution relationship.
  • the instruction of the tth instruction slot of any super long instruction word in the next two shots is the tth of any super long instruction word of the previous shot.
  • FIG. 1 is a schematic flowchart of an instruction scheduling method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a data dependency graph according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of another instruction scheduling method according to an embodiment of the present disclosure
  • 4 is a schematic diagram of another data dependency graph according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of execution of command transmission according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of another instruction scheduling apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of still another instruction scheduling apparatus according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of still another instruction scheduling apparatus according to an embodiment of the present invention.
  • the embodiment of the invention provides an instruction scheduling method, which is applied to an instruction scheduling device, as shown in FIG. 1 , and includes:
  • Step 101 Construct a data dependency graph.
  • the data dependency graph may be a DAG (Directed acyclic ic graph), and the method for constructing the data dependency graph is the same as the prior art, which is not described in the present invention.
  • DAG Directed acyclic ic graph
  • Step 102 Extract k instructions from the data dependency graph to perform scheduling to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are in parallel execution relationship, and adjacent two There is no dependency between the instruction of the tth instruction slot of any very long instruction word after one shot and the instruction of the t+1th instruction slot of any super long instruction word of the previous shot.
  • the dependencies may include: positive correlation, inverse correlation, and output correlation, and the positive correlation dependencies are also referred to as true dependencies, and the true dependencies include one-to-one dependencies, many-to-one dependencies, and a pair Multiple dependencies and many-to-many dependencies.
  • the one-to-one dependency relationship is two instructions in the order of existence, the result number of the previous one is only used by the latter instruction, and one of the operands of the latter instruction is determined by the previous one.
  • the many-to-one dependency is a plurality of instructions having a sequential order, and the number of results of the previous plurality of instructions is only used by the latter instruction, and an operand of the subsequent instruction is determined by the previous plurality of instructions. .
  • the one-to-many dependency is There are multiple instructions in the order, the result number of the previous one is used by the following multiple instructions, and one of the operands of the subsequent multiple instructions is determined by the previous one.
  • the many-to-many dependency is a plurality of instructions having a sequential order, and the number of results of the preceding plurality of instructions is used by a plurality of subsequent instructions, and an operand of the plurality of subsequent instructions is defined by a plurality of instructions preceding of.
  • the n represents the number of instruction slots in a very long instruction word
  • the n is an integer greater than or equal to 1
  • the m is an integer greater than or equal to 1
  • the t is An integer greater than or equal to 1 and less than or equal to n-1.
  • the instruction scheduling apparatus may be a compiler, and the instruction scheduling method is applicable to instruction scheduling of a compiler having a serial function processor.
  • the instruction scheduling device performs instruction scheduling in units of beats, each shot includes m super long instruction words, that is, the instruction scheduling device has a transmission width of m, and each super long instruction word includes n instruction slots, that is, can be placed n instructions.
  • the ultra-long instruction word in this embodiment is VLIW (Very Long Instruction Word), which is an architecture that utilizes instruction-level parallelism.
  • the instruction scheduling method in order to make the long-length instruction words in the same beat have a parallel execution relationship, the t-th of any super long instruction word of the next two shots in the adjacent two beats There is no dependency between the instruction of the instruction slot and the instruction of the t+1th instruction slot of any of the super long instruction words of the previous shot.
  • the instruction scheduling can be implemented by establishing a plurality of candidate instruction queues. After constructing the data dependency graph, n+1 candidate instruction queues may be established, and the n+1 candidate instruction queues are respectively the first to n+1th candidate instruction queues; then the n+1 candidate instruction queues are initialized, The n+1 candidate instruction queues are all empty.
  • the instruction with zero degree of entry has no precursor node or its location in the data dependency graph
  • the already scheduled instruction refers to an instruction that has been placed in an instruction slot of a very long instruction word.
  • the precursor node of instruction a is the node of all the opposite ends of the directed arrow pointing to instruction a on the data dependency graph.
  • the instruction a degree of zero means that the instruction a has no precursor node on the data dependency graph or its precursor node has been scheduled.
  • the data dependency graph is a directed acyclic graph, which is composed of a group of nodes and a directed acyclic edge of the connected node.
  • each node can represent a machine instruction, and the directed acyclic edge represents a dependency relationship between the instructions.
  • the dependencies have a positive correlation, an inverse correlation, and an output correlation, and the positive correlation is also referred to as a true dependency.
  • the edge of each node is marked with weight information indicating the dependency, that is, the delay, and the delay information indicates the time interval between the transmission of the previous instruction and the transmission of the next instruction.
  • a 1 as shown in FIG. 2 indicates that the instruction a1 is transmitted to the instruction a2 and the transmission must be separated by 1 clock cycle.
  • the directed acyclic edge indicates that the time interval between the transmission of the instruction aO and the transmission of the instruction a2 must be 2 clock cycles
  • 3 in Fig. 2 indicates that the time interval between the transmission of the instruction a2 and the transmission of the instruction a3 must be 3 clock cycles.
  • the directed acyclic edge is in the form of a directed arrow
  • the directed arrow indicates a dependency relationship between instructions, and the directed arrow is directed to the subsequent instruction by the predecessor instruction, that is, the execution of the subsequent instruction depends on the predecessor instruction, such as aO is the precursor command of a2, and a2 is the successor instruction of aO.
  • Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q ⁇ m;
  • the h instructions are scheduled, and the h instructions are deleted from the first candidate instruction queue, and correspondingly, the h instructions are scheduled.
  • the instruction therefore, the instruction in the data dependency graph that the degree of entry is zero after the h instructions are scheduled, that is, the newly added instruction with zero degree of incompatibility, the newly added instructions with zero degree of inclusive are not in the first In the candidate queue, Therefore, the newly added instruction with zero degree of inference can be extracted from the data dependency graph to obtain the second candidate instruction queue.
  • the instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies the time Delay and resource requirements, 0 ⁇ ⁇ ⁇ m;
  • the procedure cl deletes the h instructions from all candidate instruction queues, and correspondingly, the h instructions are scheduled instructions, and therefore the data is dependent.
  • the instruction in the figure that the degree of entry is zero after the h instructions are scheduled, that is, the newly added instruction with zero degree of incompatibility, these newly added instructions with zero degree of incompatibility are not in the qth candidate queue, therefore, Let q q+ l in step dl, and extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue.
  • P is an integer greater than 0,
  • the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
  • Extracting the highest priority from the first candidate instruction queue and satisfying the time delay and resource requirements h instructions are respectively placed in the first instruction slot of each very long instruction word, and a null operation instruction is placed in the instruction slot that has not been filled in the first instruction slot of each of the super long instruction words, Q ⁇ ⁇ m ;
  • the h instructions are scheduled, and the h instructions are deleted from the first candidate instruction queue, and correspondingly, the h instructions are scheduled.
  • the instruction therefore, the instruction in the data dependency graph that the degree of entry is zero after the h instructions are scheduled, that is, the newly added instruction with zero degree of incompatibility, the newly added instructions with zero degree of inclusive are not in the first In the candidate queue, the newly added instruction with zero degree of inference can be extracted from the data dependency graph to obtain the second candidate instruction queue.
  • step c2 deletes the h instructions from all candidate instruction queues, and correspondingly, the h instructions are scheduled instructions, and thus the data is dependent.
  • the h instructions are satisfied according to the following: when the instructions are in a one-to-one true dependency with the instructions in the q-1th instruction slot, and the time delay and resource requirements are simultaneously satisfied to schedule the instruction, It saves the registers that store the number of instructions in the previous instruction slot, saving hardware resources and improving performance.
  • Calculating the priority of each instruction in the data dependency graph may be calculated according to a certain heuristic rule, and the heuristic rule may include a maximum distance of the instruction, an execution delay of the instruction, an earliest start time of the instruction, and a latest start of the instruction. Time, whether instructions on critical paths, etc., different compilers may choose different heuristic rules.
  • the true dependencies include one-to-one dependencies, many-to-one dependencies, one-to-many dependencies, and many-to-many dependencies.
  • the one-to-one dependency is two instructions in the order of precedence.
  • the result number of the previous one is only used by the latter instruction, and one of the operands of the latter instruction is determined by the previous one.
  • instruction a2 and instruction a3 satisfy a one-to-one dependency, that is, the number of results of instruction a2 is only used by a3, and an operand of instruction a3 is determined by a2.
  • the instruction aO and the instruction a2 satisfy the one-to-one dependency.
  • the result number of the instruction aO is only used by a2, and the certain operand of the instruction a2 is determined by aO.
  • the instruction al and the instruction a2 satisfy a one-to-one dependency.
  • the result number of the instruction al is only used by a2, and the other operand of the instruction a2 is determined by al.
  • Resource requirements include: One-to-one dependency on instructions in the q-1th instruction slot while satisfying both time delay and resource requirements.
  • the instruction that satisfies the one-to-one dependency of the instruction of the previous instruction slot can be preferentially scheduled, so that a register storing the result number of the instruction of the previous instruction slot can be saved. , simplify the process of instruction scheduling.
  • the available resources in the instruction dispatching device change after each instruction is dispatched, and the available resources include functional components, registers, instruction windows, and the like that execute instructions in the CPU.
  • the scheduler Before scheduling each instruction, the scheduler needs to query the resource usage table to obtain a schedule of a suitable next instruction.
  • the resource usage table includes available resources of the current machine, and the resource usage table is changed in real time, reflecting The time at which each resource was released. Therefore, when performing step 102, the instruction scheduling apparatus needs to determine whether the delay time is satisfied between the two instructions having the dependency relationship, and whether the resource provided by the current CPU satisfies the resource requirements of the scheduled instructions. .
  • step 102 the method further includes:
  • Each of the super long instruction words is executed in accordance with an arrangement order of the respective instructions in the super long instruction word.
  • the method includes:
  • Step 301 Construct a data dependency graph.
  • the instructions in the data dependency graph are: b0, bl, b2, c0, cl, c2, c3, and c4. Assume that the data dependency graph constructed according to the dependencies between the instructions of the above instruction is as shown in Fig. 4.
  • Step 302 Calculate a priority of all instructions in the data dependency graph.
  • Step 303 Establish five candidate instruction queues.
  • the five candidate instruction queues are the first to fifth candidate instruction queues, respectively.
  • Step 304 Initialize the five candidate instruction queues, so that the five candidate instruction queues are all empty.
  • Step 305 Perform instruction scheduling according to the data dependency graph by using the five candidate instruction queues.
  • embodiments of the present invention assume that all instructions meet resource requirements.
  • each shot includes one super long instruction word, and when the transmission width of the instruction scheduling apparatus is 1, the specific steps are as follows:
  • the instructions in the candidate instruction queue are b0, bl, b2, c0, cl, c2, c3, and c4, and the candidate instruction queue can be obtained by combining the data dependency graph.
  • the instructions in the current zero degree of ingress are b0, c0, and cl
  • the first candidate instruction queue includes instructions b0, c0, and cl
  • the priorities are 5, 7, 7, respectively. That is, the first candidate instruction queue ⁇ C 0,cl,b0 ⁇ , the second, third, fourth, and fifth candidate instruction queues are all set to be empty.
  • the scheduling c0 or cl can be selected, and both of the time delay requirements are met.
  • the instruction scheduling device is a specific compiler, The characteristics of the functional components required by the different instructions or other factors may be considered to further determine the priority between c0 and cl.
  • This embodiment assumes that the scheduling c0 is selected here. After c0 is dispatched, c0 is removed from all candidate instruction queues. Check the data dependency graph, c0's successor instruction is c2, since c2 also depends on c l, and c l has not been scheduled, so c2 can not join the candidate queue.
  • the candidate instruction queues at this time are: ⁇ c l,b0 ⁇ , empty, empty, empty, empty.
  • the instruction is selected from the first and second candidate instruction queues, and the priority is prioritized, c1, and there is no unscheduled instruction with the same priority as cl, and cl satisfies the time delay requirement, and fills in the second Command slots.
  • c l remove c l from all candidate instruction queues.
  • the subsequent instruction of c l is c2. Since c0 and c l depend on c2 have been scheduled, add c2 to the third candidate instruction queue to prepare for the scheduling of the third instruction slot.
  • the candidate instruction queues are: ⁇ b0 ⁇ , empty, ⁇ c2 ⁇ , empty, empty.
  • For the third instruction slot select the instruction from the first, second, and third candidate instruction queues.
  • the first priority dispatches c2, and there is no unscheduled instruction with the same priority as c2, and since c2 depends on cl and c0, it needs to be executed after c0, cl, and c2 fills in the third instruction slot to satisfy the delay requirement. Therefore, c2 satisfies the time delay requirement and fills in the third instruction slot.
  • c2 is removed from all candidate instruction queues.
  • Check the data dependency graph. The c3 command depends on c2. At this point, the predecessor has been scheduled, and the c3 instruction is added to the fourth candidate instruction queue.
  • the candidate instruction queues are: ⁇ b0 ⁇ , empty, empty, ⁇ c3 ⁇ , null.
  • the instruction is selected from the first, second, third, and fourth candidate instruction queues, b0 is prioritized according to priority, and there is no unscheduled instruction with the same bO priority, and since bO is not dependent
  • bO fills in the 4th instruction slot to meet this delay requirement, so bO meets the time delay requirement and fills in the 4th instruction slot.
  • b0 is removed from all candidate instruction queues. Check the data dependency graph. Depending on the bO bl instruction, the predecessor has been scheduled and the bl instruction is added to the fifth candidate instruction queue.
  • the candidate instruction queues are: empty, empty, empty, ⁇ c3 ⁇ , ⁇ bl ⁇ .
  • the 1st, 2nd, 3rd, 4th, and 5th candidate instruction queues are: empty, empty, empty, ⁇ c3 ⁇ , ⁇ bl ⁇ .
  • each candidate command queue is The inter-movement instruction, that is, starting from the second candidate instruction queue, sequentially placing the instruction in the second candidate instruction queue to the n+1th candidate instruction queue into the previous candidate instruction queue, and then first The 2, 3, 4, 5 candidate instruction queues are: empty, empty, ⁇ c3 ⁇ , ⁇ bl ⁇ , null.
  • scheduling is performed according to the priority selection instruction from the first candidate instruction queue. Since the first candidate instruction queue is empty, no instruction is available for scheduling, and then the null operation (nop) is filled in, and the candidate instruction queue is not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first and second candidate instruction queues. Since the first and second candidate instruction queues are empty, no instruction is available for scheduling, and then a null operation is filled in. The instruction queue does not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first, second, and third candidate instruction queues, wherein the first and second candidate instruction queues are empty, and the third candidate instruction queue includes the c3 instruction.
  • Dependency c3 must be executed after the third beat of c2, and the interval between the third instruction slots c3 and c2 is 1, less than 3 beats, and the time delay is not satisfied, so c3 cannot be scheduled, so fill in For null operations, the candidate instruction queue does not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first, second, third, and fourth candidate instruction queues, wherein the first and second candidate instruction queues are empty, and the third candidate instruction queue includes c3.
  • the instruction, the fourth candidate instruction queue contains the bl instruction.
  • c3 must be executed after the third beat of c2, and the time interval of the fourth instruction slot c3 and c2 is 2, less than 3 beats.
  • C3 does not satisfy the time delay.
  • c3 cannot be scheduled, and bl needs to be executed after b0, so bl satisfies the delay requirement, and then fills in the bl instruction.
  • Check the data dependency graph relying on the bl instruction b2 at this point the predecessor has been scheduled to add the b2 instruction to the fifth candidate instruction queue.
  • the candidate instruction queues are: empty, empty, ⁇ c3 ⁇ , empty, 2 ⁇ .
  • the first, second, third, fourth, and fifth candidate instruction queues are: empty, empty, ⁇ c3 ⁇ , empty, ⁇ b2 ⁇ .
  • scheduling is performed according to the priority selection instruction from the first candidate instruction queue. Since the first candidate instruction queue is empty, no instruction is available for scheduling, and then the null operation is filled in, and the candidate instruction queue does not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first and second candidate instruction queues.
  • c3 must be executed after the third beat of c2, and placed in the second instruction slot.
  • the time interval between c3 and c2 is 1, less than 3 beats, so the c3 instruction does not satisfy the time delay.
  • C3 cannot be scheduled here, so the null operation is filled in, and the candidate instruction queue does not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first, second, and third candidate instruction queues.
  • c3 must be executed after the third shot of c2, and is placed in the third.
  • the time interval between the instruction slots c3 and c2 is 2, less than 3 beats, and the c3 command does not satisfy the time delay.
  • c3 cannot be scheduled, so the null operation is filled in, and the candidate instruction queue does not need to be updated.
  • the fourth instruction slot scheduling is performed according to the priority selection instruction from the first, second, third, and fourth candidate instruction queues, the second candidate instruction queue includes the c3 instruction, and the fourth candidate instruction queue includes the b2 instruction.
  • Dependency c3 must be executed after the third shot after c2 is executed, and the time interval between the fourth instruction slots c3 and c2 is exactly 3 beats, and the c3 command satisfies the delay requirement.
  • b2 must be executed after the next shot and after the bl execution, and the time interval between the second instruction slot b2 and bl is exactly 1 beat, so the b2 instruction also satisfies the delay requirement and selects according to the priority.
  • An instruction, b2 and c3 have a priority of 3, and can be arbitrarily selected to fill in the fourth instruction slot.
  • the two instructions have instructions that have a one-to-one true dependency on the previous instruction slot. If there is, then the instruction is dispatched to the current instruction slot preferentially because the previous instruction slot is filled in. It is a null operation, so there is no real dependency. Assume that b2 is scheduled here. Then fill in the b2 instruction and remove b2 from all candidate instruction queues.
  • the instruction c4 that depends on b2 also depends on the c3 instruction, and c3 has not been scheduled yet, so the c4 instruction cannot yet be added to the fifth candidate instruction queue.
  • the candidate instruction queues are: empty, ⁇ c3 ⁇ , empty, empty, empty.
  • the 1st, 2nd, 3rd, 4th, and 5th candidate command queues are: empty, ⁇ c3 ⁇ , empty, empty, and empty.
  • the command is moved between the candidate command queues, that is, the second candidate command queue is sequentially started from the second candidate command queue.
  • the instructions in the n+1th candidate instruction queue are placed in the previous candidate instruction queue, and the first, 2, 3, 4, 5 candidate instruction queues are: ⁇ c3 ⁇ , empty, empty, empty, empty.
  • the first candidate instruction queue contains the c3 instruction, from the first candidate instruction queue.
  • c3 In accordance with the priority selection instruction to do scheduling, according to the dependency, c3 must be executed after the third shot of c2, and the time interval between the first instruction slot c3 and c2 is 1, less than 3 beats, c3 instruction is not Satisfying the time delay, c3 cannot be scheduled here, so the empty operation is filled in, and the candidate instruction queue does not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first and second candidate instruction queues.
  • c3 must be executed after the third beat of c2, and placed in the second instruction slot.
  • the time interval between c3 and c2 is 2, less than 3 beats.
  • the c3 command does not satisfy the delay requirement. C3 cannot be scheduled here, so the null operation is filled in, and the candidate instruction queue does not need to be updated.
  • For the third instruction slot scheduling is performed according to the priority selection instruction from the first, second, and third candidate instruction queues.
  • c3 must be executed after the third shot of c2, and is placed in the third.
  • the time interval between instruction slots c3 and c2 is exactly 3, and the c3 instruction satisfies the delay requirement, so the c3 instruction is scheduled to delete c3 from all candidate instruction queues.
  • Check the data dependency graph relying on c3's instruction c4 so the predecessor has been scheduled, so the c4 instruction is added to the fourth candidate instruction queue.
  • the candidate instruction queues are: empty, empty, empty, ⁇ c4 ⁇ , empty.
  • scheduling is performed according to the priority selection instruction from the first, second, third, and fourth candidate instruction queues.
  • c4 must be executed after the second shot after b2 is executed. It must be executed after the second shot after c3 is executed, and the time interval between the fourth instruction slot c4 and b2 is 1 beat, and the time interval from c3 is also 1 beat, and the c4 command does not satisfy the delay requirement,
  • c4 cannot be scheduled, so the empty operation is filled in, and the candidate instruction queue does not need to be updated.
  • the first, second, third, fourth, and fifth candidate command queues are: empty, empty, empty, ⁇ c4 ⁇ , empty.
  • the command is moved between the candidate command queues, that is, the second candidate command queue is sequentially started from the second candidate command queue.
  • the instructions in the n+1th candidate instruction queue are placed in the previous candidate instruction queue, and then the first, 2, 3, 4, 5 candidate instruction queues are: empty, empty, ⁇ c4 ⁇ , empty, air.
  • scheduling is performed according to the priority selection instruction from the first candidate instruction queue. Since the first candidate instruction queue is empty, no instruction is available for scheduling, and then the null operation is filled in, and the candidate instruction queue does not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first and second candidate instruction queues. Since the first and second candidate instruction queues are empty, no instruction is available for scheduling, and then a null operation is filled in. The instruction queue does not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first, second, and third candidate instruction queues.
  • c4 must be executed after the second shot after b2 is executed, and must also be in c3.
  • the time interval between the third command slot c4 and b2 is 1 beat, and the time interval with c3 is also 1 beat.
  • the c4 command does not satisfy the delay requirement. C4 cannot be scheduled here. Then fill in the null operation, the candidate instruction queue does not need to be updated.
  • scheduling is performed according to the priority selection instruction from the first, second, third, and fourth candidate instruction queues, wherein the first and second candidate instruction queues are empty, and the third candidate instruction queue includes c4.
  • c4 must be executed after the second beat after b2 is executed, and must be executed after the second beat after c3 is executed, and placed in the fourth command slot c4 and b2.
  • the time interval with c3 is also 2 beats, and the c4 command satisfies the time delay, so it fills in c4.
  • Remove c4 from all candidate queues. Check the data dependency graph, since there are no other unscheduled instructions, so There is no new instruction to join the selected instruction queue.
  • the 1st, 2nd, 3rd, 4th, and 5th candidate command queues are: empty, empty, empty, empty, and empty.
  • the instructions on the data dependency graph have been scheduled, and the scheduling ends.
  • each shot includes 2 super long instruction words, and when the transmission width of the instruction scheduling device is 2, the specific steps are as follows:
  • the contents of the first, 2, 3, 4, and 5 candidate instruction queues are: ⁇ cO, c l, bO ⁇ , empty, empty, empty, and empty.
  • the scheduling cO and cl are respectively placed in two The first instruction slot of a very long instruction word.
  • cO and c l are dispatched, cO and c l are deleted from all candidate instruction queues, and the data dependency graph is checked to join the c2 that has been scheduled by the predecessor to the second candidate instruction queue to prepare for scheduling the second instruction slot.
  • the candidate instruction queues at this time are: ⁇ b0 ⁇ , ⁇ c2 ⁇ , empty, empty, empty.
  • the instruction is selected from the first and second candidate instruction queues.
  • c2 is dispatched preferentially, since one operand of the c2 instruction must come from
  • the number of results of c0 has a one-to-one dependency and satisfies the time delay requirement. Therefore, it is scheduled in the second instruction slot of the first very long instruction word; then b0 is scheduled to meet the time delay requirement, and the second is scheduled.
  • the second instruction slot of the very long instruction word After c2 and b0 are scheduled, c2 and b0 are deleted from all candidate instruction queues, and the data dependency graph is checked. bl and c3, which have been scheduled by the predecessor, are added to the third candidate instruction queue to prepare for scheduling the third instruction slot. .
  • the candidate instruction queues are: empty, empty, ⁇ bl, c3 ⁇ , empty, empty.
  • the instruction is selected from the first, second, and third candidate instruction queues.
  • bl is preferentially scheduled, and the time delay requirement is met. It can be placed in the third instruction slot of any super long instruction word in the beat, considering that the operand of bl must be from the number of results of b0, and has One-to-one dependency, scheduling bl in the third instruction slot of the second very long instruction word, which saves a register that stores the number of b0 results; then considers scheduling c3, c3 depends on c2, and requires at least interval c2 to execute 3 beats, here does not meet the time delay requirement, can not be scheduled in this instruction slot, and then fill in the third instruction slot of the first very long instruction word.
  • the candidate instruction queues are: empty, empty, ⁇ c3 ⁇ , ⁇ b2 ⁇ , empty.
  • the instruction is selected from the first, second, third, and fourth candidate instruction queues.
  • b2 and c3 have the same priority.
  • scheduling c3, c3 must be executed after c2 is executed for at least 3 beats.
  • the time delay requirement is not met and cannot be placed in this command slot.
  • b2 satisfying the time delay requirement, considering that the operand of b2 must come from the result number of bl and have a one-to-one dependency, then schedule b2 in the fourth instruction slot of the second very long instruction word, so You can save a register that holds the number of bl results. Then fill in the blank for the fourth instruction slot of the first very long instruction word.
  • the candidate instruction queues are: empty, empty, ⁇ c3 ⁇ , empty, empty.
  • the command is moved between the candidate command queues, that is, the second candidate command queue is sequentially started from the second candidate command queue.
  • the instructions in the n+1th candidate instruction queue are placed in the previous candidate instruction queue, and then the contents of the first, 2, 3, 4, 5 candidate instruction queues are: empty, ⁇ c3 ⁇ , empty, empty , empty.
  • the priority is selected from the first candidate instruction queue, and the first candidate instruction queue is empty, and then the null operation is filled.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, ⁇ c3 ⁇ , empty, empty, and empty.
  • the instruction is selected from the first and second candidate instruction queues, and since the c3 performs a beat from the distance c2, the time delay requirement is not satisfied, and then the null operation is filled.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, ⁇ c3 ⁇ , empty, empty, empty.
  • the candidate instruction queues For the third instruction slot of two very long instruction words, from the first, second, and third candidate instruction queues Select the command, examine c3, do not meet the time delay requirement (at this time, perform two beats from the distance c2), and then fill in the empty operation.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, ⁇ c3 ⁇ , empty, empty, and empty.
  • the fourth instruction slot of two very long instruction words select the instruction from the first, second, third, and fourth candidate instruction queues, examine c3, and satisfy the time delay requirement (at this time, perform three beats from the distance c2). The c3 is then dispatched in the fourth instruction slot of the first very long instruction word. Then fill in the empty operation for the fourth instruction slot of the second very long instruction word. C3 is removed from all candidate queues, and the data dependency graph is checked. The subsequent instruction c3 of c3 has already been scheduled, so the fifth candidate instruction queue is added.
  • the candidate command queues at this time are: empty, empty, empty, empty, ⁇ c4 ⁇ .
  • the first, second, third, fourth, and fifth candidate command queues are: empty, empty, empty, empty, ⁇ c4 ⁇ .
  • each candidate command queue is The inter-movement instruction, that is, starting from the second candidate instruction queue, sequentially placing the instruction in the second candidate instruction queue to the n+1th candidate instruction queue into the previous candidate instruction queue, and then first , 2, 3, 4, 5 candidate instruction queue contents are: empty, empty, empty, ⁇ c4 ⁇ , empty.
  • the priority is selected from the first candidate instruction queue, and the first candidate instruction queue is empty, and then the null operation is filled.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, empty, empty, ⁇ c4 ⁇ , empty.
  • the first and second candidate instruction queues are selected according to the priority, and the first and second candidate instruction queues are empty, and the empty operation is filled.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, empty, empty, ⁇ c4 ⁇ , empty.
  • the priority is selected from the first, second, and third candidate instruction queues, and the first, second, and third candidate instruction queues are empty, and then the space is filled. operating.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, empty, empty, ⁇ c4 ⁇ , empty.
  • the instruction is selected from the first, second, third, and fourth candidate instruction queues, the instruction c4 performs one shot from the distance c3, and the distance b2 performs two shots, and the time is not satisfied. Delay request, then fill in the empty operation.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, empty, empty, ⁇ c4 ⁇ , empty.
  • the 1st, 2nd, 3rd, 4th, and 5th candidate command queues are: empty, empty, empty, ⁇ c4 ⁇ , empty.
  • the command is moved between the candidate command queues, that is, the second candidate command queue is sequentially started from the second candidate command queue.
  • the instructions in the n+1th candidate instruction queue are placed in the previous candidate instruction queue, and then the contents of the first, 2, 3, 4, 5 candidate instruction queues are: empty, empty, ⁇ c4 ⁇ , empty , empty.
  • the priority is selected from the first candidate instruction queue, and the first candidate instruction queue is empty, and then the null operation is filled.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, empty, ⁇ c4 ⁇ , empty, empty.
  • the first and second candidate instruction queues are selected according to the priority, and the first and second candidate instruction queues are empty, and the empty operation is filled.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, empty, ⁇ c4 ⁇ , empty, empty.
  • the instruction c4 performs a beat from the distance c3, the distance b2 performs 2 beats, and the c4 requires the distance c3. At least 2 beats after execution and at least 2 beats after b2 is executed, so the time delay requirement is not satisfied, and then the empty operation is filled.
  • the candidate instruction queue does not change.
  • the candidate instruction queues in this order are: empty, empty, ⁇ c4 ⁇ , empty, empty.
  • the instruction is selected from the first, second, third, and fourth candidate instruction queues, and the instruction c4 performs two shots at the distance c3, and the time delay request is satisfied.
  • the fourth instruction slot of the first very long instruction word At this time, the candidate queue is empty, and then the fourth instruction slot of the second very long instruction word is filled with a null operation.
  • the 1st, 2nd, 3rd, 4th, and 5th candidate command queues are: empty, empty, empty, empty, and empty. At this point, all instructions on the data dependency graph are scheduled, and the instruction scheduling ends.
  • Step 306 Execute each instruction in the super long instruction word according to an arrangement order of each instruction in the super long instruction word.
  • 0th beat ⁇ a0, b0, c0, d0 ⁇ ⁇ e0, f0, g0, hO ⁇ ⁇ i0, j0, k0, 10 ⁇ ⁇ m0, n0, o0, pO ⁇ ;
  • 1st beat ⁇ al, bl, Cl, dl ⁇ ⁇ el, fl, gl, hi ⁇ ⁇ il, jl, kl, 11 ⁇ ⁇ ml, nl, ol, pi ⁇ ;
  • 2nd beat ⁇ a2, b2, c2, d2 ⁇ ⁇ e2, f2, G2, h2 ⁇ ⁇ i2, j2, k2, 12 ⁇ ⁇ m2, n2, o2, p2 ⁇ ;
  • 3rd beat ⁇ a3, b3, c3, d3 ⁇ ⁇ e3, f3, g3, h3 ⁇ ⁇ i3, j3, K3, 13 ⁇ ⁇ m3, n3, o3,
  • the long instruction words transmitted by the previous shot are currently executing their respective The instruction of the second instruction slot, that is, b0, f0, jO, and ⁇ are executed in parallel, and the four super long instruction words currently transmitted by the current shot are currently executing the instructions of the respective first instruction slots, that is, parallel execution of al, el, il And ml, but b0, f0, jO, and n0, and al, el, il, and ml are executed in parallel with each other, and there is no dependency.
  • the execution method of the second and third beats is the same as the first beat principle, and the present invention will not be described in detail.
  • the set of serial FUs in FIG. 5 may be the same or different functional components.
  • a group of serial FUs may have 2 adders, 1 multiplier, and 1 access. Save parts.
  • the instruction scheduling method provided by the embodiment of the invention causes the super long instruction words in the same shot to be executed in parallel, and the instruction of the tth instruction slot of any super long instruction word in the next two shots There is no dependency between the instructions of the t+1th instruction slot of any of the previous long shots, so there is no dependency when executing instructions on a multicore processor with serial features.
  • the instruction of the relationship occurs in the same shot or the next instruction that depends on this instruction precedes the execution of this instruction, which can make the processor or the pipeline run normally, and improve the correctness of the scheduling.
  • 2nd beat 2, empty operation, empty operation, empty operation ⁇ ;
  • 3rd beat ⁇ empty operation, empty operation, empty operation, empty operation ⁇ ;
  • 2nd beat ⁇ b2, c3, empty operation, empty operation ⁇ ;
  • the execution of the instruction is as follows: At 0th beat, b0 is executed; in the first beat, bl and c0 are executed in parallel, and in the second beat, b2, c2, and cl are executed in parallel.
  • the dependencies c2 and cl are executed at the same time, which may result in running errors or pipeline stalls, affecting the performance or correctness of instruction execution.
  • the instruction execution also needs to perform a null operation, except that it has no operands, no result, no actual operation, but also enters the processor for the process of addressing, decoding, and execution.
  • the sequence of instructions generated by the compiler is as follows:
  • the execution of the command is as follows: At 0th beat, c0 is executed ; the 1st shot is performed with a null operation, cl, the 2nd shot is executed in parallel with the empty operation, the empty operation and c2, and the third shot is executed in parallel with the empty operation, the empty operation , Null operation and b0, perform a null operation, a null operation, a null operation, and bl on the 4th shot.
  • the idle operation, c3 and b2 are performed on the 4th shot. Perform a no-go operation or a null operation on the 5th shot. Execute c4 at the 6th beat.
  • An embodiment of the present invention provides an instruction scheduling apparatus 60, as shown in FIG. 6, including:
  • a building unit 601 is used to construct a data dependency graph.
  • the scheduling unit 602 is configured to separately extract k instructions from the data dependency graph to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are executed in parallel. There is no dependency between the instruction of the tth instruction slot of any very long instruction word of the next shot in the next two shots and the instruction of the t+1th instruction slot of any of the super long instruction words of the previous shot.
  • the scheduling unit makes the relationship between the long instruction words in the same shot in parallel, and the instruction of the tth instruction slot of any super long instruction word in the next two shots.
  • the instructions of the t+1th instruction slot of any very long instruction word There is no dependency between the instructions of the t+1th instruction slot of any very long instruction word, so when the instruction is executed on a multi-core processor with serial functions, there is no instruction with dependency.
  • the next instruction executed by the same shot or relying on this instruction occurs before the execution of this instruction, which can make the processor or pipeline run normally, and improve the correctness of the scheduling.
  • the instruction scheduling apparatus 60 may further include:
  • the executing unit 603 is configured to execute each instruction in the super long instruction word according to an arrangement order of each instruction in the super long instruction word.
  • the instruction scheduling apparatus 60 may further include:
  • the establishing unit 604 is configured to establish n+1 candidate instruction queues, where the n+1 candidate instruction queues are the first to n+1th candidate instruction queues, respectively.
  • the initializing unit 605 is configured to initialize the n+1 candidate instruction queues, so that the n+1 candidate instruction queues are all empty.
  • the scheduling unit 60 is specifically configured to:
  • Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q ⁇ m;
  • P is an integer greater than 0,
  • the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
  • Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q ⁇ m;
  • a new instruction with zero degree of ingress is extracted from the data dependency graph to obtain a second candidate instruction queue. Perform the following steps:
  • c delete the h instructions from all candidate instruction queues; d.
  • Let q q+l, extract the newly added instruction with the current degree of zero in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue reaches There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
  • the true dependency of the instruction in the q-1th instruction slot and satisfying the time delay and the resource requirement simultaneously include: having a one-to-one dependency with the instruction in the q-1th instruction slot and simultaneously Meet time delays and resource requirements.
  • Embodiments of the present invention provide an instruction scheduling apparatus, in which a scheduling unit causes a relationship between parallel long execution words in a same shot to be executed in parallel, and a t-th of any super long instruction words in a subsequent one shot of two adjacent shots There is no dependency between the instruction of the instruction slot and the instruction of the t+1th instruction slot of any of the long-length instruction words of the previous shot. Therefore, when the instruction is executed on a multi-core processor with serial functions, It will happen that the instruction with dependency is executed in the same shot or the next instruction that depends on this instruction precedes the execution of this instruction, which can make the processor or pipeline run normally, and improve the correctness of the scheduling.
  • An embodiment of the present invention provides an instruction scheduling apparatus, as shown in FIG. 9, including:
  • a processor 901 configured to construct a data dependency graph
  • the processor 901 is further configured to separately extract k instructions from the data dependency graph to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are in parallel execution relationship. There is no dependency between the instruction of the tth instruction slot of any very long instruction word of the next shot in the adjacent two beats and the instruction of the t+1th instruction slot of any of the long shot words of the previous shot. Relationship
  • Q ⁇ A ⁇ m X n, n represents the number of a very long instruction word in the instruction slots, the n is an integer 1, m represents the number of each shot in the long instruction word, The m is an integer greater than or equal to 1, and the t is an integer greater than or equal to 1 and less than or equal to n-1.
  • the processor makes the relationship between the long instruction words in the same shot in parallel, and the instruction of the tth instruction slot of any super long instruction word in the next two shots and the previous shot.
  • the next instruction executed by the same shot or relying on this instruction occurs before the execution of this instruction, which enables processing
  • the processor 901 is further configured to:
  • Each of the super long instruction words is executed in accordance with an arrangement order of the respective instructions in the super long instruction word.
  • the processor 901 is further configured to:
  • the method further includes:
  • n+1 candidate instruction queues where the n+1 candidate instruction queues are the first to n+1th candidate instruction queues, respectively;
  • the n+1 candidate instruction queues are initialized such that the n+1 candidate instruction queues are all empty.
  • Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q ⁇ m;
  • c delete the h instructions from all candidate instruction queues; d.
  • Let q q+l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue reaches the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
  • P is an integer greater than 0,
  • the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
  • Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q ⁇ m;
  • Extracting a new instruction with zero degree of inference in the data dependency graph to obtain a second candidate instruction queue performing the following steps:
  • the processor 901 is specifically configured to:
  • the processor causes the super long instruction words in the same shot to be executed in parallel, and the tth instruction of any super long instruction words in the next two shots
  • the instruction of the slot and the instruction of the t+1th instruction slot of any of the long-length instruction words of the previous shot there is no dependency between the instruction of the slot and the instruction of the t+1th instruction slot of any of the long-length instruction words of the previous shot, so when the instruction is executed on the multi-core processor with the serial function, it will not
  • the occurrence of a dependency instruction in the same shot or the next instruction that depends on this instruction precedes the execution of this instruction, which enables the processor or pipeline to operate normally, improving the correctness of the scheduling.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separate, and the components displayed as the units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the media includes: R0M, RAM, disk or optical disk and other media that can store program code.

Abstract

Des modes de réalisation de la présente invention, qui appartiennent au domaine technique des communications, concernent un procédé et un dispositif de programmation d'instructions et permettent à un processeur ou à une ligne d'assemblage de fonctionner normalement et améliorent la précision de programmation. Le procédé consiste à : créer un graphique de dépendance de données ; extraire, du graphique de dépendance de données, k instructions pour exécuter une programmation afin d'obtenir m mots d'instruction très longs (VLIW) pour chaque cycle de telle sorte que, dans un même cycle, les VLIW puissent être exécutés en parallèle et que, pour deux cycles adjacents quelconques, il n'existe aucune dépendance entre l'instruction à l'intervalle d'instruction t d'un VLIW quelconque du dernier cycle et l'instruction à l'intervalle d'instruction (t+1) d'un VLIW quelconque du cycle précédent, où 0≤k≤m*n, n étant le nombre d'intervalles d'instruction dans un VLIW, n étant un nombre entier égal ou supérieur à 1, m étant le nombre de VLIW dans chaque cycle, m étant un nombre entier égal ou supérieur à 1, et t étant un nombre entier égal ou supérieur à 1 et inférieur ou égal à n-1. Les modes de réalisation de la présente invention concernent un procédé de programmation d'instructions et un dispositif de programmation d'instructions.
PCT/CN2014/083603 2013-08-21 2014-08-04 Procédé et dispositif de programmation d'instructions WO2015024432A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310367751.2A CN104424026B (zh) 2013-08-21 2013-08-21 一种指令调度方法及装置
CN201310367751.2 2013-08-21

Publications (1)

Publication Number Publication Date
WO2015024432A1 true WO2015024432A1 (fr) 2015-02-26

Family

ID=52483045

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/083603 WO2015024432A1 (fr) 2013-08-21 2014-08-04 Procédé et dispositif de programmation d'instructions

Country Status (2)

Country Link
CN (1) CN104424026B (fr)
WO (1) WO2015024432A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699464B (zh) * 2015-03-26 2017-12-26 中国人民解放军国防科学技术大学 一种基于依赖网格的指令级并行调度方法
CN104699466B (zh) * 2015-03-26 2017-07-18 中国人民解放军国防科学技术大学 一种面向vliw体系结构的多元启发式指令选择方法
US11275590B2 (en) 2015-08-26 2022-03-15 Huawei Technologies Co., Ltd. Device and processing architecture for resolving execution pipeline dependencies without requiring no operation instructions in the instruction memory
CN108228242B (zh) * 2018-02-06 2020-02-07 江苏华存电子科技有限公司 一种可配置且具弹性的指令调度器
CN112579272B (zh) * 2020-12-07 2023-11-14 海光信息技术股份有限公司 微指令分发方法、装置、处理器和电子设备
CN117827287A (zh) * 2022-09-29 2024-04-05 深圳市中兴微电子技术有限公司 指令级并行调度方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7114058B1 (en) * 2001-12-31 2006-09-26 Apple Computer, Inc. Method and apparatus for forming and dispatching instruction groups based on priority comparisons
US20070174599A1 (en) * 1997-08-01 2007-07-26 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems
US20120246448A1 (en) * 2011-03-25 2012-09-27 Soft Machines, Inc. Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN102799418A (zh) * 2012-08-07 2012-11-28 清华大学 融合了顺序和vliw的处理器体系结构及指令执行方法
CN102880449A (zh) * 2012-09-18 2013-01-16 中国科学院声学研究所 一种超长指令字结构下延迟槽调度方法及其系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070174599A1 (en) * 1997-08-01 2007-07-26 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems
US7114058B1 (en) * 2001-12-31 2006-09-26 Apple Computer, Inc. Method and apparatus for forming and dispatching instruction groups based on priority comparisons
US20120246448A1 (en) * 2011-03-25 2012-09-27 Soft Machines, Inc. Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN102799418A (zh) * 2012-08-07 2012-11-28 清华大学 融合了顺序和vliw的处理器体系结构及指令执行方法
CN102880449A (zh) * 2012-09-18 2013-01-16 中国科学院声学研究所 一种超长指令字结构下延迟槽调度方法及其系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHI, LEI ET AL.: "Branch Scheduling Optimization on VLIW Processors", COMPUTER ENGINEERING AND APPLICATIONS, vol. 48, no. 21, 31 December 2012 (2012-12-31), pages 41 *

Also Published As

Publication number Publication date
CN104424026B (zh) 2017-11-17
CN104424026A (zh) 2015-03-18

Similar Documents

Publication Publication Date Title
US11262787B2 (en) Compiler method
US10936008B2 (en) Synchronization in a multi-tile processing array
US20220253399A1 (en) Instruction Set
WO2015024432A1 (fr) Procédé et dispositif de programmation d'instructions
US10963003B2 (en) Synchronization in a multi-tile processing array
US11416440B2 (en) Controlling timing in computer processing
US10817459B2 (en) Direction indicator
US20220197857A1 (en) Data exchange pathways between pairs of processing units in columns in a computer
WO2022036690A1 (fr) Appareil de calcul de graphe, procédé de traitement et dispositif associé
Walk et al. Out-of-order execution within functional units of the SCAD architecture
US20200201794A1 (en) Scheduling messages
CN115543448A (zh) 数据流架构上的指令动态调度方法、数据流架构
Repouskos The Dataflow Computational Model And Its Evolution
Wang et al. Opportunities and challenges in process-algebraic verification of asynchronous circuit designs
Ye et al. Optimizing message passing programs based on task section duplication
Prakash et al. CSE 48: Optimality of Tomasulo's Algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14838079

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14838079

Country of ref document: EP

Kind code of ref document: A1