WO2015024432A1 - Instruction scheduling method and device - Google Patents

Instruction scheduling method and device Download PDF

Info

Publication number
WO2015024432A1
WO2015024432A1 PCT/CN2014/083603 CN2014083603W WO2015024432A1 WO 2015024432 A1 WO2015024432 A1 WO 2015024432A1 CN 2014083603 W CN2014083603 W CN 2014083603W WO 2015024432 A1 WO2015024432 A1 WO 2015024432A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
candidate
instructions
slot
queue
Prior art date
Application number
PCT/CN2014/083603
Other languages
French (fr)
Chinese (zh)
Inventor
黄磊
连瑞琦
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015024432A1 publication Critical patent/WO2015024432A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level

Abstract

An instruction scheduling method and device for scheduling instructions, relating to the field of communications, that enable a processor or an assembly line to function normally and enhance the accuracy of scheduling. The method comprises: establishing a data dependency graph; extracting from the data dependency graph k instructions to conduct scheduling to obtain m very long instruction words (VLIW) for each cycle such that the VLIW in a same cycle can be executed parallelly and that for any two adjacent cycles, there is no dependency between the instruction at the t-th instruction slot of any VLIW in the latter cycle and the instruction at the (t+1)-th instruction slot of any VLIW in the former cycle, where 0≤k≤m*n, n is the number of instruction slots in a VLIW, n is an integer greater than or equal to 1, m is the number of VLIW in each cycle, m is an integer greater than or equal to 1, and t is an integer greater than or equal to 1 and smaller than or equal to n-1.

Description

一种指令调度方法及装置 本申请要求于 2013 年 8 月 21 日提交中国专利局、 申请号为 201310367751. 2、发明名称为 "一种指令调度方法及装置" 的中国专利申请 的优先权, 其全部内容通过引用结合在本申请中。  The present invention claims to be filed on August 21, 2013, the Chinese Patent Office, the application number is 201310367751. 2. The priority of the Chinese patent application entitled "A Command Scheduling Method and Apparatus" is The entire contents are incorporated herein by reference.
技术领域  Technical field
本发明涉及通信领域, 尤其涉及一种指令调度方法及装置。  The present invention relates to the field of communications, and in particular, to an instruction scheduling method and apparatus.
背景技术  Background technique
现有技术中, CPU (Central Proces s ing Uni t , 中央处理器)中的各 功能部件通常是独立并行的, 因此编译器基于 CPU结构采用指令调度方 法提高指令级并行。 其中, 指令调度是一种指令并行执行的技术, 编译 器或者机器硬件通过调整指令的顺序来提高每拍内机器执行指令的数 量,所述拍为编译器在编译源程序时所模拟的机器执行指令的时钟周期。 现有编译技术中通常采用表调度算法来实现指令调度, 通常采用一个候 选指令队列。 具体的, 在进行指令调度时, 首先对需要调度的指令构建 数据依赖图, 该数据依赖图由若干个节点组成, 每个节点代表一条指令, 该数据依赖图可以用来表示指令之间的依赖关系。 然后计算各条指令的 优先级, 接着逐拍对数据依赖图中的指令进行调度。 指令调度初始时刻, 从所述数据依赖图中找出入度为零的指令加入到候选指令队列; 并将其 他候选指令队列置为空; 具体的, 每拍的调度方法为: 按照指令优先级 依次从候选指令队列中选择指令填入指令槽, 并更新候选指令队列; 对 于未能选择到指令填入的指令槽, 填入空操作指令; 当调度完一拍内的 指令槽后, 更新拍, 更新所述候选指令队列, 重复上述歩骤进行一拍内 的调度, 直到数据依赖图中的所有指令都完成调度则结束。  In the prior art, the functional components in the CPU (Central Proces s UniT) are usually independent and parallel, so the compiler uses the instruction scheduling method to improve the instruction level parallelism based on the CPU structure. The instruction scheduling is a technique in which instructions are executed in parallel, and the compiler or the machine hardware increases the number of execution instructions of the machine per beat by adjusting the order of the instructions, which is the machine execution simulated by the compiler when compiling the source program. The clock cycle of the instruction. In the existing compilation technology, a table scheduling algorithm is usually used to implement instruction scheduling, and a candidate instruction queue is usually used. Specifically, when performing instruction scheduling, first constructing a data dependency graph for an instruction that needs to be scheduled, the data dependency graph is composed of a plurality of nodes, each node represents an instruction, and the data dependency graph can be used to represent a dependency between the instructions. relationship. The priority of each instruction is then calculated, and then the instructions in the data dependency graph are scheduled on a beat-by-shot basis. At the initial time of the instruction scheduling, the instruction that finds zero degree in the data dependency graph is added to the candidate instruction queue; and the other candidate instruction queues are set to be empty; specifically, the scheduling method for each beat is: according to the instruction priority Selecting an instruction from the candidate instruction queue to fill in the instruction slot in turn, and updating the candidate instruction queue; filling the instruction instruction slot for failing to select the instruction slot filled in by the instruction; updating the beat after scheduling the instruction slot within one beat Updating the candidate instruction queue, repeating the above steps to perform scheduling within one beat, and ending until all instructions in the data dependency graph complete the scheduling.
随着多核处理器的出现, 多核处理器是由多个单核处理器组成的, 其中单核的结构趋于简单, 出现了串行的功能部件的组织形式, 甚至是 功能部件阵列。 如果使用现有技术中表调度的指令调度方法完成在多核 处理器上的指令调度, 在执行指令时就可能出现具有依赖关系的指令在 同一拍执行或者依赖本条指令的下一条指令先于本条指令执行的情况发 生, 这些情况可能导致处理器运行错或者流水线的停顿, 调度的正确性 较低。  With the advent of multi-core processors, multi-core processors are composed of multiple single-core processors. The structure of a single core tends to be simple, and the organization of serial functional components, even functional component arrays. If the instruction scheduling method on the multi-core processor is completed by using the table scheduling method in the prior art, when the instruction is executed, the instruction having the dependency may be executed in the same shot or the next instruction that depends on the instruction precedes the instruction. Execution occurs, which can cause the processor to run incorrectly or the pipeline to stall, and the scheduling is less correct.
发明内容 本发明的实施例提供一种信息显示方法及设备, 能够 Summary of the invention Embodiments of the present invention provide an information display method and device, which are capable of
为达到上述目的, 本发明的实施例采用如下技术方案:  In order to achieve the above object, the embodiment of the present invention adopts the following technical solutions:
第一方面, 提供一种指令调度方法, 应用于指令调度装置, 包括: 构建数据依赖图;  In a first aspect, an instruction scheduling method is provided, which is applied to an instruction scheduling apparatus, and includes: constructing a data dependency graph;
分别从所述数据依赖图中提取 k个指令进行调度得到每一拍的 m个 超长指令字, 使得同一拍内的超长指令字之间是并行执行的关系, 相邻 两拍中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一 超长指令字的第 t + 1个指令槽的指令之间不存在依赖关系;  Extracting k instructions from the data dependency graph to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are executed in parallel, and the next two shots are followed by one. There is no dependency between the instruction of the tth instruction slot of any of the very long instruction words of the previous shot and the instruction of the t+1th instruction slot of any of the super long instruction words of the previous shot;
其中, 0≤A≤mxn, 所述 n表示一个超长指令字中指令槽的个数, 所 述 n为大于等于 1的整数, 所述 m表示每拍中超长指令字的个数, 所述 m为大于等于 1的整数, 所述 t为大于等于 1小于等于 n- 1的整数。 Wherein, 0 ≤ A ≤ mx n , the n represents the number of instruction slots in a very long instruction word, the n is an integer greater than or equal to 1, and the m represents the number of super long instruction words per beat, m is an integer greater than or equal to 1, and t is an integer greater than or equal to 1 and less than or equal to n-1.
结合第一方面, 在第一种可实现方式中, 在所述分别从所述数据依 赖图中提取 k个指令进行调度得到每一拍的 m个超长指令字之后, 所述 方法还包括:  With reference to the first aspect, in a first implementation manner, after the k instructions are respectively extracted from the data dependency graph for scheduling to obtain m super long instruction words for each beat, the method further includes:
按照所述超长指令字中各个指令的排列顺序执行所述超长指令字中 的各个指令。  Each of the super long instruction words is executed in accordance with an arrangement order of the respective instructions in the super long instruction word.
结合第一方面、 第一种可实现方式, 在第二种可实现方式中, 在构建数据依赖图之后, 所述方法还包括:  In combination with the first aspect, the first implementation manner, in the second implementation manner, after the data dependency graph is constructed, the method further includes:
建立 n+ 1个候选指令队列, 所述 n+ 1个候选指令队列分别为第 1至 第 n+ 1候选指令队列;  Establishing n+1 candidate instruction queues, wherein the n+1 candidate instruction queues are the first to n+1th candidate instruction queues, respectively;
初始化所述 n+ 1个候选指令队列, 使所述 n+ 1个候选指令队列均为 空。  The n+1 candidate instruction queues are initialized such that the n+1 candidate instruction queues are all empty.
结合第二种可实现方式, 在第三种可实现方式中, 所述分别从所述 数据依赖图中提取 k个指令进行调度得到每一拍的 m个超长指令字, 使 得同一拍内的超长指令字之间是并行执行的关系, 相邻两拍中后一拍的 任一超长指令字的第 t个指令槽的指令与前一拍的任一超长指令字的第 t + 1个指令槽的指令之间不存在依赖关系包括:  In combination with the second achievable manner, in a third implementation manner, the extracting k instructions from the data dependency graph respectively to generate m super long instruction words for each beat, so that the same shot The relationship between the long instruction words is parallel execution, and the instruction of the tth instruction slot of any super long instruction word of the next two shots is the t + of the super long instruction word of the previous shot. There are no dependencies between the instructions of one instruction slot:
在进行第 0拍调度时,  When performing the 0th beat schedule,
在所述数据依赖图中提取当前入度为零的指令得到第 1候选指令队 列, 所述入度为零的指令在所述数据依赖图中的没有前驱结点或者其所 有前驱结点已被调度; 从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源 需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所 述超长指令字的第 1个指令槽中尚未填充的指令槽中放入空操作指令, ≤h≤m . 从所述第 1候选指令队列中删除所述 h个指令; Extracting an instruction with a current degree of zero in the data dependency graph to obtain a first candidate instruction queue, wherein the instruction with zero degree of ingress has no precursor node in the data dependency graph or all of its precursor nodes have been Scheduling Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words a null operation instruction is placed in the instruction slot that has not been filled in the first instruction slot, ≤h≤m. The h instructions are deleted from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令 队列;  Extracting a new instruction with zero degree of ingress in the data dependency graph to obtain a second candidate instruction queue;
执行下述歩骤, q初始化为 2 ;  Perform the following steps, q is initialized to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指 令分别放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与 第 q-1个指令槽中的指令具有真依赖关系且同时满足时间延迟和资源需 求, 或, 不与所述第 q-1个指令槽中的指令具有真依赖关系, 但优先级 最高且同时满足时间延迟和资源需求, 0≤ m ;  a. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth - The instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies the time Delay and resource requirements, 0 ≤ m;
b. 在每个超长指令字的第 q 个指令槽中的未填充的指令槽中放入 空操作指令;  b. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c 从所有候选指令队列中删除所述 h个指令;  c delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到 第 q候选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所 述第 q候选指令队列中不存在未调度的指令或第 n+ 1候选指令队列中的 指令被更新。  d. Let q=q+l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue reaches the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
结合第三种可实现方式, 在第四种可实现方式中, 所述分别从所述 数据依赖图中提取 k个指令进行调度得到每一拍的 m个超长指令字, 使 得同一拍内的超长指令字之间是并行执行的关系, 相邻两拍中后一拍的 任一超长指令字的第 t个指令槽的指令与前一拍的任一超长指令字的第 t + 1个指令槽的指令之间不存在依赖关系还包括:  In conjunction with the third achievable manner, in a fourth implementable manner, the extracting k instructions from the data dependency graph to perform scheduling to obtain m super long instruction words for each beat, so that the same shot The relationship between the long instruction words is parallel execution, and the instruction of the tth instruction slot of any super long instruction word of the next two shots is the t + of the super long instruction word of the previous shot. There is no dependency between the instructions of one instruction slot.
在进行第 P拍调度时, P为大于 0的整数,  When performing the P-th beat scheduling, P is an integer greater than 0,
从所述第 2候选指令队列开始, 依次将所述第 2候选指令队列至所 述第 n+1候选指令队列中的指令放入前一个候选指令队列中;  Starting from the second candidate instruction queue, the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源 需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所 述超长指令字的第 1个指令槽中尚未填充的指令槽中放入空操作指令, ≤h≤m . 从所述第 1候选指令队列中删除所述 h个指令; Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words Put a null operation instruction into the instruction slot that has not been filled in the first instruction slot. ≤ h ≤ m. deleting the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令 队列;  Extracting a new instruction with zero degree of ingress in the data dependency graph to obtain a second candidate instruction queue;
执行下述歩骤:  Perform the following steps:
执行下述歩骤, q初始化为 2 ;  Perform the following steps, q is initialized to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指 令分别放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与 第 q- 1个指令槽中的指令具有真依赖关系且同时满足时间延迟和资源需 求, 或, 不与所述第 q- 1个指令槽中的指令具有真依赖关系, 但优先级 最高且同时满足时间延迟和资源需求, 0≤ m ;  a. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth - an instruction in one instruction slot has a true dependency and satisfies both time delay and resource requirements, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies the time Delay and resource requirements, 0 ≤ m;
b. 在每个超长指令字的第 q 个指令槽中的未填充的指令槽中放入 空操作指令;  b. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c 从所有候选指令队列中删除所述 h个指令;  c delete the h instructions from all candidate instruction queues;
d.使 q=q+ l, 在所述数据依赖图中提取新增的入度为零的指令得到 第 q候选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所 述第 q候选指令队列中不存在未调度的指令或第 n+ 1候选指令队列中的 指令被更新。  d. Let q=q+ l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue is to the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
结合第三、 四种可实现方式, 在第五种可实现方式中,  In combination with the third and fourth achievable modes, in the fifth achievable manner,
所述与第 q-1个指令槽中的指令具有的真依赖关系且同时满足时间延迟和 资源需求包括:  The true dependencies with the instructions in the q-1th instruction slot and satisfying both the time delay and the resource requirements include:
与第 q-1个指令槽中的指令具有一对一依赖关系且同时满足时间延迟和资 源需求。  It has a one-to-one dependency on the instructions in the q-1th instruction slot and satisfies both time delay and resource requirements.
第二方面, 提供一种指令调度装置, 包括:  In a second aspect, an instruction scheduling apparatus is provided, including:
构建单元, 用于构建数据依赖图;  a building block for building a data dependency graph;
调度单元, 用于分别从所述数据依赖图中提取 k个指令进行调度得到每一 拍的 m个超长指令字, 使得同一拍内的超长指令字之间是并行执行的关系, 相 邻两拍中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一超长 指令字的第 t+1个指令槽的指令之间不存在依赖关系; 其中, Q≤A≤mX n, 所述 n表示一个超长指令字中指令槽的个数, 所述 n 为大于等于 1的整数, 所述 m表示每拍中超长指令字的个数, 所述 m为大于等 于 1的整数, 所述 t为大于等于 1小于等于 n-1的整数。 a scheduling unit, configured to respectively extract k instructions from the data dependency graph for scheduling to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are in parallel execution relationship, adjacent There is no dependency between the instruction of the tth instruction slot of any of the super long instruction words of the next shot in the two shots and the instruction of the t+1th instruction slot of any of the super long instruction words of the previous shot; Wherein, Q≤A≤m X n , the n represents the number of instruction slots in a very long instruction word, the n is an integer greater than or equal to 1, and the m represents the number of super long instruction words per beat. The m is an integer greater than or equal to 1, and the t is an integer greater than or equal to 1 and less than or equal to n-1.
结合第二方面, 在第一种可实现方式中, 所述指令调度装置还包括: 执行单元, 用于按照所述超长指令字中各个指令的排列顺序执行所述超长 指令字中的各个指令。  With reference to the second aspect, in a first implementation manner, the instruction scheduling apparatus further includes: an executing unit, configured to execute each of the super long instruction words according to an arrangement order of each instruction in the super long instruction word instruction.
结合第二方面、 第一种可实现方式, 在第二种可实现方式中,  In combination with the second aspect, the first achievable manner, in the second achievable manner,
所述指令调度装置还包括:  The instruction scheduling apparatus further includes:
建立单元, 用于建立 n+1个候选指令队列, 所述 n+1个候选指令队列分别 为第 1至第 n+1候选指令队列;  An establishing unit, configured to establish n+1 candidate instruction queues, where the n+1 candidate instruction queues are first to n+1th candidate instruction queues, respectively;
初始化单元, 用于初始化所述 n+1个候选指令队列, 使所述 n+1个候选指 令队列均为空。  And an initializing unit, configured to initialize the n+1 candidate instruction queues, so that the n+1 candidate instruction queues are all empty.
结合第二种可实现方式,在第三种可实现方式中,所述调度单元具体用于: 在进行第 0拍调度时,  In combination with the second achievable manner, in a third implementation manner, the scheduling unit is specifically configured to: when performing the 0th beat scheduling,
在所述数据依赖图中提取当前入度为零的指令得到第 1候选指令队列, 所 述入度为零的指令在所述数据依赖图中的没有前驱结点或者其所有前驱结点已 被调度;  Extracting an instruction with a current degree of zero in the data dependency graph to obtain a first candidate instruction queue, wherein the instruction with zero degree of ingress has no precursor node in the data dependency graph or all of its precursor nodes have been Scheduling
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ;  Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令;  And deleting the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队列; 执行下述歩骤, q初始化为 2 ;  Extracting a new instruction with zero indegree in the data dependency graph to obtain a second candidate instruction queue; performing the following steps, q is initialized to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟和 资源需求, 0≤ ≤m ; a. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth - The instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies the time Delay and Resource demand, 0 ≤ ≤ m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令;  b. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c 从所有候选指令队列中删除所述 h个指令;  c delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。  d. Let q=q+l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue reaches the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
结合第三种可实现方式,在第四种可实现方式中,所述调度单元具体用于: 在进行第 P拍调度时, P为大于 0的整数,  In combination with the third achievable manner, in the fourth implementation manner, the scheduling unit is specifically configured to: when performing the P-th scheduling, P is an integer greater than 0,
从所述第 2候选指令队列开始,依次将所述第 2候选指令队列至所述第 n+1 候选指令队列中的指令放入前一个候选指令队列中;  Starting from the second candidate instruction queue, the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ;  Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令;  And deleting the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队列; 执行下述歩骤:  Extracting a new instruction with zero degree of inference in the data dependency graph to obtain a second candidate instruction queue; performing the following steps:
执行下述歩骤, q初始化为 2 ;  Perform the following steps, q is initialized to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟和 资源需求, 0≤ ≤m ;  a. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth - The instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies the time Delay and resource requirements, 0 ≤ ≤ m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令;  b. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c 从所有候选指令队列中删除所述 h个指令;  c delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。 d. Let q=q+l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate The instruction queue is selected, and steps a to d are repeated until there is no unscheduled instruction in the first candidate instruction queue to the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
结合第三、 四种可实现方式, 在第五种可实现方式中,  In combination with the third and fourth achievable modes, in the fifth achievable manner,
所述与第 q-1个指令槽中的指令具有的真依赖关系且同时满足时间延迟和 资源需求包括:  The true dependencies with the instructions in the q-1th instruction slot and satisfying both the time delay and the resource requirements include:
与第 q-1个指令槽中的指令具有一对一依赖关系且同时满足时间延迟和资 源需求。  It has a one-to-one dependency on the instructions in the q-1th instruction slot and satisfies both time delay and resource requirements.
本发明实施例提供一种指令调度方法及装置, 包括: 构建数据依赖图; 分 别从所述数据依赖图中提取 k个指令进行调度得到每一拍的 m个超长指令字, 使得同一拍内的超长指令字之间是并行执行的关系, 相邻两拍中后一拍的任一 超长指令字的第 t个指令槽的指令与前一拍的任一超长指令字的第 t+1个指令 槽的指令之间不存在依赖关系; 其中, Q≤^≤mxn, 所述 n表示一个超长指令 字中指令槽的个数, 所述 n为大于等于 1的整数, 所述 m表示每拍中超长指令 字的个数, 所述 m为大于等于 1的整数, 所述 t为大于等于 1小于等于 n-1的 整数。 这样一来, 使得同一拍内的超长指令字之间是并行执行的关系, 相邻两 拍中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一超长指令 字的第 t+1个指令槽的指令之间不存在依赖关系, 因此在具有串行功能部件的 多核处理器上执行指令时, 就不会出现具有依赖关系的指令在同一拍执行或者 依赖本条指令的下一条指令先于本条指令执行的情况发生, 能够使得处理器或 流水线正常运行, 提高了调度的正确性。  An embodiment of the present invention provides an instruction scheduling method and apparatus, including: constructing a data dependency graph; extracting k instructions from the data dependency graph to perform scheduling to obtain m super long instruction words for each beat, so that the same shot is performed within the same shot The long instruction word is a parallel execution relationship. The instruction of the tth instruction slot of any super long instruction word in the next two shots is the tth of any super long instruction word of the previous shot. There is no dependency between the instructions of the +1 instruction slot; where Q ≤ ^ ≤ mxn, the n represents the number of instruction slots in a very long instruction word, and the n is an integer greater than or equal to 1, m represents the number of super long instruction words per beat, the m is an integer greater than or equal to 1, and the t is an integer greater than or equal to 1 and less than or equal to n-1. In this way, the relationship between the long instruction words in the same shot is performed in parallel, and the instruction of the tth instruction slot of any super long instruction word in the next two shots is the same as the previous shot. There is no dependency between the instructions of the t+1th instruction slot of a very long instruction word, so when executing an instruction on a multi-core processor with serial functions, there will be no instructions with dependencies in the same shot. The next instruction that executes or depends on this instruction occurs before the execution of this instruction, which enables the processor or pipeline to operate normally, improving the correctness of the scheduling.
附图说明  DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付 出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图 1为本发明实施例提供的一种指令调度方法流程示意图;  FIG. 1 is a schematic flowchart of an instruction scheduling method according to an embodiment of the present invention;
图 2为本发明实施例提供的一种数据依赖图的示意图;  2 is a schematic diagram of a data dependency graph according to an embodiment of the present invention;
图 3为本发明实施例提供的另一种指令调度方法流程示意图; 图 4为本发明实施例提供的另一种数据依赖图的示意图; FIG. 3 is a schematic flowchart of another instruction scheduling method according to an embodiment of the present disclosure; 4 is a schematic diagram of another data dependency graph according to an embodiment of the present invention;
图 5为本发明实施例提供的一种指令发射执行示意图;  FIG. 5 is a schematic diagram of execution of command transmission according to an embodiment of the present invention;
图 6为本发明实施例提供的一种指令调度装置结构示意图;  FIG. 6 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present disclosure;
图 7为本发明实施例提供的另一种指令调度装置结构示意图;  FIG. 7 is a schematic structural diagram of another instruction scheduling apparatus according to an embodiment of the present disclosure;
图 8为本发明实施例提供的又一种指令调度装置结构示意图;  FIG. 8 is a schematic structural diagram of still another instruction scheduling apparatus according to an embodiment of the present disclosure;
图 9为本发明实施例提供的又再一种指令调度装置结构示意图。  FIG. 9 is a schematic structural diagram of still another instruction scheduling apparatus according to an embodiment of the present invention.
具体实施方式  detailed description
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。  BRIEF DESCRIPTION OF THE DRAWINGS The technical solutions in the embodiments of the present invention will be described in detail with reference to the accompanying drawings. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without the creative work are all within the scope of the present invention.
本发明实施例提供一种指令调度方法, 应用于指令调度装置,如图 1所示, 包括:  The embodiment of the invention provides an instruction scheduling method, which is applied to an instruction scheduling device, as shown in FIG. 1 , and includes:
歩骤 101、 构建数据依赖图。  Step 101: Construct a data dependency graph.
在本发明实施例中,所述数据依赖图可以为 DAG ( Directed acycl ic graph , 有向无环图),所述数据依赖图的构建方法与现有技术相同, 本发明对此不做赘 述。  In the embodiment of the present invention, the data dependency graph may be a DAG (Directed acyclic ic graph), and the method for constructing the data dependency graph is the same as the prior art, which is not described in the present invention.
歩骤 102、分别从所述数据依赖图中提取 k个指令进行调度得到每一拍的 m 个超长指令字, 使得同一拍内的超长指令字之间是并行执行的关系, 相邻两拍 中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一超长指令字 的第 t+1个指令槽的指令之间不存在依赖关系。  Step 102: Extract k instructions from the data dependency graph to perform scheduling to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are in parallel execution relationship, and adjacent two There is no dependency between the instruction of the tth instruction slot of any very long instruction word after one shot and the instruction of the t+1th instruction slot of any super long instruction word of the previous shot.
所述依赖关系可以包括: 正相关、 反相关及输出相关, 所述正相关的依赖 关系也称为真依赖关系, 所述真依赖关系包括一对一依赖关系、 多对一依赖关 系、 一对多依赖关系以及多对多依赖关系。 所述一对一依赖关系为存在先后顺 序的两条指令, 前一条的结果数仅为后面一条指令所使用, 而所述后面一条指 令的某一个操作数确定是由前面一条所定义的。 所述多对一依赖关系为存在先 后顺序的多条指令, 前面多条的结果数仅为后面一条指令所使用, 而所述后面 一条指令的某一个操作数确定是由前面多条所定义的。 所述一对多依赖关系为 存在先后顺序的多条指令, 前面一条的结果数为后面多条指令所使用, 而所述 后面多条指令的某一个操作数确定是由前面一条所定义的。 所述多对多依赖关 系为存在先后顺序的多条指令, 前面多条的结果数为后面多条指令所使用, 而 所述后面多条指令的某一个操作数是由前面多条指令所定义的。 The dependencies may include: positive correlation, inverse correlation, and output correlation, and the positive correlation dependencies are also referred to as true dependencies, and the true dependencies include one-to-one dependencies, many-to-one dependencies, and a pair Multiple dependencies and many-to-many dependencies. The one-to-one dependency relationship is two instructions in the order of existence, the result number of the previous one is only used by the latter instruction, and one of the operands of the latter instruction is determined by the previous one. The many-to-one dependency is a plurality of instructions having a sequential order, and the number of results of the previous plurality of instructions is only used by the latter instruction, and an operand of the subsequent instruction is determined by the previous plurality of instructions. . The one-to-many dependency is There are multiple instructions in the order, the result number of the previous one is used by the following multiple instructions, and one of the operands of the subsequent multiple instructions is determined by the previous one. The many-to-many dependency is a plurality of instructions having a sequential order, and the number of results of the preceding plurality of instructions is used by a plurality of subsequent instructions, and an operand of the plurality of subsequent instructions is defined by a plurality of instructions preceding of.
其中, Q≤^≤mX n, 所述 n表示一个超长指令字中指令槽的个数, 所述 n 为大于等于 1的整数, 所述 m为大于等于 1的整数, 所述 t为大于等于 1小于 等于 n-1的整数。 Wherein, Q≤^≤m X n , the n represents the number of instruction slots in a very long instruction word, the n is an integer greater than or equal to 1, the m is an integer greater than or equal to 1, and the t is An integer greater than or equal to 1 and less than or equal to n-1.
需要说明的是, 所述指令调度装置可以为编译器, 该指令调度方法适用于 具有串行功能部件处理器的编译器的指令调度。 该指令调度装置以拍为单位进 行指令调度, 每拍包含 m个超长指令字, 即所述指令调度装置的发射宽度为 m, 每个超长指令字包括 n个指令槽, 即能放入 n个指令。 本实施例中的超长指令 字为 VLIW (Very Long Instruction Word, 超长指令字), 是利用指令级并行 的一种体系架构。  It should be noted that the instruction scheduling apparatus may be a compiler, and the instruction scheduling method is applicable to instruction scheduling of a compiler having a serial function processor. The instruction scheduling device performs instruction scheduling in units of beats, each shot includes m super long instruction words, that is, the instruction scheduling device has a transmission width of m, and each super long instruction word includes n instruction slots, that is, can be placed n instructions. The ultra-long instruction word in this embodiment is VLIW (Very Long Instruction Word), which is an architecture that utilizes instruction-level parallelism.
这样一来, 使得同一拍内的超长指令字之间是并行执行的关系, 相邻两拍 中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一超长指令字 的第 t+1个指令槽的指令之间不存在依赖关系, 因此在具有串行功能部件的多 核处理器上执行指令时, 就不会出现具有依赖关系的指令在同一拍执行或者依 赖本条指令的下一条指令先于本条指令执行的情况发生, 能够使得处理器或流 水线正常运行, 提高了调度的正确性。  In this way, the relationship between the long instruction words in the same shot is performed in parallel, and the instruction of the tth instruction slot of any super long instruction word in the next two shots is the same as the previous shot. There is no dependency between the instructions of the t+1th instruction slot of a very long instruction word, so when executing an instruction on a multi-core processor with serial functions, there will be no instructions with dependencies in the same shot. The next instruction that executes or depends on this instruction occurs before the execution of this instruction, which enables the processor or pipeline to operate normally, improving the correctness of the scheduling.
特别的, 本发明实施例提供的指令调度方法, 为了使得同一拍内的超长指 令字之间是并行执行的关系, 相邻两拍中后一拍的任一超长指令字的第 t个指 令槽的指令与前一拍的任一超长指令字的第 t+1个指令槽的指令之间不存在依 赖关系, 可以通过建立多个候选指令队列来实现指令的调度, 示例的, 在构建 数据依赖图之后, 可以建立 n+1个候选指令队列, 所述 n+1个候选指令队列分 别为第 1至第 n+1候选指令队列; 然后初始化所述 n+1个候选指令队列, 使所 述 n+1个候选指令队列均为空。  In particular, the instruction scheduling method provided by the embodiment of the present invention, in order to make the long-length instruction words in the same beat have a parallel execution relationship, the t-th of any super long instruction word of the next two shots in the adjacent two beats There is no dependency between the instruction of the instruction slot and the instruction of the t+1th instruction slot of any of the super long instruction words of the previous shot. The instruction scheduling can be implemented by establishing a plurality of candidate instruction queues. After constructing the data dependency graph, n+1 candidate instruction queues may be established, and the n+1 candidate instruction queues are respectively the first to n+1th candidate instruction queues; then the n+1 candidate instruction queues are initialized, The n+1 candidate instruction queues are all empty.
其中, 所述入度为零的指令在所述数据依赖图中的没有前驱结点或者其所 有前驱结点已被调度。 在本实施例中, 所述已被调度的指令指的是已放入超长 指令字的指令槽中的指令。 示例的, 指令 a的前驱结点为数据依赖图上所有指 向指令 a的有向箭头的反向端的结点。 所述指令 a入度为零指指令 a在数据依 赖图上没有前驱结点或者其前驱结点已被调度。 如图 2所示, 本实施例中, 数 据依赖图为有向无环图, 由一组节点和连接节点的有向无环边组成。 在该指令 调度方法的数据依赖图中, 各节点可以表示机器指令, 有向无环边代表指令之 间的依赖关系。 所述依赖关系有正相关、 反相关及输出相关, 所述正相关也称 真依赖关系。 所述各节点的边上标有表示依赖的权值信息, 即延迟, 该延迟信 息表示前一条指令发射到后一条指令发射必须间隔的时间。 如图 2中所示的 1 表示指令 al发射到指令 a2发射必须间隔的时间为 1个时钟周期。 同理可知图 2中 2表示指令 aO发射到指令 a2发射必须间隔的时间为 2个时钟周期, 图 2 中 3表示指令 a2发射到指令 a3发射必须间隔的时间为 3个时钟周期。 同时, 所述有向无环边为有向箭头形式时, 所述有向箭头表示指令间的依赖关系, 该 有向箭头由前驱指令指向后继指令, 即后继指令的执行依赖于前驱指令, 如 aO 为 a2的前驱指令, a2为 aO的后继指令。 Wherein the instruction with zero degree of entry has no precursor node or its location in the data dependency graph There are predecessor nodes that have been scheduled. In this embodiment, the already scheduled instruction refers to an instruction that has been placed in an instruction slot of a very long instruction word. For example, the precursor node of instruction a is the node of all the opposite ends of the directed arrow pointing to instruction a on the data dependency graph. The instruction a degree of zero means that the instruction a has no precursor node on the data dependency graph or its precursor node has been scheduled. As shown in FIG. 2, in this embodiment, the data dependency graph is a directed acyclic graph, which is composed of a group of nodes and a directed acyclic edge of the connected node. In the data dependency graph of the instruction scheduling method, each node can represent a machine instruction, and the directed acyclic edge represents a dependency relationship between the instructions. The dependencies have a positive correlation, an inverse correlation, and an output correlation, and the positive correlation is also referred to as a true dependency. The edge of each node is marked with weight information indicating the dependency, that is, the delay, and the delay information indicates the time interval between the transmission of the previous instruction and the transmission of the next instruction. A 1 as shown in FIG. 2 indicates that the instruction a1 is transmitted to the instruction a2 and the transmission must be separated by 1 clock cycle. Similarly, 2 in Fig. 2 indicates that the time interval between the transmission of the instruction aO and the transmission of the instruction a2 must be 2 clock cycles, and 3 in Fig. 2 indicates that the time interval between the transmission of the instruction a2 and the transmission of the instruction a3 must be 3 clock cycles. Meanwhile, when the directed acyclic edge is in the form of a directed arrow, the directed arrow indicates a dependency relationship between instructions, and the directed arrow is directed to the subsequent instruction by the predecessor instruction, that is, the execution of the subsequent instruction depends on the predecessor instruction, such as aO is the precursor command of a2, and a2 is the successor instruction of aO.
在进行第 0拍调度时,  When performing the 0th beat schedule,
在所述数据依赖图中提取当前入度为零的指令得到第 1候选指令队列, 所 述入度为零的指令在所述数据依赖图中的没有前驱结点或者其所有前驱结点已 被调度;  Extracting an instruction with a current degree of zero in the data dependency graph to obtain a first candidate instruction queue, wherein the instruction with zero degree of ingress has no precursor node in the data dependency graph or all of its precursor nodes have been Scheduling
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ;  Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令;  And deleting the h instructions from the first candidate instruction queue;
需要说明的是, 由于在进行第 0拍调度时, 调度了所述 h个指令, 并且从 所述第 1候选指令队列中删除所述 h个指令, 相应的, 所述 h个指令为已调度 指令, 因此在数据依赖图中出现了在所述 h个指令被调度后入度为零的指令, 即新增的入度为零的指令, 这些新增的入度为零的指令不在第 1候选队列中, 因此可以在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队 列。 It should be noted that, when the 0th beat scheduling is performed, the h instructions are scheduled, and the h instructions are deleted from the first candidate instruction queue, and correspondingly, the h instructions are scheduled. The instruction, therefore, the instruction in the data dependency graph that the degree of entry is zero after the h instructions are scheduled, that is, the newly added instruction with zero degree of incompatibility, the newly added instructions with zero degree of inclusive are not in the first In the candidate queue, Therefore, the newly added instruction with zero degree of inference can be extracted from the data dependency graph to obtain the second candidate instruction queue.
执行下述歩骤, q初始化为 2;  Perform the following steps, q is initialized to 2;
al . 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分 别放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指 令槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述 第 q-1个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟 和资源需求, 0≤^≤m ;  Extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and q - The instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies the time Delay and resource requirements, 0 ≤ ^ ≤ m;
b l . 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令;  b l. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c l . 从所有候选指令队列中删除所述 h个指令;  c l . remove the h instructions from all candidate instruction queues;
d l .使 q=q+ l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 al至 d l, 直至所述第 1候选指令队列至所述第 q候选 指令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。其中, 所述第 n+1候选指令队列中的指令被更新表示在 q=n+ l时, 在所述数据依赖图 中提取新增的入度为零的指令得到了第 n+1候选指令队列, 这样一拍的调度就 结束了  Dl. Let q=q+ l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, repeat steps a1 to dl until the first candidate instruction queue is to the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated. The instruction in the n+1th candidate instruction queue is updated to indicate that when q=n+l, the newly added instruction with zero indegree is extracted in the data dependency graph to obtain the n+1th candidate instruction queue. , the scheduling of this shot is over.
需要说明的是, 由于歩骤 al调度了所述 h个指令, 歩骤 c l从所有候选指 令队列中删除所述 h个指令, 相应的, 所述 h个指令为已调度指令, 因此在数 据依赖图中出现了在所述 h个指令被调度后入度为零的指令, 即新增的入度为 零的指令, 这些新增的入度为零的指令不在第 q候选队列中, 因此, 在歩骤 d l 中使 q=q+ l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候选指 令队列。  It should be noted that, since the h instructions are scheduled, the procedure cl deletes the h instructions from all candidate instruction queues, and correspondingly, the h instructions are scheduled instructions, and therefore the data is dependent. The instruction in the figure that the degree of entry is zero after the h instructions are scheduled, that is, the newly added instruction with zero degree of incompatibility, these newly added instructions with zero degree of incompatibility are not in the qth candidate queue, therefore, Let q=q+ l in step dl, and extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue.
在进行第 P拍调度时, P为大于 0的整数,  When performing the P-th beat scheduling, P is an integer greater than 0,
从所述第 2候选指令队列开始,依次将所述第 2候选指令队列至所述第 n+1 候选指令队列中的指令放入前一个候选指令队列中;  Starting from the second candidate instruction queue, the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ; Extracting the highest priority from the first candidate instruction queue and satisfying the time delay and resource requirements h instructions are respectively placed in the first instruction slot of each very long instruction word, and a null operation instruction is placed in the instruction slot that has not been filled in the first instruction slot of each of the super long instruction words, Q≤ ^≤m ;
从所述第 1候选指令队列中删除所述 h个指令;  And deleting the h instructions from the first candidate instruction queue;
需要说明的是, 由于在进行第 0拍调度时, 调度了所述 h个指令, 并且从 所述第 1候选指令队列中删除所述 h个指令, 相应的, 所述 h个指令为已调度 指令, 因此在数据依赖图中出现了在所述 h个指令被调度后入度为零的指令, 即新增的入度为零的指令, 这些新增的入度为零的指令不在第 1候选队列中, 因此可以在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队 列。  It should be noted that, when the 0th beat scheduling is performed, the h instructions are scheduled, and the h instructions are deleted from the first candidate instruction queue, and correspondingly, the h instructions are scheduled. The instruction, therefore, the instruction in the data dependency graph that the degree of entry is zero after the h instructions are scheduled, that is, the newly added instruction with zero degree of incompatibility, the newly added instructions with zero degree of inclusive are not in the first In the candidate queue, the newly added instruction with zero degree of inference can be extracted from the data dependency graph to obtain the second candidate instruction queue.
执行下述歩骤:  Perform the following steps:
执行下述歩骤, q初始化为 2 ;  Perform the following steps, q is initialized to 2;
a2. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分 别放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指 令槽中的指令具有的真依赖关系且同时满足时间延迟和资源需求, 或, 不与所 述第 q-1个指令槽中的指令具有的真依赖关系, 但优先级最高且同时满足时间 延迟和资源需求, 0≤^≤m ;  A2. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth - the instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and at the same time Meet the time delay and resource requirements, 0 ≤ ^ ≤ m;
b2. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令;  B2. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c2. 从所有候选指令队列中删除所述 h个指令;  C2. deleting the h instructions from all candidate instruction queues;
d2.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a2至 d2, 直至所述第 1候选指令队列至所述第 q候选 指令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。其中, 所述第 n+1候选指令队列中的指令被更新表示在 q=n+l时, 在所述数据依赖图 中提取新增的入度为零的指令得到了第 n+1候选指令队列。  D2. Let q=q+l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a2 to d2 until the first candidate instruction queue reaches the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated. The instruction in the n+1th candidate instruction queue is updated to indicate that when q=n+1, the new instruction with zero indegree is extracted in the data dependency graph to obtain the n+1th candidate instruction. queue.
需要说明的是, 由于歩骤 a2调度了所述 h个指令, 歩骤 c2从所有候选指 令队列中删除所述 h个指令, 相应的, 所述 h个指令为已调度指令, 因此在数 据依赖图中出现了在所述 h个指令被调度后入度为零的指令, 即新增的入度为 零的指令, 这些新增的入度为零的指令不在第 q候选队列中, 因此, 在歩骤 d2 中使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候选指 令队列。 It should be noted that, since the h instructions are scheduled in step a2, step c2 deletes the h instructions from all candidate instruction queues, and correspondingly, the h instructions are scheduled instructions, and thus the data is dependent. In the figure, there is an instruction that the degree of entry is zero after the h instructions are scheduled, that is, the newly entered degree is Zero instruction, these new zero-input instructions are not in the q-th candidate queue. Therefore, in step d2, q=q+l is added, and the new in-degree is extracted in the data dependency graph. The instruction gets the qth candidate instruction queue.
特别的, 在歩骤 al和 a2中按照所述 h个指令满足: 与第 q-1个指令槽中 的指令具有一对一的真依赖关系且同时满足时间延迟和资源需求来调度指令 时, 可以节省存放前一指令槽中指令的结果数的寄存器, 节省硬件资源, 提高 性能。  Specifically, in the steps a1 and a2, the h instructions are satisfied according to the following: when the instructions are in a one-to-one true dependency with the instructions in the q-1th instruction slot, and the time delay and resource requirements are simultaneously satisfied to schedule the instruction, It saves the registers that store the number of instructions in the previous instruction slot, saving hardware resources and improving performance.
计算所述数据依赖图中各指令的优先级可以根据一定的启发式规则进行计 算, 所述启发式规则可以包括指令的最大距离、 指令的执行延迟、 指令的最早 开始时间、 指令的最晚开始时间、 是否关键路径上的指令等, 不同的编译器可 能选择不同的启发式规则。  Calculating the priority of each instruction in the data dependency graph may be calculated according to a certain heuristic rule, and the heuristic rule may include a maximum distance of the instruction, an execution delay of the instruction, an earliest start time of the instruction, and a latest start of the instruction. Time, whether instructions on critical paths, etc., different compilers may choose different heuristic rules.
所述真依赖关系包括一对一依赖关系、 多对一依赖关系、 一对多依赖关系 以及多对多依赖关系。 所述一对一依赖关系为存在先后顺序的两条指令, 前一 条的结果数仅为后面一条指令所使用, 而所述后面一条指令的某一个操作数确 定是由前面一条所定义的。 示例的, 如图 2所示, 指令 a2与指令 a3满足一对 一依赖关系, 即指令 a2的结果数仅被 a3所使用, 指令 a3的某个操作数确定由 a2所定义。 指令 aO与指令 a2满足一对一依赖关系, 指令 aO的结果数仅被 a2 所使用, 指令 a2的某个操作数确定由 aO所定义。 指令 al与指令 a2满足一对 一依赖关系, 指令 al的结果数仅被 a2所使用,指令 a2的另一个操作数确定由 al所定义。 需要说明的是, 在本发明实施例中, 进行指令调度时, 判断所述 h 个指令满足的条件中所述与第 q-1个指令槽中的指令具有真依赖关系且同时满 足时间延迟和资源需求包括: 与第 q-1个指令槽中的指令具有一对一依赖关系 且同时满足时间延迟和资源需求。 这样, 在存在多个指令满足真依赖关系时, 可以优先调度与前一指令槽的指令满足一对一依赖关系的指令, 这样可以节省 一个存放所述前一指令槽的指令的结果数的寄存器, 简化指令调度的过程。  The true dependencies include one-to-one dependencies, many-to-one dependencies, one-to-many dependencies, and many-to-many dependencies. The one-to-one dependency is two instructions in the order of precedence. The result number of the previous one is only used by the latter instruction, and one of the operands of the latter instruction is determined by the previous one. For example, as shown in Figure 2, instruction a2 and instruction a3 satisfy a one-to-one dependency, that is, the number of results of instruction a2 is only used by a3, and an operand of instruction a3 is determined by a2. The instruction aO and the instruction a2 satisfy the one-to-one dependency. The result number of the instruction aO is only used by a2, and the certain operand of the instruction a2 is determined by aO. The instruction al and the instruction a2 satisfy a one-to-one dependency. The result number of the instruction al is only used by a2, and the other operand of the instruction a2 is determined by al. It should be noted that, in the embodiment of the present invention, when the instruction scheduling is performed, it is determined that the conditions in the h instruction are true depend on the instruction in the q-1 instruction slot and the time delay is satisfied. Resource requirements include: One-to-one dependency on instructions in the q-1th instruction slot while satisfying both time delay and resource requirements. In this way, when there are multiple instructions satisfying the true dependency, the instruction that satisfies the one-to-one dependency of the instruction of the previous instruction slot can be preferentially scheduled, so that a register storing the result number of the instruction of the previous instruction slot can be saved. , simplify the process of instruction scheduling.
特别的, 在每调度完一个指令之后, 指令调度装置中的可用资源都会发生 变化, 所述可用资源包括 CPU中执行指令的功能部件、 寄存器、 指令窗口等。 在对每个指令进行调度前, 调度器需要查询资源使用表来得到适合的下一个指 令的调度, 所述资源使用表中包括了当前机器的可用资源, 该资源使用表是实 时变化的, 反映了各资源被释放的时间。 因此, 在执行歩骤 102时, 指令调度 装置不但需要判断所述存在依赖关系的两条指令之间是否满足延迟时间, 而且 需要判断当前 CPU提供的资源是否满足被调度的各条指令的资源需求。 In particular, the available resources in the instruction dispatching device change after each instruction is dispatched, and the available resources include functional components, registers, instruction windows, and the like that execute instructions in the CPU. Before scheduling each instruction, the scheduler needs to query the resource usage table to obtain a schedule of a suitable next instruction. The resource usage table includes available resources of the current machine, and the resource usage table is changed in real time, reflecting The time at which each resource was released. Therefore, when performing step 102, the instruction scheduling apparatus needs to determine whether the delay time is satisfied between the two instructions having the dependency relationship, and whether the resource provided by the current CPU satisfies the resource requirements of the scheduled instructions. .
在歩骤 102之后, 所述方法还包括:  After step 102, the method further includes:
按照所述超长指令字中各个指令的排列顺序执行所述超长指令字中的各个 指令。  Each of the super long instruction words is executed in accordance with an arrangement order of the respective instructions in the super long instruction word.
本发明实施例提供另一种指令调度方法, 应用于指令调度装置, 假设该指 令调度装置以拍为单位进行指令调度,每拍包含 1个或 2个超长指令字,即 m=l 或 m=2, 每个超长指令字包括 4个指令槽。 如图 3所示, 所述方法包括:  An embodiment of the present invention provides another instruction scheduling method, which is applied to an instruction scheduling apparatus, and assumes that the instruction scheduling apparatus performs instruction scheduling in units of beats, and each shot includes one or two super long instruction words, that is, m=l or m. =2, each very long instruction word includes 4 instruction slots. As shown in FIG. 3, the method includes:
歩骤 301、 构建数据依赖图。  Step 301: Construct a data dependency graph.
数据依赖图中的指令为: b0、 bl、 b2、 c0、 cl、 c2、 c3及 c4。 假设根据上 述指令的各指令间的依赖关系构建得到的数据依赖图如图 4所示。  The instructions in the data dependency graph are: b0, bl, b2, c0, cl, c2, c3, and c4. Assume that the data dependency graph constructed according to the dependencies between the instructions of the above instruction is as shown in Fig. 4.
歩骤 302、 计算所述数据依赖图中所有指令的优先级。  Step 302: Calculate a priority of all instructions in the data dependency graph.
如图 4所示, 假设按照图 4所示的数据依赖图以及各指令间的延迟来计算 各指令的优先级。假设指令 c4需要 1个时钟周期完成, 则其余指令的优先级可 以为:  As shown in Fig. 4, it is assumed that the priority of each instruction is calculated in accordance with the data dependency graph shown in Fig. 4 and the delay between the respective instructions. Assuming that instruction c4 takes one clock cycle to complete, the priority of the remaining instructions can be:
P (c4) =1;  P (c4) =1;
P (c3) =2+P (c4) =3;  P (c3) = 2+P (c4) = 3;
P (b2) =2+P (c4) =3;  P (b2) = 2+P (c4) = 3;
P (c2) =3+P (c3) =6;  P (c2) = 3 + P (c3) = 6;
P (bl) =1+P (b2) =4;  P (bl) =1+P (b2) = 4;
P (cO) =1+P (c2) =7;  P (cO) =1+P (c2) =7;
P (cl) =1+P (c2) =7; P (cl) =1+P (c2) =7;
Figure imgf000016_0001
Figure imgf000016_0001
需要说明的是 P为代表指令的优先级。 歩骤 303、 建立 5个候选指令队列。 It should be noted that P is the priority of the representative instruction. Step 303: Establish five candidate instruction queues.
所述 5个候选指令队列分别为第 1至第 5候选指令队列。  The five candidate instruction queues are the first to fifth candidate instruction queues, respectively.
歩骤 304、 初始化所述 5个候选指令队列, 使所述 5个候选指令队列均为 空。  Step 304: Initialize the five candidate instruction queues, so that the five candidate instruction queues are all empty.
歩骤 305、 通过所述 5个候选指令队列, 根据数据依赖图进行指令调度。 为了描述的简便, 本发明实施例假设所有指令均满足资源需求。  Step 305: Perform instruction scheduling according to the data dependency graph by using the five candidate instruction queues. For simplicity of description, embodiments of the present invention assume that all instructions meet resource requirements.
当 m=l, 即指令调度装置以拍为单位进行指令调度, 每拍包含 1个超长指 令字, 所述指令调度装置的发射宽度为 1时, 具体歩骤如下:  When m=l, that is, the instruction scheduling apparatus performs instruction scheduling in units of beats, each shot includes one super long instruction word, and when the transmission width of the instruction scheduling apparatus is 1, the specific steps are as follows:
如表 1所示,在进行第 0拍调度时,所述候选指令队列中的指令为 b0、 bl、 b2、 c0、 c l、 c2、 c3及 c4, 结合数据依赖图可以得到所述候选指令队列中当前 入度为零的指令为 b0、 c0及 c l,所述第 1候选指令队列包括指令 b0、 c0及 c l, 优先级分别为 5, 7, 7。 即第 1候选指令队列 {C0,c l,b0}, 第 2, 3, 4, 5候选 指令队列都设置为空。 As shown in Table 1, when the 0th beat scheduling is performed, the instructions in the candidate instruction queue are b0, bl, b2, c0, cl, c2, c3, and c4, and the candidate instruction queue can be obtained by combining the data dependency graph. The instructions in the current zero degree of ingress are b0, c0, and cl, and the first candidate instruction queue includes instructions b0, c0, and cl, and the priorities are 5, 7, 7, respectively. That is, the first candidate instruction queue { C 0,cl,b0}, the second, third, fourth, and fifth candidate instruction queues are all set to be empty.
对于第 1个指令槽, 从第 1候选指令队列中按照优先级, 可以选择调度 c0 或者 c l, 且都满足时间延迟要求, 需要说明的是, 当所述指令调度装置为具体 的编译器时, 可能会考虑不同指令需要的功能部件的特征或其他因素, 从而进 一歩判定 c0和 c l之间的优先级, 本实施例假设这里选择调度 c0。 调度完 c0 后, 从所有候选指令队列中删除 c0。 检查数据依赖图, c0的后继指令为 c2, 由于 c2还依赖于 c l, 而 c l尚未被调度, 于是 c2尚不能加入候选队列。 此时 的候选指令队列依次为: {c l,b0}, 空, 空, 空, 空。  For the first instruction slot, according to the priority of the first candidate instruction queue, the scheduling c0 or cl can be selected, and both of the time delay requirements are met. It should be noted that when the instruction scheduling device is a specific compiler, The characteristics of the functional components required by the different instructions or other factors may be considered to further determine the priority between c0 and cl. This embodiment assumes that the scheduling c0 is selected here. After c0 is dispatched, c0 is removed from all candidate instruction queues. Check the data dependency graph, c0's successor instruction is c2, since c2 also depends on c l, and c l has not been scheduled, so c2 can not join the candidate queue. The candidate instruction queues at this time are: {c l,b0}, empty, empty, empty, empty.
对于第 2个指令槽, 从第 1, 第 2候选指令队列中选择指令, 按照优先级 优先调度 c l, 并且没有与 c l优先级相同的未调度指令, 且 c l满足时间延迟要 求, 填入第 2个指令槽。 调度完 c l之后, 从所有候选指令队列中删除 c l。 检 查数据依赖图, c l的后继指令为 c2, 由于 c2所依赖的 c0和 c l都已经被调度, 则将 c2加入第 3候选指令队列, 为第 3指令槽的调度做好准备。候选指令队列 依次为: {b0}, 空, {c2}, 空, 空。  For the second instruction slot, the instruction is selected from the first and second candidate instruction queues, and the priority is prioritized, c1, and there is no unscheduled instruction with the same priority as cl, and cl satisfies the time delay requirement, and fills in the second Command slots. After scheduling c l , remove c l from all candidate instruction queues. Check the data dependency graph. The subsequent instruction of c l is c2. Since c0 and c l depend on c2 have been scheduled, add c2 to the third candidate instruction queue to prepare for the scheduling of the third instruction slot. The candidate instruction queues are: {b0}, empty, {c2}, empty, empty.
对于第 3个指令槽, 从第 1, 第 2, 第 3候选指令队列中选择指令, 按照优 先级优先调度 c2, 并且没有与 c2优先级相同的未调度指令, 且由于 c2依赖于 c l和 c0, 需要在 c0,c l后一拍执行, c2填入第 3个指令槽满足此延迟要求, 因此 c2满足时间延迟要求, 填入第 3个指令槽。 调度完 c2之后, 从所有候选 指令队列中删除 c2。 检查数据依赖图, 依赖于 c2的 c3指令此时前驱已经被调 度, 而将 c3指令加入第 4候选指令队列, 候选指令队列依次为: {b0}, 空, 空, {c3}, 空。 For the third instruction slot, select the instruction from the first, second, and third candidate instruction queues. The first priority dispatches c2, and there is no unscheduled instruction with the same priority as c2, and since c2 depends on cl and c0, it needs to be executed after c0, cl, and c2 fills in the third instruction slot to satisfy the delay requirement. Therefore, c2 satisfies the time delay requirement and fills in the third instruction slot. After c2 is scheduled, c2 is removed from all candidate instruction queues. Check the data dependency graph. The c3 command depends on c2. At this point, the predecessor has been scheduled, and the c3 instruction is added to the fourth candidate instruction queue. The candidate instruction queues are: {b0}, empty, empty, {c3}, null.
对于第 4个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令队列中选择指令, 按照优先级优先调度 b0, 并且没有与 bO优先级相同的未调度指令, 且由于 bO 不依赖于其他指令, bO填入第 4个指令槽满足此延迟要求, 因此 bO满足时间 延迟要求, 填入第 4个指令槽。 调度完 bO之后, 从所有候选指令队列删除 b0。 检查数据依赖图,依赖于 bO的 bl指令此时前驱已经被调度而将 bl指令加入第 5候选指令队列, 候选指令队列依次为: 空, 空, 空, {c3}, {bl}。  For the fourth instruction slot, the instruction is selected from the first, second, third, and fourth candidate instruction queues, b0 is prioritized according to priority, and there is no unscheduled instruction with the same bO priority, and since bO is not dependent For other instructions, bO fills in the 4th instruction slot to meet this delay requirement, so bO meets the time delay requirement and fills in the 4th instruction slot. After scheduling bO, b0 is removed from all candidate instruction queues. Check the data dependency graph. Depending on the bO bl instruction, the predecessor has been scheduled and the bl instruction is added to the fifth candidate instruction queue. The candidate instruction queues are: empty, empty, empty, {c3}, {bl}.
当第 0拍结束时, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, 空, {c3}, {bl }。  When the 0th beat ends, the 1st, 2nd, 3rd, 4th, and 5th candidate instruction queues are: empty, empty, empty, {c3}, {bl }.
表 1  Table 1
Figure imgf000018_0001
Figure imgf000018_0001
如表 2所示, 在进行第 1拍调度时, 当第 1拍开始时, 各候选指令队列之 间移动指令, 即从所述第 2候选指令队列开始, 依次将所述第 2候选指令队列 至所述第 n+1候选指令队列中的指令放入前一个候选指令队列中, 于是, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, {c3}, {bl} , 空。 As shown in Table 2, when the first beat schedule is performed, when the first beat starts, each candidate command queue is The inter-movement instruction, that is, starting from the second candidate instruction queue, sequentially placing the instruction in the second candidate instruction queue to the n+1th candidate instruction queue into the previous candidate instruction queue, and then first The 2, 3, 4, 5 candidate instruction queues are: empty, empty, {c3}, {bl}, null.
对于第 1个指令槽, 从第 1候选指令队列中按照优先级选择指令做调度, 由于第 1候选指令队列为空, 没有指令可供调度, 于是填入空操作 (nop ) , 候 选指令队列不需要更新。  For the first instruction slot, scheduling is performed according to the priority selection instruction from the first candidate instruction queue. Since the first candidate instruction queue is empty, no instruction is available for scheduling, and then the null operation (nop) is filled in, and the candidate instruction queue is not need to be updated.
对于第 2个指令槽, 从第 1, 第 2候选指令队列中按照优先级选择指令做 调度, 由于第 1, 第 2候选指令队列为空, 没有指令可供调度, 于是填入空操 作, 候选指令队列不需要更新。  For the second instruction slot, scheduling is performed according to the priority selection instruction from the first and second candidate instruction queues. Since the first and second candidate instruction queues are empty, no instruction is available for scheduling, and then a null operation is filled in. The instruction queue does not need to be updated.
对于第 3个指令槽, 从第 1, 第 2, 第 3候选指令队列中按照优先级选择指 令做调度,其中第 1,第 2候选指令队列为空,第 3候选指令队列包含 c3指令, 按照依赖关系, c3必须在 c2执行的第三拍之后才能执行, 而放在第三个指令 槽 c3与 c2的时间间隔为 1, 小于 3拍, 不满足时间延迟, 因此不能调度 c3, 于是填入空操作, 候选指令队列不需要更新。  For the third instruction slot, scheduling is performed according to the priority selection instruction from the first, second, and third candidate instruction queues, wherein the first and second candidate instruction queues are empty, and the third candidate instruction queue includes the c3 instruction. Dependency, c3 must be executed after the third beat of c2, and the interval between the third instruction slots c3 and c2 is 1, less than 3 beats, and the time delay is not satisfied, so c3 cannot be scheduled, so fill in For null operations, the candidate instruction queue does not need to be updated.
对于第 4个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令队列中按照优先级 选择指令做调度, 其中第 1, 第 2候选指令队列为空, 第 3候选指令队列包含 c3指令, 第 4候选指令队列包含 bl指令, 按照依赖关系, c3必须在 c2执行的 第三拍之后才能执行, 而放在第四个指令槽 c3与 c2的时间间隔为 2, 小于 3 拍, 因此 c3不满足时间延迟, 这里不能调度 c3, 而 b l需要在 b0之后一拍执 行, 因此 bl满足延迟要求, 于是填入 b l指令。 从所有候选指令队列删除 bl。 检查数据依赖图,依赖于 bl的指令 b2此时前驱已经被调度而将 b2指令加入到 第 5候选指令队列。 候选指令队列依次为: 空, 空, {c3}, 空, 2}。  For the fourth instruction slot, scheduling is performed according to the priority selection instruction from the first, second, third, and fourth candidate instruction queues, wherein the first and second candidate instruction queues are empty, and the third candidate instruction queue includes c3. The instruction, the fourth candidate instruction queue contains the bl instruction. According to the dependency, c3 must be executed after the third beat of c2, and the time interval of the fourth instruction slot c3 and c2 is 2, less than 3 beats. C3 does not satisfy the time delay. Here, c3 cannot be scheduled, and bl needs to be executed after b0, so bl satisfies the delay requirement, and then fills in the bl instruction. Remove bl from all candidate instruction queues. Check the data dependency graph, relying on the bl instruction b2 at this point the predecessor has been scheduled to add the b2 instruction to the fifth candidate instruction queue. The candidate instruction queues are: empty, empty, {c3}, empty, 2}.
当第 1拍结束时, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, {c3}, 空, {b2}。  When the first beat ends, the first, second, third, fourth, and fifth candidate instruction queues are: empty, empty, {c3}, empty, {b2}.
表 2
Figure imgf000019_0001
队列 队列 队列
Table 2
Figure imgf000019_0001
Queue queue queue
{c3} {bl} 1: 空  {c3} {bl} 1: empty
操作  Operation
{c3} {bl} 2 :  {c3} {bl} 2 :
空操作  Empty operation
{c3} {bl} 3: 空  {c3} {bl} 3: Empty
操作  Operation
ί^5Γ ί^5Γ {c3} {bl} ί^5Γ 4: bl 如表 3所示, 在进行第 2拍调度时, 当第 2拍开始时, 各候选指令队列之 间移动指令, 即从所述第 2候选指令队列开始, 依次将所述第 2候选指令队列 至所述第 n+1候选指令队列中的指令放入前一个候选指令队列中, 于是, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, {c3}, 空, 2}, 空。  ί^5Γ ί^5Γ {c3} {bl} ί^5Γ 4: bl As shown in Table 3, when the second beat schedule is performed, when the second beat starts, the command is moved between the candidate command queues, that is, from The second candidate instruction queue starts, and the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue, and thus, the first, second, third, and fourth , 5 candidate instruction queues are: empty, {c3}, empty, 2}, empty.
对于第 1个指令槽, 从第 1候选指令队列中按照优先级选择指令做调度, 由于第 1候选指令队列为空, 没有指令可供调度, 于是填入空操作, 候选指令 队列不需要更新。  For the first instruction slot, scheduling is performed according to the priority selection instruction from the first candidate instruction queue. Since the first candidate instruction queue is empty, no instruction is available for scheduling, and then the null operation is filled in, and the candidate instruction queue does not need to be updated.
对于第 2个指令槽, 从第 1, 第 2候选指令队列中按照优先级选择指令做 调度, 按照依赖关系, c3必须在 c2执行的第三拍之后才能执行, 而放在第二 个指令槽 c3与 c2的时间间隔为 1, 小于 3拍, 因此 c3指令不满足时间延迟, 这里不能调度 c3, 于是填入空操作, 候选指令队列不需要更新。  For the second instruction slot, scheduling is performed according to the priority selection instruction from the first and second candidate instruction queues. According to the dependency relationship, c3 must be executed after the third beat of c2, and placed in the second instruction slot. The time interval between c3 and c2 is 1, less than 3 beats, so the c3 instruction does not satisfy the time delay. C3 cannot be scheduled here, so the null operation is filled in, and the candidate instruction queue does not need to be updated.
对于第 3个指令槽, 从第 1, 第 2, 第 3候选指令队列中按照优先级选择指 令做调度, 按照依赖关系, c3必须在 c2执行的第三拍之后才能执行, 而放在 第三个指令槽 c3与 c2的时间间隔为 2, 小于 3拍, c3指令不满足时间延迟, 这里不能调度 c3, 于是填入空操作, 候选指令队列不需要更新。  For the third instruction slot, scheduling is performed according to the priority selection instruction from the first, second, and third candidate instruction queues. According to the dependency relationship, c3 must be executed after the third shot of c2, and is placed in the third. The time interval between the instruction slots c3 and c2 is 2, less than 3 beats, and the c3 command does not satisfy the time delay. Here, c3 cannot be scheduled, so the null operation is filled in, and the candidate instruction queue does not need to be updated.
对于第 4个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令队列中按照优先级 选择指令做调度, 第 2候选指令队列包含 c3指令, 第 4候选指令队列包含 b2 指令, 按照依赖关系, c3必须在 c2执行后的第三拍之后才能执行, 而放在第 四个指令槽 c3与 c2的时间间隔恰好为 3拍, c3指令满足延迟要求, 同时, 按照依赖关系, b2必须在 b l执行的下一拍及之后才能执行, 而放在第四个指 令槽 b2与 bl的时间间隔恰好为 1拍, 因此 b2指令也满足延迟要求,按照优先 级来选择一条指令, b2和 c3优先级都为 3,可任意选择一条填入第四个指令槽。 在同等优先级情况下, 考虑这 2条指令是否有与前一指令槽具有一对一真依赖 关系的指令, 如果有则优先调度这样的指令到当下的指令槽, 因为前一指令槽 填入的是空操作, 所以也不存在真依赖关系。假定这里调度了 b2。于是填入 b2 指令, 从所有候选指令队列中删除 b2。 检查数据依赖图, 依赖于 b2的指令 c4 由于也依赖于 c3指令,而 c3尚未被调度,于是 c4指令尚不能加入到第 5候选 指令队列。 候选指令队列依次为: 空, {c3}, 空, 空, 空。 For the fourth instruction slot, scheduling is performed according to the priority selection instruction from the first, second, third, and fourth candidate instruction queues, the second candidate instruction queue includes the c3 instruction, and the fourth candidate instruction queue includes the b2 instruction. Dependency, c3 must be executed after the third shot after c2 is executed, and the time interval between the fourth instruction slots c3 and c2 is exactly 3 beats, and the c3 command satisfies the delay requirement. According to the dependency, b2 must be executed after the next shot and after the bl execution, and the time interval between the second instruction slot b2 and bl is exactly 1 beat, so the b2 instruction also satisfies the delay requirement and selects according to the priority. An instruction, b2 and c3 have a priority of 3, and can be arbitrarily selected to fill in the fourth instruction slot. In the case of equal priority, consider whether the two instructions have instructions that have a one-to-one true dependency on the previous instruction slot. If there is, then the instruction is dispatched to the current instruction slot preferentially because the previous instruction slot is filled in. It is a null operation, so there is no real dependency. Assume that b2 is scheduled here. Then fill in the b2 instruction and remove b2 from all candidate instruction queues. Checking the data dependency graph, the instruction c4 that depends on b2 also depends on the c3 instruction, and c3 has not been scheduled yet, so the c4 instruction cannot yet be added to the fifth candidate instruction queue. The candidate instruction queues are: empty, {c3}, empty, empty, empty.
当第 2拍结束时, 第 1, 2 , 3, 4, 5候选指令队列依次为: 空, {c3}, 空, 空, 空。  When the 2nd beat ends, the 1st, 2nd, 3rd, 4th, and 5th candidate command queues are: empty, {c3}, empty, empty, and empty.
表 3  table 3
Figure imgf000021_0001
Figure imgf000021_0001
如表 4所示, 在进行第 3拍调度时, 当第 3拍开始时, 各候选指令队列之 间移动指令, 即从所述第 2候选指令队列开始, 依次将所述第 2候选指令队列 至所述第 n+1候选指令队列中的指令放入前一个候选指令队列中, 第 1, 2, 3, 4, 5候选指令队列依次为: {c3}, 空, 空, 空, 空。  As shown in Table 4, when the third beat is scheduled, when the third beat starts, the command is moved between the candidate command queues, that is, the second candidate command queue is sequentially started from the second candidate command queue. The instructions in the n+1th candidate instruction queue are placed in the previous candidate instruction queue, and the first, 2, 3, 4, 5 candidate instruction queues are: {c3}, empty, empty, empty, empty.
对于第 1个指令槽,第 1候选指令队列包含 c3指令, 从第 1候选指令队列 中按照优先级选择指令做调度, 按照依赖关系, c3必须在 c2执行的第三拍之 后才能执行, 而放在第一个指令槽 c3与 c2的时间间隔为 1, 小于 3拍, c3指 令不满足时间延迟, 这里不能调度 c3, 于是填入空操作, 候选指令队列不需要 更新。 For the first instruction slot, the first candidate instruction queue contains the c3 instruction, from the first candidate instruction queue. In accordance with the priority selection instruction to do scheduling, according to the dependency, c3 must be executed after the third shot of c2, and the time interval between the first instruction slot c3 and c2 is 1, less than 3 beats, c3 instruction is not Satisfying the time delay, c3 cannot be scheduled here, so the empty operation is filled in, and the candidate instruction queue does not need to be updated.
对于第 2个指令槽, 从第 1, 第 2候选指令队列中按照优先级选择指令做 调度, 按照依赖关系, c3必须在 c2执行的第三拍之后才能执行, 而放在第二 个指令槽 c3与 c2的时间间隔为 2, 小于 3拍, c3指令不满足延迟要求, 这里 不能调度 c3, 于是填入空操作, 候选指令队列不需要更新。  For the second instruction slot, scheduling is performed according to the priority selection instruction from the first and second candidate instruction queues. According to the dependency relationship, c3 must be executed after the third beat of c2, and placed in the second instruction slot. The time interval between c3 and c2 is 2, less than 3 beats. The c3 command does not satisfy the delay requirement. C3 cannot be scheduled here, so the null operation is filled in, and the candidate instruction queue does not need to be updated.
对于第 3个指令槽, 从第 1, 第 2, 第 3候选指令队列中按照优先级选择指 令做调度, 按照依赖关系, c3必须在 c2执行的第三拍之后才能执行, 而放在 第三个指令槽 c3与 c2的时间间隔恰好为 3, c3指令满足延迟要求,于是调度 c3指令, 从所有候选指令队列删除 c3。 检查数据依赖图, 依赖于 c3的指令 c4 因此前驱已经被调度,于是 c4指令加入到第 4候选指令队列。候选指令队列依 次为: 空, 空, 空, {c4}, 空。  For the third instruction slot, scheduling is performed according to the priority selection instruction from the first, second, and third candidate instruction queues. According to the dependency relationship, c3 must be executed after the third shot of c2, and is placed in the third. The time interval between instruction slots c3 and c2 is exactly 3, and the c3 instruction satisfies the delay requirement, so the c3 instruction is scheduled to delete c3 from all candidate instruction queues. Check the data dependency graph, relying on c3's instruction c4 so the predecessor has been scheduled, so the c4 instruction is added to the fourth candidate instruction queue. The candidate instruction queues are: empty, empty, empty, {c4}, empty.
对于第 4个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令队列中按照优先级 选择指令做调度, 按照依赖关系, c4必须在 b2执行后的第二拍之后才能执行, 也必须在 c3执行后的第二拍之后才能执行,而放在第四个指令槽 c4与 b2的时 间间隔为 1拍, 与 c3的时间间隔也为 1拍,, c4指令不满足延迟要求,, 这里 不能调度 c4, 于是填入空操作, 候选指令队列不需要更新。  For the fourth instruction slot, scheduling is performed according to the priority selection instruction from the first, second, third, and fourth candidate instruction queues. According to the dependency relationship, c4 must be executed after the second shot after b2 is executed. It must be executed after the second shot after c3 is executed, and the time interval between the fourth instruction slot c4 and b2 is 1 beat, and the time interval from c3 is also 1 beat, and the c4 command does not satisfy the delay requirement, Here, c4 cannot be scheduled, so the empty operation is filled in, and the candidate instruction queue does not need to be updated.
当第 3拍结束时, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, 空, {c4}, 空。  When the third beat ends, the first, second, third, fourth, and fifth candidate command queues are: empty, empty, empty, {c4}, empty.
表 4  Table 4
第 1候 第 2 第 3 第 4候 第 5 指令 选指令队列 候选指令 候选指令 选指令队列 候选指令 槽  1st, 2nd, 3rd, 4th, 5th, 5th instruction, instruction queue, candidate instruction, candidate instruction, instruction queue, candidate instruction, slot
队列 队列 队列  Queue queue queue
{c3} 1: 空 {c3} 1: empty
操作 {c3} ί^5Γ ί^5Γ ί^5Γ ί^5Γ 2: 空 operating {c3} ί^5Γ ί^5Γ ί^5Γ ί^5Γ 2: Empty
操作  Operation
{c3} 3: c3 ί^5Γ ί^5Γ ί^5Γ {c4} ί^5Γ 4: 空  {c3} 3: c3 ί^5Γ ί^5Γ ί^5Γ {c4} ί^5Γ 4: Empty
操作  Operation
如表 5所示, 在进行第 4拍调度时, 当第 4拍开始时, 各候选指令队列之 间移动指令, 即从所述第 2候选指令队列开始, 依次将所述第 2候选指令队列 至所述第 n+1候选指令队列中的指令放入前一个候选指令队列中, 于是, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, {c4}, 空, 空。  As shown in Table 5, when the fourth beat is scheduled, when the fourth beat starts, the command is moved between the candidate command queues, that is, the second candidate command queue is sequentially started from the second candidate command queue. The instructions in the n+1th candidate instruction queue are placed in the previous candidate instruction queue, and then the first, 2, 3, 4, 5 candidate instruction queues are: empty, empty, {c4}, empty, air.
对于第 1个指令槽, 从第 1候选指令队列中按照优先级选择指令做调度, 由于第 1候选指令队列为空, 没有指令可供调度, 于是填入空操作, 候选指令 队列不需要更新。  For the first instruction slot, scheduling is performed according to the priority selection instruction from the first candidate instruction queue. Since the first candidate instruction queue is empty, no instruction is available for scheduling, and then the null operation is filled in, and the candidate instruction queue does not need to be updated.
对于第 2个指令槽, 从第 1, 第 2候选指令队列中按照优先级选择指令做 调度, 由于第 1, 第 2候选指令队列为空, 没有指令可供调度, 于是填入空操 作, 候选指令队列不需要更新。  For the second instruction slot, scheduling is performed according to the priority selection instruction from the first and second candidate instruction queues. Since the first and second candidate instruction queues are empty, no instruction is available for scheduling, and then a null operation is filled in. The instruction queue does not need to be updated.
对于第 3个指令槽, 从第 1, 第 2, 第 3候选指令队列中按照优先级选择指 令做调度, 按照依赖关系, c4必须在 b2执行后的第二拍之后才能执行, 也必 须在 c3执行后的第二拍之后才能执行,而放在第三个指令槽 c4与 b2的时间间 隔为 1拍, 与 c3的时间间隔也为 1拍, c4指令不满足延迟要求, 这里不能调 度 c4, 于是填入空操作, 候选指令队列不需要更新。  For the third instruction slot, scheduling is performed according to the priority selection instruction from the first, second, and third candidate instruction queues. According to the dependency relationship, c4 must be executed after the second shot after b2 is executed, and must also be in c3. After the second beat after execution, the time interval between the third command slot c4 and b2 is 1 beat, and the time interval with c3 is also 1 beat. The c4 command does not satisfy the delay requirement. C4 cannot be scheduled here. Then fill in the null operation, the candidate instruction queue does not need to be updated.
对于第 4个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令队列中按照优先级 选择指令做调度, 其中第 1, 第 2候选指令队列为空, 第 3候选指令队列包含 c4指令, 按照依赖关系, c4必须在 b2执行后的第二拍及之后才能执行, 也必 须在 c3执行后的第二拍及之后才能执行,而放在第四个指令槽 c4与 b2的时间 间隔为 2拍, 与 c3的时间间隔也为 2拍, c4指令满足时间延迟, 于是填入 c4。 从所有候选队列删除 c4。 检查数据依赖图, 由于没有其他未调度指令, 所以候 选指令队列没有新指令加入。 For the fourth instruction slot, scheduling is performed according to the priority selection instruction from the first, second, third, and fourth candidate instruction queues, wherein the first and second candidate instruction queues are empty, and the third candidate instruction queue includes c4. According to the dependency, c4 must be executed after the second beat after b2 is executed, and must be executed after the second beat after c3 is executed, and placed in the fourth command slot c4 and b2. For 2 beats, the time interval with c3 is also 2 beats, and the c4 command satisfies the time delay, so it fills in c4. Remove c4 from all candidate queues. Check the data dependency graph, since there are no other unscheduled instructions, so There is no new instruction to join the selected instruction queue.
当第 4拍结束时, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, 空, 空, 空。 数据依赖图上的指令都已经被调度了, 此时调度结束。  When the 4th beat ends, the 1st, 2nd, 3rd, 4th, and 5th candidate command queues are: empty, empty, empty, empty, and empty. The instructions on the data dependency graph have been scheduled, and the scheduling ends.
表 5 table 5
Figure imgf000024_0001
Figure imgf000024_0001
当 m=2, 即指令调度装置以拍为单位进行指令调度, 每拍包含 2个超长指 令字, 所述指令调度装置的发射宽度为 2时, 具体歩骤如下:  When m=2, that is, the instruction scheduling device performs instruction scheduling in units of beats, each shot includes 2 super long instruction words, and when the transmission width of the instruction scheduling device is 2, the specific steps are as follows:
如表 6所示, 在进行第 0拍调度时, 第 1, 2, 3, 4, 5候选指令队列内容 依次为: {cO,c l,bO}, 空, 空, 空, 空。  As shown in Table 6, when the 0th beat schedule is performed, the contents of the first, 2, 3, 4, and 5 candidate instruction queues are: {cO, c l, bO}, empty, empty, empty, and empty.
对于 2个超长指令字的第一个指令槽,从第 1候选指令队列中按照优先级, 可以选择调度 cO或者 c l, 且都满足时间延迟要求,假设这里选择调度 cO和 c l 分别放在两个超长指令字的第一个指令槽内。 调度完 cO和 c l后, 从所有候选 指令队列中删除 cO和 c l, 并检查数据依赖图将依赖前驱已经被调度的 c2加入 第 2候选指令队列, 为调度第 2个指令槽做好准备。 此时的候选指令队列依次 为: {b0}, {c2}, 空, 空, 空。  For the first instruction slot of two very long instruction words, according to the priority of the first candidate instruction queue, you can choose to schedule cO or cl, and both meet the time delay requirement, assuming that the scheduling cO and cl are respectively placed in two The first instruction slot of a very long instruction word. After cO and c l are dispatched, cO and c l are deleted from all candidate instruction queues, and the data dependency graph is checked to join the c2 that has been scheduled by the predecessor to the second candidate instruction queue to prepare for scheduling the second instruction slot. The candidate instruction queues at this time are: {b0}, {c2}, empty, empty, empty.
对于 2个超长指令字的第二个指令槽, 从第 1, 第 2候选指令队列中选择 指令。 按照指令的优先级, 优先调度 c2, 由于 c2指令的一个操作数必定来自 c0的结果数, 具有一对一依赖关系, 且满足时间延迟要求, 因此, 调度在第一 个超长指令字的第二个指令槽; 然后调度 b0, 满足时间延迟要求, 调度在第二 个超长指令字的第二个指令槽。 调度完 c2和 b0之后, 从所有候选指令队列中 删除 c2和 b0, 并检查数据依赖图将依赖前驱已经被调度的 bl和 c3加入第 3 候选指令队列, 为调度第三个指令槽做好准备。 候选指令队列依次为: 空, 空, {bl,c3}, 空, 空。 For the second instruction slot of two very long instruction words, the instruction is selected from the first and second candidate instruction queues. According to the priority of the instruction, c2 is dispatched preferentially, since one operand of the c2 instruction must come from The number of results of c0 has a one-to-one dependency and satisfies the time delay requirement. Therefore, it is scheduled in the second instruction slot of the first very long instruction word; then b0 is scheduled to meet the time delay requirement, and the second is scheduled. The second instruction slot of the very long instruction word. After c2 and b0 are scheduled, c2 and b0 are deleted from all candidate instruction queues, and the data dependency graph is checked. bl and c3, which have been scheduled by the predecessor, are added to the third candidate instruction queue to prepare for scheduling the third instruction slot. . The candidate instruction queues are: empty, empty, {bl, c3}, empty, empty.
对于 2个超长指令字的第三个指令槽, 从第 1, 第 2, 第 3候选指令队列中 选择指令。 按照指令的优先级, 优先调度 bl, 满足时间延迟要求, 可以放在本 拍内任一超长指令字的第三个指令槽, 考虑到 bl的操作数必定来自于 b0的结 果数, 且具有一对一依赖关系, 调度 bl在第二个超长指令字的第三个指令槽, 这样可以节省一个存放 b0结果数的寄存器; 然后考虑调度 c3, c3依赖于 c2, 需要间隔 c2的执行至少 3拍,这里不满足时间延迟要求,不能调度在本指令槽, 于是为第一个超长指令字的第三个指令槽填空操作。 调度完之后, 从所有候选 指令队列中删除 bl, 并检查数据依赖图将依赖前驱已经被调度的 b2加入第 4 候选指令队列, 为调度第四个指令槽做好准备。 候选指令队列依次为: 空, 空, {c3}, {b2}, 空。  For the third instruction slot of two very long instruction words, the instruction is selected from the first, second, and third candidate instruction queues. According to the priority of the instruction, bl is preferentially scheduled, and the time delay requirement is met. It can be placed in the third instruction slot of any super long instruction word in the beat, considering that the operand of bl must be from the number of results of b0, and has One-to-one dependency, scheduling bl in the third instruction slot of the second very long instruction word, which saves a register that stores the number of b0 results; then considers scheduling c3, c3 depends on c2, and requires at least interval c2 to execute 3 beats, here does not meet the time delay requirement, can not be scheduled in this instruction slot, and then fill in the third instruction slot of the first very long instruction word. After scheduling, remove bl from all candidate instruction queues, and check that the data dependency graph is dependent on the b2 that the predecessor has been scheduled to join the fourth candidate instruction queue in preparation for scheduling the fourth instruction slot. The candidate instruction queues are: empty, empty, {c3}, {b2}, empty.
对于 2个超长指令字的第四个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令 队列中选择指令。 按照指令的优先级, b2和 c3优先级相同。 先考虑调度 c3, c3必须在 c2执行至少 3拍后执行, 这里不满足时间延迟要求, 不能放在本指 令槽。 考虑 b2, 满足时间延迟要求, 考虑到 b2的操作数必定来自于 bl的结果 数, 且具有一对一的依赖关系, 于是调度 b2在第二个超长指令字的第四个指令 槽,这样可以节省一个存放 bl结果数的寄存器。然后为第一个超长指令字的第 四个指令槽填空操作。 调度完之后, 从所有候选指令队列中删除 b2, 尚未被调 度的指令有 c3和 c4, c3已经在候选队列中, 而 c4由于其一前驱 c3尚未被调 度, 故不能加入候选队列。 于是, 候选指令队列依次为: 空, 空, {c3}, 空, 空。  For the fourth instruction slot of two very long instruction words, the instruction is selected from the first, second, third, and fourth candidate instruction queues. According to the priority of the instruction, b2 and c3 have the same priority. Consider scheduling c3, c3 must be executed after c2 is executed for at least 3 beats. The time delay requirement is not met and cannot be placed in this command slot. Consider b2, satisfying the time delay requirement, considering that the operand of b2 must come from the result number of bl and have a one-to-one dependency, then schedule b2 in the fourth instruction slot of the second very long instruction word, so You can save a register that holds the number of bl results. Then fill in the blank for the fourth instruction slot of the first very long instruction word. After scheduling, b2 is removed from all candidate instruction queues. The unscheduled instructions have c3 and c4, c3 is already in the candidate queue, and c4 cannot be added to the candidate queue because its predecessor c3 has not been scheduled. Thus, the candidate instruction queues are: empty, empty, {c3}, empty, empty.
当第 0拍结束时, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, {c3}, 表 6 When the 0th beat ends, the first, 2, 3, 4, 5 candidate instruction queues are: empty, empty, {c3}, Table 6
Figure imgf000026_0001
Figure imgf000026_0001
如表 7所示, 在进行第 1拍调度时, 当第 1拍开始时, 各候选指令队列之 间移动指令, 即从所述第 2候选指令队列开始, 依次将所述第 2候选指令队列 至所述第 n+1候选指令队列中的指令放入前一个候选指令队列中, 于是, 第 1, 2, 3, 4, 5候选指令队列内容依次为: 空, {c3}, 空, 空, 空。  As shown in Table 7, when the first beat is scheduled, when the first beat starts, the command is moved between the candidate command queues, that is, the second candidate command queue is sequentially started from the second candidate command queue. The instructions in the n+1th candidate instruction queue are placed in the previous candidate instruction queue, and then the contents of the first, 2, 3, 4, 5 candidate instruction queues are: empty, {c3}, empty, empty , empty.
对于 2个超长指令字的第一个指令槽, 从第 1候选指令队列中按照优先级 选择, 第 1候选指令队列为空, 于是填入空操作。 候选指令队列不变化, 此时 的候选指令队列依次为: 空, {c3}, 空, 空, 空。  For the first instruction slot of two very long instruction words, the priority is selected from the first candidate instruction queue, and the first candidate instruction queue is empty, and then the null operation is filled. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, {c3}, empty, empty, and empty.
对于 2个超长指令字的第二个指令槽, 从第 1, 第 2候选指令队列中选择 指令, 由于 c3距离 c2执行一拍, 不满足时间延迟要求, 于是填入空操作。 候 选指令队列不变化, 此时的候选指令队列依次为: 空, {c3}, 空, 空, 空。  For the second instruction slot of two very long instruction words, the instruction is selected from the first and second candidate instruction queues, and since the c3 performs a beat from the distance c2, the time delay requirement is not satisfied, and then the null operation is filled. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, {c3}, empty, empty, empty.
对于 2个超长指令字的第三个指令槽, 从第 1, 第 2, 第 3候选指令队列中 选择指令, 考察 c3, 不满足时间延迟要求 (此时距离 c2执行二拍), 于是填入 空操作。 候选指令队列不变化, 此时的候选指令队列依次为: 空, {c3}, 空, 空, 空。 For the third instruction slot of two very long instruction words, from the first, second, and third candidate instruction queues Select the command, examine c3, do not meet the time delay requirement (at this time, perform two beats from the distance c2), and then fill in the empty operation. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, {c3}, empty, empty, and empty.
对于 2个超长指令字的第四个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令 队列中选择指令, 考察 c3, 满足时间延迟要求 (此时距离 c2执行三拍), 于是 调度 c3在第一个超长指令字第四个指令槽。然后为第二个超长指令字的第四个 指令槽填入空操作。从所有候选队列中删除 c3, 并检查数据依赖图, c3的后继 指令 c4此时前驱都已经被调度, 故而加入第 5个候选指令队列。此时的候选指 令队列依次为: 空, 空, 空, 空, {c4}。 当第 1拍结束时, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, 空, 空, {c4}。  For the fourth instruction slot of two very long instruction words, select the instruction from the first, second, third, and fourth candidate instruction queues, examine c3, and satisfy the time delay requirement (at this time, perform three beats from the distance c2). The c3 is then dispatched in the fourth instruction slot of the first very long instruction word. Then fill in the empty operation for the fourth instruction slot of the second very long instruction word. C3 is removed from all candidate queues, and the data dependency graph is checked. The subsequent instruction c3 of c3 has already been scheduled, so the fifth candidate instruction queue is added. The candidate command queues at this time are: empty, empty, empty, empty, {c4}. When the first beat ends, the first, second, third, fourth, and fifth candidate command queues are: empty, empty, empty, empty, {c4}.
表 7 Table 7
Figure imgf000027_0001
Figure imgf000027_0001
如表 8所示, 在进行第 2拍调度时, 当第 2拍开始时, 各候选指令队列之 间移动指令, 即从所述第 2候选指令队列开始, 依次将所述第 2候选指令队列 至所述第 n+1候选指令队列中的指令放入前一个候选指令队列中, 于是, 第 1, 2, 3, 4, 5候选指令队列内容依次为: 空, 空, 空, {c4}, 空。 As shown in Table 8, when the second beat schedule is performed, when the second beat starts, each candidate command queue is The inter-movement instruction, that is, starting from the second candidate instruction queue, sequentially placing the instruction in the second candidate instruction queue to the n+1th candidate instruction queue into the previous candidate instruction queue, and then first , 2, 3, 4, 5 candidate instruction queue contents are: empty, empty, empty, {c4}, empty.
对于 2个超长指令字的第一个指令槽, 从第 1候选指令队列中按照优先级 选择, 第 1候选指令队列为空, 于是填入空操作。 候选指令队列不变化, 此时 的候选指令队列依次为: 空, 空, 空, {c4}, 空。  For the first instruction slot of two very long instruction words, the priority is selected from the first candidate instruction queue, and the first candidate instruction queue is empty, and then the null operation is filled. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, empty, empty, {c4}, empty.
对于 2个超长指令字的第二个指令槽, 从第 1, 第 2候选指令队列中按照 优先级选择, 第 1, 第 2候选指令队列为空, 于是填入空操作。 候选指令队列 不变化, 此时的候选指令队列依次为: 空, 空, 空, {c4}, 空。  For the second instruction slot of the two very long instruction words, the first and second candidate instruction queues are selected according to the priority, and the first and second candidate instruction queues are empty, and the empty operation is filled. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, empty, empty, {c4}, empty.
对于 2个超长指令字的第三个指令槽, 从第 1, 第 2, 第 3候选指令队列中 按照优先级选择, 第 1, 第 2, 第 3候选指令队列为空, 于是填入空操作。 候选 指令队列不变化, 此时的候选指令队列依次为: 空, 空, 空, {c4}, 空。  For the third instruction slot of two very long instruction words, the priority is selected from the first, second, and third candidate instruction queues, and the first, second, and third candidate instruction queues are empty, and then the space is filled. operating. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, empty, empty, {c4}, empty.
对于 2个超长指令字的第四个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令 队列中选择指令, 指令 c4距离 c3执行一拍, 距离 b2执行 2拍, 不满足时间延 迟要求, 于是填入空操作。候选指令队列不变化, 此时的候选指令队列依次为: 空, 空, 空, {c4}, 空。  For the fourth instruction slot of two very long instruction words, the instruction is selected from the first, second, third, and fourth candidate instruction queues, the instruction c4 performs one shot from the distance c3, and the distance b2 performs two shots, and the time is not satisfied. Delay request, then fill in the empty operation. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, empty, empty, {c4}, empty.
当第 2拍结束时, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, 空, {c4}, 空。  When the 2nd beat ends, the 1st, 2nd, 3rd, 4th, and 5th candidate command queues are: empty, empty, empty, {c4}, empty.
表 8  Table 8
Figure imgf000028_0001
1, 2, 3 : 空 空操作 空操作 3
Figure imgf000028_0001
1, 2, 3 : empty operation 3
4 : {c4} 4 : {c4}
5 : 空  5 : Empty
1, 2, 3 : 空 空操作 空操作 4 1, 2, 3 : Empty operation No operation 4
4 : {c4} 4 : {c4}
5 : 空  5 : Empty
如表 8所示, 在进行第 3拍调度时, 当第 3拍开始时, 各候选指令队列之 间移动指令, 即从所述第 2候选指令队列开始, 依次将所述第 2候选指令队列 至所述第 n+1候选指令队列中的指令放入前一个候选指令队列中, 于是, 第 1, 2, 3, 4, 5候选指令队列内容依次为: 空, 空, {c4}, 空, 空。  As shown in Table 8, when the third beat is scheduled, when the third beat starts, the command is moved between the candidate command queues, that is, the second candidate command queue is sequentially started from the second candidate command queue. The instructions in the n+1th candidate instruction queue are placed in the previous candidate instruction queue, and then the contents of the first, 2, 3, 4, 5 candidate instruction queues are: empty, empty, {c4}, empty , empty.
对于 2个超长指令字的第一个指令槽, 从第 1候选指令队列中按照优先级 选择, 第 1候选指令队列为空, 于是填入空操作。 候选指令队列不变化, 此时 的候选指令队列依次为: 空, 空, {c4}, 空, 空。  For the first instruction slot of two very long instruction words, the priority is selected from the first candidate instruction queue, and the first candidate instruction queue is empty, and then the null operation is filled. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, empty, {c4}, empty, empty.
对于 2个超长指令字的第二个指令槽, 从第 1, 第 2候选指令队列中按照 优先级选择, 第 1, 第 2候选指令队列为空, 于是填入空操作。 候选指令队列 不变化, 此时的候选指令队列依次为: 空, 空, {c4}, 空, 空。  For the second instruction slot of the two very long instruction words, the first and second candidate instruction queues are selected according to the priority, and the first and second candidate instruction queues are empty, and the empty operation is filled. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, empty, {c4}, empty, empty.
对于 2个超长指令字的第三个指令槽, 从第 1, 第 2, 第 3候选指令队列中 按照优先级, 指令 c4距离 c3执行一拍, 距离 b2执行 2拍, 而 c4需要距离 c3 执行后至少 2拍, 距离 b2执行后至少 2拍, 因此不满足时间延迟要求, 于是填 入空操作。 候选指令队列不变化, 此时的候选指令队列依次为: 空, 空, {c4}, 空, 空。  For the third instruction slot of two very long instruction words, from the first, second, and third candidate instruction queues, according to the priority, the instruction c4 performs a beat from the distance c3, the distance b2 performs 2 beats, and the c4 requires the distance c3. At least 2 beats after execution and at least 2 beats after b2 is executed, so the time delay requirement is not satisfied, and then the empty operation is filled. The candidate instruction queue does not change. The candidate instruction queues in this order are: empty, empty, {c4}, empty, empty.
对于 2个超长指令字的第四个指令槽, 从第 1, 第 2, 第 3, 第 4候选指令 队列中选择指令, 指令 c4此时距离 c3执行二拍, 满足时间延迟要求于是填入 第一超长指令字的第四个指令槽。 这时候选队列为空, 于是给第二超长指令字 的第四个指令槽填入空操作。  For the fourth instruction slot of two very long instruction words, the instruction is selected from the first, second, third, and fourth candidate instruction queues, and the instruction c4 performs two shots at the distance c3, and the time delay request is satisfied. The fourth instruction slot of the first very long instruction word. At this time, the candidate queue is empty, and then the fourth instruction slot of the second very long instruction word is filled with a null operation.
当第 3拍结束时, 第 1, 2, 3, 4, 5候选指令队列依次为: 空, 空, 空, 空, 空。 至此, 数据依赖图上所有指令都被调度, 指令调度结束。 When the 3rd beat ends, the 1st, 2nd, 3rd, 4th, and 5th candidate command queues are: empty, empty, empty, empty, and empty. At this point, all instructions on the data dependency graph are scheduled, and the instruction scheduling ends.
表 9 Table 9
Figure imgf000030_0001
Figure imgf000030_0001
歩骤 306、 按照所述超长指令字中各个指令的排列顺序执行所述超长指令 字中的各个指令。  Step 306: Execute each instruction in the super long instruction word according to an arrangement order of each instruction in the super long instruction word.
示例的, 假定一拍发射 4个超长指令字, 即 m=4, 一个超长指令字有 4个 指令槽, 即 n=4。  For example, assume that one shot emits four very long instruction words, ie m=4, and one very long instruction word has four instruction slots, ie n=4.
假定根据上述指令调度方法得到的指令序列为:  Assume that the sequence of instructions obtained according to the above instruction scheduling method is:
第 0拍 : {a0, b0, c0, d0} {e0, f0, g0, hO} {i0, j0, k0, 10} {m0, n0, o0, pO} ; 第 1拍 : {al, bl, cl, dl} {el, fl, gl, hi} {il, jl, kl, 11} {ml, nl, ol, pi} ; 第 2拍 : {a2, b2, c2, d2} {e2, f2, g2, h2} {i2, j2, k2, 12} {m2, n2, o2, p2} ; 第 3拍 : {a3, b3, c3, d3} {e3, f3, g3, h3} {i3, j3, k3, 13} {m3, n3, o3, p3}。 则指令发射执行情况如图 5所示: 其中, FU (Function Unit 功能部件) 为超长指令字的部件, 在第 0拍, 发射了 4个超长指令字, 当前拍在执行各自 的第 1个指令槽的指令, 即 a0、 e0、 i0和 mO并行执行, 在第 1拍, 又发射了 4个超长指令字, 前一拍发射的超长指令字, 当前在执行各自的第 2个指令槽 的指令, 即并行执行 b0、 f0、 jO和 ηθ , 当前拍发射的四条超长指令字, 当前 在执行各自的第 1个指令槽的指令, 即并行执行 al、 el、 i l和 ml , 但是 b0、 f0、 jO和 n0, 以及 al、 el、 i l和 ml , 彼此之间并行执行, 不能存在依赖关系。 即后一拍的超长指令字的第 t个指令槽的指令与前一拍的超长指令字的第 t+1 个指令槽的指令之间不存在依赖关系。 第 2、 3拍的执行方法与第 1拍原理相 同, 本发明对此不做赘述。 需要说明的是, 图 5中的中一组串行的 FU, 可以为 相同或不同的功能部件, 示例的, 一组串行的 FU可能有 2个加法器、 1个乘法 器、 1个访存部件。 0th beat: {a0, b0, c0, d0} {e0, f0, g0, hO} {i0, j0, k0, 10} {m0, n0, o0, pO} ; 1st beat: {al, bl, Cl, dl} {el, fl, gl, hi} {il, jl, kl, 11} {ml, nl, ol, pi} ; 2nd beat: {a2, b2, c2, d2} {e2, f2, G2, h2} {i2, j2, k2, 12} {m2, n2, o2, p2} ; 3rd beat: {a3, b3, c3, d3} {e3, f3, g3, h3} {i3, j3, K3, 13} {m3, n3, o3, p3}. The execution of the command transmission is shown in Figure 5: where FU (Function Unit function) is a component of the very long instruction word. At the 0th beat, 4 super long instruction words are transmitted, and the current shot is executed. The instruction of the first instruction slot, that is, a0, e0, i0, and mO are executed in parallel. In the first shot, four extra long instruction words are transmitted, and the long instruction words transmitted by the previous shot are currently executing their respective The instruction of the second instruction slot, that is, b0, f0, jO, and ηθ are executed in parallel, and the four super long instruction words currently transmitted by the current shot are currently executing the instructions of the respective first instruction slots, that is, parallel execution of al, el, il And ml, but b0, f0, jO, and n0, and al, el, il, and ml are executed in parallel with each other, and there is no dependency. That is, there is no dependency between the instruction of the tth instruction slot of the long shot command word of the last shot and the instruction of the t+1th instruction slot of the long shot instruction word of the previous shot. The execution method of the second and third beats is the same as the first beat principle, and the present invention will not be described in detail. It should be noted that the set of serial FUs in FIG. 5 may be the same or different functional components. For example, a group of serial FUs may have 2 adders, 1 multiplier, and 1 access. Save parts.
本发明实施例提供的指令调度方法, 使得同一拍内的超长指令字之间是并 行执行的关系, 相邻两拍中后一拍的任一超长指令字的第 t个指令槽的指令与 前一拍的任一超长指令字的第 t+1个指令槽的指令之间不存在依赖关系, 因此 在具有串行功能部件的多核处理器上执行指令时, 就不会出现具有依赖关系的 指令在同一拍执行或者依赖本条指令的下一条指令先于本条指令执行的情况发 生, 能够使得处理器或流水线正常运行, 提高了调度的正确性。 依然以图 4为 例,假设该指令调度装置以拍为单位进行指令调度, 每拍包含 1个超长指令字, 即 m=l, 每个超长指令字包括 4个指令槽。 若采用现有技术的指令调度方法, 编译器生成的指令序列如下:  The instruction scheduling method provided by the embodiment of the invention causes the super long instruction words in the same shot to be executed in parallel, and the instruction of the tth instruction slot of any super long instruction word in the next two shots There is no dependency between the instructions of the t+1th instruction slot of any of the previous long shots, so there is no dependency when executing instructions on a multicore processor with serial features. The instruction of the relationship occurs in the same shot or the next instruction that depends on this instruction precedes the execution of this instruction, which can make the processor or the pipeline run normally, and improve the correctness of the scheduling. Still taking FIG. 4 as an example, it is assumed that the instruction scheduling device performs instruction scheduling in units of beats, each shot contains one super long instruction word, that is, m=l, and each super long instruction word includes four instruction slots. If the prior art instruction scheduling method is used, the sequence of instructions generated by the compiler is as follows:
第 0拍 : {b0, c0, c l,空操作 } ;  0th beat: {b0, c0, c l, empty operation } ;
第 1拍 : { bl, c2, 空操作, 空操作 } ;  1st beat : { bl, c2, empty operation, empty operation } ;
第 2拍: 2,空操作, 空操作, 空操作 } ;  2nd beat: 2, empty operation, empty operation, empty operation } ;
第 3拍: {空操作,空操作, 空操作, 空操作 } ;  3rd beat: {empty operation, empty operation, empty operation, empty operation };
第 4拍: {c3,空操作, 空操作, 空操作 } ;  4th beat: {c3, empty operation, empty operation, empty operation };
第 5拍: {空操作,空操作, 空操作, 空操作 } ;  5th beat: {empty operation, empty operation, empty operation, empty operation };
第 6拍: {c4,空操作, 空操作, 空操作 } ;  6th beat: {c4, empty operation, empty operation, empty operation };
第 2拍 : { b2, c3, 空操作, 空操作 }; 则指令发射执行情况如下: 在第 0拍, 执行 b0 ;在第 1拍并行执行 bl、 c0, 在第 2拍并行执行 b2、 c2和 c l。 这里存在依赖关系的 c2和 c l在同一时刻执 行, 这可能会导致运行出错或者流水线停顿, 影响指令执行的性能或正确性。 通常, 指令发射执行也需要执行空操作, 只是它没有操作数、 结果数也没有实 际的操作, 但同样进入处理器进行取址、 译码、 执行的过程。 2nd beat: { b2, c3, empty operation, empty operation}; Then, the execution of the instruction is as follows: At 0th beat, b0 is executed; in the first beat, bl and c0 are executed in parallel, and in the second beat, b2, c2, and cl are executed in parallel. The dependencies c2 and cl are executed at the same time, which may result in running errors or pipeline stalls, affecting the performance or correctness of instruction execution. Usually, the instruction execution also needs to perform a null operation, except that it has no operands, no result, no actual operation, but also enters the processor for the process of addressing, decoding, and execution.
采用本发明实施例提供的指令调度方法, 编译器生成的指令序列如下: With the instruction scheduling method provided by the embodiment of the present invention, the sequence of instructions generated by the compiler is as follows:
0拍 : { c0, c l, c2, b0} , 0 beats : { c0, c l, c2, b0} ,
1拍 : : {空操作, 空操作, 空操作, bl} ; 1 beat: : {empty operation, empty operation, empty operation, bl} ;
2拍 : : {空操作, 空操作, 空操作, b2} ; 2 beats: : {empty operation, no operation, no operation, b2} ;
3拍 : : {空操作, 空操作, c3, 空操作 } ;  3 beats : : { empty operation, empty operation, c3, empty operation } ;
4拍 : : {空操作, 空操作, 空操作, c4}。  4 beats : : { empty operation, empty operation, empty operation, c4}.
则指令发射执行情况如下: 在第 0拍, 执行 c0; 在第 1拍执行空操作、 c l , 在第 2拍并行执行空操作、 空操作和 c2, 在第 3拍并行执行空操作、 空操作、 空操作和 b0, 在第 4拍执行空操作、 空操作、 空操作和 bl。在第 4拍执行空操 作、 c3和 b2。 在第 5拍执行空操作、 空操作。 在第 6拍执行 c4。 因此在具有 串行功能部件的多核处理器上执行指令时, 就不会出现具有依赖关系的指令在 同一拍执行或者依赖本条指令的下一条指令先于本条指令执行的情况发生, 能 够使得处理器或流水线正常运行, 提高了调度的正确性。 Then the execution of the command is as follows: At 0th beat, c0 is executed ; the 1st shot is performed with a null operation, cl, the 2nd shot is executed in parallel with the empty operation, the empty operation and c2, and the third shot is executed in parallel with the empty operation, the empty operation , Null operation and b0, perform a null operation, a null operation, a null operation, and bl on the 4th shot. The idle operation, c3 and b2 are performed on the 4th shot. Perform a no-go operation or a null operation on the 5th shot. Execute c4 at the 6th beat. Therefore, when an instruction is executed on a multi-core processor having a serial function component, there is no occurrence that the instruction having the dependency is executed in the same shot or the next instruction that depends on the instruction is executed before the instruction, and the processor can be caused. Or the pipeline is running normally, which improves the correctness of the scheduling.
本发明实施例提供一种指令调度装置 60, 如图 6所示, 包括:  An embodiment of the present invention provides an instruction scheduling apparatus 60, as shown in FIG. 6, including:
构建单元 601, 用于构建数据依赖图。  A building unit 601 is used to construct a data dependency graph.
调度单元 602, 用于分别从所述数据依赖图中提取 k个指令进行调度得到 每一拍的 m个超长指令字,使得同一拍内的超长指令字之间是并行执行的关系, 相邻两拍中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一超 长指令字的第 t+1个指令槽的指令之间不存在依赖关系。  The scheduling unit 602 is configured to separately extract k instructions from the data dependency graph to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are executed in parallel. There is no dependency between the instruction of the tth instruction slot of any very long instruction word of the next shot in the next two shots and the instruction of the t+1th instruction slot of any of the super long instruction words of the previous shot.
其中, Q≤^≤mX n, 所述 n表示一个超长指令字中指令槽的个数, 所述 n 为大于等于 1的整数, 所述 m表示每拍中超长指令字的个数, 所述 m为大于等 于 1的整数, 所述 t为大于等于 1小于等于 n-1的整数。 这样一来, 调度单元使得同一拍内的超长指令字之间是并行执行的关系, 相邻两拍中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一超 长指令字的第 t+1个指令槽的指令之间不存在依赖关系, 因此在具有串行功能 部件的多核处理器上执行指令时, 就不会出现具有依赖关系的指令在同一拍执 行或者依赖本条指令的下一条指令先于本条指令执行的情况发生, 能够使得处 理器或流水线正常运行, 提高了调度的正确性。 Wherein, Q≤^≤m X n , the n represents the number of instruction slots in a very long instruction word, the n is an integer greater than or equal to 1, and the m represents the number of super long instruction words per beat. The m is an integer greater than or equal to 1, and the t is an integer greater than or equal to 1 and less than or equal to n-1. In this way, the scheduling unit makes the relationship between the long instruction words in the same shot in parallel, and the instruction of the tth instruction slot of any super long instruction word in the next two shots. There is no dependency between the instructions of the t+1th instruction slot of any very long instruction word, so when the instruction is executed on a multi-core processor with serial functions, there is no instruction with dependency. The next instruction executed by the same shot or relying on this instruction occurs before the execution of this instruction, which can make the processor or pipeline run normally, and improve the correctness of the scheduling.
进一歩的, 如图 7所示, 所述指令调度装置 60还可以包括:  Further, as shown in FIG. 7, the instruction scheduling apparatus 60 may further include:
执行单元 603, 用于按照所述超长指令字中各个指令的排列顺序执行所述 超长指令字中的各个指令。  The executing unit 603 is configured to execute each instruction in the super long instruction word according to an arrangement order of each instruction in the super long instruction word.
如图 8所示, 所述指令调度装置 60还可以包括:  As shown in FIG. 8, the instruction scheduling apparatus 60 may further include:
建立单元 604, 用于建立 n+1个候选指令队列, 所述 n+1个候选指令队列 分别为第 1至第 n+1候选指令队列。  The establishing unit 604 is configured to establish n+1 candidate instruction queues, where the n+1 candidate instruction queues are the first to n+1th candidate instruction queues, respectively.
初始化单元 605, 用于初始化所述 n+1个候选指令队列, 使所述 n+1个候 选指令队列均为空。  The initializing unit 605 is configured to initialize the n+1 candidate instruction queues, so that the n+1 candidate instruction queues are all empty.
所述调度单元 60具体用于:  The scheduling unit 60 is specifically configured to:
在进行第 0拍调度时,  When performing the 0th beat schedule,
在所述数据依赖图中提取当前入度为零的指令得到第 1候选指令队列, 所 述入度为零的指令在所述数据依赖图中的没有前驱结点或者其所有前驱结点已 被调度;  Extracting an instruction with a current degree of zero in the data dependency graph to obtain a first candidate instruction queue, wherein the instruction with zero degree of ingress has no precursor node in the data dependency graph or all of its precursor nodes have been Scheduling
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ;  Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令;  And deleting the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队列,; 执行下述歩骤, q初始化为 2 ;  Extracting a new instruction with zero degree of inference in the data dependency graph to obtain a second candidate instruction queue; performing the following steps, q is initialized to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有的真依赖关系, 但优先级最高且同时满足时间延迟 和资源需求, 0≤^≤m ; a. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth -1 instruction The instructions in the slot have true dependencies and satisfy both time delays and resource requirements, or do not have true dependencies on the instructions in the q-1th instruction slot, but have the highest priority and satisfy both time delays and resources. Demand, 0 ≤ ^ ≤ m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令;  b. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c 从所有候选指令队列中删除所述 h个指令;  c delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。  d. Let q=q+l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue reaches the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
在进行第 P拍调度时, P为大于 0的整数,  When performing the P-th beat scheduling, P is an integer greater than 0,
从所述第 2候选指令队列开始,依次将所述第 2候选指令队列至所述第 n+1 候选指令队列中的指令放入前一个候选指令队列中;  Starting from the second candidate instruction queue, the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ;  Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令;  And deleting the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队列。 执行下述歩骤:  A new instruction with zero degree of ingress is extracted from the data dependency graph to obtain a second candidate instruction queue. Perform the following steps:
执行下述歩骤, q初始化为 2;  Perform the following steps, q is initialized to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有的真依赖关系, 但优先级最高且同时满足时间延迟 和资源需求, 0≤^≤m ;  a. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth - The instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies Time delay and resource demand, 0 ≤ ^ ≤ m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令;  b. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c 从所有候选指令队列中删除所述 h个指令; d.使 q=q+l,在所述数据依赖图中提取新增的当前入度为零的指令得到第 q 候选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选 指令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。 c delete the h instructions from all candidate instruction queues; d. Let q=q+l, extract the newly added instruction with the current degree of zero in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue reaches There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
特别的, 所述与第 q-1个指令槽中的指令具有的真依赖关系且同时满足时 间延迟和资源需求包括: 与第 q-1个指令槽中的指令具有一对一依赖关系且同 时满足时间延迟和资源需求。  Specifically, the true dependency of the instruction in the q-1th instruction slot and satisfying the time delay and the resource requirement simultaneously include: having a one-to-one dependency with the instruction in the q-1th instruction slot and simultaneously Meet time delays and resource requirements.
本发明实施例提供一种指令调度装置中, 调度单元使得同一拍内的超长指 令字之间是并行执行的关系, 相邻两拍中后一拍的任一超长指令字的第 t个指 令槽的指令与前一拍的任一超长指令字的第 t+1个指令槽的指令之间不存在依 赖关系, 因此在具有串行功能部件的多核处理器上执行指令时, 就不会出现具 有依赖关系的指令在同一拍执行或者依赖本条指令的下一条指令先于本条指令 执行的情况发生, 能够使得处理器或流水线正常运行, 提高了调度的正确性。 本发明实施例提供一种指令调度装置, 如图 9所示, 包括:  Embodiments of the present invention provide an instruction scheduling apparatus, in which a scheduling unit causes a relationship between parallel long execution words in a same shot to be executed in parallel, and a t-th of any super long instruction words in a subsequent one shot of two adjacent shots There is no dependency between the instruction of the instruction slot and the instruction of the t+1th instruction slot of any of the long-length instruction words of the previous shot. Therefore, when the instruction is executed on a multi-core processor with serial functions, It will happen that the instruction with dependency is executed in the same shot or the next instruction that depends on this instruction precedes the execution of this instruction, which can make the processor or pipeline run normally, and improve the correctness of the scheduling. An embodiment of the present invention provides an instruction scheduling apparatus, as shown in FIG. 9, including:
处理器 901, 用于构建数据依赖图;  a processor 901, configured to construct a data dependency graph;
所述处理器 901还用于分别从所述数据依赖图中提取 k个指令进行调度得 到每一拍的 m个超长指令字, 使得同一拍内的超长指令字之间是并行执行的关 系, 相邻两拍中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任 一超长指令字的第 t+1个指令槽的指令之间不存在依赖关系;  The processor 901 is further configured to separately extract k instructions from the data dependency graph to obtain m super long instruction words for each beat, so that the super long instruction words in the same shot are in parallel execution relationship. There is no dependency between the instruction of the tth instruction slot of any very long instruction word of the next shot in the adjacent two beats and the instruction of the t+1th instruction slot of any of the long shot words of the previous shot. Relationship
其中, Q≤A≤mX n, 所述 n表示一个超长指令字中指令槽的个数, 所述 n 为大于等于 1的整数, 所述 m表示每拍中超长指令字的个数, 所述 m为大于等 于 1的整数, 所述 t为大于等于 1小于等于 n-1的整数。 Wherein, Q≤A≤m X n, n represents the number of a very long instruction word in the instruction slots, the n is an integer 1, m represents the number of each shot in the long instruction word, The m is an integer greater than or equal to 1, and the t is an integer greater than or equal to 1 and less than or equal to n-1.
这样一来, 处理器使得同一拍内的超长指令字之间是并行执行的关系, 相 邻两拍中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一超长 指令字的第 t+1个指令槽的指令之间不存在依赖关系, 因此在具有串行功能部 件的多核处理器上执行指令时, 就不会出现具有依赖关系的指令在同一拍执行 或者依赖本条指令的下一条指令先于本条指令执行的情况发生, 能够使得处理 器或流水线正常运行, 提高了调度的正确性。 In this way, the processor makes the relationship between the long instruction words in the same shot in parallel, and the instruction of the tth instruction slot of any super long instruction word in the next two shots and the previous shot. There is no dependency between the instructions of the t+1th instruction slot of any very long instruction word, so when the instruction is executed on a multi-core processor with serial functions, there is no instruction with dependency. The next instruction executed by the same shot or relying on this instruction occurs before the execution of this instruction, which enables processing The pipeline or pipeline runs normally, which improves the correctness of the schedule.
所述处理器 901还用于:  The processor 901 is further configured to:
按照所述超长指令字中各个指令的排列顺序执行所述超长指令字中的各个 指令。  Each of the super long instruction words is executed in accordance with an arrangement order of the respective instructions in the super long instruction word.
所述处理器 901还用于:  The processor 901 is further configured to:
在构建数据依赖图之后, 所述方法还包括:  After the data dependency graph is constructed, the method further includes:
建立 n+1个候选指令队列, 所述 n+1个候选指令队列分别为第 1至第 n+1 候选指令队列;  Establishing n+1 candidate instruction queues, where the n+1 candidate instruction queues are the first to n+1th candidate instruction queues, respectively;
初始化所述 n+1个候选指令队列, 使所述 n+1个候选指令队列均为空。 在进行第 0拍调度时,  The n+1 candidate instruction queues are initialized such that the n+1 candidate instruction queues are all empty. When performing the 0th beat schedule,
在所述数据依赖图中提取当前入度为零的指令得到第 1候选指令队列, 所 述入度为零的指令在所述数据依赖图中的没有前驱结点或者其所有前驱结点已 被调度;  Extracting an instruction with a current degree of zero in the data dependency graph to obtain a first candidate instruction queue, wherein the instruction with zero degree of ingress has no precursor node in the data dependency graph or all of its precursor nodes have been Scheduling
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ;  Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令;  And deleting the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取当前入度为零的指令得到第 2候选指令队列; 执行下述歩骤, q初始化为 2 ;  Extracting an instruction with a current indegree of zero in the data dependency graph to obtain a second candidate instruction queue; performing the following steps, q is initialized to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟和 资源需求, 0≤ ≤m ;  a. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth - The instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies the time Delay and resource requirements, 0 ≤ ≤ m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令;  b. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c 从所有候选指令队列中删除所述 h个指令; d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。 c delete the h instructions from all candidate instruction queues; d. Let q=q+l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue reaches the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
在进行第 P拍调度时, P为大于 0的整数,  When performing the P-th beat scheduling, P is an integer greater than 0,
从所述第 2候选指令队列开始,依次将所述第 2候选指令队列至所述第 n+1 候选指令队列中的指令放入前一个候选指令队列中;  Starting from the second candidate instruction queue, the instructions in the second candidate instruction queue to the n+1th candidate instruction queue are sequentially placed in the previous candidate instruction queue;
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ;  Extracting h instructions having the highest priority and satisfying the time delay and resource requirements from the first candidate instruction queue are respectively placed in the first instruction slot of each super long instruction word, and each of the super long instruction words In the first instruction slot of the first instruction slot, a null operation instruction is placed in the instruction slot that has not been filled, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令;  And deleting the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队列; 执行下述歩骤:  Extracting a new instruction with zero degree of inference in the data dependency graph to obtain a second candidate instruction queue; performing the following steps:
执行下述歩骤, q初始化为 2;  Perform the following steps, q is initialized to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟和 资源需求, 0≤ ≤m ;  a. extracting h instructions from the first candidate instruction queue to the qth candidate instruction queue into the qth instruction slot of each very long instruction word, the h instructions satisfy: and the qth - The instruction in the instruction slot has a true dependency and satisfies both the time delay and the resource requirement, or does not have a true dependency on the instruction in the q-1th instruction slot, but has the highest priority and simultaneously satisfies the time Delay and resource requirements, 0 ≤ ≤ m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令;  b. Put a null operation instruction in the unfilled instruction slot in the qth instruction slot of each very long instruction word;
c 从所有候选指令队列中删除所述 h个指令;  c delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。  d. Let q=q+l, extract the newly added instruction with zero degree of inference in the data dependency graph to obtain the qth candidate instruction queue, and repeat steps a to d until the first candidate instruction queue reaches the There is no unscheduled instruction in the qth candidate instruction queue or the instruction in the n+1th candidate instruction queue is updated.
特别的, 所述处理器 901具体用于:  Specifically, the processor 901 is specifically configured to:
与第 q-1个指令槽中的指令具有一对一依赖关系且同时满足时间延迟和资 源需求。 本发明实施例提供的指令调度装置中, 处理器使得同一拍内的超长指令字 之间是并行执行的关系, 相邻两拍中后一拍的任一超长指令字的第 t个指令槽 的指令与前一拍的任一超长指令字的第 t+1个指令槽的指令之间不存在依赖关 系, 因此在具有串行功能部件的多核处理器上执行指令时, 就不会出现具有依 赖关系的指令在同一拍执行或者依赖本条指令的下一条指令先于本条指令执行 的情况发生, 能够使得处理器或流水线正常运行, 提高了调度的正确性。 It has a one-to-one dependency with the instructions in the q-1th instruction slot and simultaneously satisfies the time delay and resource requirements. In the instruction scheduling apparatus provided by the embodiment of the present invention, the processor causes the super long instruction words in the same shot to be executed in parallel, and the tth instruction of any super long instruction words in the next two shots There is no dependency between the instruction of the slot and the instruction of the t+1th instruction slot of any of the long-length instruction words of the previous shot, so when the instruction is executed on the multi-core processor with the serial function, it will not The occurrence of a dependency instruction in the same shot or the next instruction that depends on this instruction precedes the execution of this instruction, which enables the processor or pipeline to operate normally, improving the correctness of the scheduling.
所属领域的技术人员可以清楚地了解到, 为描述的方便和简洁, 上述描述 的系统,装置和单元的具体工作过程, 可以参考前述方法实施例中的对应过程, 在此不再赘述。  A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统, 装置和方 法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示意性 的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可以有另 外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个系统, 或 一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间的耦合或直 接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合或通信连接, 可以是电性, 机械或其它的形式。  In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的, 作为 单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者 也可以分布到多个网络单元上。 可以根据实际的需要选择其中的部分或者全部 单元来实现本实施例方案的目的。  The units described as separate components may or may not be physically separate, and the components displayed as the units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元中, 也可以是各个单元单独物理包括, 也可以两个或两个以上单元集成在一个单元 中。 上述集成的单元既可以采用硬件的形式实现, 也可以采用硬件加软件功能 单元的形式实现。  In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分歩骤可 以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读取存 储介质中, 该程序在执行时, 执行包括上述方法实施例的歩骤; 而前述的存储 介质包括: R0M、 RAM, 磁碟或者光盘等各种可以存储程序代码的介质。 A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, when executed, Performing the steps including the above method embodiments; and the foregoing storage The media includes: R0M, RAM, disk or optical disk and other media that can store program code.
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限于 此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 以所述权利要求的保护范围为准。  The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

权 利 要 求 Rights request
1、 一种指令调度方法, 应用于指令调度装置, 其特征在于, 包括: 构建数据依赖图; 1. An instruction scheduling method, applied to an instruction scheduling device, characterized in that it includes: constructing a data dependency graph;
分别从所述数据依赖图中提取 k个指令进行调度得到每一拍的 m个超长指 令字, 使得同一拍内的超长指令字之间是并行执行的关系, 相邻两拍中后一拍 的任一超长指令字的第 t个指令槽的指令与前一拍的任一超长指令字的第 t+1 个指令槽的指令之间不存在依赖关系; Extract k instructions from the data dependency graph and schedule them to obtain m ultra-long instruction words in each beat, so that the ultra-long instruction words in the same beat are executed in parallel, and the latter one of two adjacent beats is There is no dependency between the instruction in the t-th instruction slot of any very long instruction word in the previous shot and the instruction in the t+1th instruction slot of any extremely long instruction word in the previous shot;
其中, Q≤A≤mX n, 所述 n表示一个超长指令字中指令槽的个数, 所述 n 为大于等于 1的整数, 所述 m表示每拍中超长指令字的个数, 所述 m为大于等 于 1的整数, 所述 t为大于等于 1小于等于 n-1的整数。 Among them , Q≤A≤m The m is an integer greater than or equal to 1, and the t is an integer greater than or equal to 1 and less than or equal to n-1.
2、根据权利要求 1所述的方法, 其特征在于, 在所述分别从所述 DAG数据 依赖图中提取 k个指令进行调度得到每一拍的 m个超长指令字之后, 所述方法 还包括: 2. The method according to claim 1, characterized in that, after extracting k instructions from the DAG data dependency graph and scheduling them to obtain m ultra-long instruction words for each beat, the method further: include:
按照所述超长指令字中各个指令的排列顺序执行所述超长指令字中的各个 指令。 Each instruction in the ultra-long instruction word is executed in the order in which the instructions in the ultra-long instruction word are arranged.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 3. The method according to claim 1 or 2, characterized in that,
在构建数据依赖图之后, 所述方法还包括: After building the data dependency graph, the method also includes:
建立 n+1个候选指令队列, 所述 n+1个候选指令队列分别为第 1至第 n+1 候选指令队列; Establish n+1 candidate instruction queues, where the n+1 candidate instruction queues are the 1st to n+1th candidate instruction queues respectively;
初始化所述 n+1个候选指令队列, 使所述 n+1个候选指令队列均为空。 Initialize the n+1 candidate instruction queues so that the n+1 candidate instruction queues are all empty.
4、根据权利要求 3所述的方法, 其特征在于, 所述分别从所述数据依赖图 中提取 k个指令进行调度得到每一拍的 m个超长指令字, 使得同一拍内的超长 指令字之间是并行执行的关系, 相邻两拍中后一拍的任一超长指令字的第 t个 指令槽的指令与前一拍的任一超长指令字的第 t+1个指令槽的指令之间不存在 依赖关系包括: 4. The method according to claim 3, characterized in that: k instructions are respectively extracted from the data dependency graph and scheduled to obtain m ultra-long instruction words in each beat, so that the over-long instruction words in the same beat are There is a parallel execution relationship between instruction words. The instruction in the t-th instruction slot of any very long instruction word in the latter beat of the two adjacent beats is the same as the t+1th instruction of any very long instruction word in the previous beat. There is no dependency between instructions in the instruction slot including:
在进行第 0拍调度时, When scheduling the 0th beat,
在所述数据依赖图中提取当前入度为零的指令得到第 1候选指令队列, 所 述入度为零的指令在所述数据依赖图中的没有前驱结点或者其所有前驱结点已 被调度; Extract the instruction whose current in-degree is zero from the data dependency graph to obtain the first candidate instruction queue, so The instruction with zero insertion degree has no predecessor nodes in the data dependency graph or all its predecessor nodes have been scheduled;
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ; Extract the h instructions with the highest priority and meet the time delay and resource requirements from the first candidate instruction queue and put them into the first instruction slot of each ultra-long instruction word. In each of the ultra-long instruction words Put a no-operation instruction into the unfilled instruction slot of the first instruction slot, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令; Delete the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增入度为零的指令得到第 2候选指令队列; 执行下述歩骤, q初始化为 2 ; Extract instructions with zero new entry degree from the data dependency graph to obtain the second candidate instruction queue; perform the following steps and initialize q to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟和 资源需求, 0≤ ≤m ; a. Extract h instructions from the first candidate instruction queue to the q-th candidate instruction queue and put them into the q-th instruction slot of each ultra-long instruction word. The h instructions satisfy: and the q-th instruction slot. The instruction in -1 instruction slot has a true dependency and meets both time delay and resource requirements, or, does not have a true dependency with the instruction in the q-1th instruction slot, but has the highest priority and meets both time delay and resource requirements. Latency and resource requirements, 0≤ ≤m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令; b. Place a no-operation instruction into the unfilled instruction slot in the q-th instruction slot of each extremely long instruction word;
c 从所有候选指令队列中删除所述 h个指令; c Delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。 d. Set q=q+l, extract the newly added instructions with zero in-depth in the data dependency graph to obtain the qth candidate instruction queue, repeat steps a to d, until the first candidate instruction queue reaches the qth candidate instruction queue. There are no unscheduled instructions in the q-th candidate instruction queue or the instructions in the n+1-th candidate instruction queue are updated.
5、根据权利要求 4所述的方法, 其特征在于, 所述分别从所述数据依赖图 中提取 k个指令进行调度得到每一拍的 m个超长指令字, 使得同一拍内的超长 指令字之间是并行执行的关系, 相邻两拍中后一拍的任一超长指令字的第 t个 指令槽的指令与前一拍的任一超长指令字的第 t+1个指令槽的指令之间不存在 依赖关系还包括: 5. The method according to claim 4, characterized in that: k instructions are respectively extracted from the data dependency graph and scheduled to obtain m ultra-long instruction words in each beat, so that the over-long instruction words in the same beat are There is a parallel execution relationship between instruction words. The instruction in the t-th instruction slot of any very long instruction word in the latter beat of the two adjacent beats is the same as the t+1th instruction of any very long instruction word in the previous beat. There is no dependency between instructions in the instruction slot, including:
在进行第 P拍调度时, P为大于 0的整数, When scheduling the Pth beat, P is an integer greater than 0,
从所述第 2候选指令队列开始,依次将所述第 2候选指令队列至所述第 n+1 候选指令队列中的指令放入前一个候选指令队列中; 从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ; Starting from the second candidate instruction queue, sequentially put the instructions in the second candidate instruction queue to the n+1th candidate instruction queue into the previous candidate instruction queue; Extract the h instructions with the highest priority and meet the time delay and resource requirements from the first candidate instruction queue and put them into the first instruction slot of each ultra-long instruction word, and in each ultra-long instruction word Put a no-operation instruction into the unfilled instruction slot of the first instruction slot, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令; Delete the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队列; 执行下述歩骤: Extract the newly added instruction with an in-degree of zero from the data dependency graph to obtain the second candidate instruction queue; perform the following steps:
执行下述歩骤, q初始化为 2 ; Perform the following steps and initialize q to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟和 资源需求, 0≤ ≤m ; a. Extract h instructions from the first candidate instruction queue to the q-th candidate instruction queue and put them into the q-th instruction slot of each ultra-long instruction word. The h instructions satisfy: and the q-th instruction slot. The instruction in -1 instruction slot has a true dependency and meets both time delay and resource requirements, or, does not have a true dependency with the instruction in the q-1th instruction slot, but has the highest priority and meets both time delay and resource requirements. Latency and resource requirements, 0≤ ≤m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令; b. Place a no-operation instruction into the unfilled instruction slot in the q-th instruction slot of each extremely long instruction word;
c 从所有候选指令队列中删除所述 h个指令; c Delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。 d. Set q=q+l, extract the newly added instructions with zero in-depth in the data dependency graph to obtain the qth candidate instruction queue, repeat steps a to d, until the first candidate instruction queue reaches the qth candidate instruction queue. There are no unscheduled instructions in the q-th candidate instruction queue or the instructions in the n+1-th candidate instruction queue are updated.
6、 根据权利要求 4或 5所述的方法, 其特征在于, 6. The method according to claim 4 or 5, characterized in that,
所述与第 q-1个指令槽中的指令具有真依赖关系且同时满足时间延迟和资 源需求包括: The instructions that have a true dependency relationship with the instruction in the q-1th instruction slot and meet the time delay and resource requirements include:
与第 q-1个指令槽中的指令具有一对一依赖关系且同时满足时间延迟和资 源需求。 It has a one-to-one dependency on the instruction in the q-1th instruction slot and meets both time delay and resource requirements.
7、 一种指令调度装置, 其特征在于, 包括: 7. An instruction scheduling device, characterized in that it includes:
构建单元, 用于构建数据依赖图; Building units, used to build data dependency graphs;
调度单元, 用于分别从所述数据依赖图中提取 k个指令进行调度得到每一 拍的 m个超长指令字, 使得同一拍内的超长指令字之间是并行执行的关系, 相 邻两拍中后一拍的任一超长指令字的第 t个指令槽的指令与前一拍的任一超长 指令字的第 t+1个指令槽的指令之间不存在依赖关系; The scheduling unit is used to extract k instructions from the data dependency graph and schedule them to obtain m ultra-long instruction words in each beat, so that the ultra-long instruction words in the same beat are executed in parallel. There is no dependency between the instruction in the t-th instruction slot of any very long instruction word in the latter two beats and the instruction in the t+1th instruction slot of any very long instruction word in the previous beat;
其中, Q≤A≤mX n, 所述 n表示一个超长指令字中指令槽的个数, 所述 n 为大于等于 1的整数, 所述 m表示每拍中超长指令字的个数, 所述 m为大于等 于 1的整数, 所述 t为大于等于 1小于等于 n-1的整数。 Among them , Q≤A≤m The m is an integer greater than or equal to 1, and the t is an integer greater than or equal to 1 and less than or equal to n-1.
8、根据权利要求 7所述的指令调度装置, 其特征在于, 所述指令调度装置 还包括: 8. The instruction scheduling device according to claim 7, characterized in that, the instruction scheduling device further includes:
执行单元, 用于按照所述超长指令字中各个指令的排列顺序执行所述超长 指令字中的各个指令。 An execution unit, configured to execute each instruction in the ultra-long instruction word according to the order in which the instructions in the ultra-long instruction word are arranged.
9、 根据权利要求 7或 8所述的指令调度装置, 其特征在于, 9. The instruction scheduling device according to claim 7 or 8, characterized in that,
所述指令调度装置还包括: The instruction scheduling device also includes:
建立单元, 用于建立 n+1个候选指令队列, 所述 n+1个候选指令队列分别 为第 1至第 n+1候选指令队列; The establishment unit is used to establish n+1 candidate instruction queues, and the n+1 candidate instruction queues are the 1st to n+1th candidate instruction queues respectively;
初始化单元, 用于初始化所述 n+1个候选指令队列, 使所述 n+1个候选指 令队列均为空。 An initialization unit is used to initialize the n+1 candidate instruction queues so that the n+1 candidate instruction queues are all empty.
10、 根据权利要求 9所述的指令调度装置, 其特征在于, 所述调度单元具 体用于: 10. The instruction scheduling device according to claim 9, characterized in that the scheduling unit is specifically used to:
在进行第 0拍调度时, When scheduling the 0th beat,
在所述数据依赖图中提取当前入度为零的指令得到第 1候选指令队列, 所 述入度为零的指令在所述数据依赖图中的没有前驱结点或者其所有前驱结点已 被调度; Extract the instruction whose current in-degree is zero in the data dependency graph to obtain the first candidate instruction queue. The instruction whose in-degree is zero has no predecessor node in the data dependency graph or all its predecessor nodes have been Scheduling;
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ; Extract the h instructions with the highest priority and meet the time delay and resource requirements from the first candidate instruction queue and put them into the first instruction slot of each ultra-long instruction word. In each of the ultra-long instruction words Put a no-operation instruction into the unfilled instruction slot of the first instruction slot, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令; Delete the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队列; 执行下述歩骤, q初始化为 2 ; a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟和 资源需求, 0≤ ≤m ; Extract the newly added instruction with an in-degree of zero from the data dependency graph to obtain the second candidate instruction queue; perform the following steps and initialize q to 2; a. Extract h instructions from the first candidate instruction queue to the q-th candidate instruction queue and put them into the q-th instruction slot of each ultra-long instruction word. The h instructions satisfy: and the q-th instruction slot. The instruction in -1 instruction slot has a true dependency and meets both time delay and resource requirements, or, does not have a true dependency with the instruction in the q-1th instruction slot, but has the highest priority and meets both time delay and resource requirements. Latency and resource requirements, 0≤ ≤m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令; b. Place a no-operation instruction into the unfilled instruction slot in the q-th instruction slot of each extremely long instruction word;
c 从所有候选指令队列中删除所述 h个指令; c Delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。 d. Set q=q+l, extract the newly added instructions with zero in-depth in the data dependency graph to obtain the qth candidate instruction queue, repeat steps a to d, until the first candidate instruction queue reaches the qth candidate instruction queue. There are no unscheduled instructions in the q-th candidate instruction queue or the instructions in the n+1-th candidate instruction queue are updated.
11、根据权利要求 10所述的指令调度装置, 其特征在于, 所述调度单元具 体用于: 11. The instruction scheduling device according to claim 10, characterized in that the scheduling unit is specifically used to:
在进行第 P拍调度时, P为大于 0的整数, When scheduling the Pth beat, P is an integer greater than 0,
从所述第 2候选指令队列开始,依次将所述第 2候选指令队列至所述第 n+1 候选指令队列中的指令放入前一个候选指令队列中; Starting from the second candidate instruction queue, sequentially put the instructions in the second candidate instruction queue to the n+1th candidate instruction queue into the previous candidate instruction queue;
从所述第 1候选指令队列中提取优先级最高且满足时间延迟和资源需求的 h个指令分别放入每个超长指令字的第 1个指令槽中, 在每个所述超长指令字 的第 1个指令槽中尚未填充的指令槽中放入空操作指令, Q≤^≤m ; Extract the h instructions with the highest priority and meet the time delay and resource requirements from the first candidate instruction queue and put them into the first instruction slot of each ultra-long instruction word, and in each ultra-long instruction word Put a no-operation instruction into the unfilled instruction slot of the first instruction slot, Q≤^≤m;
从所述第 1候选指令队列中删除所述 h个指令; Delete the h instructions from the first candidate instruction queue;
在所述数据依赖图中提取新增的入度为零的指令得到第 2候选指令队列; 执行下述歩骤: Extract the newly added instruction with an in-degree of zero from the data dependency graph to obtain the second candidate instruction queue; perform the following steps:
执行下述歩骤, q初始化为 2; Perform the following steps and initialize q to 2;
a. 从所述第 1候选指令队列至所述第 q候选指令队列中提取 h个指令分别 放入每个超长指令字的第 q个指令槽中, 所述 h个指令满足: 与第 q-1个指令 槽中的指令具有真依赖关系且同时满足时间延迟和资源需求, 或, 不与所述第 q-1 个指令槽中的指令具有真依赖关系, 但优先级最高且同时满足时间延迟和 资源需求, 0≤ ≤m ; a. Extract h instructions from the first candidate instruction queue to the q-th candidate instruction queue and put them into the q-th instruction slot of each ultra-long instruction word. The h instructions satisfy: and the q-th instruction slot. The instruction in -1 instruction slot has a true dependency and meets both time delay and resource requirements, or, does not have a true dependency with the instruction in the q-1th instruction slot, but has the highest priority and meets both time delay and resource requirements. Delay and Resource requirements, 0≤ ≤m;
b. 在每个超长指令字的第 q个指令槽中的未填充的指令槽中放入空操作 指令; b. Place a no-operation instruction into the unfilled instruction slot in the q-th instruction slot of each extremely long instruction word;
c 从所有候选指令队列中删除所述 h个指令; c Delete the h instructions from all candidate instruction queues;
d.使 q=q+l, 在所述数据依赖图中提取新增的入度为零的指令得到第 q候 选指令队列, 重复歩骤 a至 d, 直至所述第 1候选指令队列至所述第 q候选指 令队列中不存在未调度的指令或第 n+1候选指令队列中的指令被更新。 d. Set q=q+l, extract the newly added instructions with zero in-depth in the data dependency graph to obtain the qth candidate instruction queue, repeat steps a to d, until the first candidate instruction queue reaches the qth candidate instruction queue. There are no unscheduled instructions in the q-th candidate instruction queue or the instructions in the n+1-th candidate instruction queue are updated.
12、 根据权利要求 10或 11所述的指令调度装置, 其特征在于, 所述与第 q-1个指令槽中的指令具有的真依赖关系且同时满足时间延迟和 资源需求包括: 12. The instruction scheduling device according to claim 10 or 11, wherein the true dependency relationship with the instruction in the q-1th instruction slot and satisfying both time delay and resource requirements include:
与第 q-1个指令槽中的指令具有一对一依赖关系且同时满足时间延迟和资 源需求。 It has a one-to-one dependency on the instruction in the q-1th instruction slot and meets both time delay and resource requirements.
PCT/CN2014/083603 2013-08-21 2014-08-04 Instruction scheduling method and device WO2015024432A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310367751.2A CN104424026B (en) 2013-08-21 2013-08-21 One kind instruction dispatching method and device
CN201310367751.2 2013-08-21

Publications (1)

Publication Number Publication Date
WO2015024432A1 true WO2015024432A1 (en) 2015-02-26

Family

ID=52483045

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/083603 WO2015024432A1 (en) 2013-08-21 2014-08-04 Instruction scheduling method and device

Country Status (2)

Country Link
CN (1) CN104424026B (en)
WO (1) WO2015024432A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699466B (en) * 2015-03-26 2017-07-18 中国人民解放军国防科学技术大学 A kind of many meta-heuristics towards vliw architecture instruct system of selection
CN104699464B (en) * 2015-03-26 2017-12-26 中国人民解放军国防科学技术大学 A kind of instruction level parallelism dispatching method based on dependence grid
US11275590B2 (en) * 2015-08-26 2022-03-15 Huawei Technologies Co., Ltd. Device and processing architecture for resolving execution pipeline dependencies without requiring no operation instructions in the instruction memory
CN108228242B (en) * 2018-02-06 2020-02-07 江苏华存电子科技有限公司 Configurable and flexible instruction scheduler
CN112579272B (en) * 2020-12-07 2023-11-14 海光信息技术股份有限公司 Micro instruction distribution method, micro instruction distribution device, processor and electronic equipment
CN117827287A (en) * 2022-09-29 2024-04-05 深圳市中兴微电子技术有限公司 Instruction-level parallel scheduling method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7114058B1 (en) * 2001-12-31 2006-09-26 Apple Computer, Inc. Method and apparatus for forming and dispatching instruction groups based on priority comparisons
US20070174599A1 (en) * 1997-08-01 2007-07-26 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems
US20120246448A1 (en) * 2011-03-25 2012-09-27 Soft Machines, Inc. Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN102799418A (en) * 2012-08-07 2012-11-28 清华大学 Processor architecture and instruction execution method integrating sequence and VLIW (Very Long Instruction Word)
CN102880449A (en) * 2012-09-18 2013-01-16 中国科学院声学研究所 Method and system for scheduling delay slot in very-long instruction word structure

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070174599A1 (en) * 1997-08-01 2007-07-26 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems
US7114058B1 (en) * 2001-12-31 2006-09-26 Apple Computer, Inc. Method and apparatus for forming and dispatching instruction groups based on priority comparisons
US20120246448A1 (en) * 2011-03-25 2012-09-27 Soft Machines, Inc. Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN102799418A (en) * 2012-08-07 2012-11-28 清华大学 Processor architecture and instruction execution method integrating sequence and VLIW (Very Long Instruction Word)
CN102880449A (en) * 2012-09-18 2013-01-16 中国科学院声学研究所 Method and system for scheduling delay slot in very-long instruction word structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHI, LEI ET AL.: "Branch Scheduling Optimization on VLIW Processors", COMPUTER ENGINEERING AND APPLICATIONS, vol. 48, no. 21, 31 December 2012 (2012-12-31), pages 41 *

Also Published As

Publication number Publication date
CN104424026A (en) 2015-03-18
CN104424026B (en) 2017-11-17

Similar Documents

Publication Publication Date Title
US11262787B2 (en) Compiler method
US10936008B2 (en) Synchronization in a multi-tile processing array
US20220253399A1 (en) Instruction Set
WO2015024432A1 (en) Instruction scheduling method and device
US10963003B2 (en) Synchronization in a multi-tile processing array
US11416440B2 (en) Controlling timing in computer processing
US10817459B2 (en) Direction indicator
US20220197857A1 (en) Data exchange pathways between pairs of processing units in columns in a computer
WO2022036690A1 (en) Graph computing apparatus, processing method, and related device
Walk et al. Out-of-order execution within functional units of the SCAD architecture
US20200201794A1 (en) Scheduling messages
CN115543448A (en) Dynamic instruction scheduling method on data flow architecture and data flow architecture
Repouskos The Dataflow Computational Model And Its Evolution
Wang et al. Opportunities and challenges in process-algebraic verification of asynchronous circuit designs
Ye et al. Optimizing message passing programs based on task section duplication
Prakash et al. CSE 48: Optimality of Tomasulo's Algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14838079

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14838079

Country of ref document: EP

Kind code of ref document: A1