CN104484160A - Instruction scheduling and register allocation method on optimized clustered VLIW (Very Long Instruction Word) processor - Google Patents
Instruction scheduling and register allocation method on optimized clustered VLIW (Very Long Instruction Word) processor Download PDFInfo
- Publication number
- CN104484160A CN104484160A CN201410799189.5A CN201410799189A CN104484160A CN 104484160 A CN104484160 A CN 104484160A CN 201410799189 A CN201410799189 A CN 201410799189A CN 104484160 A CN104484160 A CN 104484160A
- Authority
- CN
- China
- Prior art keywords
- dag
- instruction
- register
- basic block
- scheduling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 230000000694 effects Effects 0.000 abstract description 2
- 230000006872 improvement Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Landscapes
- Devices For Executing Special Programs (AREA)
Abstract
本发明公开了一种优化的分簇VLIW处理器上的指令调度和寄存器分配方法,包括两个阶段:在第一个阶段,使用统一的算法对所有基本块进行第一遍指令调度和寄存器分配;在第二个阶段,根据基本块所属的最长路径的长度和最高执行频率,对存在寄存器溢出的基本块进行指令重调度和寄存器重分配。本发明具有适用范围广、性能优化效果好、可有效地减少实时系统中程序的最长执行时间等优点。
The invention discloses an optimized instruction scheduling and register allocation method on a clustered VLIW processor, which includes two stages: in the first stage, a unified algorithm is used to perform the first-pass instruction scheduling and register allocation for all basic blocks ; In the second stage, according to the length of the longest path to which the basic block belongs and the highest execution frequency, perform instruction rescheduling and register reallocation on the basic block with register overflow. The invention has the advantages of wide application range, good performance optimization effect, can effectively reduce the longest execution time of the program in the real-time system, and the like.
Description
技术领域technical field
本发明主要涉及到处理器的编译优化技术领域,特指一种适用于分簇VLIW处理器的优化的指令调度和寄存器分配方法。The invention mainly relates to the technical field of processor compilation optimization, in particular to an optimized instruction scheduling and register allocation method suitable for clustered VLIW processors.
背景技术Background technique
程序的最长执行时间是衡量嵌入式实时系统设计的重要依据之一,必须满足所有的时间限制来保证实时系统的正确性。程序的最长执行时间对于给程序分配可行的调度有重大的影响。由于程序在运行时可能执行不同的分支而导致程序的运行时间不同,程序的最长执行时间是指程序在目标平台上的所有执行时间中最长的。若程序的最长执行时间大于实时系统的时间限制,则无法为该程序分配可行的调度。如果可以降低程序的最长执行时间,则更有可能给程序分配可行的调度。因此,最小化程序的最长执行时间是一个重要的问题。The longest execution time of the program is one of the important bases to measure the design of embedded real-time system, and all time constraints must be met to ensure the correctness of the real-time system. The maximum execution time of a program has a significant impact on assigning a feasible schedule to the program. The running time of the program is different because different branches may be executed during the running of the program, and the longest running time of the program refers to the longest running time of the program on the target platform. A program cannot be assigned a feasible schedule if its maximum execution time is greater than the time limit of the real-time system. If the maximum execution time of a program can be reduced, it is more likely that the program will be assigned a feasible schedule. Therefore, minimizing the maximum execution time of a program is an important issue.
对分簇VLIW体系结构的嵌入式系统来说,指令调度和寄存器分配是一个优化编译器中的重要组成部分,对程序的最长执行时间有极大的影响。传统方法将寄存器分配和指令调度分开执行,然而单独执行每个阶段会导致阶段顺序问题,使得编译代码不够优化。分簇是改进VLIW处理器的可扩展性和能耗的有效技术,然而分簇VLIW处理器加大了指令调度和寄存器分配的难度。首先,当变量被传递到不同簇的时候,会动态地产生的新的活跃区间,并且需要不同簇上的多个寄存器来保存同一个变量的副本。第二,一个变量的精确的活跃区间取决于它的第一个定义和最后一个使用的相关指令在何时被调度,而不能由传统的针对静态代码的活跃区间分析来决定。第三,不当的簇间指令分配会导致不必要的簇间通信,从而增加基本块的调度时间。For embedded systems with sub-cluster VLIW architecture, instruction scheduling and register allocation are important components in an optimizing compiler, which have a great impact on the longest execution time of a program. Traditionally, register allocation and instruction scheduling are performed separately. However, performing each phase separately causes phase order issues, making the compiled code less optimal. Clustering is an effective technique to improve the scalability and energy consumption of VLIW processors. However, clustering VLIW processors increases the difficulty of instruction scheduling and register allocation. First, when variables are passed to different clusters, new active intervals are dynamically generated, and multiple registers on different clusters are required to hold copies of the same variable. Second, the precise live range of a variable depends on when its first definition and last used associated instruction was dispatched, and cannot be determined by traditional live range analysis for static code. Third, improper inter-cluster instruction allocation can lead to unnecessary inter-cluster communication, which increases the scheduling time of basic blocks.
发明内容Contents of the invention
本发明要解决的技术问题就在于:针对现有技术存在的技术问题,本发明提供一种适用范围广、性能优化效果好、可有效地减少实时系统中程序的最长执行时间的优化的分簇VLIW处理器上的指令调度和寄存器分配方法。The technical problem to be solved by the present invention is that: aiming at the technical problems existing in the prior art, the present invention provides an optimized analysis system with wide application range, good performance optimization effect, and effective reduction of the longest execution time of programs in real-time systems. Instruction Scheduling and Register Allocation Methods on Cluster VLIW Processors.
为解决上述技术问题,本发明采用以下技术方案:In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:
一种优化的分簇VLIW处理器上的指令调度和寄存器分配方法,包括两个阶段:在第一个阶段,使用统一的算法对所有基本块进行第一遍指令调度和寄存器分配;在第二个阶段,根据基本块所属的最长路径的长度和最高执行频率,对存在寄存器溢出的基本块进行指令重调度和寄存器重分配。An optimized instruction scheduling and register allocation method on a clustered VLIW processor, including two stages: in the first stage, a unified algorithm is used to perform the first-pass instruction scheduling and register allocation for all basic blocks; in the second stage In the first stage, according to the length of the longest path to which the basic block belongs and the highest execution frequency, instruction rescheduling and register reallocation are performed on the basic block with register overflow.
作为本发明的进一步改进:所述第一阶段的步骤为:As a further improvement of the present invention: the steps in the first stage are:
(1)构造程序P的有权控制流图G;程序P由有权控制流图G=(V,E,W)表示,其中V={B1,B2,…,Bn:是程序的基本块,E={(Bi,Bj):Bj对Bi控制依赖},W={wi:wi是基本块Bi的执行时间};(1) Construct the authorized control flow graph G of the program P; the program P is represented by the authorized control flow graph G=(V,E,W), where V={B 1 ,B 2 ,...,B n : is the program The basic block of , E={(B i , B j ): B j depends on the control of B i }, W={w i : w i is the execution time of basic block B i };
(2)根据统一算法按照相反后序对每个基本块Bi进行指令调度和寄存器分配。(2) Perform instruction scheduling and register allocation for each basic block B i in reverse order according to the unified algorithm.
作为本发明的进一步改进:所述步骤(2)中统一算法将递增的寄存器分配方法与基于优先级的指令调度方法结合在一起;所述指令调度方法按照相反后序调度所有基本块,并且根据每个基本块中的指令优先级调度每个指令;每个指令的优先级考虑指令间延迟及处理器资源限制,在调度过程中,指令优先级被动态更新来减少寄存器压力。As a further improvement of the present invention: in the step (2), the unified algorithm combines the incremental register allocation method with the priority-based instruction scheduling method; the instruction scheduling method schedules all basic blocks according to the reverse order, and according to The instruction priority in each basic block schedules each instruction; the priority of each instruction considers the inter-instruction delay and processor resource constraints. During the scheduling process, the instruction priority is dynamically updated to reduce register pressure.
作为本发明的进一步改进:所述第二阶段的步骤为:As a further improvement of the present invention: the steps in the second stage are:
(1)更新有权控制流图G中每个基本块Bi的权值wi;(1) Update the weight w i of each basic block B i in the control flow graph G;
(2)构造无环图DAG(G);将有权控制流图G转换成无环图DAG(G)=(V’,E’,W’),其中V’=V是P的基本块集合,E’=E–{(Bi,Bj):(Bi,Bj)是一条回边},是DAG(G)中边的集合,W’={w’i:w’i是节点Bi的权值,w’i=wi*N(Bi),wi是Bi的执行时间,N(Bi)是Bi的最高执行频率};(2) Construct an acyclic graph DAG(G); transform the authorized control flow graph G into an acyclic graph DAG(G)=(V',E',W'), where V'=V is the basic block of P Set, E'=E–{(B i ,B j ):(B i ,B j ) is a return edge}, which is the set of edges in DAG(G), W'={w' i :w' i is the weight of node B i , w' i =w i *N(B i ), w i is the execution time of B i , N(B i ) is the highest execution frequency of B i };
(3)重复以下步骤直到DAG(G)的最长路径不能再被缩短;(3) Repeat the following steps until the longest path of DAG(G) can no longer be shortened;
(3a)计算DAG(G)的最长路径;(3a) Calculate the longest path of DAG(G);
(3b)找到最长路径上具有寄存器溢出的基本块中执行频率最高的基本块Bk;(3b) Find the basic block B k with the highest execution frequency among the basic blocks with register overflow on the longest path;
(3c)对Bk进行指令重调度和寄存器重分配;(3c) performing instruction rescheduling and register reallocation on B k ;
(3d)更新DAG(G)中节点Bk的权值。(3d) Update the weight of node B k in DAG(G).
作为本发明的进一步改进:所述第二阶段中每次选择DAG(G)的最长路径上执行频率最高的基本块Bi,对每一个被溢出到存储器的活跃周期Rj进行以下步骤来减少溢出:As a further improvement of the present invention: in the second stage, the basic block B i with the highest execution frequency on the longest path of DAG(G) is selected each time, and the following steps are performed for each active cycle R j overflowed to the memory Reduce overflow:
I、找出对最长路径影响最小的活跃周期Rk,满足条件:Rk的周期大于Rj的周期,将Rk的寄存器分配给Rj;1. Find out the active cycle R k that has the least impact on the longest path, and satisfy the condition: the cycle of R k is greater than the cycle of R j , and the register of R k is assigned to R j ;
II、加入Rk的溢出代码,并且重调度所有受影响的指令;II, adding the overflow code of R k , and rescheduling all affected instructions;
III、重新计算受Rk影响的每个基本块的执行时间。III. Recalculate the execution time of each basic block affected by R k .
作为本发明的进一步改进:在所述步骤I中,为了找到满足条件的活跃周期Rk,通过引入一个新的图DAG(G,k)来进行;所述图DAG(G,k)是DAG(G)的子图,DAG(G,k)的任何路径的长度都不大于k+lmin,其中lmin是DAG(G)最短路径的长度;k的值设为(lmax-lmin)/2,其中lmax是DAG(G)最长路径的长度;在构造DAG(G,(lmax-lmin)/2)之后,计算每个活跃周期Rs的优先级rank如下:rank(Rs)=n1(Rs)/n(Rs),其中n1(Rs)是DAG(G,(lmax-lmin)/2)中所有基本块中对Rs的引用总数,n(Rs)是所有基本块中使用Rs的总数;所选择的Rk是最长路径上优先级最大的活跃周期。As a further improvement of the present invention: in the step I, in order to find the active period R k that satisfies the conditions, it is carried out by introducing a new graph DAG (G, k); the graph DAG (G, k) is DAG (G), the length of any path of DAG(G,k) is not greater than k+l min , where l min is the length of the shortest path of DAG(G); the value of k is set to (l max -l min )/2, where l max is the length of the longest path of DAG(G); after constructing DAG(G,(l max -l min )/2), the priority rank of each active cycle R s is calculated as follows: rank (R s )=n1(R s )/n(R s ), where n1(R s ) is the total number of references to R s in all basic blocks in DAG(G,(l max -l min )/2), n(R s ) is the total number of used R s in all basic blocks; the chosen R k is the active cycle with the highest priority on the longest path.
与现有技术相比,本发明的优点在于:Compared with the prior art, the present invention has the advantages of:
1、本发明优化的分簇VLIW处理器上的指令调度和寄存器分配方法,在执行指令重调度和寄存器重分配之前,不仅进行第一遍的指令调度,也进行第一遍的寄存器分配。较之传统方法只进行第一遍指令调度而不考虑寄存器分配的方式,本发明所获得的最长路径更加准确可靠。1. The method for instruction scheduling and register allocation on the optimized clustered VLIW processor of the present invention, before executing instruction rescheduling and register reallocation, not only performs the first pass of instruction scheduling, but also performs the first pass of register allocation. Compared with the traditional method, which only performs the first-pass instruction scheduling without considering register allocation, the longest path obtained by the present invention is more accurate and reliable.
2、本发明优化的分簇VLIW处理器上的指令调度和寄存器分配方法,在指令重调度和寄存器重分配过程中,选择最长路径上执行频率最高的基本块优先处理,较之传统方法选择最长路径上具有最大指令级并行的基本块进行处理的方式,鉴于最长路径上具有最高执行频率的基本块对于减少最长路径的长度的影响力最大,本发明的选择策略显然更好。2. The instruction scheduling and register allocation method on the clustered VLIW processor optimized by the present invention, in the instruction rescheduling and register redistribution process, the basic block with the highest execution frequency on the longest path is selected for priority processing, compared with the traditional method selection In view of the basic block with the highest execution frequency on the longest path having the greatest influence on reducing the length of the longest path, the selection strategy of the present invention is obviously better.
3、本发明优化的分簇VLIW处理器上的指令调度和寄存器分配方法,当寄存器压力很大时,本发明为基本块的每个指令分配一个动态优先级来降低指令级并行,从而减少寄存器溢出。传统的方法不考虑降低指令级并行,可能引发多次寄存器溢出。本发明将寄存器分配和指令调度集成在一个阶段来执行,可以产生性能优化的编译代码。3. The instruction scheduling and register allocation method on the clustered VLIW processor optimized by the present invention, when the register pressure is very high, the present invention assigns a dynamic priority to each instruction of the basic block to reduce instruction-level parallelism, thereby reducing register overflow. Traditional methods do not consider reducing instruction-level parallelism, which may lead to multiple register overflows. The invention integrates register allocation and instruction scheduling into one stage, and can generate performance-optimized compiled codes.
附图说明Description of drawings
图1是本发明的流程示意图。Fig. 1 is a schematic flow chart of the present invention.
图2是本发明在具体应用中的有权控制流图G=(V,E,W)。Fig. 2 is a right control flow graph G=(V, E, W) of the present invention in a specific application.
图3是本发明在具体应用中的无环图DAG(G)=(V’,E’,W’)。Fig. 3 is an acyclic graph DAG(G)=(V', E', W') of the present invention in a specific application.
具体实施方式Detailed ways
以下将结合说明书附图和具体实施例对本发明做进一步详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
如图1所示,本发明一种优化的分簇VLIW处理器上的指令调度和寄存器分配方法,以最小化程序的最长执行时间为目标,其包括两个阶段:在第一个阶段,使用统一的算法对所有基本块进行第一遍指令调度和寄存器分配;在第二个阶段,根据基本块所属的最长路径的长度和最高执行频率,对存在寄存器溢出的基本块进行指令重调度和寄存器重分配。As shown in Figure 1, the instruction scheduling and register allocation method on a kind of optimized sub-clustering VLIW processor of the present invention, aim at minimizing the longest execution time of the program, it comprises two phases: in the first phase, Use a unified algorithm to perform the first-pass instruction scheduling and register allocation for all basic blocks; in the second stage, according to the length of the longest path and the highest execution frequency to which the basic block belongs, perform instruction rescheduling on the basic blocks with register overflow and register reallocation.
具体应用时,本发明的详细流程为:During specific application, the detailed flow process of the present invention is:
(1)构造程序P的有权控制流图G,其中wi未定义。(1) Construct the privileged control flow graph G of program P, where w i is undefined.
(2)根据统一算法按照相反后序对每个基本块Bi进行指令调度和寄存器分配。(2) Perform instruction scheduling and register allocation for each basic block B i in reverse order according to the unified algorithm.
(3)更新有权控制流图G中每个基本块Bi的权值wi。(3) Update the weight w i of each basic block B i in the control flow graph G.
(4)构造无环图DAG(G)。(4) Construct an acyclic graph DAG(G).
(5)重复以下步骤直到DAG(G)的最长路径不能再被缩短:(5) Repeat the following steps until the longest path of DAG(G) can no longer be shortened:
(5a)计算DAG(G)的最长路径。(5a) Calculate the longest path of DAG(G).
(5b)找到最长路径上具有寄存器溢出的基本块中执行频率最高的基本块Bk。(5b) Find the basic block B k with the highest execution frequency among the basic blocks with register overflow on the longest path.
(5c)对Bk进行指令重调度和寄存器重分配。(5c) Perform instruction rescheduling and register reallocation on B k .
(5d)更新DAG(G)中节点Bk的权值。(5d) Update the weight of node B k in DAG(G).
在具体应用实例中,如图2所示,程序P由有权控制流图G=(V,E,W)表示,其中V={B1,B2,…,Bn:是程序的基本块,E={(Bi,Bj):Bj对Bi控制依赖},W={wi:wi是基本块Bi的执行时间}。如图3所示,将G转换成无环图(Directed Acyclic Graph)DAG(G)=(V’,E’,W’),其中V’=V是P的基本块集合,E’=E–{(Bi,Bj):(Bi,Bj)是一条回边},是DAG(G)中边的集合,W’={w’i:w’i是节点Bi的权值,w’i=wi*N(Bi),wi是Bi的执行时间,N(Bi)是Bi的最高执行频率}。In a specific application example, as shown in Figure 2, a program P is represented by an authorized control flow graph G=(V, E, W), where V={B 1 , B 2 ,...,B n : is the basic Block, E={(B i , B j ): B j controls dependence on B i }, W={w i : w i is the execution time of basic block B i }. As shown in Figure 3, convert G into an acyclic graph (Directed Acyclic Graph) DAG(G)=(V', E', W'), where V'=V is the basic block set of P, E'=E –{(B i ,B j ):(B i ,B j ) is a return edge}, which is the set of edges in DAG(G), W'={w' i :w' i is the weight of node B i value, w' i =w i *N(B i ), where w i is the execution time of B i , and N(B i ) is the highest execution frequency of B i }.
在上述步骤中,在第一阶段,统一算法将递增的寄存器分配方法与基于优先级的指令调度方法结合在一起。该指令调度方法按照相反后序调度所有基本块,并且根据每个基本块中的指令优先级调度每个指令。每个指令的优先级考虑了指令间延迟及处理器资源限制。在调度过程中,指令优先级被动态更新来减少寄存器压力。对优先级最高的可调度指令,将指令分配到簇上的功能单元,并且调用递增的寄存器分配方法将物理寄存器分配给指令的虚拟寄存器。簇间指令分配需要考虑指令的开始时间及每个簇的寄存器压力。In the above steps, in the first stage, the unified algorithm combines the incremental register allocation method with the priority-based instruction scheduling method. The instruction scheduling method schedules all basic blocks in reverse order, and schedules each instruction according to the instruction priority in each basic block. The priority of each instruction takes into account inter-instruction latency and processor resource constraints. During scheduling, instruction priorities are dynamically updated to reduce register pressure. For the schedulable instruction with the highest priority, the instruction is allocated to the functional units on the cluster, and the incremental register allocation method is called to allocate the physical register to the virtual register of the instruction. Inter-cluster instruction allocation needs to consider the start time of the instruction and the register pressure of each cluster.
在上述步骤中,第二阶段的目标是通过指令重调度和寄存器重分配来最小化最长路径上的寄存器溢出。每次选择DAG(G)的最长路径上执行频率最高的基本块Bi,对每一个被溢出到存储器的活跃周期Rj进行以下步骤来减少溢出:In the above steps, the goal of the second stage is to minimize the register overflow on the longest path through instruction rescheduling and register reallocation. Each time the basic block B i with the highest execution frequency on the longest path of DAG(G) is selected, the following steps are performed for each active cycle R j spilled into memory to reduce overflow:
I、找出对最长路径影响最小的活跃周期Rk,满足条件:Rk的周期大于Rj的周期,将Rk的寄存器分配给Rj。I. Find the active cycle R k that has the least impact on the longest path, and satisfy the condition: the cycle of R k is greater than the cycle of R j , and assign the register of R k to R j .
II、加入Rk的溢出代码,并且重调度所有受影响的指令。II. Add the overflow code of R k , and reschedule all affected instructions.
III、重新计算受Rk影响的每个基本块的执行时间。III. Recalculate the execution time of each basic block affected by R k .
在上述步骤I中,为了找到满足条件的活跃周期Rk,本实施例中通过引入一个新的图DAG(G,k)来进行。DAG(G,k)是DAG(G)的子图,DAG(G,k)的任何路径的长度都不大于k+lmin,其中lmin是DAG(G)最短路径的长度。k的值设为(lmax-lmin)/2,其中lmax是DAG(G)最长路径的长度。在构造DAG(G,(lmax-lmin)/2)之后,计算每个活跃周期Rs的优先级rank如下:rank(Rs)=n1(Rs)/n(Rs),其中n1(Rs)是DAG(G,(lmax-lmin)/2)中所有基本块中对Rs的引用总数,n(Rs)是所有基本块中使用Rs的总数。本发明所选择的Rk是最长路径上优先级最大的活跃周期。In the above step I, in order to find the active period R k that satisfies the conditions, a new graph DAG(G,k) is introduced in this embodiment. DAG(G,k) is a subgraph of DAG(G), and the length of any path of DAG(G,k) is not greater than k+l min , where l min is the length of the shortest path of DAG(G). The value of k is set to (l max -l min )/2, where l max is the length of the longest path in DAG(G). After constructing DAG(G,(l max -l min )/2), calculate the priority rank of each active period R s as follows: rank(R s )=n1(R s )/n(R s ), where n1(R s ) is the total number of references to R s in all basic blocks in DAG(G,(lmax-l min )/2), and n(R s ) is the total number of used R s in all basic blocks. The R k selected by the present invention is the active cycle with the highest priority on the longest path.
以上仅是本发明的优选实施方式,本发明的保护范围并不仅局限于上述实施例,凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理前提下的若干改进和润饰,应视为本发明的保护范围。The above are only preferred implementations of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions under the idea of the present invention belong to the protection scope of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principle of the present invention should be regarded as the protection scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410799189.5A CN104484160B (en) | 2014-12-19 | 2014-12-19 | Instruction scheduling and register allocation method on a kind of sub-clustering vliw processor of optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410799189.5A CN104484160B (en) | 2014-12-19 | 2014-12-19 | Instruction scheduling and register allocation method on a kind of sub-clustering vliw processor of optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104484160A true CN104484160A (en) | 2015-04-01 |
CN104484160B CN104484160B (en) | 2017-12-26 |
Family
ID=52758704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410799189.5A Active CN104484160B (en) | 2014-12-19 | 2014-12-19 | Instruction scheduling and register allocation method on a kind of sub-clustering vliw processor of optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104484160B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105843660A (en) * | 2016-03-21 | 2016-08-10 | 同济大学 | Code optimization scheduling method for encoder |
CN112445481A (en) * | 2019-08-27 | 2021-03-05 | 无锡江南计算技术研究所 | Low-power-consumption register allocation compiling optimization method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101553780A (en) * | 2006-12-11 | 2009-10-07 | Nxp股份有限公司 | Virtual functional units for VLIW processors |
US20120159110A1 (en) * | 2010-12-21 | 2012-06-21 | National Tsing Hua University | Method for allocating registers for a processor based on cycle information |
-
2014
- 2014-12-19 CN CN201410799189.5A patent/CN104484160B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101553780A (en) * | 2006-12-11 | 2009-10-07 | Nxp股份有限公司 | Virtual functional units for VLIW processors |
US20120159110A1 (en) * | 2010-12-21 | 2012-06-21 | National Tsing Hua University | Method for allocating registers for a processor based on cycle information |
Non-Patent Citations (2)
Title |
---|
杨旭等: "分簇VLIW结构下利用数据依赖图优化调度的研究", 《计算机学报》 * |
胡定磊等: "基于超块的统一分簇与模调度", 《计算机研究与发展》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105843660A (en) * | 2016-03-21 | 2016-08-10 | 同济大学 | Code optimization scheduling method for encoder |
CN105843660B (en) * | 2016-03-21 | 2019-04-02 | 同济大学 | A Compiler's Code Optimization Scheduling Method |
CN112445481A (en) * | 2019-08-27 | 2021-03-05 | 无锡江南计算技术研究所 | Low-power-consumption register allocation compiling optimization method |
CN112445481B (en) * | 2019-08-27 | 2022-07-12 | 无锡江南计算技术研究所 | Low-power-consumption register allocation compiling optimization method |
Also Published As
Publication number | Publication date |
---|---|
CN104484160B (en) | 2017-12-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101738641B1 (en) | Apparatus and method for compilation of program on multi core system | |
Mashayekhy et al. | Energy-aware scheduling of mapreduce jobs for big data applications | |
CN104461471B (en) | Unified instruction scheduling and register allocation method on sub-clustering vliw processor | |
Verma et al. | Scratchpad sharing strategies for multiprocess embedded systems: A first approach | |
CN103257900B (en) | Real-time task collection method for obligating resource on the multiprocessor that minimizing CPU takies | |
CN102193779A (en) | MPSoC (multi-processor system-on-chip)-oriented multithread scheduling method | |
CN104331331A (en) | Resource distribution method for reconfigurable chip multiprocessor with task number and performance sensing functions | |
CN104484160B (en) | Instruction scheduling and register allocation method on a kind of sub-clustering vliw processor of optimization | |
CN108139929B (en) | Task scheduling apparatus and method for scheduling a plurality of tasks | |
Liu et al. | Task assignment with cache partitioning and locking for wcet minimization on mpsoc | |
WO2019153681A1 (en) | Smart instruction scheduler | |
CN108228242B (en) | Configurable and flexible instruction scheduler | |
CN105430074A (en) | Method and system for cloud data allocation and storage optimization based on data dependency and access amount | |
CN105260165B (en) | The dynamic control method of speculative threads number in thread-level hypothetical system based on SMT | |
Huang et al. | Dynamic allocation/reallocation of dark cores in many-core systems for improved system performance | |
Ansari et al. | Power-aware scheduling of fixed priority tasks in soft real-time multicore systems | |
Hussain et al. | AMMC: Advanced multi-core memory controller | |
Eitschberger et al. | Energy-efficient task scheduling in manycore processors with frequency scaling overhead | |
Tasoulas et al. | Performance and aging aware resource allocation for concurrent GPU applications under process variation | |
Tasoulas et al. | Performance-based and aging-aware resource allocation for concurrent gpu applications | |
De Sensi et al. | State-aware concurrency throttling | |
Cannella et al. | System-level scheduling of real-time streaming applications using a semi-partitioned approach | |
Abd Ishak et al. | Energy-aware task scheduling with precedence and deadline constraints on MPSoCs | |
WO2019153684A1 (en) | Method for automatically managing low-latency instruction scheduler | |
Thomas et al. | Application aware scalable architecture for GPGPU |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |