CN104484160A - Instruction scheduling and register allocation method on optimized clustered VLIW (Very Long Instruction Word) processor - Google Patents

Instruction scheduling and register allocation method on optimized clustered VLIW (Very Long Instruction Word) processor Download PDF

Info

Publication number
CN104484160A
CN104484160A CN201410799189.5A CN201410799189A CN104484160A CN 104484160 A CN104484160 A CN 104484160A CN 201410799189 A CN201410799189 A CN 201410799189A CN 104484160 A CN104484160 A CN 104484160A
Authority
CN
China
Prior art keywords
dag
instruction
register
fundamental block
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410799189.5A
Other languages
Chinese (zh)
Other versions
CN104484160B (en
Inventor
张雪萌
吴辉
孙海燕
王霁
阳柳
郭阳
扈啸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201410799189.5A priority Critical patent/CN104484160B/en
Publication of CN104484160A publication Critical patent/CN104484160A/en
Application granted granted Critical
Publication of CN104484160B publication Critical patent/CN104484160B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses an instruction scheduling and register allocation method on an optimized clustered VLIW (Very Long Instruction Word) processor, which includes two stages: at the first stage, a unified algorithm is used for carrying out instruction scheduling and register allocation for all basic blocks for the first time; at the second stage, according to the lengths of longest paths belonging to the basic blocks and the highest execution efficiency, instruction rescheduling and register reallocation are carried out for the basic blocks where register spilling happens. The method has the advantages of wide application range, good performance optimization effect and the like, and can effectively shorten the longest execution time of programs in a real-time system.

Description

Instruction scheduling on a kind of sub-clustering vliw processor of optimization and register allocation method
Technical field
The present invention is mainly concerned with the optimization field of processor, refers in particular to a kind of instruction scheduling and the register allocation method that are applicable to the optimization of sub-clustering vliw processor.
Background technology
The maximum execution time of program is one of important evidence weighing embedded real time system design, must meet all time restrictions to ensure the correctness of real-time system.The maximum execution time of program has great impact for giving the feasible scheduling of programme distribution.Cause the working time of program different because program operationally may perform different branches, the maximum execution time of program refers in all execution time of program on target platform the longest.If the maximum execution time of program is greater than the time restriction of real-time system, then it cannot be the feasible scheduling of this programme distribution.If the maximum execution time of program can be reduced, then more likely give the scheduling that programme distribution is feasible.Therefore, the maximum execution time minimizing program is an important problem.
Concerning the embedded system of sub-clustering vliw architecture, it is important component part in an Optimizing Compiler that instruction scheduling and register distribute, and has great impact to the maximum execution time of program.Classic method is distributed by register and instruction scheduling separately performs, but execution each stage can cause phase sequence problem separately, and compiled code is optimized not.Sub-clustering improves the extensibility of vliw processor and the effective technology of energy consumption, but sub-clustering vliw processor increases the difficulty of instruction scheduling and register distribution.First, when variable is passed to different bunches time, between the new active region that can dynamically produce, and need the multiple registers on different bunches to preserve the copy of same variable.Depend between the accurate active region of second, variable when its first definition and the dependent instruction of last use are being scheduled, and can not be decided by traditional interval analysis that enlivens for static code.3rd, improperly bunch, command assignment can cause communication between unnecessary bunch, thus increases the scheduling time of fundamental block.
Summary of the invention
The technical problem to be solved in the present invention is just: the technical matters existed for prior art, the invention provides a kind of applied widely, performance optimization is effective, effectively can reduce instruction scheduling on the sub-clustering vliw processor of the optimization of the maximum execution time of real-time system Program and register allocation method.
For solving the problems of the technologies described above, the present invention by the following technical solutions:
Instruction scheduling on the sub-clustering vliw processor of optimization and a register allocation method, comprise two stages: at first stage, uses unified algorithm to carry out first pass instruction scheduling and register distribution to all fundamental blocks; At second stage, the length of the longest path belonging to fundamental block and the highest execution frequency, carry out instruction reschedule and register code reassignment to the fundamental block that there is register spilling.
As a further improvement on the present invention: the step of described first stage is:
(1) the controlling stream graph G that has the right of constructor P; Program P is represented by the controlling stream graph G=(V, E, W) that has the right, wherein V={B 1, B 2..., B n: the fundamental block being program, E={ (B i, B j): B jto B icontrol to rely on, W={w i: w ifundamental block B iexecution time;
(2) according to Unified Algorithm according to contrary postorder to each fundamental block B icarry out instruction scheduling and register distribution.
As a further improvement on the present invention: in described step (2) Unified Algorithm by the register allocation method that increases progressively together with the instruction scheduling methods combining based on priority; Described instruction scheduling method dispatches all fundamental blocks according to contrary postorder, and according to each instruction of instruction priority scheduling in each fundamental block; The priority of each instruction is considered to postpone between instruction and processor resource restriction, and in scheduling process, instruction priority is dynamically updated reduces register pressure.
As a further improvement on the present invention: the step of described subordinate phase is:
(1) each fundamental block B in the controlling stream graph G that has the right is upgraded iweight w i;
(2) acyclic figure DAG (G) is constructed; The controlling stream graph G that will have the right converts acyclic figure DAG (G)=(V ', E ', W ') to, and wherein V '=V is the fundamental block set of P, E '=E – { (B i, B j): (B i, B j) be a back edge, be the set on limit in DAG (G), W '=w ' i: w ' iit is Node B iweights, w ' i=w i* N (B i), w ib iexecution time, N (B i) be B ithe highest execution frequency;
(3) following steps are repeated until the longest path of DAG (G) can not be shortened again;
(3a) longest path of DAG (G) is calculated;
(3b) find in fundamental block longest path with register spilling and perform the highest fundamental block B of frequency k;
(3c) to B kcarry out instruction reschedule and register code reassignment;
(3d) DAG (G) interior joint B is upgraded kweights.
As a further improvement on the present invention: select in described subordinate phase the longest path of DAG (G) performs the highest fundamental block B of frequency at every turn i, the activity periods R of storer is spilt into each jcarry out following steps to reduce spilling:
I, find out minimum activity periods R is affected on longest path k, satisfy condition: R kcycle be greater than R jcycle, by R kregister distribute to R j;
II, add R kflooding code, and all affected instructions of reschedule;
III, to recalculate by R kthe execution time of each fundamental block of impact.
As a further improvement on the present invention: in described step I, in order to find the activity periods R satisfied condition k, undertaken by introducing a new figure DAG (G, k); Described figure DAG (G, k) is the subgraph of DAG (G), and the length in any path of DAG (G, k) is all not more than k+l min, wherein l minthe length of DAG (G) shortest path; The value of k is set to (l max-l min)/2, wherein l maxthe length of DAG (G) longest path; At structure DAG (G, (l max-l min)/2) after, calculate each activity periods R spriority rank as follows: rank (R s)=n1 (R s)/n (R s), wherein n1 (R s) be DAG (G, (l max-l min)/2) in all fundamental blocks to R squote sum, n (R s) be use R in all fundamental blocks ssum; Selected R kit is the activity periods that on longest path, priority is maximum.
Compared with prior art, the invention has the advantages that:
1, the present invention optimize sub-clustering vliw processor on instruction scheduling and register allocation method, execution instruction reschedule and register code reassignment before, not only carry out the instruction scheduling of first pass, also carry out first pass register distribute.Only carry out first pass instruction scheduling than classic method and do not consider the mode that register distributes, the longest path that the present invention obtains more accurately and reliably.
2, the present invention optimize sub-clustering vliw processor on instruction scheduling and register allocation method, in instruction reschedule and register code reassignment process, select longest path performs the highest fundamental block priority processing of frequency, fundamental block longest path with maximum instruction level parallelism is selected to carry out the mode processed than classic method, in view of fundamental block longest path with the highest execution frequency is maximum for the influence power of the length reducing longest path, selection strategy of the present invention is obviously better.
3, the present invention optimize sub-clustering vliw processor on instruction scheduling and register allocation method, when register pressure is very large, the present invention be each command assignment dynamic priority of fundamental block to reduce instruction level parallelism, thus reduce register spilling.Traditional method is not considered to reduce instruction level parallelism, may cause repeatedly register spilling.Register distributes by the present invention and instruction scheduling is integrated in a stage and performs, and can produce the compiled code of performance optimization.
Accompanying drawing explanation
Fig. 1 is schematic flow sheet of the present invention.
Fig. 2 is the present invention controlling stream graph G=(V, E, W) that has the right in a particular application.
Fig. 3 is the present invention's acyclic figure DAG (G) in a particular application=(V ', E ', W ').
Embodiment
Below with reference to Figure of description and specific embodiment, the present invention is described in further details.
As shown in Figure 1, instruction scheduling on the sub-clustering vliw processor of a kind of optimization of the present invention and register allocation method, to minimize the maximum execution time of program for target, it comprises two stages: at first stage, uses unified algorithm to carry out first pass instruction scheduling and register distribution to all fundamental blocks; At second stage, the length of the longest path belonging to fundamental block and the highest execution frequency, carry out instruction reschedule and register code reassignment to the fundamental block that there is register spilling.
During embody rule, detailed process of the present invention is:
(1) have the right controlling stream graph G, the wherein w of constructor P iundefined.
(2) according to Unified Algorithm according to contrary postorder to each fundamental block B icarry out instruction scheduling and register distribution.
(3) each fundamental block B in the controlling stream graph G that has the right is upgraded iweight w i.
(4) acyclic figure DAG (G) is constructed.
(5) following steps are repeated until the longest path of DAG (G) can not be shortened again:
(5a) longest path of DAG (G) is calculated.
(5b) find in fundamental block longest path with register spilling and perform the highest fundamental block B of frequency k.
(5c) to B kcarry out instruction reschedule and register code reassignment.
(5d) DAG (G) interior joint B is upgraded kweights.
In embody rule example, as shown in Figure 2, program P is represented by the controlling stream graph G=(V, E, W) that has the right, wherein V={B 1, B 2..., B n: the fundamental block being program, E={ (B i, B j): B jto B icontrol to rely on, W={w i: w ifundamental block B iexecution time.As shown in Figure 3, G is converted to acyclic figure (Directed Acyclic Graph) DAG (G)=(V ', E ', W '), wherein V '=V is the fundamental block set of P, E '=E – { (B i, B j): (B i, B j) be a back edge, be the set on limit in DAG (G), W '=w ' i: w ' iit is Node B iweights, w ' i=w i* N (B i), w ib iexecution time, N (B i) be B ithe highest execution frequency.
In above-mentioned steps, in the first stage, Unified Algorithm by the register allocation method that increases progressively together with the instruction scheduling methods combining based on priority.This instruction scheduling method dispatches all fundamental blocks according to contrary postorder, and according to each instruction of instruction priority scheduling in each fundamental block.The priority of each instruction considers between instruction and postpones and processor resource restriction.In scheduling process, instruction priority is dynamically updated reduces register pressure.The schedulable instruction the highest to priority, by command assignment to bunch on functional unit, and call the virtual register that physical register is distributed to instruction by the register allocation method increased progressively.Between bunch, command assignment needs to consider the start time of instruction and the register pressure of each bunch.
In above-mentioned steps, the target of subordinate phase is the register spilling minimized by instruction reschedule and register code reassignment on longest path.The longest path of each selection DAG (G) performs the highest fundamental block B of frequency i, the activity periods R of storer is spilt into each jcarry out following steps to reduce spilling:
I, find out minimum activity periods R is affected on longest path k, satisfy condition: R kcycle be greater than R jcycle, by R kregister distribute to R j.
II, add R kflooding code, and all affected instructions of reschedule.
III, to recalculate by R kthe execution time of each fundamental block of impact.
In above-mentioned steps I, in order to find the activity periods R satisfied condition k, undertaken by the figure DAG (G, k) that introducing one is new in the present embodiment.DAG (G, k) is the subgraph of DAG (G), and the length in any path of DAG (G, k) is all not more than k+l min, wherein l minthe length of DAG (G) shortest path.The value of k is set to (l max-l min)/2, wherein l maxthe length of DAG (G) longest path.At structure DAG (G, (l max-l min)/2) after, calculate each activity periods R spriority rank as follows: rank (R s)=n1 (R s)/n (R s), wherein n1 (R s) be DAG (G, (lmax-l min)/2) in all fundamental blocks to R squote sum, n (R s) be use R in all fundamental blocks ssum.R selected by the present invention kit is the activity periods that on longest path, priority is maximum.
Below be only the preferred embodiment of the present invention, protection scope of the present invention be not only confined to above-described embodiment, all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, should be considered as protection scope of the present invention.

Claims (6)

1. the instruction scheduling on the sub-clustering vliw processor optimized and a register allocation method, is characterized in that, comprise two stages: at first stage, uses unified algorithm to carry out first pass instruction scheduling and register distribution to all fundamental blocks; At second stage, the length of the longest path belonging to fundamental block and the highest execution frequency, carry out instruction reschedule and register code reassignment to the fundamental block that there is register spilling.
2. the instruction scheduling on the sub-clustering vliw processor of optimization according to claim 1 and register allocation method, it is characterized in that, the step of described first stage is:
(1) the controlling stream graph G that has the right of constructor P; Program P is represented by the controlling stream graph G=(V, E, W) that has the right, wherein V={B 1, B 2..., B n: the fundamental block being program, E={ (B i, B j): B jto B icontrol to rely on, W={w i: w ifundamental block B iexecution time;
(2) according to Unified Algorithm according to contrary postorder to each fundamental block B icarry out instruction scheduling and register distribution.
3. the instruction scheduling on the sub-clustering vliw processor of optimization according to claim 2 and register allocation method, it is characterized in that, in described step (2) Unified Algorithm by the register allocation method that increases progressively together with the instruction scheduling methods combining based on priority; Described instruction scheduling method dispatches all fundamental blocks according to contrary postorder, and according to each instruction of instruction priority scheduling in each fundamental block; The priority of each instruction is considered to postpone between instruction and processor resource restriction, and in scheduling process, instruction priority is dynamically updated reduces register pressure.
4. the instruction scheduling on the sub-clustering vliw processor of the optimization according to Claims 2 or 3 and register allocation method, is characterized in that, the step of described subordinate phase is:
(1) each fundamental block B in the controlling stream graph G that has the right is upgraded iweight w i;
(2) acyclic figure DAG (G) is constructed; The controlling stream graph G that will have the right converts acyclic figure DAG (G)=(V ', E ', W ') to, and wherein V '=V is the fundamental block set of P, E '=E – { (B i, B j): (B i, B j) be a back edge, be the set on limit in DAG (G), W '=w ' i: w ' iit is Node B iweights, w ' i=w i* N (B i), w ib iexecution time, N (B i) be B ithe highest execution frequency;
(3) following steps are repeated until the longest path of DAG (G) can not be shortened again;
(3a) longest path of DAG (G) is calculated;
(3b) find in fundamental block longest path with register spilling and perform the highest fundamental block B of frequency k;
(3c) to B kcarry out instruction reschedule and register code reassignment;
(3d) DAG (G) interior joint B is upgraded kweights.
5. the instruction scheduling on the sub-clustering vliw processor of optimization according to claim 4 and register allocation method, is characterized in that, select the longest path of DAG (G) to perform the highest fundamental block B of frequency in described subordinate phase at every turn i, the activity periods R of storer is spilt into each jcarry out following steps to reduce spilling:
I, find out minimum activity periods R is affected on longest path k, satisfy condition: R kcycle be greater than R jcycle, by R kregister distribute to R j;
II, add R kflooding code, and all affected instructions of reschedule;
III, to recalculate by R kthe execution time of each fundamental block of impact.
6. the instruction scheduling on the sub-clustering vliw processor of optimization according to claim 5 and register allocation method, is characterized in that, in described step I, in order to find the activity periods R satisfied condition k, undertaken by introducing a new figure DAG (G, k); Described figure DAG (G, k) is the subgraph of DAG (G), and the length in any path of DAG (G, k) is all not more than k+l min, wherein l minthe length of DAG (G) shortest path; The value of k is set to (l max-l min)/2, wherein l maxthe length of DAG (G) longest path; At structure DAG (G, (l max-l min)/2) after, calculate each activity periods R spriority rank as follows: rank (R s)=n1 (R s)/n (R s), wherein n1 (R s) be DAG (G, (l max-l min)/2) in all fundamental blocks to R squote sum, n (R s) be use R in all fundamental blocks ssum; Selected R kit is the activity periods that on longest path, priority is maximum.
CN201410799189.5A 2014-12-19 2014-12-19 Instruction scheduling and register allocation method on a kind of sub-clustering vliw processor of optimization Active CN104484160B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410799189.5A CN104484160B (en) 2014-12-19 2014-12-19 Instruction scheduling and register allocation method on a kind of sub-clustering vliw processor of optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410799189.5A CN104484160B (en) 2014-12-19 2014-12-19 Instruction scheduling and register allocation method on a kind of sub-clustering vliw processor of optimization

Publications (2)

Publication Number Publication Date
CN104484160A true CN104484160A (en) 2015-04-01
CN104484160B CN104484160B (en) 2017-12-26

Family

ID=52758704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410799189.5A Active CN104484160B (en) 2014-12-19 2014-12-19 Instruction scheduling and register allocation method on a kind of sub-clustering vliw processor of optimization

Country Status (1)

Country Link
CN (1) CN104484160B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843660A (en) * 2016-03-21 2016-08-10 同济大学 Code optimization scheduling method for encoder
CN112445481A (en) * 2019-08-27 2021-03-05 无锡江南计算技术研究所 Low-power-consumption register allocation compiling optimization method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101553780A (en) * 2006-12-11 2009-10-07 Nxp股份有限公司 Virtual functional units for VLIW processors
US20120159110A1 (en) * 2010-12-21 2012-06-21 National Tsing Hua University Method for allocating registers for a processor based on cycle information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101553780A (en) * 2006-12-11 2009-10-07 Nxp股份有限公司 Virtual functional units for VLIW processors
US20120159110A1 (en) * 2010-12-21 2012-06-21 National Tsing Hua University Method for allocating registers for a processor based on cycle information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨旭等: "分簇VLIW结构下利用数据依赖图优化调度的研究", 《计算机学报》 *
胡定磊等: "基于超块的统一分簇与模调度", 《计算机研究与发展》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843660A (en) * 2016-03-21 2016-08-10 同济大学 Code optimization scheduling method for encoder
CN105843660B (en) * 2016-03-21 2019-04-02 同济大学 A kind of code optimization dispatching method of compiler
CN112445481A (en) * 2019-08-27 2021-03-05 无锡江南计算技术研究所 Low-power-consumption register allocation compiling optimization method
CN112445481B (en) * 2019-08-27 2022-07-12 无锡江南计算技术研究所 Low-power-consumption register allocation compiling optimization method

Also Published As

Publication number Publication date
CN104484160B (en) 2017-12-26

Similar Documents

Publication Publication Date Title
Sundar et al. Offloading dependent tasks with communication delay and deadline constraint
CN103970609A (en) Cloud data center task scheduling method based on improved ant colony algorithm
CN103699446A (en) Quantum-behaved particle swarm optimization (QPSO) algorithm based multi-objective dynamic workflow scheduling method
KR20120068572A (en) Apparatus and method for compilation of program on multi core system
CN104461748B (en) A kind of optimal localization tasks dispatching method based on MapReduce
CN106293947B (en) GPU-CPU (graphics processing Unit-Central processing Unit) mixed resource allocation system and method in virtualized cloud environment
CN105912387A (en) Method and device for dispatching data processing operation
CN106598717B (en) A kind of method for scheduling task based on time slice
KR20180034626A (en) Compile data processing graph
CN105450684A (en) Cloud computing resource scheduling method and system
JP2018139064A (en) Virtual computer system and resource allocation method thereof
CN114911612A (en) Task scheduling method for CPU-GPU heterogeneous resources
Pascual et al. Effects of topology-aware allocation policies on scheduling performance
CN104484160A (en) Instruction scheduling and register allocation method on optimized clustered VLIW (Very Long Instruction Word) processor
CN104461471B (en) Unified instruction scheduling and register allocation method on sub-clustering vliw processor
Sundar et al. Communication augmented latest possible scheduling for cloud computing with delay constraint and task dependency
Bauer et al. PATS: a performance aware task scheduler for runtime reconfigurable processors
CN111090613B (en) Low-complexity hardware and software partitioning and scheduling method based on graph partitioning
CN101996105A (en) Static software/hardware task dividing and dispatching method for reconfigurable computing platform
Lee et al. On resource efficiency of workflow schedules
US9043582B2 (en) Enhanced instruction scheduling during compilation of high level source code for improved executable code
Melot et al. Improving energy-efficiency of static schedules by core consolidation and switching off unused cores
JP2016532183A (en) Method and system for assigning computational blocks of a software program to a core of a multiprocessor system
CN110990130A (en) Reproducible self-adaptive computation unloading layered service quality optimization method
Wu et al. Latency modeling and minimization for large-scale scientific workflows in distributed network environments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant