CN106445678A - Parallelism degree adjustment algorithm for reducing power consumption of instruction-level parallel processor - Google Patents
Parallelism degree adjustment algorithm for reducing power consumption of instruction-level parallel processor Download PDFInfo
- Publication number
- CN106445678A CN106445678A CN201610594829.8A CN201610594829A CN106445678A CN 106445678 A CN106445678 A CN 106445678A CN 201610594829 A CN201610594829 A CN 201610594829A CN 106445678 A CN106445678 A CN 106445678A
- Authority
- CN
- China
- Prior art keywords
- parallelism
- degree
- energy efficiency
- power consumption
- regions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention relates to a parallelism degree adjustment algorithm for reducing power consumption of an instruction-level parallelism (ILP) processor. The algorithm comprises the steps of analyzing hardware resources and parallelism degree demand quantity information of parts of an input application program by utilizing a compiler, and obtaining a control flow graph (CFG) and loop hierarchy trees (LHTs) of the application program; dividing the application program: cutting the application program into different regions; setting executive parallelism degrees of the regions according to the hardware resources, the parallelism degree demand quantities and energy efficiency of the regions, so that the parallelism degree can be adjusted and changed according to the demands of executive time in the whole program execution process; and re-scheduling the program by the compiler, scheduling the regions by using the set parallelism degrees, inserting a power gating instruction before the regions are executed, and turning off idle hardware resources to reduce electric leakage power consumption. According to the algorithm, the influence of electric leakage power consumption on working performance of the processor can be reduced and the utilization rate of the hardware resources in the ILP processor can be increased.
Description
Art
The invention belongs to Computer Architecture design field, is related to a kind of low power dissipation design optimized algorithm.
Background technology
Recent years, smart mobile phone, the electronic product such as wearable smart machine is developed rapidly, and almost everyone is owned by
One electronic equipment, although the appearance of these electronic equipments facilitates the life of people, executes speed and can meet most of use
The use demand at family, but power problemses are particularly pertinent, often affect the performance of equipment, reliability and operating time;Special
Not in the Internet+epoch, for computer, the performance requirement more and more higher of intelligent machine, the energy efficiency of equipment is improved increasingly
There is realistic meaning.
It is instruction level parallelism (Instruction Level that these electronic equipments execute the fast main cause of speed
Parallelism, ILP) processor structure application, the structure allows a plurality of instruction while executing.Traditional ILP process
Device always executes different application formula with maximum degree of parallelism, i.e., all hardware resource is opened simultaneously, but different application journeys
When formula is operated on ILP processor, for hardware resource (such as:Functional unit and depositor) demand have very big difference,
Even in same application, the demand difference of different part for hardware resource during execution is also very bright
Aobvious, if all executed with maximum degree of parallelism using traditional processor of ILP from start to finish, then for hardware resource requirements amount
The formula of very little will necessarily cause the extra wasting of resources, and these hardware resources in idle condition will necessarily cause extra
Electricity leakage power dissipation;When the characteristic size of particularly transistor enters deep-submicron, electricity leakage power dissipation is sharply increased even exceed and is moved
State power consumption, the so big electricity leakage power dissipation that in ILP processor, these hardware resources in idle condition are produced will necessarily reduce
The performance of processor work, and then affect the reliability of equipment.
In order to reduce electricity leakage power dissipation, power gating (power gating) technology is widely used as current switch, i.e.,
Reduce electricity leakage power dissipation by idle equipment is turned off.But current research is all to be confined to whole formula most holding soon
Under the premise of scanning frequency degree, those are turned off in idle equipment to reduce electricity leakage power dissipation, user is not for speed sometimes
There is so high expectation, power problemses more cause their attention on the contrary, so design one kind can be according to user to executing
The requirement of time carrys out the algorithm that " automatic " adjustment formula executes degree of parallelism, and then reduction power consumption has realistic meaning very much.
The AFReP algorithm that Tabkhi was proposed in 2014, simply inserts power in application is executed before each function
Gating is instructed, and be which results in hardware device and is frequently opened, i.e., often enter a function and all change a hardware configuration, and hardware sets
Standby frequent unlatching, causes to produce very big extra power consumption, the energy-saving effect of net assessment AFReP, although reduce leakage loss
Can, but very big extra power consumption is but produced, effect is bad;And AFReP algorithm can only be dropped when executing fastest
Low-power consumption, can not need adjustment degree of parallelism according to user, reach the effect of a kind of " automatic ", and this is that of the algorithm lacks
Fall into.
Content of the invention
The purpose of the present invention is the above-mentioned deficiency for overcoming prior art, provide a kind of can be according to user to executing the time
Require to carry out the algorithm that " automatically " adjustment formula executes degree of parallelism, and then reduce the hardware in ILP processor in idle condition
The optimized algorithm of the electricity leakage power dissipation that resource is produced, to reduce impact of the electricity leakage power dissipation to processor service behaviour, the present invention can be carried
The utilization rate of hardware resource in high ILP processor, and improve the reliability of equipment with ILP processor as core.The present invention
Technical scheme as follows:
A kind of degree of parallelism adjustment algorithm for reducing instruction level parallel processing device power consumption, comprises the following steps:
(1) using hardware resource and the degree of parallelism demand information of the application various pieces of compiler analysis input,
And obtain the controlling stream graph CFG and circulating level tree LHTs of the application;
(2) application is divided, application is cut into different regions;
(3) according to the demand and energy efficiency of regional hardware resource and degree of parallelism, holding for regional is set
Row degree of parallelism, makes in whole formula implementation procedure, according to the adjustment of the demand of the time of execution and can change degree of parallelism, and method is such as
Under:
The first step, arranges the initial degree of parallelism of regional:According to execution time TdlFind and execute degree of parallelism piSo that execute
Time is less than Tdl, and arrange regional initial degree of parallelism be pi;
Second step, finds the minimum region of energy efficiency, allows its degree of parallelism subtract 1, and energy efficiency is defined as:As region Ri's
Degree of parallelism from p increase to q when, energy efficiency EE (Ri, p, q) and it is the execution time of saving and the ratio of increased energy;
3rd step, finds energy efficiency highest region in all regions, allows its degree of parallelism add 1, compensates the execution time
Loss.
(4) compiler reschedules this formula, and each region is scheduling with the degree of parallelism for setting, and in each area
Before domain executes, insertion power gating is instructed, and idle hardware resource is turned off, reduces electricity leakage power dissipation.
The method is on the premise of the expected execution time is met, and " automatically " adjusts degree of parallelism according to the time of execution, and then
Reduce electricity leakage power dissipation so that a different program segment of formula is executed using different degree of parallelisms and hardware configuration, is finally caused
ILP processor electricity leakage power dissipation reduces, and improves energy efficiency.
Description of the drawings
Fig. 1 compiler workflow (technical scheme)
The relation of Fig. 2 power consumption and degree of parallelism
Fig. 3 degree of parallelism assignment model
The degree of parallelism execution pattern of formula after Fig. 4 algorithm optimization
Specific embodiment
The present invention, using dependent compilation device optimized algorithm, analyzes application and executes the time, hsrdware requirements amount, and simultaneously
The demand of row degree, adjusts the degree of parallelism of regional so that ILP processor, can basis when different application formula is executed
The restriction of execution time, " automatically " adjusts degree of parallelism, finally causes energy efficiency highest, and electricity leakage power dissipation is minimum.
Fig. 1 illustrates compiler workflow in whole scheme, and main technical schemes are as follows:
(1) letter such as the hardware resource of the application various pieces being input into using compiler analysis and degree of parallelism demand
Breath, and obtain the controlling stream graph (Control Flow Graph, CFG) of the application, circulating level tree (Loop
Hierarchy Trees,LHTs);
(2) application is divided, application is cut into different regions;
(3) according to the demand and energy efficiency of regional hardware resource and degree of parallelism, holding for regional is set
Row degree of parallelism, makes in whole formula implementation procedure, can be according to the demand of the time of execution, and " automatically " adjustment and change are parallel
Degree, compared with unified degree of parallelism pattern is all taken from start to finish, hardware resource utilization and energy efficiency are improved, and are met
The requirement of execution time;It is broadly divided into following three step:
The first step, arranges the initial degree of parallelism of regional.According to execution time TdlFind and execute degree of parallelism piSo that execute
Time is less than Tdl, and arrange regional initial degree of parallelism be pi;
Second step, finds the minimum region of energy efficiency, allows its degree of parallelism subtract 1.Energy efficiency is defined as:As region Ri's
Degree of parallelism from p increase to q when, energy efficiency EE (Ri, p, q) and it is the execution time of saving and the ratio of increased energy.
(wherein, T (Ri, p) with T (Ri, q) it is respectively region RiThe execution time when degree of parallelism is for p and q, E (Ri, p) and
E(Ri, q) it is respectively region RiThe energy for consuming when degree of parallelism is for p and q)
3rd step, compensates the loss for executing the time in previous step.Energy efficiency highest region in all regions is found, is allowed
Its degree of parallelism adds 1, compensates the loss of execution time.
(4) compiler reschedules this formula, and each region is scheduling with the degree of parallelism for setting, and in each area
Before domain executes, insertion power gating is instructed, and idle hardware resource is turned off, reduces electricity leakage power dissipation.
With reference to embodiment, the present invention will be described.
(1) application is input into, compiler converts it into the bytecode of intermediate language first, then using compiler
Optimization tool, extracts the controlling stream graph (control flow graph, CFG) of this section of application, and analyzes this section of journey
Execution time and demand for hardware resource of the formula under each degree of parallelism.
(2) application being divided, can divide by circulation in formula, it is also possible to several are divided into by the function in formula
Block.
(3) setting area degree of parallelism.As the power consumption in each region is with the increase of degree of parallelism, can all present shown in Fig. 2
Trend, as degree of parallelism increases, power consumption is continuously increased, it is possible to by appropriate reduction degree of parallelism, and makes more hard
The part equipment free time gets off, and is turned off using power gating technology, reduces electric leakage power consumption.
Table 1 gives the parameter of needs, and the main purpose in the stage is to find a solution(its
In, RiFor the region that certain divides, piFor the degree of parallelism for arranging, m is the number in region) so that final execution time TEX(S) full
Sufficient Tdl, and the ENERGY E for consumingEX(S) minimum, Fig. 3 is the setting degree of parallelism model of finally whole formula.Formula (1) and formula
(2) T sets forthEXAnd E (S)EX(S) computational methods:
1 parameter declaration of table
The false code for arranging the algorithm of degree of parallelism is algorithm1 and algorithm2, and the algorithm is opened for a kind of greed formula
The method of sending out, is broadly divided into following three step:
The first step, in order to meet execution time Tdl, first S is initialized, that is, finds a unified degree of parallelism pi, each is set
Region RiIt is all this unified degree of parallelism,;
Second step, in order to obtain highest energy efficiency, first selects the region R of now minimum energy efficiencya, and it
Degree of parallelism paReduce 1;
3rd step, the reduction of degree of parallelism in second step inevitably results in the increase of execution time, in order to compensate the execution time
Loss, select region R with highest energy efficiencyk, and the degree of parallelism p itkIncrease by 1;
(4) after setting degree of parallelism, with compiler, this formula is rescheduled, in each region that application is divided
Before executing, insertion power gating instruction, controls being switched on and off for hardware device, reduces electric leakage power consumption, improves the utilization of resources
Rate.Fig. 4 is the execution degree of parallelism after algorithm optimization, in whole formula implementation procedure, it can be seen that degree of parallelism is held in formula
It is continually changing in the process of row.
Claims (1)
1. a kind of degree of parallelism adjustment algorithm for reducing instruction level parallel processing device power consumption, comprises the following steps:
(1) using hardware resource and the degree of parallelism demand information of the application various pieces of compiler analysis input, and
Obtain the controlling stream graph CFG and circulating level tree LHTs of the application;
(2) application is divided, application is cut into different regions;
(3) according to the demand and energy efficiency of regional hardware resource and degree of parallelism, the execution of regional is set simultaneously
Row degree, makes in whole formula implementation procedure, according to the adjustment of the demand of the time of execution and can change degree of parallelism, and method is as follows:
The first step, arranges the initial degree of parallelism of regional:According to execution time TdlFind and execute degree of parallelism piSo that execute the time
Less than Tdl, and arrange regional initial degree of parallelism be pi;
Second step, finds the minimum region of energy efficiency, allows its degree of parallelism subtract 1, and energy efficiency is defined as:As region RiParallel
Degree from p increase to q when, energy efficiency EE (Ri, p, q) and it is the execution time of saving and the ratio of increased energy;
3rd step, finds energy efficiency highest region in all regions, allows its degree of parallelism add 1, compensates the loss of execution time.
(4) compiler reschedules this formula, and each region is scheduling with the degree of parallelism for setting, and holds in each region
Before row, insertion power gating instruction, turns off idle hardware resource, reduces electricity leakage power dissipation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610594829.8A CN106445678B (en) | 2016-07-21 | 2016-07-21 | Parallelism adjusting algorithm for reducing power consumption of instruction level parallel processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610594829.8A CN106445678B (en) | 2016-07-21 | 2016-07-21 | Parallelism adjusting algorithm for reducing power consumption of instruction level parallel processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106445678A true CN106445678A (en) | 2017-02-22 |
CN106445678B CN106445678B (en) | 2020-02-07 |
Family
ID=58185121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610594829.8A Expired - Fee Related CN106445678B (en) | 2016-07-21 | 2016-07-21 | Parallelism adjusting algorithm for reducing power consumption of instruction level parallel processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106445678B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101441564A (en) * | 2008-12-04 | 2009-05-27 | 浙江大学 | Method for implementing reconfigurable accelerator custom-built for program |
CN101894383A (en) * | 2010-06-11 | 2010-11-24 | 四川大学 | Method for accelerating ray-traced digital image rebuilding technology |
CN102004719A (en) * | 2010-11-16 | 2011-04-06 | 清华大学 | Very long instruction word processor structure supporting simultaneous multithreading |
CN102830954A (en) * | 2012-08-24 | 2012-12-19 | 北京中科信芯科技有限责任公司 | Method and device for instruction scheduling |
US20140337853A1 (en) * | 2013-05-08 | 2014-11-13 | Wisconsin Alumni Research Foundation | Resource And Core Scaling For Improving Performance Of Power-Constrained Multi-Core Processors |
-
2016
- 2016-07-21 CN CN201610594829.8A patent/CN106445678B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101441564A (en) * | 2008-12-04 | 2009-05-27 | 浙江大学 | Method for implementing reconfigurable accelerator custom-built for program |
CN101894383A (en) * | 2010-06-11 | 2010-11-24 | 四川大学 | Method for accelerating ray-traced digital image rebuilding technology |
CN102004719A (en) * | 2010-11-16 | 2011-04-06 | 清华大学 | Very long instruction word processor structure supporting simultaneous multithreading |
CN102830954A (en) * | 2012-08-24 | 2012-12-19 | 北京中科信芯科技有限责任公司 | Method and device for instruction scheduling |
US20140337853A1 (en) * | 2013-05-08 | 2014-11-13 | Wisconsin Alumni Research Foundation | Resource And Core Scaling For Improving Performance Of Power-Constrained Multi-Core Processors |
Also Published As
Publication number | Publication date |
---|---|
CN106445678B (en) | 2020-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Energy-aware runtime scheduling for embedded-multiprocessor SOCs | |
Attia et al. | Dynamic power management techniques in multi-core architectures: A survey study | |
Wang et al. | Kernel fusion: An effective method for better power efficiency on multithreaded GPU | |
Yang et al. | An approximation algorithm for energy-efficient scheduling on a chip multiprocessor | |
US10078357B2 (en) | Power gating functional units of a processor | |
Xu et al. | Energy-aware scheduling for streaming applications on chip multiprocessors | |
Zidenberg et al. | Multiamdahl: How should i divide my heterogenous chip? | |
CN103902016A (en) | Server power consumption management method oriented to scene prediction | |
Chen et al. | Profit: priority and power/performance optimization for many-core systems | |
CN103729241B (en) | A kind of optimization method of OpenMP task parallelism under multi-core environment | |
Schmitz et al. | Cosynthesis of energy-efficient multimode embedded systems with consideration of mode-execution probabilities | |
Albers | On energy conservation in data centers | |
Song et al. | An efficient scheduling algorithm for energy consumption constrained parallel applications on heterogeneous distributed systems | |
Liu et al. | Overhead-aware system-level joint energy and performance optimization for streaming applications on multiprocessor systems-on-chip | |
CN104484008B (en) | A kind of chip low-power consumption treatment method and device | |
Kaur et al. | Towards energy efficient scheduling with DVFS for precedence constrained tasks on heterogeneous cluster system | |
Mezmaz et al. | A bi-objective hybrid genetic algorithm to minimize energy consumption and makespan for precedence-constrained applications using dynamic voltage scaling | |
Lee et al. | On effective slack reclamation in task scheduling for energy reduction | |
CN106445678A (en) | Parallelism degree adjustment algorithm for reducing power consumption of instruction-level parallel processor | |
CN105929928B (en) | A kind of instruction level parallel processing device low power dissipation design optimization method | |
Shin et al. | Optimizing intratask voltage scheduling using profile and data-flow information | |
Kumar et al. | Energy efficient task scheduling for parallel workflows in cloud environment | |
CN106598716A (en) | Task scheduling method based on multiple processors | |
Mezmaz et al. | A parallel island-based hybrid genetic algorithm for precedence-constrained applications to minimize energy consumption and makespan | |
CN101290509A (en) | Embedded system low-power consumption real time task scheduling method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200207 Termination date: 20200721 |