CN106445678A - Parallelism degree adjustment algorithm for reducing power consumption of instruction-level parallel processor - Google Patents

Parallelism degree adjustment algorithm for reducing power consumption of instruction-level parallel processor Download PDF

Info

Publication number
CN106445678A
CN106445678A CN201610594829.8A CN201610594829A CN106445678A CN 106445678 A CN106445678 A CN 106445678A CN 201610594829 A CN201610594829 A CN 201610594829A CN 106445678 A CN106445678 A CN 106445678A
Authority
CN
China
Prior art keywords
parallelism
degree
energy efficiency
power consumption
regions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610594829.8A
Other languages
Chinese (zh)
Other versions
CN106445678B (en
Inventor
梁煜
佟玉凤
张为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201610594829.8A priority Critical patent/CN106445678B/en
Publication of CN106445678A publication Critical patent/CN106445678A/en
Application granted granted Critical
Publication of CN106445678B publication Critical patent/CN106445678B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention relates to a parallelism degree adjustment algorithm for reducing power consumption of an instruction-level parallelism (ILP) processor. The algorithm comprises the steps of analyzing hardware resources and parallelism degree demand quantity information of parts of an input application program by utilizing a compiler, and obtaining a control flow graph (CFG) and loop hierarchy trees (LHTs) of the application program; dividing the application program: cutting the application program into different regions; setting executive parallelism degrees of the regions according to the hardware resources, the parallelism degree demand quantities and energy efficiency of the regions, so that the parallelism degree can be adjusted and changed according to the demands of executive time in the whole program execution process; and re-scheduling the program by the compiler, scheduling the regions by using the set parallelism degrees, inserting a power gating instruction before the regions are executed, and turning off idle hardware resources to reduce electric leakage power consumption. According to the algorithm, the influence of electric leakage power consumption on working performance of the processor can be reduced and the utilization rate of the hardware resources in the ILP processor can be increased.

Description

A kind of degree of parallelism adjustment algorithm for reducing instruction level parallel processing device power consumption
Art
The invention belongs to Computer Architecture design field, is related to a kind of low power dissipation design optimized algorithm.
Background technology
Recent years, smart mobile phone, the electronic product such as wearable smart machine is developed rapidly, and almost everyone is owned by One electronic equipment, although the appearance of these electronic equipments facilitates the life of people, executes speed and can meet most of use The use demand at family, but power problemses are particularly pertinent, often affect the performance of equipment, reliability and operating time;Special Not in the Internet+epoch, for computer, the performance requirement more and more higher of intelligent machine, the energy efficiency of equipment is improved increasingly There is realistic meaning.
It is instruction level parallelism (Instruction Level that these electronic equipments execute the fast main cause of speed Parallelism, ILP) processor structure application, the structure allows a plurality of instruction while executing.Traditional ILP process Device always executes different application formula with maximum degree of parallelism, i.e., all hardware resource is opened simultaneously, but different application journeys When formula is operated on ILP processor, for hardware resource (such as:Functional unit and depositor) demand have very big difference, Even in same application, the demand difference of different part for hardware resource during execution is also very bright Aobvious, if all executed with maximum degree of parallelism using traditional processor of ILP from start to finish, then for hardware resource requirements amount The formula of very little will necessarily cause the extra wasting of resources, and these hardware resources in idle condition will necessarily cause extra Electricity leakage power dissipation;When the characteristic size of particularly transistor enters deep-submicron, electricity leakage power dissipation is sharply increased even exceed and is moved State power consumption, the so big electricity leakage power dissipation that in ILP processor, these hardware resources in idle condition are produced will necessarily reduce The performance of processor work, and then affect the reliability of equipment.
In order to reduce electricity leakage power dissipation, power gating (power gating) technology is widely used as current switch, i.e., Reduce electricity leakage power dissipation by idle equipment is turned off.But current research is all to be confined to whole formula most holding soon Under the premise of scanning frequency degree, those are turned off in idle equipment to reduce electricity leakage power dissipation, user is not for speed sometimes There is so high expectation, power problemses more cause their attention on the contrary, so design one kind can be according to user to executing The requirement of time carrys out the algorithm that " automatic " adjustment formula executes degree of parallelism, and then reduction power consumption has realistic meaning very much.
The AFReP algorithm that Tabkhi was proposed in 2014, simply inserts power in application is executed before each function Gating is instructed, and be which results in hardware device and is frequently opened, i.e., often enter a function and all change a hardware configuration, and hardware sets Standby frequent unlatching, causes to produce very big extra power consumption, the energy-saving effect of net assessment AFReP, although reduce leakage loss Can, but very big extra power consumption is but produced, effect is bad;And AFReP algorithm can only be dropped when executing fastest Low-power consumption, can not need adjustment degree of parallelism according to user, reach the effect of a kind of " automatic ", and this is that of the algorithm lacks Fall into.
Content of the invention
The purpose of the present invention is the above-mentioned deficiency for overcoming prior art, provide a kind of can be according to user to executing the time Require to carry out the algorithm that " automatically " adjustment formula executes degree of parallelism, and then reduce the hardware in ILP processor in idle condition The optimized algorithm of the electricity leakage power dissipation that resource is produced, to reduce impact of the electricity leakage power dissipation to processor service behaviour, the present invention can be carried The utilization rate of hardware resource in high ILP processor, and improve the reliability of equipment with ILP processor as core.The present invention Technical scheme as follows:
A kind of degree of parallelism adjustment algorithm for reducing instruction level parallel processing device power consumption, comprises the following steps:
(1) using hardware resource and the degree of parallelism demand information of the application various pieces of compiler analysis input, And obtain the controlling stream graph CFG and circulating level tree LHTs of the application;
(2) application is divided, application is cut into different regions;
(3) according to the demand and energy efficiency of regional hardware resource and degree of parallelism, holding for regional is set Row degree of parallelism, makes in whole formula implementation procedure, according to the adjustment of the demand of the time of execution and can change degree of parallelism, and method is such as Under:
The first step, arranges the initial degree of parallelism of regional:According to execution time TdlFind and execute degree of parallelism piSo that execute Time is less than Tdl, and arrange regional initial degree of parallelism be pi
Second step, finds the minimum region of energy efficiency, allows its degree of parallelism subtract 1, and energy efficiency is defined as:As region Ri's Degree of parallelism from p increase to q when, energy efficiency EE (Ri, p, q) and it is the execution time of saving and the ratio of increased energy;
3rd step, finds energy efficiency highest region in all regions, allows its degree of parallelism add 1, compensates the execution time Loss.
(4) compiler reschedules this formula, and each region is scheduling with the degree of parallelism for setting, and in each area Before domain executes, insertion power gating is instructed, and idle hardware resource is turned off, reduces electricity leakage power dissipation.
The method is on the premise of the expected execution time is met, and " automatically " adjusts degree of parallelism according to the time of execution, and then Reduce electricity leakage power dissipation so that a different program segment of formula is executed using different degree of parallelisms and hardware configuration, is finally caused ILP processor electricity leakage power dissipation reduces, and improves energy efficiency.
Description of the drawings
Fig. 1 compiler workflow (technical scheme)
The relation of Fig. 2 power consumption and degree of parallelism
Fig. 3 degree of parallelism assignment model
The degree of parallelism execution pattern of formula after Fig. 4 algorithm optimization
Specific embodiment
The present invention, using dependent compilation device optimized algorithm, analyzes application and executes the time, hsrdware requirements amount, and simultaneously The demand of row degree, adjusts the degree of parallelism of regional so that ILP processor, can basis when different application formula is executed The restriction of execution time, " automatically " adjusts degree of parallelism, finally causes energy efficiency highest, and electricity leakage power dissipation is minimum.
Fig. 1 illustrates compiler workflow in whole scheme, and main technical schemes are as follows:
(1) letter such as the hardware resource of the application various pieces being input into using compiler analysis and degree of parallelism demand Breath, and obtain the controlling stream graph (Control Flow Graph, CFG) of the application, circulating level tree (Loop Hierarchy Trees,LHTs);
(2) application is divided, application is cut into different regions;
(3) according to the demand and energy efficiency of regional hardware resource and degree of parallelism, holding for regional is set Row degree of parallelism, makes in whole formula implementation procedure, can be according to the demand of the time of execution, and " automatically " adjustment and change are parallel Degree, compared with unified degree of parallelism pattern is all taken from start to finish, hardware resource utilization and energy efficiency are improved, and are met The requirement of execution time;It is broadly divided into following three step:
The first step, arranges the initial degree of parallelism of regional.According to execution time TdlFind and execute degree of parallelism piSo that execute Time is less than Tdl, and arrange regional initial degree of parallelism be pi
Second step, finds the minimum region of energy efficiency, allows its degree of parallelism subtract 1.Energy efficiency is defined as:As region Ri's Degree of parallelism from p increase to q when, energy efficiency EE (Ri, p, q) and it is the execution time of saving and the ratio of increased energy.
(wherein, T (Ri, p) with T (Ri, q) it is respectively region RiThe execution time when degree of parallelism is for p and q, E (Ri, p) and E(Ri, q) it is respectively region RiThe energy for consuming when degree of parallelism is for p and q)
3rd step, compensates the loss for executing the time in previous step.Energy efficiency highest region in all regions is found, is allowed Its degree of parallelism adds 1, compensates the loss of execution time.
(4) compiler reschedules this formula, and each region is scheduling with the degree of parallelism for setting, and in each area Before domain executes, insertion power gating is instructed, and idle hardware resource is turned off, reduces electricity leakage power dissipation.
With reference to embodiment, the present invention will be described.
(1) application is input into, compiler converts it into the bytecode of intermediate language first, then using compiler Optimization tool, extracts the controlling stream graph (control flow graph, CFG) of this section of application, and analyzes this section of journey Execution time and demand for hardware resource of the formula under each degree of parallelism.
(2) application being divided, can divide by circulation in formula, it is also possible to several are divided into by the function in formula Block.
(3) setting area degree of parallelism.As the power consumption in each region is with the increase of degree of parallelism, can all present shown in Fig. 2 Trend, as degree of parallelism increases, power consumption is continuously increased, it is possible to by appropriate reduction degree of parallelism, and makes more hard The part equipment free time gets off, and is turned off using power gating technology, reduces electric leakage power consumption.
Table 1 gives the parameter of needs, and the main purpose in the stage is to find a solution(its In, RiFor the region that certain divides, piFor the degree of parallelism for arranging, m is the number in region) so that final execution time TEX(S) full Sufficient Tdl, and the ENERGY E for consumingEX(S) minimum, Fig. 3 is the setting degree of parallelism model of finally whole formula.Formula (1) and formula (2) T sets forthEXAnd E (S)EX(S) computational methods:
1 parameter declaration of table
The false code for arranging the algorithm of degree of parallelism is algorithm1 and algorithm2, and the algorithm is opened for a kind of greed formula The method of sending out, is broadly divided into following three step:
The first step, in order to meet execution time Tdl, first S is initialized, that is, finds a unified degree of parallelism pi, each is set Region RiIt is all this unified degree of parallelism,;
Second step, in order to obtain highest energy efficiency, first selects the region R of now minimum energy efficiencya, and it Degree of parallelism paReduce 1;
3rd step, the reduction of degree of parallelism in second step inevitably results in the increase of execution time, in order to compensate the execution time Loss, select region R with highest energy efficiencyk, and the degree of parallelism p itkIncrease by 1;
(4) after setting degree of parallelism, with compiler, this formula is rescheduled, in each region that application is divided Before executing, insertion power gating instruction, controls being switched on and off for hardware device, reduces electric leakage power consumption, improves the utilization of resources Rate.Fig. 4 is the execution degree of parallelism after algorithm optimization, in whole formula implementation procedure, it can be seen that degree of parallelism is held in formula It is continually changing in the process of row.

Claims (1)

1. a kind of degree of parallelism adjustment algorithm for reducing instruction level parallel processing device power consumption, comprises the following steps:
(1) using hardware resource and the degree of parallelism demand information of the application various pieces of compiler analysis input, and Obtain the controlling stream graph CFG and circulating level tree LHTs of the application;
(2) application is divided, application is cut into different regions;
(3) according to the demand and energy efficiency of regional hardware resource and degree of parallelism, the execution of regional is set simultaneously Row degree, makes in whole formula implementation procedure, according to the adjustment of the demand of the time of execution and can change degree of parallelism, and method is as follows:
The first step, arranges the initial degree of parallelism of regional:According to execution time TdlFind and execute degree of parallelism piSo that execute the time Less than Tdl, and arrange regional initial degree of parallelism be pi
Second step, finds the minimum region of energy efficiency, allows its degree of parallelism subtract 1, and energy efficiency is defined as:As region RiParallel Degree from p increase to q when, energy efficiency EE (Ri, p, q) and it is the execution time of saving and the ratio of increased energy;
3rd step, finds energy efficiency highest region in all regions, allows its degree of parallelism add 1, compensates the loss of execution time.
(4) compiler reschedules this formula, and each region is scheduling with the degree of parallelism for setting, and holds in each region Before row, insertion power gating instruction, turns off idle hardware resource, reduces electricity leakage power dissipation.
CN201610594829.8A 2016-07-21 2016-07-21 Parallelism adjusting algorithm for reducing power consumption of instruction level parallel processor Expired - Fee Related CN106445678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610594829.8A CN106445678B (en) 2016-07-21 2016-07-21 Parallelism adjusting algorithm for reducing power consumption of instruction level parallel processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610594829.8A CN106445678B (en) 2016-07-21 2016-07-21 Parallelism adjusting algorithm for reducing power consumption of instruction level parallel processor

Publications (2)

Publication Number Publication Date
CN106445678A true CN106445678A (en) 2017-02-22
CN106445678B CN106445678B (en) 2020-02-07

Family

ID=58185121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610594829.8A Expired - Fee Related CN106445678B (en) 2016-07-21 2016-07-21 Parallelism adjusting algorithm for reducing power consumption of instruction level parallel processor

Country Status (1)

Country Link
CN (1) CN106445678B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441564A (en) * 2008-12-04 2009-05-27 浙江大学 Method for implementing reconfigurable accelerator custom-built for program
CN101894383A (en) * 2010-06-11 2010-11-24 四川大学 Method for accelerating ray-traced digital image rebuilding technology
CN102004719A (en) * 2010-11-16 2011-04-06 清华大学 Very long instruction word processor structure supporting simultaneous multithreading
CN102830954A (en) * 2012-08-24 2012-12-19 北京中科信芯科技有限责任公司 Method and device for instruction scheduling
US20140337853A1 (en) * 2013-05-08 2014-11-13 Wisconsin Alumni Research Foundation Resource And Core Scaling For Improving Performance Of Power-Constrained Multi-Core Processors

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441564A (en) * 2008-12-04 2009-05-27 浙江大学 Method for implementing reconfigurable accelerator custom-built for program
CN101894383A (en) * 2010-06-11 2010-11-24 四川大学 Method for accelerating ray-traced digital image rebuilding technology
CN102004719A (en) * 2010-11-16 2011-04-06 清华大学 Very long instruction word processor structure supporting simultaneous multithreading
CN102830954A (en) * 2012-08-24 2012-12-19 北京中科信芯科技有限责任公司 Method and device for instruction scheduling
US20140337853A1 (en) * 2013-05-08 2014-11-13 Wisconsin Alumni Research Foundation Resource And Core Scaling For Improving Performance Of Power-Constrained Multi-Core Processors

Also Published As

Publication number Publication date
CN106445678B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
Yang et al. Energy-aware runtime scheduling for embedded-multiprocessor SOCs
Attia et al. Dynamic power management techniques in multi-core architectures: A survey study
Wang et al. Kernel fusion: An effective method for better power efficiency on multithreaded GPU
Yang et al. An approximation algorithm for energy-efficient scheduling on a chip multiprocessor
US10078357B2 (en) Power gating functional units of a processor
Xu et al. Energy-aware scheduling for streaming applications on chip multiprocessors
Zidenberg et al. Multiamdahl: How should i divide my heterogenous chip?
CN103902016A (en) Server power consumption management method oriented to scene prediction
Chen et al. Profit: priority and power/performance optimization for many-core systems
CN103729241B (en) A kind of optimization method of OpenMP task parallelism under multi-core environment
Schmitz et al. Cosynthesis of energy-efficient multimode embedded systems with consideration of mode-execution probabilities
Albers On energy conservation in data centers
Song et al. An efficient scheduling algorithm for energy consumption constrained parallel applications on heterogeneous distributed systems
Liu et al. Overhead-aware system-level joint energy and performance optimization for streaming applications on multiprocessor systems-on-chip
CN104484008B (en) A kind of chip low-power consumption treatment method and device
Kaur et al. Towards energy efficient scheduling with DVFS for precedence constrained tasks on heterogeneous cluster system
Mezmaz et al. A bi-objective hybrid genetic algorithm to minimize energy consumption and makespan for precedence-constrained applications using dynamic voltage scaling
Lee et al. On effective slack reclamation in task scheduling for energy reduction
CN106445678A (en) Parallelism degree adjustment algorithm for reducing power consumption of instruction-level parallel processor
CN105929928B (en) A kind of instruction level parallel processing device low power dissipation design optimization method
Shin et al. Optimizing intratask voltage scheduling using profile and data-flow information
Kumar et al. Energy efficient task scheduling for parallel workflows in cloud environment
CN106598716A (en) Task scheduling method based on multiple processors
Mezmaz et al. A parallel island-based hybrid genetic algorithm for precedence-constrained applications to minimize energy consumption and makespan
CN101290509A (en) Embedded system low-power consumption real time task scheduling method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200207

Termination date: 20200721