CN106293003A - A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query - Google Patents

A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query Download PDF

Info

Publication number
CN106293003A
CN106293003A CN201610638736.0A CN201610638736A CN106293003A CN 106293003 A CN106293003 A CN 106293003A CN 201610638736 A CN201610638736 A CN 201610638736A CN 106293003 A CN106293003 A CN 106293003A
Authority
CN
China
Prior art keywords
task
program
power consumption
aov
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201610638736.0A
Other languages
Chinese (zh)
Inventor
王卓薇
程良伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201610638736.0A priority Critical patent/CN106293003A/en
Publication of CN106293003A publication Critical patent/CN106293003A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/329Power saving characterised by the action undertaken by task scheduling

Abstract

The invention discloses a kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query, dynamic power consumption optimization problem is described as by CUDA program on heterogeneous system perform process prescription be that a kind of abstract data represent AOV network, and critical path of based on AOV Solution To The Network program, find the non-critical task that can carry out DVFS frequency reducing optimization on the premise of not affecting the program always execution time, solve the frequency amplitude of accommodation of each non-critical task under energetic optimum target.The energy-optimised problem of CUDA program can be changed into mathematical programming problem based on AOV net by the method effectively, thus provides the optimisation strategy of energetic optimum under limited performance premise.

Description

A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query
Technical field
The present invention relates to heterogeneous system low-power consumption field, more specifically, design is towards the dynamic merit of CPU-GPU heterogeneous system Consumption optimization problem.
Background technology
Of the same trade more existing methods reducing heterogeneous system dynamic power consumptions, existing GPU low-power consumption optimization be mostly for The power problems of single GPU task, seldom has the working needle power consumption to the CPU-GPU whole application program of heterogeneous system the characteristic study Optimize.But there is multiple dissimilar task in the application, simultaneously in CUDA programmed environment, host CPU is calling Being at idle condition after cudaThreadSynchronize (), this is actually to the waste calculating resource, although GPU's Computing capability is the most powerful, but is also the most time-consuming when processing large-scale dataset, can a part of distribution of computation tasks be given It is in the CPU process of idle condition, allows CPU Yu GPU concurrent working, the execution time of kernel function will certainly be reduced, and then Reduce the execution time of whole program.Therefore for a given task (there is no the loop iteration of dependence), how to exist Task division is carried out so that heterogeneous system, under conditions of meeting performance (energy) constraint, obtains energy (property between CPU and GPU Can) optimum is the design original intention of the new solution that we design.
Summary of the invention
The present invention proposes a kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query, in program One group of non-critical task of middle searching (collect at program operation process and do not affect the task of whole program execution time) also determines phase Answer CPU or GPU Frequency regulation factor so that when program is run in CPU-GPU heterogeneous system, perform time-preserving and energy Consumption optimum.
In order to solve the problems referred to above, the technical scheme is that
A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query, analyzes CUDA application program Operation characteristic on CPU-GPU heterogeneous system, concludes task dependence therein, comprises many GPU task by one complete The procedural representation that performs of program is a kind of data structure AOV net based on figure, analyzes the critical path that program is run on this basis Footpath, finds out and can carry out energy optimization part in program, solve the corresponding frequency amplitude of accommodation, and keeping, program feature is constant The overall power consumption of program is minimized under premise.Optimize to as if the dynamic merit of whole program that comprises multiple GPU task Consumption, specific as follows:
Step1: (CPU calculates task, communication task, GPU calculating times first to isolate different types of task from program Business), carry out list GPU task again being divided on CPU and GPU perform simultaneously.It is depending between task by the process prescription of operation The relation of relying, the AOV network of constructor running.
Step2: being secondly analyzed running AOV network, (be made up of mission critical is oriented to determine critical path Figure), then the CPU on non-critical path (directed graph being made up of non-critical task) and GPU task are for can carry out frequency regulation To save the non-critical task of power consumption.
Step3: the time performed finally according to mission critical determines the execution time range that non-critical task can be loosened, Thus solve each task processor frequency amplitude of accommodation to minimize the consumption of power consumption.
One typical CUDA program is as shown in Figure 1, it is assumed that single GPU task is now divided into 2 subtasks. Block1, block2, block4 represent CPU task incoherent with GPU task, single GPU task is divided into N number of son and appoints Business, the corresponding sub-Kernel function in each subtask, need to be called by main frame at the end of every sub-kernel function CudaThreadSynchronize () synchronizes, and block3_1-block3_N represents and divides a part of task by each subtask Transfer to CPU process.Specific practice is:
Step1: the task dependence in analysis program sets up program task dependency graph.One is not lost in order to simplify problem As property, it is assumed that single GPU task is divided into 2 subtasks, each subtask carry out again CPU Yu GPU task divide, its program Task dependency graph G=(V, E) (V represents task dependency graph interior joint, and E represents task dependence) is as shown in Figure 2.C generation in figure Table CPU calculates task, and T represents data and is transmitted between CPU and GPU, and G represents GPU and calculates task.
Step2: structure AOV network.Resource contention is there is, it then follows arbitration mechanism between concurrently performing of task: assuming that If a certain moment task flow exists multiple satisfied priority dependence but there is the task of resource contention, then prioritizing selection simultaneously The tasks carrying that task flow numbering is less.Set up the dependence between resource contention task.Such as in Fig. 2, when CPU task C1 After execution completes, the task of meeting task dependence includes T1, C2 and T2.Wherein T1 and T2 belongs to communication task, therefore There is resource contention to be unable to simultaneously perform, preferentially perform T1 according to arbitration mechanism, therefore should increase on the basis of artwork T1 with Task dependence between T2.Program dependence task figure expands to Fig. 3.
Step3: determine earliest start time and the late start time of each task in program.With EST (Mi) represent task Mi Earliest start time function, LST (Mi) represent task MiLate Start function,<vj, vi>then represents that node j will be prior to joint Point i performs.
Earliest start time:
E S T ( M 1 ) = 0 E S T ( M i ) = max < v j , v i > &Element; E { E S T ( M j ) + T i m e ( M j ) } - - - ( 1 )
Late start time:
{ L S T ( M N ) = E S T ( M N ) L S T ( M i ) = min < v i , v j > &Element; E { L S T ( M j ) - T i m e ( M i ) } - - - ( 2 )
Therefore task MiThe earliest with Starting Executing Time EST (M the latesti) and LST (Mi) can be according to formula (14) and (15) Recursion draws.
4. critical path is judged.If the possible time started the earliest of task allow the time started the latest equal to it, then may be used To judge that this task is positioned in the critical path of AOV net, its operation time directly affects the operation time of whole program, it is impossible to enter Row loosens.If instead the possible time started the earliest of task allow the time started the latest less than it, then judge that this task is positioned at On the non-critical path of AOV net, this task can be carried out frequency regulation, suitably increase the operation time, reduce system dynamics energy Consumption.
5. dynamic energy consumption optimal problem is changed into N unit extreme-value problem solve.For non-key node set Pi, I Construct an AOV subnetAssuming that AOV subnet has N number of non-key CPU and GPU task node, it is designated asIts processor and memorizer original frequency are1≤j≤N, carries out after frequency regulation theirs Frequency becomes respectivelyThe calculating time of the most each task becomes The memory access time becomesThe time of other tasks is constant.We can be according to formula (15) calculateThe possible time started the earliest after regulation, it is designated as
WhenTime, increase within the specific limits and calculate operating time (T 'comp), reduce processor running frequency, be System dynamic energy consumption is attributed to N unit extreme-value problem:
min &Sigma; j = 1 N k c . f c &prime; ( M j i ) 3 . N w a r p &CenterDot; N B N &CenterDot; T c o m p ( M j i ) &CenterDot; f ( M j i ) f c &prime; ( M j i ) + k m &CenterDot; f m 3 &CenterDot; N w a r p &CenterDot; N B N T m e m s . t . EST &prime; ( M i + 1 c ) = L S T ( M i + 1 c ) T c o m p &prime; &le; m e m ^ T l a t + T c o m p &prime; N m e m &le; N u m _ W a r p T m e m N m e m f c &prime; &le; f c ( max ) - - - ( 3 )
WhenTime, increase storage operating time (T ' within the specific limitsmem), reduce memorizer memory access frequency, be System dynamic energy consumption is attributed to N unit extreme-value problem:
min &Sigma; i = 1 N k c . f c 3 . N w a r p &CenterDot; N B N &CenterDot; T c o m p + k m &CenterDot; f m &prime; ( M i j ) 3 &CenterDot; N w a r p &CenterDot; N B N &CenterDot; T m e m ( M i j ) &CenterDot; f m ( M i j ) f m &prime; ( M i j ) s . t . EST &prime; ( M i + 1 c ) = L S T ( M i + 1 c ) T m e m &prime; &le; T c o m p ^ T l a t + T c o m p N m e m &GreaterEqual; N u m _ W a r p T m e m &prime; N m e m f m &prime; &le; f m ( max ) - - - ( 4 )
Accompanying drawing explanation
Fig. 1 is the typical CUDA program schematic diagram of the present invention
Fig. 2 is the program task dependency graph of the present invention
Fig. 3 is the program AOV schematic diagram of the present invention
Fig. 4 is the analysis of cases figure of the present invention
Fig. 5 is EST and LST of each task of the present invention
Detailed description of the invention
The present invention will be further described with embodiment below in conjunction with the accompanying drawings.
The present embodiment implements process based on AOV gateway key path query dynamic energy consumption optimization method carry out a kind of Detailed description.Fig. 4 gives the analysis process of embodiment program.Wherein figure (a) is primal algorithm flow process, includes 6 steps altogether Rapid: being first the initialization to two arrays of a, b, then invoked procedure f1 and f2 is respectively to a and b process, obtains array c And d.4th row representative function f3 is calculated array e by a scalar ce;5th row then representative function f4 is by two arrays of a, b Calculate scalar result β;Last column representative function f5 is calculated array g by β, c, d and e.Figure (b) gives above-mentioned algorithm One CUDA realizes, it is assumed that f1, f2, f3 and f5 function can be parallelized, and therefore uses Kernel function to realize, respectively correspondence In Kernel1-Kernel4;F4 function can not be parallelized, and is the most still completed by CPU.Before Kernel1 and Kernel2 performs Needing to call cudaMemcpy and will input array a and b loading GPU memory, Kernel4 needs after performing to terminate to restore array g To CPU memory, calculating and the concurrency communicated to develop Kernel, Communication hiding expense, CUDA will in realizing The traffic operation of Kernel1, Kernel2, Kernel3 and correspondence thereof is set as asynchronous mode, and Kernel4 has used front 3 The output of individual Kernel function, first carries out the task flow simultaneously operating of the overall situation before therefore calling Kernel4.Figure (c) gives The task dependency graph of structure, it is assumed that the execution time of each task is as listed in figure (d), adds T1 to T2, G3 to G1 and G1 arrives The Resource Dependence limit of G2, the AOV network of generation is as shown in figure (e).Now utilize formula (1) and (2) that each joint can be extrapolated May starting the earliest with the permission time started at the latest as shown in Figure 5 of point.It can be seen that task C1, T1, T2, G2, S1, G4, T3 The earliest may time started and allow the time started equal the latest, they constitute the critical path of AOV net, as schemed in (f) Shown in shade node.Therefore the non-critical task that can carry out frequency regulation in system includes G1, G3 and C2.These 3 tasks can To be divided into the most inaccessible two groups, as shown in the dotted line frame in figure (f), can independently carry out regulating power consumption.The situation ratio of C2 Relatively simple, it may be respectively 1 and 13 with allowing the time started the latest the earliest, therefore according to formula (3-4) be given first fixed Condition, as long as the frequency of C2 regulation CPU does not affect the possible time started the earliest of S1.Understand C2 according to formula (1) to perform Time can be extended for 16, and when therefore performing task C2, the frequency of CPU is minimum is down to original 1/4, the energy expenditure fall of C2 For original 1/16.For G1 and G3, in like manner understand the frequency on GPU and regulate the possible time started the earliest that can not affect S1. Owing to the initial operating time of G1 and G3 is all 2, do not have a holiday or vacation and set energy that they consume under original frequency all as E, after regulation The operation time becomesWithAccording to formula (1), the operation time of G1 and G3 task energetic optimum is represented by
min { ( 2 t G 1 &prime; ) 2 E + ( 2 t G 3 &prime; ) 2 E } s . t . max 1 + t G 3 &prime; EST &prime; ( G 1 ) + t G 1 &prime; = 17 EST &prime; ( G 2 ) + 10 ,
Wherein,
EST &prime; ( G 1 ) = m a x 1 + t G 3 &prime; 4 ,
EST &prime; ( G 2 ) = m a x EST &prime; ( G 1 ) + t G 1 &prime; 7
Solve above formula can obtain,Time, the minimum 8/9E of both total power consumption.
Above the specific embodiment of the present invention is described.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make various deformation or amendment within the scope of the claims, this not shadow Ring the flesh and blood of the present invention.

Claims (4)

1. a heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query, analyzes CUDA application program and exists Operation characteristic on CPU-GPU heterogeneous system, concludes task dependence therein, comprises the complete journey of many GPU task by one The procedural representation that performs of sequence is a kind of data structure AOV net based on figure, analyzes the critical path that program is run on this basis, Find out and program can carry out energy optimization part, solve the corresponding frequency amplitude of accommodation, before keeping program feature constant Put the overall power consumption of the program of minimizing, optimization to as if comprise the dynamic power consumption of whole program of multiple GPU task, Specific as follows:
Step1: first isolate from program that different types of task: CPU calculates task, communication task, GPU calculate task, Carry out list GPU task again being divided on CPU and GPU perform simultaneously, be that the dependence between task is closed by the process prescription of operation System, the AOV network of constructor running;
Step2: secondly running AOV network is analyzed, determines critical path, then CPU and GPU on non-critical path Task is can to carry out frequency regulation to save the non-critical task of power consumption;
Step3: the time performed finally according to mission critical determines the execution time range that non-critical task can be loosened, thus Solve each task processor frequency amplitude of accommodation to minimize the consumption of power consumption.
Heterogeneous system dynamic power consumption optimization method the most according to claim 1, it is characterised in that Step1: particularly as follows: point Task dependence in analysis program sets up program task dependency graph.
Heterogeneous system dynamic power consumption optimization method the most according to claim 2, it is characterised in that Step2: particularly as follows: structure Make AOV network, there is resource contention between concurrently performing of task, it then follows arbitration mechanism: if same in a certain moment task flow Time there is multiple satisfied priority dependence but there is the task of resource contention, then the task that prioritizing selection task flow numbering is less Perform.
Heterogeneous system dynamic power consumption optimization method the most according to claim 1, it is characterised in that Step3 includes walking as follows Rapid: to determine earliest start time and the late start time of each task in program;With EST (Mi) represent task MiDuring early start Between function, LST (Mi) represent task MiLate Start function,<vj, vi>then represents that node j to perform prior to node i.
Earliest start time:
E S T ( M 1 ) =0 E S T ( M i ) = m a x < v j , v i > &Element; E { E S T ( M j ) + T i m e ( M j ) - - - ( 1 )
Late start time:
L S T ( M N ) = E S T ( M N ) L S T ( M i ) = m i n < v i , v j > &Element; E { L S T ( M j ) - T i m e ( M i ) } - - - ( 2 )
Therefore task MiThe earliest with Starting Executing Time EST (M the latesti) and LST (Mi) can obtain according to formula (1) and (2) recursion Go out.
CN201610638736.0A 2016-08-05 2016-08-05 A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query Withdrawn CN106293003A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610638736.0A CN106293003A (en) 2016-08-05 2016-08-05 A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610638736.0A CN106293003A (en) 2016-08-05 2016-08-05 A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query

Publications (1)

Publication Number Publication Date
CN106293003A true CN106293003A (en) 2017-01-04

Family

ID=57665742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610638736.0A Withdrawn CN106293003A (en) 2016-08-05 2016-08-05 A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query

Country Status (1)

Country Link
CN (1) CN106293003A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874158A (en) * 2017-01-11 2017-06-20 广东工业大学 A kind of heterogeneous system Whole Process power consumption metering method
CN106896895A (en) * 2017-01-11 2017-06-27 广东工业大学 A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path queries
CN110618748A (en) * 2018-06-04 2019-12-27 中芯国际集成电路制造(上海)有限公司 Logic circuit and wearable electronic equipment
CN113112084A (en) * 2020-07-31 2021-07-13 中国海洋大学 Training plane rear body research and development flow optimization method and device
CN117453379A (en) * 2023-12-25 2024-01-26 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140223211A1 (en) * 2011-09-06 2014-08-07 St-Ericsson Sa Regulating the Activity of a Core
CN105677461A (en) * 2015-12-30 2016-06-15 西安工业大学 Mixed-criticality tasks scheduling method based on criticality

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140223211A1 (en) * 2011-09-06 2014-08-07 St-Ericsson Sa Regulating the Activity of a Core
CN105677461A (en) * 2015-12-30 2016-06-15 西安工业大学 Mixed-criticality tasks scheduling method based on criticality

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林一松: ""面向GPU的低功耗软件优化关键技术研究"", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874158A (en) * 2017-01-11 2017-06-20 广东工业大学 A kind of heterogeneous system Whole Process power consumption metering method
CN106896895A (en) * 2017-01-11 2017-06-27 广东工业大学 A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path queries
CN110618748A (en) * 2018-06-04 2019-12-27 中芯国际集成电路制造(上海)有限公司 Logic circuit and wearable electronic equipment
CN110618748B (en) * 2018-06-04 2021-02-09 中芯国际集成电路制造(上海)有限公司 Logic circuit and wearable electronic equipment
CN113112084A (en) * 2020-07-31 2021-07-13 中国海洋大学 Training plane rear body research and development flow optimization method and device
CN117453379A (en) * 2023-12-25 2024-01-26 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system
CN117453379B (en) * 2023-12-25 2024-04-05 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system

Similar Documents

Publication Publication Date Title
CN106293003A (en) A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query
Bosilca et al. Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA
Date et al. GPU-accelerated Hungarian algorithms for the linear assignment problem
JáJá Parallel algorithms
Yi et al. An ILP formulation for task mapping and scheduling on multi-core architectures
TWI827792B (en) Multipath neural network, method to allocate resources and multipath neural network analyzer
Attiya et al. Two phase algorithm for load balancing in heterogeneous distributed systems
Shetti et al. Optimization of the HEFT algorithm for a CPU-GPU environment
Ahn et al. A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures
CN112559053B (en) Data synchronization processing method and device for reconfigurable processor
Bosilca et al. Distibuted dense numerical linear algebra algorithms on massively parallel architectures: DPLASMA
Zheng et al. Atomic dataflow based graph-level workload orchestration for scalable DNN accelerators
Bisseling et al. Parallel LU decomposition on a transputer network
Lee et al. NP-CGRA: Extending CGRAs for efficient processing of light-weight deep neural networks
CN106896895A (en) A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path queries
Xiao et al. FCNNLib: An efficient and flexible convolution algorithm library on FPGAs
CN112199177B (en) SKA task scheduling system and method based on genetic algorithm and computational topology model
Kalashnikov et al. A parallel algorithm of simulated annealing for multiprocessor scheduling
Bondalapati et al. Loop pipelining and optimization for run time reconfiguration
Mohan et al. Graph matching algorithm for task assignment problem
CN115374395A (en) Hardware structure for carrying out scheduling calculation through algorithm control unit
Ganapathy et al. Optimal synthesis of algorithm-specific lower-dimensional processor arrays
Senger et al. Bounds on the scalability of bag-of-tasks applications running on master-slave platforms
Laghari et al. Scheduling Techniques of Processor Scheduling in Cellular Automaton
US20240126611A1 (en) Workload-Aware Hardware Architecture Recommendations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20170104

WW01 Invention patent application withdrawn after publication