CN106293003A - A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query - Google Patents
A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query Download PDFInfo
- Publication number
- CN106293003A CN106293003A CN201610638736.0A CN201610638736A CN106293003A CN 106293003 A CN106293003 A CN 106293003A CN 201610638736 A CN201610638736 A CN 201610638736A CN 106293003 A CN106293003 A CN 106293003A
- Authority
- CN
- China
- Prior art keywords
- task
- program
- power consumption
- aov
- gpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/324—Power saving characterised by the action undertaken by lowering clock frequency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/329—Power saving characterised by the action undertaken by task scheduling
Abstract
The invention discloses a kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query, dynamic power consumption optimization problem is described as by CUDA program on heterogeneous system perform process prescription be that a kind of abstract data represent AOV network, and critical path of based on AOV Solution To The Network program, find the non-critical task that can carry out DVFS frequency reducing optimization on the premise of not affecting the program always execution time, solve the frequency amplitude of accommodation of each non-critical task under energetic optimum target.The energy-optimised problem of CUDA program can be changed into mathematical programming problem based on AOV net by the method effectively, thus provides the optimisation strategy of energetic optimum under limited performance premise.
Description
Technical field
The present invention relates to heterogeneous system low-power consumption field, more specifically, design is towards the dynamic merit of CPU-GPU heterogeneous system
Consumption optimization problem.
Background technology
Of the same trade more existing methods reducing heterogeneous system dynamic power consumptions, existing GPU low-power consumption optimization be mostly for
The power problems of single GPU task, seldom has the working needle power consumption to the CPU-GPU whole application program of heterogeneous system the characteristic study
Optimize.But there is multiple dissimilar task in the application, simultaneously in CUDA programmed environment, host CPU is calling
Being at idle condition after cudaThreadSynchronize (), this is actually to the waste calculating resource, although GPU's
Computing capability is the most powerful, but is also the most time-consuming when processing large-scale dataset, can a part of distribution of computation tasks be given
It is in the CPU process of idle condition, allows CPU Yu GPU concurrent working, the execution time of kernel function will certainly be reduced, and then
Reduce the execution time of whole program.Therefore for a given task (there is no the loop iteration of dependence), how to exist
Task division is carried out so that heterogeneous system, under conditions of meeting performance (energy) constraint, obtains energy (property between CPU and GPU
Can) optimum is the design original intention of the new solution that we design.
Summary of the invention
The present invention proposes a kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query, in program
One group of non-critical task of middle searching (collect at program operation process and do not affect the task of whole program execution time) also determines phase
Answer CPU or GPU Frequency regulation factor so that when program is run in CPU-GPU heterogeneous system, perform time-preserving and energy
Consumption optimum.
In order to solve the problems referred to above, the technical scheme is that
A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query, analyzes CUDA application program
Operation characteristic on CPU-GPU heterogeneous system, concludes task dependence therein, comprises many GPU task by one complete
The procedural representation that performs of program is a kind of data structure AOV net based on figure, analyzes the critical path that program is run on this basis
Footpath, finds out and can carry out energy optimization part in program, solve the corresponding frequency amplitude of accommodation, and keeping, program feature is constant
The overall power consumption of program is minimized under premise.Optimize to as if the dynamic merit of whole program that comprises multiple GPU task
Consumption, specific as follows:
Step1: (CPU calculates task, communication task, GPU calculating times first to isolate different types of task from program
Business), carry out list GPU task again being divided on CPU and GPU perform simultaneously.It is depending between task by the process prescription of operation
The relation of relying, the AOV network of constructor running.
Step2: being secondly analyzed running AOV network, (be made up of mission critical is oriented to determine critical path
Figure), then the CPU on non-critical path (directed graph being made up of non-critical task) and GPU task are for can carry out frequency regulation
To save the non-critical task of power consumption.
Step3: the time performed finally according to mission critical determines the execution time range that non-critical task can be loosened,
Thus solve each task processor frequency amplitude of accommodation to minimize the consumption of power consumption.
One typical CUDA program is as shown in Figure 1, it is assumed that single GPU task is now divided into 2 subtasks.
Block1, block2, block4 represent CPU task incoherent with GPU task, single GPU task is divided into N number of son and appoints
Business, the corresponding sub-Kernel function in each subtask, need to be called by main frame at the end of every sub-kernel function
CudaThreadSynchronize () synchronizes, and block3_1-block3_N represents and divides a part of task by each subtask
Transfer to CPU process.Specific practice is:
Step1: the task dependence in analysis program sets up program task dependency graph.One is not lost in order to simplify problem
As property, it is assumed that single GPU task is divided into 2 subtasks, each subtask carry out again CPU Yu GPU task divide, its program
Task dependency graph G=(V, E) (V represents task dependency graph interior joint, and E represents task dependence) is as shown in Figure 2.C generation in figure
Table CPU calculates task, and T represents data and is transmitted between CPU and GPU, and G represents GPU and calculates task.
Step2: structure AOV network.Resource contention is there is, it then follows arbitration mechanism between concurrently performing of task: assuming that
If a certain moment task flow exists multiple satisfied priority dependence but there is the task of resource contention, then prioritizing selection simultaneously
The tasks carrying that task flow numbering is less.Set up the dependence between resource contention task.Such as in Fig. 2, when CPU task C1
After execution completes, the task of meeting task dependence includes T1, C2 and T2.Wherein T1 and T2 belongs to communication task, therefore
There is resource contention to be unable to simultaneously perform, preferentially perform T1 according to arbitration mechanism, therefore should increase on the basis of artwork T1 with
Task dependence between T2.Program dependence task figure expands to Fig. 3.
Step3: determine earliest start time and the late start time of each task in program.With EST (Mi) represent task Mi
Earliest start time function, LST (Mi) represent task MiLate Start function,<vj, vi>then represents that node j will be prior to joint
Point i performs.
Earliest start time:
Late start time:
Therefore task MiThe earliest with Starting Executing Time EST (M the latesti) and LST (Mi) can be according to formula (14) and (15)
Recursion draws.
4. critical path is judged.If the possible time started the earliest of task allow the time started the latest equal to it, then may be used
To judge that this task is positioned in the critical path of AOV net, its operation time directly affects the operation time of whole program, it is impossible to enter
Row loosens.If instead the possible time started the earliest of task allow the time started the latest less than it, then judge that this task is positioned at
On the non-critical path of AOV net, this task can be carried out frequency regulation, suitably increase the operation time, reduce system dynamics energy
Consumption.
5. dynamic energy consumption optimal problem is changed into N unit extreme-value problem solve.For non-key node set Pi, I
Construct an AOV subnetAssuming that AOV subnet has N number of non-key CPU and GPU task node, it is designated asIts processor and memorizer original frequency are1≤j≤N, carries out after frequency regulation theirs
Frequency becomes respectivelyThe calculating time of the most each task becomes
The memory access time becomesThe time of other tasks is constant.We can be according to formula
(15) calculateThe possible time started the earliest after regulation, it is designated as
WhenTime, increase within the specific limits and calculate operating time (T 'comp), reduce processor running frequency, be
System dynamic energy consumption is attributed to N unit extreme-value problem:
WhenTime, increase storage operating time (T ' within the specific limitsmem), reduce memorizer memory access frequency, be
System dynamic energy consumption is attributed to N unit extreme-value problem:
Accompanying drawing explanation
Fig. 1 is the typical CUDA program schematic diagram of the present invention
Fig. 2 is the program task dependency graph of the present invention
Fig. 3 is the program AOV schematic diagram of the present invention
Fig. 4 is the analysis of cases figure of the present invention
Fig. 5 is EST and LST of each task of the present invention
Detailed description of the invention
The present invention will be further described with embodiment below in conjunction with the accompanying drawings.
The present embodiment implements process based on AOV gateway key path query dynamic energy consumption optimization method carry out a kind of
Detailed description.Fig. 4 gives the analysis process of embodiment program.Wherein figure (a) is primal algorithm flow process, includes 6 steps altogether
Rapid: being first the initialization to two arrays of a, b, then invoked procedure f1 and f2 is respectively to a and b process, obtains array c
And d.4th row representative function f3 is calculated array e by a scalar ce;5th row then representative function f4 is by two arrays of a, b
Calculate scalar result β;Last column representative function f5 is calculated array g by β, c, d and e.Figure (b) gives above-mentioned algorithm
One CUDA realizes, it is assumed that f1, f2, f3 and f5 function can be parallelized, and therefore uses Kernel function to realize, respectively correspondence
In Kernel1-Kernel4;F4 function can not be parallelized, and is the most still completed by CPU.Before Kernel1 and Kernel2 performs
Needing to call cudaMemcpy and will input array a and b loading GPU memory, Kernel4 needs after performing to terminate to restore array g
To CPU memory, calculating and the concurrency communicated to develop Kernel, Communication hiding expense, CUDA will in realizing
The traffic operation of Kernel1, Kernel2, Kernel3 and correspondence thereof is set as asynchronous mode, and Kernel4 has used front 3
The output of individual Kernel function, first carries out the task flow simultaneously operating of the overall situation before therefore calling Kernel4.Figure (c) gives
The task dependency graph of structure, it is assumed that the execution time of each task is as listed in figure (d), adds T1 to T2, G3 to G1 and G1 arrives
The Resource Dependence limit of G2, the AOV network of generation is as shown in figure (e).Now utilize formula (1) and (2) that each joint can be extrapolated
May starting the earliest with the permission time started at the latest as shown in Figure 5 of point.It can be seen that task C1, T1, T2, G2, S1, G4, T3
The earliest may time started and allow the time started equal the latest, they constitute the critical path of AOV net, as schemed in (f)
Shown in shade node.Therefore the non-critical task that can carry out frequency regulation in system includes G1, G3 and C2.These 3 tasks can
To be divided into the most inaccessible two groups, as shown in the dotted line frame in figure (f), can independently carry out regulating power consumption.The situation ratio of C2
Relatively simple, it may be respectively 1 and 13 with allowing the time started the latest the earliest, therefore according to formula (3-4) be given first fixed
Condition, as long as the frequency of C2 regulation CPU does not affect the possible time started the earliest of S1.Understand C2 according to formula (1) to perform
Time can be extended for 16, and when therefore performing task C2, the frequency of CPU is minimum is down to original 1/4, the energy expenditure fall of C2
For original 1/16.For G1 and G3, in like manner understand the frequency on GPU and regulate the possible time started the earliest that can not affect S1.
Owing to the initial operating time of G1 and G3 is all 2, do not have a holiday or vacation and set energy that they consume under original frequency all as E, after regulation
The operation time becomesWithAccording to formula (1), the operation time of G1 and G3 task energetic optimum is represented by
Wherein,
Solve above formula can obtain,Time, the minimum 8/9E of both total power consumption.
Above the specific embodiment of the present invention is described.It is to be appreciated that the invention is not limited in above-mentioned
Particular implementation, those skilled in the art can make various deformation or amendment within the scope of the claims, this not shadow
Ring the flesh and blood of the present invention.
Claims (4)
1. a heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query, analyzes CUDA application program and exists
Operation characteristic on CPU-GPU heterogeneous system, concludes task dependence therein, comprises the complete journey of many GPU task by one
The procedural representation that performs of sequence is a kind of data structure AOV net based on figure, analyzes the critical path that program is run on this basis,
Find out and program can carry out energy optimization part, solve the corresponding frequency amplitude of accommodation, before keeping program feature constant
Put the overall power consumption of the program of minimizing, optimization to as if comprise the dynamic power consumption of whole program of multiple GPU task,
Specific as follows:
Step1: first isolate from program that different types of task: CPU calculates task, communication task, GPU calculate task,
Carry out list GPU task again being divided on CPU and GPU perform simultaneously, be that the dependence between task is closed by the process prescription of operation
System, the AOV network of constructor running;
Step2: secondly running AOV network is analyzed, determines critical path, then CPU and GPU on non-critical path
Task is can to carry out frequency regulation to save the non-critical task of power consumption;
Step3: the time performed finally according to mission critical determines the execution time range that non-critical task can be loosened, thus
Solve each task processor frequency amplitude of accommodation to minimize the consumption of power consumption.
Heterogeneous system dynamic power consumption optimization method the most according to claim 1, it is characterised in that Step1: particularly as follows: point
Task dependence in analysis program sets up program task dependency graph.
Heterogeneous system dynamic power consumption optimization method the most according to claim 2, it is characterised in that Step2: particularly as follows: structure
Make AOV network, there is resource contention between concurrently performing of task, it then follows arbitration mechanism: if same in a certain moment task flow
Time there is multiple satisfied priority dependence but there is the task of resource contention, then the task that prioritizing selection task flow numbering is less
Perform.
Heterogeneous system dynamic power consumption optimization method the most according to claim 1, it is characterised in that Step3 includes walking as follows
Rapid: to determine earliest start time and the late start time of each task in program;With EST (Mi) represent task MiDuring early start
Between function, LST (Mi) represent task MiLate Start function,<vj, vi>then represents that node j to perform prior to node i.
Earliest start time:
Late start time:
Therefore task MiThe earliest with Starting Executing Time EST (M the latesti) and LST (Mi) can obtain according to formula (1) and (2) recursion
Go out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610638736.0A CN106293003A (en) | 2016-08-05 | 2016-08-05 | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610638736.0A CN106293003A (en) | 2016-08-05 | 2016-08-05 | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106293003A true CN106293003A (en) | 2017-01-04 |
Family
ID=57665742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610638736.0A Withdrawn CN106293003A (en) | 2016-08-05 | 2016-08-05 | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106293003A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874158A (en) * | 2017-01-11 | 2017-06-20 | 广东工业大学 | A kind of heterogeneous system Whole Process power consumption metering method |
CN106896895A (en) * | 2017-01-11 | 2017-06-27 | 广东工业大学 | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path queries |
CN110618748A (en) * | 2018-06-04 | 2019-12-27 | 中芯国际集成电路制造(上海)有限公司 | Logic circuit and wearable electronic equipment |
CN113112084A (en) * | 2020-07-31 | 2021-07-13 | 中国海洋大学 | Training plane rear body research and development flow optimization method and device |
CN117453379A (en) * | 2023-12-25 | 2024-01-26 | 麒麟软件有限公司 | Scheduling method and system for AOE network computing tasks in Linux system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140223211A1 (en) * | 2011-09-06 | 2014-08-07 | St-Ericsson Sa | Regulating the Activity of a Core |
CN105677461A (en) * | 2015-12-30 | 2016-06-15 | 西安工业大学 | Mixed-criticality tasks scheduling method based on criticality |
-
2016
- 2016-08-05 CN CN201610638736.0A patent/CN106293003A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140223211A1 (en) * | 2011-09-06 | 2014-08-07 | St-Ericsson Sa | Regulating the Activity of a Core |
CN105677461A (en) * | 2015-12-30 | 2016-06-15 | 西安工业大学 | Mixed-criticality tasks scheduling method based on criticality |
Non-Patent Citations (1)
Title |
---|
林一松: ""面向GPU的低功耗软件优化关键技术研究"", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874158A (en) * | 2017-01-11 | 2017-06-20 | 广东工业大学 | A kind of heterogeneous system Whole Process power consumption metering method |
CN106896895A (en) * | 2017-01-11 | 2017-06-27 | 广东工业大学 | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path queries |
CN110618748A (en) * | 2018-06-04 | 2019-12-27 | 中芯国际集成电路制造(上海)有限公司 | Logic circuit and wearable electronic equipment |
CN110618748B (en) * | 2018-06-04 | 2021-02-09 | 中芯国际集成电路制造(上海)有限公司 | Logic circuit and wearable electronic equipment |
CN113112084A (en) * | 2020-07-31 | 2021-07-13 | 中国海洋大学 | Training plane rear body research and development flow optimization method and device |
CN117453379A (en) * | 2023-12-25 | 2024-01-26 | 麒麟软件有限公司 | Scheduling method and system for AOE network computing tasks in Linux system |
CN117453379B (en) * | 2023-12-25 | 2024-04-05 | 麒麟软件有限公司 | Scheduling method and system for AOE network computing tasks in Linux system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106293003A (en) | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query | |
Bosilca et al. | Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA | |
Date et al. | GPU-accelerated Hungarian algorithms for the linear assignment problem | |
JáJá | Parallel algorithms | |
Yi et al. | An ILP formulation for task mapping and scheduling on multi-core architectures | |
TWI827792B (en) | Multipath neural network, method to allocate resources and multipath neural network analyzer | |
Attiya et al. | Two phase algorithm for load balancing in heterogeneous distributed systems | |
Shetti et al. | Optimization of the HEFT algorithm for a CPU-GPU environment | |
Ahn et al. | A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures | |
CN112559053B (en) | Data synchronization processing method and device for reconfigurable processor | |
Bosilca et al. | Distibuted dense numerical linear algebra algorithms on massively parallel architectures: DPLASMA | |
Zheng et al. | Atomic dataflow based graph-level workload orchestration for scalable DNN accelerators | |
Bisseling et al. | Parallel LU decomposition on a transputer network | |
Lee et al. | NP-CGRA: Extending CGRAs for efficient processing of light-weight deep neural networks | |
CN106896895A (en) | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path queries | |
Xiao et al. | FCNNLib: An efficient and flexible convolution algorithm library on FPGAs | |
CN112199177B (en) | SKA task scheduling system and method based on genetic algorithm and computational topology model | |
Kalashnikov et al. | A parallel algorithm of simulated annealing for multiprocessor scheduling | |
Bondalapati et al. | Loop pipelining and optimization for run time reconfiguration | |
Mohan et al. | Graph matching algorithm for task assignment problem | |
CN115374395A (en) | Hardware structure for carrying out scheduling calculation through algorithm control unit | |
Ganapathy et al. | Optimal synthesis of algorithm-specific lower-dimensional processor arrays | |
Senger et al. | Bounds on the scalability of bag-of-tasks applications running on master-slave platforms | |
Laghari et al. | Scheduling Techniques of Processor Scheduling in Cellular Automaton | |
US20240126611A1 (en) | Workload-Aware Hardware Architecture Recommendations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170104 |
|
WW01 | Invention patent application withdrawn after publication |