CN109992385A

CN109992385A - A kind of inside GPU energy consumption optimization method of task based access control balance dispatching

Info

Publication number: CN109992385A
Application number: CN201910205801.4A
Authority: CN
Inventors: 黄彦辉; 旷志寰; 王兆基; 冯雪昱
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2019-07-09
Anticipated expiration: 2039-03-19
Also published as: CN109992385B

Abstract

The invention belongs to equipment states and the task of task characteristic to balance dispatching technique field, disclose energy consumption optimization method inside a kind of GPU of task based access control balance dispatching, using balanced impact factor CB and SM the frequency of use information PHB handle of pending program, the CB-HRV task scheduling algorithm of balanced impact factor and SM frequency of use information is proposed；One group task is assigned in corresponding SM according to the balanced impact factor of task according to the time of task execution and executes by CB-HRV task scheduling algorithm；Strategies scheduler task balance is assigned in each SM；Meanwhile the service efficiency of device resource is improved using SM frequency of use information；The method for scheduling task of energy optimization is realized by the integration application to pending mission bit stream and resource information.The present invention realizes the scheduling of the dynamic equilibrium in multiple SM, has reached the energy optimization target of GPU.

Description

A kind of inside GPU energy consumption optimization method of task based access control balance dispatching

Technical field

The invention belongs to equipment states and the task of task characteristic balance dispatching technique field more particularly to one kind to be based on Energy consumption optimization method inside the GPU of task balance scheduling.

Background technique

Currently, the prior art commonly used in the trade is such that graphics processor (English: Graphics Processing Unit, abbreviation: GPU), also known as show core, vision processor, display chip, be one kind specially PC, work station, The microprocessor that image operation works on game machine and some mobile devices (such as tablet computer, smart phone).Purposes be by Display information required for computer system carries out conversion driving, and provides line scan signals to display, controls display Correct display is one of the critical elements, and the important equipment of " human-computer dialogue " for connecting display and PC mainboard. Video card undertakes the task of output display figure, for being engaged in professional graphic as an important component in host computer Video card is extremely important for the people of design.

One, the energy optimization research of task based access control scheduling, carrying out energy optimization by task schedule in GPU can be from Set about in terms of hardware and software two.Software-based energy optimization strategy is small because of hardware costs and implementation is relatively easy, because This is considered as relatively effective energy optimization means.In general, it is divided by the task scheduling strategy of software towards program The static method of design and the dynamic approach of oriented manipulation system and compiler.The static method of programming-oriented is straight Connected the distribution of SM resource needed for programming carries out.Scholars propose static Task Assigned Policy to realize task Scheduling, to reduce system energy consumption.The method of static compilation is not necessarily to carry out the modification of GPU operation system or hardware view, Realize that cost is relatively small.But simultaneously as this method do not account for the resource situation and task execution of SM in GPU when Between target, therefore energy optimization effect is limited.Especially when performing environment changes, the energy consumption of this method Effect is just more difficult to be protected.If it is intended to obtaining the power dissipation obj ectives of specific environment, need before program scheduler to program It is previously run to obtain parameter and Manual Override program codes.Second method is dynamic dispatching method, also It is to carry out task balance using the time parameter of known GPU resource parameter and task execution during program executes Scheduling.These parameters include the size for handling data, task to the degree of dependence of CACHE, utilization rate of equipment and installations of SM etc..Dynamically Scheme balance dispatching is carried out by the analysis of analysis and task characteristic itself to the resource inside GPU, tend to obtain Effect more better than static scheduling, weight analysis Dynamic Scheduling Strategy and propose the energy consumption of task based access control equilibrium dynamic dispatching a kind of Optimization method.

Two, task based access control executes the scheduling strategy (Random allocation device) of time, GPU energy optimization Research scene be the operative scenario based on more SM.It is tactful to each according to certain into the task of GPU from SM task dispatcher SM distributes task.The status of each SM is equality in task implementation procedure, that is to say, that their task load ability is Identical, task dispatcher is generally basede on equality strategy and carries out task distribution.But since task dispatcher is distributed in each SM Task characteristic it is different, the service efficiency of each SM is not identical.

In current task schedule, it is often based upon the scheduling strategy of program execution time.Assuming that pending in system Agenda is Pi (0≤i≤n), and the available processors resource in current system is SMi (0≤i≤n), then to allow System executive energy consumption is minimum:

As shown from the above formula, system energy consumption is represented by all SM, CPU in system and the energy consumption of mainboard consumption The sum of.Further it is represented by the product of respective power and time.The certain phase of program and quantity to be scheduled for one group For SM, the sequence to scheduler task is different according to the difference of dispatching algorithm, but not function when change task execution Consumption.The average power consumption to the sequence of scheduler task remains unchanged in other words.Therefore, system energy consumption can further indicate that For average power consumptionWith the product of time T.In order to make system energy consumption minimum when executing agenda, it is necessary to make average function ConsumptionIt is as small as possible with time T.And for different dispatching methods, average power consumptionIt is certain, therefore, for System energy consumption minimum will to execute time T minimum.Many energy optimization schemes are all based on the scheme for executing the time, That is making overall deadline of the task on SM minimum by a kind of dispatching algorithm.

Three, based on the task scheduling strategy (Random allocation device) for being randomly assigned equipment

It is executed in dynamic dispatching of the time as target to reduce average time, task is carried out based on equipment (SM) state Scheduling, Random allocation device (RAD) is common energy optimization means, and equipment here refers in GPU SM.A kind of method of low-power consumption distribution load.The thought of this method is directly equipment (SM) to be randomly assigned to times to be executed It make sures with the state without considering equipment.That is, this method thinks that the SM equipment in GPU is all equality, without spy Difference in property and in use state.The pseudocode that the algorithm of this strategy is realized has been shown below.

Random_Allocation_Device Approach(RAD)

This method is advantageous on time complexity, because there is no consider SM as Resource Properties for this method Difference is randomly assigned that task to be executed is enabled to carry out SM positioning very fast, and executes at once.But as Described in algorithm, for this method there is no difference of the SM as Resource Properties is considered, this also cannot be the characteristic of execution task Correlation is carried out with SM to be executed to link.For example, working as some SM that some task is probabilistically assigned, it is more likely that the SM Still it is occupied by other tasks, and needs just be released for quite a long time.Under this strategy being randomly assigned, Task cannot be executed in time, and real-time is excessively poor.

Four, the task scheduling strategy (device history-based) based on device history state

If it is considered that the history use state of equipment (SM), carrying out task schedule based on equipment (SM) state is often to make Method.DFB (device history-based) algorithm below such as considers the release conditions of SM.Function IsSMfree (i) has inquired the state of equipment, if equipment is released, the SM resource of release will be placed to resource pool coequally It is used for all tasks.

Device_Free_based Approach(DFB)

Due to consideration that the state of SM, the dispatching method are capable of providing more preferable than being randomly assigned in most cases Performance.However, the dispatching method but cannot be guaranteed the consistency of task execution, because it has considered only current set For whether occupied by previous task.That is this method does not consider the temporal correlation of program in task With the influence of spatial coherence, this correlation is possible to will cause frequent switching of the task in each SM, and it is unnecessary to cause Energy loss.Meanwhile the correlation of this task also will cause considerable task and migrate to hold to further result in SM in SM The loss of line efficiency.

Five, it carries out further strategy according to the state of SM to attempt, this is the algorithm policy based on the performance of SM history (Performance History Based), which has used the execution time ratio of GPU.This method is suitable in selection SM can read the historical information for executing the time before executing task, calculate History Ratio Value by following formula (PHB), the history resource frequency of use of referred to as some SM.

Ratio [i]=Execution on the SM_i/Execution on the SM_i+1；

The each SM of the parameter declaration when being executed between on frequency, which substantially reflects execution task in SM On program time and spatial coherence；It is thus possible to improve the utilization rate of entire SM resource is excellent to carry out to energy consumption Change.

Illustrate that the task dependencies of the SM are relatively preferable if the use of some SM is more frequent according to this strategy, hold Line efficiency is also just relatively high.It is prioritized and is assigned to new task；The pseudo- generation of dispatching method based on frequency of use Code.

Performance_history_Based Approach(PHB)

Although it is contemplated that in GPU each SM difference, however could not consider that present procedure executes in corresponding SM surplus The remaining time, and then will lead to the excessive use of SM and owe to utilize.For example, if all pending programs are according to above-mentioned calculating History resource frequency of use be all distributed on first SM, then first SM will excessive use and remaining SM will It can become to owe utilization.

In conclusion problem of the existing technology is:

(1) loss caused by task migrates on each SM, such as the dispatching algorithm based on equipment release are not accounted for (device history-based algorithm) considers preferentially using the SM being released, but there is no consideration task moving on SM It moves；Probably due to being migrated repeatedly between SM for multiple tasks and generate a large amount of energy consumption, weaken the energy consumption and performance of GPU.

(2) device resource and the task of execution overall comprehensively consider；Algorithm based on the performance of SM history Tactful (Performance History Based) although in the utilization rate of SM is considered, there is no combine to appoint The migration characteristic of business cannot carry out balanced use to SM, it is inefficient to cause task execution.

Solve the difficulty of above-mentioned technical problem:

(1) current task scheduling algorithm is not high to the utilization rate of the computing unit (stream handle SM) of GPU, cannot be comprehensive Close influence of the feature of the utilization efficiency and task that consider SM to energy consumption.

(2) transport phenomena in task schedule is common problem, and migration of the task in stream handle will cause Energy loss, current scheduling strategy cannot effectively alleviate task immigration phenomenon.

(3) there is presently no effective energy optimization task scheduling algorithms to be able to solve two above problem, needs algorithm Design.

Solve the meaning of above-mentioned technical problem:

(1) problem not high for computing unit utilization rate, the present invention analyze task schedule energy consumption under GPU environment and ask The correlative factor of topic, and creatively taken out the history utilization for influencing the balanced impact factor CB and SM of task immigration Rate HRV solves the problems, such as energy loss caused by the task immigration in scheduling using balance idea.By balance policy to SM In task carry out rational management, the utilization rate of computing unit is improved, to reduce GPU energy loss.

(2) for energy loss caused by task immigration, the present invention comprehensively utilizes computing resource attribute (SM history resource Utilization rate) with task characteristic (balanced impact factor), realize better energy optimization strategy by the way that the two is combined.It is logical The utilization rate for crossing the sequence and SM to task balance sorts to optimize task scheduling strategy, and reducing in SM for task is moved Phenomenon is moved, to realize the comprehensive energy consumption optimization of GPU.

(3) be directed to energy optimization task scheduling algorithm, the present invention construct task balance dispatching algorithm equilibrium influence because Son-utilization rate of equipment and installations task scheduling strategy (CB-HRV), and carry out algorithm frame building and realize.

The present invention in order to effectively alleviate the above problem and enable GPU more widely application to adapt to program Diversity, propose balance dispatching strategy, realize a low overhead, portable good task energy optimization model.This hair Bright scheduling strategy has fully considered migration characteristic of the task on SM；And optimize GPU in conjunction with the equipment execution efficiency of SM Inside energy consumption, referred to as based on task scheduling strategy (the CB-HRV Task of balanced impact factor-utilization rate of equipment and installations Schedule Approach)。

Summary of the invention

In view of the problems of the existing technology, the present invention provides energy consumptions inside a kind of GPU of task based access control balance dispatching Optimization method.

The invention is realized in this way energy consumption optimization method inside a kind of GPU of task based access control balance dispatching, the base Energy consumption optimization method is used using the balanced impact factor CB and SM of pending program inside the GPU of task balance scheduling Frequency information PHB handle proposes the CB-HRV task scheduling algorithm of balanced impact factor and SM frequency of use information；CB-HRV One group task is assigned to corresponding SM according to the balanced impact factor of task according to the time of task execution by task scheduling algorithm Middle execution；Strategies scheduler task balance is assigned in each SM；Meanwhile it being set using SM frequency of use information to improve The service efficiency of standby resource.The task of energy optimization is realized by the integration application to pending mission bit stream and resource information Dispatching method.

Further, the CB-HRV task scheduling algorithm specifically includes:

(1) the balance influence factor CB for calculating each task, calculates the utilization rate PHB of each SM resource；

(2) it resequences according to the sequence of the balance influence factor of task from big to small to each task, forms task team Column；

(3) it resequences according to the sequence of the utilization rate PHB of SM resource from big to small to each SM；

(4) task queue of sequence is successively matched in the SM queue of sequence.

Further, the CB-HRV task scheduling algorithm forms two queues, and one is history frequency of use queue, separately One is task balance impact factor；

The queue of task balance impact factor is indicated with TaskBalance [], and with SortTask (TaskBalance [i] Function is to the queue descending sort；The queue of history frequency of use is indicated with SmRatio [], and with SortSm (SmRatio [i]) Function carries out descending sort to the queue；For the SM history frequency of use PHB after sequence, array SmRationIndex is used [] storage；For the task balance impact factor sequence after sequence, stored using array TaskCBIndex [].

Further, the CB-HRV task scheduling algorithm further comprises:

(1) the balance influence factor CB of each task is calculated with CalculateBalance (P [i]) function first, and It is stored in TaskBalance [i] array, the use of each SM resource is then calculated with CalculateRatio (SM [i]) Rate ratio value, and be stored in SmRatio [i] array；

(2) it is given using SortTask (TaskBalance [i]) according to the sequence of impact factor CB value from big to small each Task rearrangement, and be stored in TaskCBIndex [] array, form task queue；With SortSm (SmRatio [i]) Function is resequenced according to the sequence of SM utilization rate from big to small to each SM, and result is stored in SmRationIndex In [] array；

(3) task queue of sequence is successively matched in the SM queue of sequence.

Further, energy consumption optimization method inside the GPU of the task based access control balance dispatching further include:

The first step, analyzes the balanced impact factor information of pending task, and is ranked up from small to large；Comparing Being easy to produce transport phenomena of the task distributes to the relatively high SM of utilization rate in GPU and not the receiving migration interference greatly of the task is divided The relatively low SM of dispensing utilization rate；

Second, two queues are formed, first queue is the balanced descending sequence of impact factor CB according to task Queue；The balanced impact factor of task is bigger, shows that the task is also bigger by the degree of migration interference；Second queue be By the queue PHB of the descending sequence of the frequency of use of SM；Frequency of use PHB value is higher, illustrates that the frequency of use of the SM is got over Height, execution efficiency are also higher；

The equilibrium maximum task of impact factor distributed to the highest SM of execution efficiency and executed by third step, is thus reduced It is influenced as caused by task immigration；Successively task is distributed to according to balanced impact factor size the SM of descending sequence It executes, realizes overall optimisation strategy；

4th step is being obtained by the way that the high task of balanced impact factor is configured in the higher SM of execution efficiency wait hold In the case of the temporal information of row task, the impact factor information according to task is ranked up task, secondly obtains the history of SM Frequency of use simultaneously sorts；The mission number distribution recorded according to the task queue of each SM is given in corresponding SM.

Another object of the present invention is to provide energy consumption inside a kind of GPU using the task based access control balance dispatching is excellent The graphics processor of change method.

In conclusion advantages of the present invention and good effect are as follows: the present invention analyzes the correlation of task based access control scheduling balance Theoretical and algorithm, and propose that bonding apparatus state balances scheduling strategy with the task of task characteristic: balanced impact factor-is set Standby utilization rate task scheduling strategy (CB-HRV)；Strategy had both considered the Resource Properties of SM while having had also contemplated task in SM Migration characteristic, dynamic workload balance dispatching is realized in GPU using algorithm.Resource of the present invention to each SM Utilization rate is analyzed, and the balance dispatching of task is carried out using the characteristic of task, and the dynamic realized in multiple SM is flat Weighing apparatus scheduling, has reached the energy optimization target of GPU.

The present invention is by reasonably distributing processing unit task disappearing to the system energy consumption during reducing task execution Consumption.CB-HRV method for scheduling task obtains the time data of a group task first, and carries out according to balanced impact factor to them Sequence, while the specific SM usage history utilization rate of the task of execution is ranked up, then successively using the thought of balance dispatching Mission number after descending is assigned on the processing unit of descending sort, is realized based on balanced GPU energy optimization dispatching party Method.

In experimental section, the present invention carries out Experimental comparison to four kinds of Dynamic Assignments respectively and analyzes superiority and inferiority；Experiment The result shows that balanced impact factor-utilization rate of equipment and installations task scheduling strategy can be such that the energy consumption and performance of GPU improves 15.69%.The scheduling scheme of time is executed, the experimental results showed that, compared with the conventional method, CB-HRV scheme energy consumption average out to 10.38%；Compared with the conventional method, dispatching method of the invention is more effective, executes the scheduling scheme of time；Experimental result table It is bright, compared with the conventional method, CB-HRV scheme energy consumption average out to 10.38% of the invention；Compared with the conventional method, from experiment As a result it is seen on, CB-HRV method proposed by the present invention is effective, reasonable and feasible.

Detailed description of the invention

Fig. 1 is energy consumption optimization method flow chart inside the GPU of task based access control balance dispatching provided in an embodiment of the present invention.

Fig. 2 is the task immigration phenomenon schematic diagram in SM provided in an embodiment of the present invention；

In figure: (a) the long-living overload phenomenon of task；(b) it reschedules task and generates task immigration phenomenon.

Fig. 3 is the task completion time schematic diagram of different allocation plans in SM provided in an embodiment of the present invention.

Fig. 4 is the balanced impact factor schematic diagram of different benchmarks provided in an embodiment of the present invention.

Fig. 5 is Matrix Multiplication task execution timing diagram provided in an embodiment of the present invention.

Fig. 6 is Matrix Multiplicatio scheduling time distribution ratio example diagram provided in an embodiment of the present invention.

Fig. 7 is Histogram task execution timing diagram provided in an embodiment of the present invention.

Fig. 8 is Histogram scheduling time distribution ratio example diagram provided in an embodiment of the present invention.

Fig. 9 is Scalar products task execution timing diagram provided in an embodiment of the present invention.

Figure 10 is Scalar products scheduling time distribution ratio example diagram provided in an embodiment of the present invention.

Figure 11 is Scalar products task execution timing diagram provided in an embodiment of the present invention.

Figure 12 is Scalar products scheduling time distribution ratio example diagram provided in an embodiment of the present invention.

Figure 13 is average energy consumption (test one) schematic diagram of different scheduling schemes provided in an embodiment of the present invention.

Figure 14 is average energy consumption (test two) schematic diagram of different scheduling schemes provided in an embodiment of the present invention.

Figure 15 is average energy consumption (test three) schematic diagram of different scheduling schemes provided in an embodiment of the present invention.

Figure 16 is that the energy consumption of CB-HRV dispatching method provided in an embodiment of the present invention promotes contrast schematic diagram.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to this hair It is bright to be further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, not For limiting the present invention.

The present invention analyzes the correlative factor of task schedule energy consumption problem under GPU environment, and converts task for this problem Scheduling problem.The balanced impact factor of task is creatively introduced, the task immigration solved in scheduling using balance idea is made At energy loss problem, by balance policy in SM task carry out rational management reduce GPU energy loss.Synthesis makes With the equalization characteristic of Resource Properties and task, better energy optimization strategy is realized by the way that the two is combined, and to equilibrium Scheduling carries out.Optimize task scheduling strategy by the utilization rate sequence of sequence and SM to task balance, realizes The comprehensive energy consumption of GPU optimizes.Construct task balance dispatching algorithm equilibrium impact factor-utilization rate of equipment and installations task scheduling strategy (CB-HRV), and code construction has been carried out.

Application principle of the invention is described in detail with reference to the accompanying drawing.

As shown in Figure 1, energy consumption optimization method packet inside the GPU of task based access control balance dispatching provided in an embodiment of the present invention Include following steps:

S101: the balanced impact factor information of pending task is analyzed, and is ranked up from small to large；Hold comparing Being also easy to produce transport phenomena of the task distributes to the relatively high SM of utilization rate in GPU and not the receiving migration interference greatly of the task is distributed Give utilization rate relatively low SM；

S102: two queues are formed, first queue is the balanced descending sequence of impact factor CB according to task Queue；The balanced impact factor of task is bigger, shows that the task is also bigger by the degree of migration interference；Second queue be by The queue PHB of the descending sequence of the frequency of use of SM；Frequency of use PHB value is higher, illustrates that the frequency of use of the SM is higher, Execution efficiency is also higher；

The equilibrium maximum task of impact factor: being distributed to the highest SM of execution efficiency and executed by S103, thus come reduce by It is influenced caused by task immigration.Next, task is successively distributed to descending sequence according to balanced impact factor size SM execute, realize overall optimisation strategy；

S104: pending obtaining by the way that the high task of balanced impact factor is configured in the higher SM of execution efficiency In the case of the temporal information of task, the impact factor information according to task is ranked up task, and the history for secondly obtaining SM makes With frequency and sort；The mission number distribution recorded according to the task queue of each SM is given in corresponding SM.

Application principle of the invention is further described with reference to the accompanying drawing.

1, the transport phenomena in SM task schedule

In task schedule in GPU, some task can not be completed only in a fixed SM, and task is in SM Migration be inevitable phenomenon.There are two task in Fig. 2, the scheduler in SM dynamically monitors its scheduling queue.Scheming Transition process of the task in two SM in 2.It (a) is the process for being not carried out task immigration, two tasks are divided in two SM It does not execute.As seen from Figure 2, task 1 has been dispensed on SM1 and SM2 and has executed.Second task (is used orange table by scheduler Show) it is allocated to SM2 execution.But a part and task 2 of task 1 execute on SM2 simultaneously, and SM2 is caused to overload.At this moment, In order to avoid SM is crashed due to overload, scheduler can dispatch task 2 and execute on loading lesser SM1, as shown in Figure 2.

Task dispatcher can redistribute task by the above strategy to avoid the exception of GPU.But at the same time, by The redundant overheads of GPU are caused in task immigration, task needs switch between SM.If there is n task enters SM group Part causes the increase of energy consumption since the expense that task immigration generates continues to increase.It should in the scheduling strategy of task Migration of the task in each SM is avoided as far as possible.

2, schedule equalization tasks analysis of strategies

One task executes on multiple SM if there is multiple warp, warp needs.If considering holding for some task Line efficiency considers task in each SM comprehensive deadline executed.Fig. 3 holds some task at two SM (SM1, SM2) Row is illustrated.

In Fig. 3, the difference that the same task takes different Task Assigned Policies to generate in two SM is presented. The task needs 100 clock cycle to complete, and in the first scenario, Task Assigned Policy is assigned with 20 periods on SM1 Task, and the task of subsequent 80 clock cycle give SM2 complete.Task allocation plan needs 80 clocks in total Period completes.The task of 40 clock cycle is then allocated to SM1 by second of allocation plan, 60 clock cycle in addition Task then distributes to SM2 completion.Entire task was completed to execute within 60 periods under this allocation plan.See, latter side Case (60 clock cycle) is completed faster.Second of workload allocation plan more effectively uses the SM inside GPU Resource, this scheme are more balanced allocation strategies.

In order to illustrate two kinds of tactful differences, balanced impact factor CB (coefficient of has been taken out Balance):

In formula: the impact factor of CB expression task.Wherein σ is standard deviation, and λ is the average value of SM deadline；Respectively The CB value for calculating CB the first allocation plan just now in scheme is 72.8%, and the CB value of second scheme is 45.1%.In order to The rule for finding each task balance impact factor tests their CB value using one group of CUDA benchmark program.Experiment is chosen 4 typical CUDA benchmark programs, respectively Matrix Multiplication, histogram, scalar Products and BlackScholes.Following table illustrates four used benchmark program parameters of experiment in detail.

1 benchmark of table

This 4 groups of benchmark programs are tested using simulator, and calculate their CB value, as a result the institute in Fig. 4 Show, the CB value that the CB value of MM is about 40%, HG has been more than 60%.The CB value of SP and BS is relatively small, and SP is about 35%, BS's CB value is less than 5%.It is uneven that task dispatcher will monitor such workload during application program executes, and will not complete Warp adjacent SM is re-assigned to from the SM of overload, more efficiently use SM.This illustrates that different task characteristic has not Same balanced impact factor, the migration that the balanced impact factor gone out on missions and task are summarized from experiment have proportional relation, also It is to say that a possibility that balanced impact factor of some task is bigger, it is migrated between SM is higher.Influence task is completed Efficiency and be finally reflected in terms of energy consumption, observation inspires the present invention that all tasks are carried out overall row in some way Then sequence is allocated task in SM by sequence, so as to improve the utilization rate of each SM, reaches whole energy optimization Target.

3, the scheduling strategy of task based access control equilibrium impact factor and computing resource utilization rate

It in order to avoid the migration of task, and realizes the Task Assigned Policy of balance dispatching, finds a kind of scheduling of equilibrium Strategy.The strategy can overcome the shortcomings of in original scheduling strategy.Realize that (1) avoids task immigration in unnecessary SM as far as possible The phenomenon that, to reduce the energy loss generated due to task immigration.(2) comprehensive to use Resource Properties and task characteristic.Current Scheme generally uses one of SM state or task characteristic as the foundation of scheduling strategy, and the two is combined realization Better energy optimization strategy.(3) comprehensive energy consumption that rational management realizes GPU is carried out to the task in SM by balance policy Optimization；This needs to fully consider the harmony of task schedule, more effectively utilizes SM computing resource.

In heterogeneous system, the pending task appropriate to SM distribution is that one of decision systems execution time is important Factor.Therefore, pending is appointed by comprehensively considering balanced impact factor CB and SM history frequency of use information PHB Business, which is reasonably assigned to the balance dispatching that can be achieved on task in appropriate SM and improves SM utilization rate, realizes energy optimization Target.

According to the deficiency of Existing methods, the balanced impact factor CB and SM frequency of use of pending program are comprehensively utilized Information PHB handle proposes the CB-HRV task scheduling strategy (CB-HRV of balanced impact factor and SM frequency of use information Scheduling Scheme).The thought of CB-HRV task scheduling algorithm be according to task execution time by a group task by It is assigned in corresponding SM and executes according to the balanced impact factor of task.It task balance is distributed according to this strategies scheduler Into each SM, migration of the task in each SM is efficiently avoided, meanwhile, it is set using SM frequency of use information to improve The service efficiency of standby resource.A kind of energy optimization is realized by the integration application to pending mission bit stream and resource information Method for scheduling task.

Specific method is that analysis need to distribute the balanced impact factor of task and the history utilization rate of SM, comprehensive utilization The two information carry out balanced efficient scheduling to task.The balanced impact factor information of pending task is analyzed first, and It is ranked up from small to large.The smaller distribution for illustrating the task on SM of balanced impact factor is more uniform, is unlikely to by task The interference of migration.On the contrary, illustrating that distribution of the task on SM is more uneven if balanced impact factor is bigger, taken office The degree of business migration interference is bigger.Based on this analysis, being easier to generate transport phenomena of the task is distributed to sharp in GPU Not the receiving migration interference greatly of the task is distributed to the relatively low SM of utilization rate with rate relatively high SM.Reduce on total Transport phenomena of the task in SM, and the utilization rate of SM is improved simultaneously, improve the energy optimization strategy of entirety GPU.

In specific implementation, two queues are formed, as shown in Figure 5.First queue be according to task it is balanced influence because The queue of the sub- descending sequence of CB.The balanced impact factor of task is bigger, shows that the task is also got over by the degree of migration interference Greatly；Second queue is the queue PHB by the descending sequence of frequency of use of SM.Frequency of use PHB value is higher, illustrates this The frequency of use of SM is higher, and execution efficiency is also higher.Then, the equilibrium maximum task of impact factor is distributed to execution efficiency Highest SM is executed, and thus reduces the influence as caused by task immigration.According to balanced impact factor size successively task point The SM of the descending sequence of dispensing is executed, and realizes overall optimisation strategy.

By the way that the high task of balanced impact factor (the relatively high task of mobilance) is configured to the higher SM of execution efficiency In (PHB value is higher), better energy consumption effect can be obtained.It comprises the concrete steps that, in the temporal information feelings for obtaining pending task Under condition, the impact factor information for being first depending on task is ranked up task, secondly obtains the history frequency of use of SM side by side Sequence.Then the mission number distribution recorded according to the task queue of each SM is given in corresponding SM.

4, the dispatching algorithm of CB-HRV task scheduling strategy

The specific implementation steps of CB-HRV task scheduling algorithm are as follows:

(1) the balance influence factor CB for calculating each task, calculates the utilization rate PHB of each SM resource.

(2) it resequences according to the sequence of the balance influence factor of task from big to small to each task, forms task team Column.

(3) it resequences according to the sequence of the utilization rate PHB of SM resource from big to small to each SM.

(4) task queue of sequence is successively matched in the SM queue of sequence.

CB-HRV algorithm needs to form two queues, and one is history frequency of use queue, the other is task balance shadow Ring the factor.And it is sorted respectively to them.The queue of task balance impact factor is indicated with TaskBalance [], is used in combination (TaskBalance [i] function is to the queue descending sort by SortTask.History frequency of use team is indicated with SmRatio [] Column, and descending sort is carried out to the queue with SortSm (SmRatio [i]) function.For the SM history frequency of use after sequence PHB uses array SmRationIndex [] Lai Cunfang.For the task balance impact factor sequence after sequence, array is used TaskCBIndex [] is stored.The pseudocode of table 2CB-HRV dispatching algorithm.

Balanced impact factor-utilization rate of equipment and installations the task scheduling algorithm of table 2

CB-HRV Task Scheduling Approach

Algorithm is described as follows:

(1) 3-5 row calculates the balance influence of each task with CalculateBalance (P [i]) function first Factor CB, and be stored in TaskBalance [i] array, then 6-8 row is calculated with CalculateRatio (SM [i]) The utilization rate ratio value of each SM resource, and be stored in SmRatio [i] array.

(2) the 9th enforcements are given with SortTask (TaskBalance [i]) according to the sequence of impact factor CB value from big to small Each task rearrangement, and be stored in TaskCBIndex [] array, form task queue.10th row SortSm (SmRatio [i]) function is resequenced according to the sequence of SM utilization rate from big to small to each SM, and result is stored in In SmRationIndex [] array.

(3) 11-31 rows are successively matched to the task queue of sequence in the SM queue of sequence.According to the quantity of task Number (P) is divided into two kinds of situations.The first situation is the quantity that task quantity is not more than SM, and 11-18 row handles the first Situation.Directly on the task orientation after sequence to correspondingly SM.Distribution is participated in function isSMfree (i) judgement first Whether SM can be with.If SM can participate in distribution, task is assigned to correspondingly using function mappingSM [i] (P [i]) On SM.Second situation is the quantity that task quantity is greater than SM, and 19-28 row handles second situation.Such case then appoint Business is divided into several groups (every group of quantity is NumSM), and task is still assigned to phase using function mappingSM [j] (P [i]) It answers on ground SM.29-30 row discharges two array of indexes, to use in next sub-distribution.

Application effect of the invention is explained in detail below with reference to experiment.

In order to compare RAD, DFB, PHB and CB-HRV dispatching method, performance pair has been carried out to four kinds of dispatching methods Than analysis.Experiment is divided into 3 parts, and first part is experimental design and experimental situation parameter declaration；The second part is The analysis of CB-HRV algorithm harmony, the present invention compared tetra- kinds of dispatching algorithms of RAD, DFB, PHB and CB-HRV appointing on SM The equilibrium characteristic of business distribution, and analyzed.Third part is CB-HRV algorithm energy consumption analysis, comparative analysis of the present invention The energy consumption and performance of RAD, DFB, PHB and CB-HRV dispatching algorithm.

1, experimental design and experiment parameter explanation

(1) implementation steps tested are as follows:

(a) time of the program on each GPU is obtained.It is successively run with RAD, DFB, PHB and CB-HRV dispatching method 20 test programs, and the execution time diagram on 4 SM is obtained, it thus calculates and executes shared time scale on each SM.

(b) the above test program is successively run with RAD, DFB, PHB and CB-HRV dispatching method, and utilizes power meter The power consumption of entire device is measured by measurement electric current and its input voltage, to obtain the power consumption of four kinds of different dispatching methods Data.

(2) experimental situation

The software and hardware experimental situation of this experiment is as shown in table 3.The CSM that the hardware experiments environment of heterogeneous system uses is i5- 7500 processors and 4 pieces of 1060 video cards of NVIDIA GeForce GTX.Installed System Memory in hardware experiments environment is 8GB, GPU Inside save as 6GB.GPU use framework be Pascal, it possess 10 SM (Streaming Multiprocessors, each SM contains 128 SMDA cores, in total 1280 SMDA core, it is possible to provide 4.4TFLOPS floating-point operation ability.Software experimentation ring Border is windows 10, VS2015 and SMDA9.2.Following table is detailed description.

Used hardware environment in the experiment of table 3

2, CB-HRV dispatching algorithm harmony is analyzed

The present invention is separately operable 20 Matrix Multiplication, histogram, scalar on GPU Products and BlackScholes test program, and every case is obtained in the execution of the execution 1 second on 4 SM Between scheme.

(1) Matrix Multiplication test program is analyzed

The present invention is shown the distribution condition of 4 SM task execution times.Fig. 5 illustrates 20 Matrix Task execution timing diagram of 4 SM in one second in the case of Multiplication and its corresponding input range.

Time proportion, this hair are executed in 4 SM in order to more intuitively study Matrix Multiplication It is bright to be shown with pie chart.

The SM maximum execution time it can be seen from Fig. 6 in RAD method accounts for the entire figure for executing the time 38.71%, the SM the smallest execution time account for the 14.66% of the entire execution time, the execution time of the two difference 24.05%. 34.79%, the SM the smallest execution time that the SM maximum execution time in DFB method accounts for the entire execution time, which accounts for, entirely holds The 14.66% of row time, the execution time of the two difference 17.64%.The SM maximum execution time in PHB method accounts for entirely 34.79%, the SM the smallest execution time for executing the time accounts for the 18.26% of the entire execution time, the two difference 16.53% Execute the time.The maximum execution time in CB-HRV method accounts for the 27.67% of entire execution time, the smallest execution time Account for the 23.06% of entire execution time, the execution time of the two difference 4.61%.It bright can find out to dispatch based on CB-HRV and calculate Than other the three kinds scheduling of task distribution of the method on 4 SM are more balanced.

(2) Histogram test program is analyzed

The present invention is shown the distribution condition of 4 SM task execution times.Fig. 6 illustrate 20 Histogram and Task execution timing diagram of 4 SM in one second in the case of input range corresponding to it.In order to more intuitively study Histogram executes time proportion in 4 SM, and the present invention is shown with pie chart.

The SM maximum execution time it can be seen from Fig. 7 in RAD method accounts for the entire figure for executing the time 32.88%, the SM the smallest execution time account for the 16.78% of the entire execution time, the execution time of the two difference 16.10%. 31.78%, the SM the smallest execution time that the SM maximum execution time in DFB method accounts for the entire execution time, which accounts for, entirely holds The 18.93% of row time, the execution time of the two difference 12.85%.The SM maximum execution time in PHB method accounts for entirely 30.76%, the SM the smallest execution time for executing the time accounts for the 20.64% of the entire execution time, and the two difference 9.11% is held The row time.The maximum execution time in CB-HRV method accounts for the 25.12% of entire execution time, and the smallest execution time accounts for Entire to execute the 24.85% of the time, the two difference is only 0.27% execution time.Can significantly it find out based on CB-HRV Than other the three kinds scheduling of task distribution of the dispatching algorithm on 4 SM are more balanced.

(3) Scalar products test program is analyzed

The present invention is shown the distribution condition of 4 SM task execution times.Fig. 6 illustrates 20 Scalar Task execution timing diagram of 4 SM in one second in the case of product and its corresponding input range.In order to more intuitive Research test program time proportion is executed in 4 SM, the present invention is shown with pie chart.

The SM maximum execution time it can be seen from Fig. 8 in RAD method accounts for the entire figure for executing the time 38.50%, the SM the smallest execution time account for the 15.29% of the entire execution time, the execution time of the two difference 23.21%. 35.57%, the SM the smallest execution time that the SM maximum execution time in DFB method accounts for the entire execution time, which accounts for, entirely holds The 19.35% of row time, the execution time of the two difference 16.22%.The SM maximum execution time in PHB method accounts for entirely 34.58%, the SM the smallest execution time for executing the time accounts for the 19.32% of the entire execution time, the two difference 15.26% Execute the time.The maximum execution time in CB-HRV method accounts for the 29.23% of entire execution time, the smallest execution time Account for the 21.43% of entire execution time, the execution time of the two difference 7.80%.It can be seen that being based on CB-HRV dispatching algorithm Than other three kinds scheduling of task distribution on 4 SM are more balanced.

(4) BlackScholes test program is analyzed

The present invention is shown the distribution condition of 4 SM task execution times.Fig. 6 illustrates 20 Task execution timing diagram of 4 SM in one second in the case of BlackScholes and its corresponding input range, such as Fig. 9.

Time proportion is executed in 4 SM in order to more intuitively study test program, the present invention is carried out with pie chart It shows.

The SM maximum execution time it can be seen from Figure 10 in RAD method accounts for the entire figure for executing the time 38.54%, the SM the smallest execution time account for the 13.82% of the entire execution time, the execution time of the two difference 24.72%. 34.61%, the SM the smallest execution time that the SM maximum execution time in DFB method accounts for the entire execution time, which accounts for, entirely holds The 16.92% of row time, the execution time of the two difference 17.69%.The SM maximum execution time in PHB method accounts for entirely 34.42%, the SM the smallest execution time for executing the time accounts for the 17.42% of the entire execution time, the two difference 17.00% Execute the time.The maximum execution time in CB-HRV method accounts for the 32.32% of entire execution time, the smallest execution time Account for the 18.36% of entire execution time, the execution time of the two difference 13.96%.It can be seen that being based on CB-HRV dispatching algorithm Than other three kinds scheduling of task distribution on 4 SM are more balanced.

It, can be from obtaining CB-HRV dispatching method in every kind of test in data from the harmonious analysis of 4 kinds of test programs Harmony compared to other three kinds of scheduling schemes is all best.The sequence of its improvement is successively histogram, Matrix Multiplication, scalar products and BlackScholes, the task of this and four kinds of test programs Migration characteristic is identical.It is 4 tests that the best reason of Histogram improvement, which is due to histogram test program, Task immigration phenomenon is the most apparent in program.

From the ratio for executing the time, CB-HRV scheduling is dispatched compared with RAD, DFB and PHB to the evenly distributed of task And rationally.This demonstrates the equalization characteristic of CB-HRV dispatching method, and this harmony can effectively reduce task moving in SM Phenomenon is moved, and more effectively utilizes the computing capability of all SM.Finally, this equalization characteristic will effectively reduce the energy of GPU Consumption.

3, CB-HRV dispatching algorithm energy consumption analysis

In order to verify the energy consumption characteristics of CB-HRV dispatching algorithm, the present invention measures different scheduling strategies and different inputs Under the conditions of four benchmark programs average energy consumption.To RAD, the energy consumption characteristics of DFB and PHB scheduling strategy compare analysis. Table 4 illustrates four used benchmark program parameters of experiment in detail.

Table 4

Present invention uses 18 kinds of situations to emulate to each execution sequence by the present invention.Due to this four programs for Energy consumption and performance have strong influence to scheduler program, therefore the present invention repeatedly tests all sequences in four programs. Then, average numerical value has been used to be compared each case.Measurement for energy consumption, in this experiment, the present invention use Power meter is as system energy consumption measuring tool.The energy consumption that it is measured is to be input to the energy consumption of heterogeneous system.

In an experiment, the present invention has selected typical dispatching method to be compared.These methods be respectively RAD, DFB and PHB.In an experiment, since the performance of RAD and DFB scheduling mode and system energy consumption are larger by the execution sequence of pending program. It is obtained for this purpose, using the method for being performed a plurality of times and averaging.Since the experimental situation of this experiment is GPU all in system All be it is identical, therefore, the sequence that program executes equally produces influences on PHB method, same using more in an experiment The secondary method averaged that executes obtains.In an experiment, the CB-HRV implementation method of this experiment is as follows, first to this experiment institute The method of proposition is realized in VS2015；After obtaining CB-HRV output result, the program sequence obtained after output is utilized Column, which reprogram, to be run.

In order to verify the energy consumption characteristics of CB-HRV dispatching algorithm, the present invention is four benchmark programs in different tune The measurement of average energy consumption is carried out under degree strategy and different input conditions.The method compared is RAD, DFB and PHB scheduling Strategy, input condition used in scheduling strategy consider the typical input condition of typical number of tasks and each task.Under Table 5 is the energy consumption of RAD, DFB, PHB and CB-HRV scheduling in three tests.

Table 5

It is compared in order to clearer, the present invention is respectively shown the data tested three times with histogram, in figure Abscissa indicate different dispatching methods, ordinate indicates the energy consumption of consumption.

The incoming task number of test 1 is 24, energy consumption measurement the result is shown in Figure 13.It can be seen that the method for RAD is disappeared in figure The average energy consumption of consumption is most, and average energy consumption consumed by DFB and PHB is suitable, the display of energy consumption minimal graph 13 consumed by CB-HRV The energy consumptions of various methods.In general, to the RAD method phase based on 24 incoming tasks and its corresponding input range Than CB-HRV method realizes average energy saving 13.53%；Compare the average energy conservation 5.16% of DFB；Compare PHB method, CB-HRV Method averagely saves 5.60% energy.

The incoming task number of test 2 is that 40, Figure 14 shows that RAD method consumes most energy.DFB and PHB disappears in energy Consumption is taken second place, and compared with RAD method, CB-HRV method realizes average energy saving 11.41%；Compare the average energy conservation of DFB 5.14%；PHB method is compared, CB-HRV method averagely saves 3.93% energy.

The energy of the energy consumption measurement the result is shown in Figure 15 of test 3, the consumption of RAD method is most, and DFB and PHB are having the same flat Equal energy consumption, and the energy of CB-HRV consumption is minimum.Compared with RAD method, CB-HRV method realizes average energy conservation 7.97%；Compare the average energy conservation 4.95% of DFB；PHB method is compared, CB-HRV method averagely saves 4.48% energy.

In order to which the energy consumption of clearer display CB-HRV method promotes effect, the present invention is with enhancing rate comparison diagram to above Three experiment energy consumption effects indicated, as shown in figure 16.

In conclusion under the experiment of different number of tasks and different input range situations, the CB-HRV of this experiment proposition Method can evenly distribute task compared with RAD, DFB and PHB.From energy consumption effect, average compared with RAD method energy saving 10.97%, Compared with the average energy conservation 5.09% of DFB, compared with the average energy conservation 4.67% of PHB.

Why the energy consumption of mean consumption is at most to only considered the quantity of task RAD method, does not consider the state of SM And the status consideration of task, it causes and only considers number of tasks and do not consider the state of SM and execute the time and then consume more Energy consumption.Average energy consumption consumed by DFB and PHB is almost suitable, the reason is that causing in the environment of identical GPU in PHB method It is DFB method that PHB dispatching method, which is degenerated,.The use state that the SM in system is considered due to DFB and PHB, in task It is better than RAD method in distribution, the consumption of the energy consumption of DFB method and PHB method is less than RAD method in figure.

It can be seen from the figure that CB-HRV method is more preferable than the energy-saving effect of other three kinds of methods, this is attributed to the fact that CB- The harmony of HRV algorithm.Distribution due to task in each SM is more rationally effective, and it is existing to reduce migration of the task between SM As so that new algorithm achieves better energy optimization effect.Experimental result illustrates the CB-HRV method that this experiment proposes With more task balance characteristic, energy consumption and performance is also more preferable.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. energy consumption optimization method inside a kind of GPU of task based access control balance dispatching, which is characterized in that the task based access control equilibrium tune Energy consumption optimization method utilizes balanced impact factor CB and SM the frequency of use information PHB handle of pending program inside the GPU of degree, It is proposed the CB-HRV task scheduling algorithm of balanced impact factor and SM frequency of use information；CB-HRV task scheduling algorithm foundation One group task is assigned in corresponding SM according to the balanced impact factor of task and executes by the time of task execution；Strategies scheduler It task balance is assigned in each SM；Meanwhile the service efficiency of device resource is improved using SM frequency of use information；It is logical Cross the method for scheduling task that energy optimization is realized to the integration application of pending mission bit stream and resource information.

2. energy consumption optimization method inside the GPU of task based access control balance dispatching as described in claim 1, which is characterized in that described CB-HRV task scheduling algorithm specifically includes:

(2) it resequences according to the sequence of the balance influence factor of task from big to small to each task, forms task queue；

(4) task queue of sequence is successively matched in the SM queue of sequence.

3. energy consumption optimization method inside the GPU of task based access control balance dispatching as claimed in claim 2, which is characterized in that described CB-HRV task scheduling algorithm formed two queues, one is history frequency of use queue, the other is task balance influence because Son；

The queue of task balance impact factor is indicated with TaskBalance [], and with SortTask (TaskBalance [i] function pair The queue descending sort；The queue of history frequency of use is indicated with SmRatio [], and with SortSm (SmRatio [i]) function pair The queue carries out descending sort；For the SM history frequency of use PHB after sequence, deposited using array SmRationIndex [] It puts；For the task balance impact factor sequence after sequence, stored using array TaskCBIndex [].

4. energy consumption optimization method inside the GPU of task based access control balance dispatching as claimed in claim 2, which is characterized in that described CB-HRV task scheduling algorithm further comprises:

(1) the balance influence factor CB of each task is calculated with CalculateBalance (P [i]) function first, and stored In TaskBalance [i] array, the utilization rate of each SM resource is then calculated with CalculateRatio (SM [i]) Ratio value, and be stored in SmRatio [i] array；

(2) each task weight is given according to the sequence of impact factor CB value from big to small using SortTask (TaskBalance [i]) New sort, and be stored in TaskCBIndex [] array, form task queue；With SortSm (SmRatio [i]) function according to The sequence of SM utilization rate from big to small is resequenced to each SM, and result is stored in SmRationIndex [] array；

(3) task queue of sequence is successively matched in the SM queue of sequence.

5. energy consumption optimization method inside the GPU of task based access control balance dispatching as described in claim 1, which is characterized in that described Energy consumption optimization method inside the GPU of task based access control balance dispatching further include:

The first step, analyzes the balanced impact factor information of pending task, and is ranked up from small to large；Being easier to produce The task of raw transport phenomena distributes to the relatively high SM of utilization rate in GPU and not the receiving migration interference greatly of the task is distributed to utilization The relatively low SM of rate；

Second, two queues are formed, first queue is the team according to the balanced descending sequence of impact factor CB of task Column；The balanced impact factor of task is bigger, shows that the task is also bigger by the degree of migration interference；Second queue is by SM The queue PHB of the descending sequence of frequency of use；Frequency of use PHB value is higher, illustrates that the frequency of use of the SM is higher, executes effect Rate is also higher；

The equilibrium maximum task of impact factor is distributed to the highest SM of execution efficiency and executed by third step, thus come reduce due to It is influenced caused by task immigration；Successively the SM that task distributes to descending sequence is executed according to balanced impact factor size, Realize overall optimisation strategy；

4th step is obtaining pending by the way that the high task of balanced impact factor is configured in the higher SM of execution efficiency In the case of the temporal information of business, the impact factor information according to task is ranked up task, and the history for secondly obtaining SM uses Frequency simultaneously sorts；The mission number distribution recorded according to the task queue of each SM is given in corresponding SM.

6. energy consumption optimization method inside a kind of GPU using task based access control balance dispatching described in Claims 1 to 5 any one Graphics processor.