CN110515735A

CN110515735A - A kind of multiple target cloud resource dispatching method based on improvement Q learning algorithm

Info

Publication number: CN110515735A
Application number: CN201910807351.6A
Authority: CN
Inventors: 李成严; 孙巍; 宋月
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2019-11-29

Abstract

The present invention provides a kind of based on the multiple target cloud resource dispatching method for improving Q learning algorithm.This method is constantly interacted by Agent with environment, and study obtains optimal policy.The present invention passes through Cloudsim cloud computing emulation platform, it is random to generate different task and virtual machine, using the deadline and operating cost for optimizing task simultaneously as optimization aim, it designs a kind of based on the multiple target cloud resource dispatching method for improving Q learning algorithm, the convergence rate of Q learning algorithm is accelerated using the heuristic movement selection strategy for automatically updating weight factor, while improving the optimizing ability of algorithm, to improve the utilization rate of cloud resource, user satisfaction is improved, operator's cost is reduced.

Description

A kind of multiple target cloud resource dispatching method based on improvement Q learning algorithm

Technical field

The present invention relates to cloud resource scheduling fields, and in particular to it is a kind of based on improve Q learning algorithm multiple target because of resource Dispatching method.

Background technique

Cloud resource scheduling refers to according to resource using rule, different resource user according to rule cloud service platform into The process of row resource adjustment.Reasonable scheduling of resource optimization algorithm is most important for the comprehensive performance for improving cloud computing system 's.QoS constraint in scheduling includes operating cost, deadline, safety, availability etc..In actual demand, cost constraint With the deadline respectively be influence operator and user satisfaction key factor, will reduce execute the time and reduce operation at This is simultaneously essential for dispatching algorithm as optimization aim.Therefore, present invention use to optimize execution simultaneously Time and operating cost are the multiple target cloud resource scheduling model of target.

Intensified learning is provided as a kind of non-supervisory formula intelligent search algorithm with learning ability unrelated with model in cloud Have preferable learning effect in the scheduling problem of source, therefore attempts to solve cloud resource scheduling problem using nitrification enhancement.Its In, Q learning algorithm is more stable for solving the performance of cloud resource scheduling problem, but meeting existence space is big, and convergence rate is slow The problems such as, to improve algorithm the convergence speed, the present invention combines weight factor with heuristic function, instructs every time according to Agent Return value immediately after white silk automatically updates the weight factor after different movements execute, so that it is determined that movement selection strategy, improves and calculate Method convergence rate.

Summary of the invention

In order to solve cloud resource scheduling problem, the invention discloses one kind can reduce task execution time, reduces system Operating cost and the dispatching method that algorithm the convergence speed, operational efficiency and optimizing ability are accounted for range.

For this purpose, the present invention provides the following technical scheme that

1. a kind of based on the multiple target cloud resource dispatching method for improving Q learning algorithm, which is characterized in that algorithm passes through The interaction of Agent and environment is learnt, and Agent updates Q table, more new state by movement selection strategy selection movement, Iteration above-mentioned steps obtain optimal policy until Q expression to convergence, Agent.It specifically includes:

Definition status space: state space is made of different state s, is indicated by a dynamic array, and wherein state s is used One-dimension array indicates that the subscript of s indicates that task number, the value of s indicate virtual machine serial number.For example 5 tasks distribute 3 virtually Machine, then be the shaping array of 5 elements, which empty machine is the value expression task of each element be assigned on and execute.

It defines motion space: being integer variable by action definition, i-th of task is distributed into jth platform virtual machine when executing When This move, then integer variable j is assigned to i-th of value in state s array.Such as one-dimension array [1,0,0,2,1], then table Show that the 0th task distributes to No. 1 virtual machine, the 1st task distributes to No. 0 virtual machine ...

Definition is returned immediately: r=ω * (Etc-T_i)+(1-ω)*(Cst-C_i), wherein T_iAnd C_iRespectively indicate current shape Total execution time of the allocated task of lower i-th virtual machine of state and the totle drilling cost for executing task.Etc and Cst all indicate compared with Big constant, sets total execution time of all tasks on all virtual machines for Etc herein, and all tasks are arranged in institute in Cst There is the totle drilling cost on virtual machine.

Define Q value more new formula are as follows: Wherein, (0,1) α ∈ indicates learning rate.γ indicates discount factor；Q_tIndicate the Q value of t moment.

Define the more new formula of weight factor are as follows:Wherein, s_iAnd a_iPoint It Biao Shi not need to update the state and movement of weight factor；f(s_i,a_i) indicate in state s_iLower execution acts a_iWeight factor； r_maxExpression state s_iUnder maximal rewards value；a_tIndicate Agent in current period in state s_iThe movement of lower selection；r_tIt indicates Current period is in state s_iLower execution acts a_tThe return value of feedback.

Define the update rule of heuristic function are as follows:Wherein, π^f(s_t) Expression state is s_tWhen, the optimal movement that is selected under the guidance of weight factor function f；Indicate maximum power The ratio of repeated factor and total weight factor indicates the significance level of the movement by the size of this numerical value；The size of U value indicates Power of the weight factor to the influence degree for acting selection, U is bigger, and weight factor is stronger to the directiveness for acting selection.

Definition automatically updates the heuristic movement selection strategy of weight factor are as follows: Wherein, a_randomIndicate one movement of random selection；P, q ∈ [0,1], p value determine that Agent carries out exploration probability, and p value is bigger, The probability that Agent is explored is with regard to smaller.

Include: based on the multiple target cloud resource dispatching method for improving Q learning algorithm

Step 1: the parameter and algorithm parameter of emulation platform are set；

Step 2: the random task and virtual machine for generating certain scale is set by Cloudsim emulation platform；

Step 3: initialization Q table and G table, init state space S；

Step 4: iteration executes step 4-1 to 4-5；

Step 4-1: using s as being set as current state；

Step 4-2: using the heuristic movement selection strategy for automatically updating weight factor based on ε-greedy algorithm from Selection acts in set of actions A；

Step 4-3: executing the movement chosen, and the return value immediately that the movement is executed under the state is recorded, according to formula Q_t= (1-α)Q_t+α(r+γ*Q_t+1) Q value is updated, while weight factor is updated, further according to formula Update G value；State is transferred to NextState s ' by s；

Step 4-4: calculating error=MAX (error | Q_t-Q_previous-t), wherein Q_previous-tIndicate the previous of t moment Moment value；

Step 4-5: terminate the learning process of Agent if error < θ, otherwise return step 4-1.Wherein θ is to fix Value, sets according to demand.

It is ect by the execution timing definition of task in the present invention_ij=size_i/mip_j, wherein size_iIndicate i-th The size of business, mip_jIndicate the processing speed of jth platform virtual machine.

By the total run time of jth platform virtual machine is defined as:

By a complete scheduling scheme P_iTotal execution time are as follows:

By total operating cost consumed by execution task is defined as:Wherein, cst_jIndicate the resources costs consumed by jth platform virtual machines performing tasks within the unit time.

According to defined above, then optimization aim of the invention may be defined as: min [Time (P_i),Cost(P_i)].Indicate this The target of invention is that task is always executed to time and operating cost minimum.

In order to more clearly evaluate Multiobjective Scheduling, P will be dispatched_iEvaluation function is defined as: est (P_i)=ω * logTime(P_i)+(1-ω)*logCost(P_i), wherein ω ∈ [0,1] indicates user to the pass for executing time and operating cost Note degree, by adjusting the size of ω come meet user to execute time and operating cost different demands；Time(P_i) indicate to adjust Degree scheme P_iTotal execution time；Cost(P_i) indicate scheduling scheme P_iTotal operating cost consumed by execution task.Pass through evaluation The size of function judges the quality of scheduling strategy, according to the optimization aim of the present embodiment it is found that evaluation function is smaller, dispatches plan It is slightly better.

Compared with the prior art, the invention has the following beneficial effects:

1. multiple target cloud resource scheduling model proposed by the present invention can uniformly examine operator's interests with user demand Consider, while reducing task completion time, reduces operating cost.

2. improvement Q learning algorithm proposed by the present invention is in terms of solving multiple target cloud resource scheduling problem, using based on certainly The dynamic heuristic movement selection strategy for updating weight factor, in terms of optimizing ability, convergence and load balance ability all Optimization effectively improves the overall performance of cloud resource scheduling.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

Fig. 2 is comparison schematic diagram of the different work dispatching method in terms of algorithm optimizing ability in the embodiment of the present invention；

Fig. 3 is comparison schematic diagram of the different work dispatching method in terms of algorithm the convergence speed in the embodiment of the present invention；

Fig. 4 is comparison schematic diagram of the different work dispatching method in terms of algorithmic load equilibrium in the embodiment of the present invention.

Specific embodiment

In order to enable technical solution in the embodiment of the present invention to understand and be fully described by, with reference to embodiments in Attached drawing, the present invention is further described in detail

Embodiment:

As shown in Figure 1, it is a kind of based on the multiple target cloud resource dispatching method for improving Q study, it is carried out by Agent and environment Interaction, carries out the study of optimal policy, when meeting the strategy of error < θ condition, terminates Agent learning process.Specific packet It includes:

Step 1: the parameter and algorithm parameter of emulation platform are set；

Step 3: initialization Q table and G table, init state space S；

Step 4: iteration executes step 4-1 to 4-5；

Step 4-1: using s as being set as current state；

The present embodiment in step 1, is configured Cloudsim emulation platform and algorithm parameter, wherein Cloudsim Emulation platform setting such as table 1；Improve the setting of Q learning algorithm as shown in table 2, wherein α indicates that learning rate, γ indicate discount factor, ε For balancing utilization and heuristic process in ε-greedy algorithm；ω indicates concern of the user to time and operating cost is executed Degree；U indicates power of the weight factor to the influence degree for acting selection.

The setting of 1 experiment parameter of table

The setting of 2 algorithm parameter of table

The present embodiment generates data set in step 2, using Cloudsim emulation platform at random, and task size is defined on Between section [60000,120000], the processing speed of virtual machine is defined between [400,1200].Task rule in embodiment Mould 5 incremented by successively, at maximum up to 30 since 10；The quantity of virtual machine is set as 5.

It is ect by the execution timing definition of task in the present embodiment_ij=size_i/mip_j, wherein size_iIt indicates i-th The size of task, mip_jIndicate the processing speed of jth platform virtual machine.

By the total run time of jth platform virtual machine is defined as:

By a complete scheduling scheme P_iTotal execution time are as follows:

According to defined above, then the optimization aim of the present embodiment may be defined as: min [Time (P_i),Cost(P_i)].It indicates The target of the present embodiment is that task is always executed to time and operating cost minimum.

Fig. 2 be improvement of the present invention Q learning algorithm in terms of the optimizing ability in multiple target cloud resource scheduling problem with The comparison diagram of other dispatching algorithms.Following four algorithm is compared altogether:

1. task is dispensed on each virtual machine by the scheduling scheme executed in order in order, i.e., first is appointed First virtual machine is distributed in business, and second task is distributed to second virtual machine etc., indicated with Equ.

2. genetic algorithm (GA).

3.Q learning algorithm (QL).

4. based on the heuristic Q learning algorithm (WHAQL) for automatically updating weight.

In Fig. 2, abscissa indicates task scale, and ordinate indicates evaluation function value, and evaluation function value is smaller, algorithm Optimizing ability is stronger.

By Fig. 2 it is known that the obtained scheduling scheme of WHAQL algorithm used in the present invention can be minimized Evaluation function, optimizing ability is strong.

Fig. 3 be improvement of the present invention Q learning algorithm in terms of the convergence rate in multiple target cloud resource scheduling problem with The comparison diagram of other dispatching algorithms.It will be calculated based on the heuristic Q learning algorithm (WHAQL) for automatically updating weight factor and Q study Method (QL) and heuristic Q learning algorithm (HAQL) compare.

When Fig. 3 indicates that working as task scale is 20, ω=0.5, the comparison diagram of three kinds of algorithm iteration processes.Total the number of iterations is set It is set to 5000 times, every iteration 500 times are a study stage, and record is once as a result, totally 10 study stages.Wherein, abscissa Expression task scale, ordinate indicate evaluation function value.

It is learnt by Fig. 3, WHAQL algorithm used in the present invention is compared to other two kinds of algorithms, and convergence rate is faster.

Fig. 4 be improvement of the present invention Q learning algorithm in terms of the load balancing in multiple target cloud resource scheduling problem with The comparison diagram of other dispatching algorithms.It will be based on automatically updating the heuristic Q learning algorithm (WHAQL) of weight factor and hold in order Capable dispatching method (Equ), genetic algorithm (GA) and Q learning algorithm (QL) algorithm compares.

Wherein, abscissa indicates task scale, and ordinate indicates load balancing value, load balancing value closer to 1, system It loads more balanced.

The ratio of the most short execution time of virtual machine and maximum execution time are defined as to the load balancing function of system, meter Calculating formula is

It is learnt by Fig. 4, the load balance degree of WHAQL algorithm used in the present invention is imitated compared to other several algorithms Fruit is more preferable, it was demonstrated that WHAQL not only has higher utilization rate to resource, can also effectively mitigate the workload of virtual machine.

The implementation case proves that the present invention is a kind of to be sought based on the multiple target cloud resource dispatching method for improving Q learning algorithm Three excellent ability, convergence rate and load balancing aspects have preferable performance.

The above is that the embodiment of the present invention is discussed in detail in conjunction with attached drawing, and embodiments herein is only It is to be used to help understand method of the invention.For those skilled in the art, according to the thought of the present invention, having It can have some change and modify in body embodiment and application range, therefore present specification should not be construed as limiting the invention.

Claims

1. based on the multiple target cloud resource dispatching method for improving Q learning algorithm, which is characterized in that Agent with environment by carrying out Interaction selects the maximum movement of return value to execute, and in the movement choice phase, this method considers weight factor and heuristic function It combines, the return value immediately after being trained every time according to Agent, automatically updates the weight factor after different movements execute, thus It determines movement selection strategy, improves algorithm the convergence speed, detailed process is as follows:

Step 1: generating task data and virtual-machine data at random using Cloudsim emulation platform；

Step 2: defining the state space S of Q study: indicating that wherein state s is indicated with one-dimension array, s's by a dynamic array Subscript indicates that task number, the value of s indicate virtual machine serial number；

Step 3: defining the set of actions A of Q study: being integer variable by action definition, i-th of task is distributed to the when executing When j platform virtual machine This move, then integer quantity j is assigned to i-th of value in state s array；

Step 4: defining the Reward Program immediately of Q learning algorithm: r=ω * (Etc-T_i)+(1-ω)*(Cst-C_i)；Wherein, T_iWith C_iIt respectively indicates total execution time of the allocated task of lower i-th virtual machine of current state and executes the totle drilling cost of task, Etc and Cst indicates larger constant, sets total execution time of all tasks on all virtual machines, Cst for Etc herein Totle drilling cost of all tasks on all virtual machines is set；

Step 5: generation task data and virtual-machine data being adjusted using based on the Q learning algorithm for automatically updating weight factor Degree distribution.

2. the multiple target cloud resource dispatching method according to claim 1 for improving Q learning algorithm, which is characterized in that described Generation task data and virtual-machine data are scheduled using based on the Q learning algorithm for automatically updating weight factor in step 5 Distribution, specific steps are as follows:

Step 5-1: initialization Q table and G table, wherein Q table is used to store the value of a certain movement under a certain state；G table is used to deposit Store up the relevant information of weight factor；

Step 5-2: init state space S；

Step 5-3: iteration executes 5-3-1 to 5-3-6 step；

Step 5-3-1: current state is set by s；

Step 5-3-2: driven using the heuristic movement selection strategy for automatically updating weight factor based on ε-greedy algorithm Make selection in set A to act；

Step 5-3-3: executing the movement chosen, and records the return value immediately and NextState s ' that the movement is executed under the state；

Step 5-3-4: according to formula Q_t=(1- α) Q_t+α(r+γ*Q_t+1) updating Q value, wherein α ∈ (0,1) indicates learning rate； γ indicates discount factor；Q_tIndicate the Q value of t moment；Update weight factor f (s_t,a_t), further according to formulaUpdate G value；

Step 5-3-5: calculating error=MAX (error | Q_t-Q_previous-t), wherein Q_previous-tIndicate t moment it is previous when Quarter value；

Step 5-3-6: terminating the learning process of Agent if error < θ, and otherwise (wherein θ is to fix to return step 5-3-1 Value is set according to demand).