CN106228314A

CN106228314A - The workflow schedule method of study is strengthened based on the degree of depth

Info

Publication number: CN106228314A
Application number: CN201610656579.6A
Authority: CN
Inventors: 段翰聪; 闵革勇; 张建; 王瑾
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2016-08-11
Filing date: 2016-08-11
Publication date: 2016-12-14

Abstract

The invention discloses the workflow schedule method strengthening study based on the degree of depth, comprise the steps: step A) the tasks carrying DAG workflow directed acyclic graph M that collects in actual execution environment opens, as sample pool；Step B) every DAG workflow directed acyclic graph is carried out MDP markov decision process modeling, generate task status set S；Step C) according to the training method DQN of neutral net, M is opened task status set S and corresponding known action set A that DAG workflow directed acyclic graph generates as input, substitute into deep neural network formula, try to achieve neural network parameter matrixValue.The present invention passes through said method, it is long that the workflow schedule method under current distributed environment that solves performs the time, and the defect of generalization difference accelerates to ensure the time efficiency of algorithm, simultaneously increase algorithm itself Generalization Capability, allow dispatch machine can be according to actual scene feature autonomic learning scheduling strategy.

Description

The workflow schedule method of study is strengthened based on the degree of depth

Technical field

The present invention relates to computer software fields, in particular it relates to strengthen the workflow schedule method of study based on the degree of depth.

Background technology

In a distributed computing environment, workflow schedule problem (workflow scheduling) is the most all meter One of optimization problem in calculation machine field.Workflow schedule problem actually provides a scheduling scheme, by workflow Task be dispatched in a certain order properly to perform on node perform, to minimize Executing Cost.Its mathematical model is such as Under:

One concrete calculating application can be represented by a directed acyclic graph (DAG) G (T, E), and wherein T is n Business set { t₁,t₂,...,t_n, E is the dependence set between task.(i, j) ∈ E represents task to each dependence e t_jNeed in task t_iExecution could start after completing to perform.Providing limited collection of machines M, M comprises n platform node {m₁,m₂,...,m_n}.Make χ represent all of distribution condition set, typically can be expressed as element x ∈ χ a | T | × | M | matrix, represent a kind of plan of distribution, meanwhile, it is assumed that there are cost function C: χ → [0 ,+a ∞], cost function can To be represented as always performing time (makespan) length, then be furnished with for each component: C_ij: M × J → [0 ,+∞] is wherein C_ijExpression task J_iIt is assigned to machine M_jThe cost of rear execution.To sum up, job shop scheduling problem can abstract be, obtain a component and join Solving x ∈ χ makes C (x) minimum, say, that there is not a component and joins solution y and make C (y) < C (x).So, for job scheduling Optimization problem, has two main aspects to need to consider, one is the occupancy of system resource, and one is that whole Job execution completes Total time.

Workflow schedule problem has been proved to belong to np complete problem, it is impossible to find the solution of polynomial time complexity. Existing solution np complete problem generally uses heuritic approach, genetic algorithm or Q learning algorithm, and heuritic approach can only calculate secondary Excellent solution；Genetic algorithm needs to carry out iterative process for several times, and just can finding compares preferably solves, when whole iterative process performs Between longer, calculate in real time on platform in big data, the time loss of optimization even cannot accept；Q learning algorithm is general In the property changed and performance all well, but once number of tasks magnitude is excessive, and whole state number is too much, Q-value matrix dimension mistake Height causes computer cannot store whole Q-value matrix (Q (s, a)).

Summary of the invention

Instant invention overcomes the deficiencies in the prior art, it is provided that strengthen the workflow schedule method of study based on the degree of depth, solve Under distributed environment, the workflow schedule method execution time is long at present, the defect of generalization difference, accelerates to ensure the time effect of algorithm Rate, simultaneously increase algorithm itself Generalization Capability, allow dispatch machine can be according to actual scene feature autonomic learning scheduling strategy.

The present invention solves the problems referred to above and be the technical scheme is that the workflow schedule side strengthening study based on the degree of depth Method, comprises the steps:

Step A) the tasks carrying DAG workflow directed acyclic graph M that collects in actual execution environment opens, as sample pool；

Step B) every DAG workflow directed acyclic graph is carried out MDP markov decision process modeling, generate task shape State set S；

Step C) according to the training method DQN of neutral net, M is opened the task shape that DAG workflow directed acyclic graph generates State set S and corresponding known action set A, as input, substitutes into deep neural network formula Q (s, a；θ_i), try to achieve execution and appoint Neural network parameter matrix θ during business i_iValue, Q is value of the movements function, and s is in task status set S, and a is Make a scheduling scheme in set A；

Step D) judge that the task status set S that DAG workflow directed acyclic graph generates substitutes into nerve the most successively The training method DQN of network, if all substituted into, then exports final neural network parameter matrix θ_iValue, without then Continue executing with the training method DQN of neutral net.

Step E) for newly inputted DAG workflow directed acyclic task, carry out MDP markov decision process equally and build Mould, generate initiating task state S0, substitute into step D) in deep neural network formula Q (s, a；θ_i), wherein θ_iValue be step D) calculated value in, can obtain a scheduling scheme a in final set of actions A, and will input DAG work specifically Task status and the scheduling result of making stream directed acyclic graph are input to sample pool.

This programme the i.e. degree of depth of Q learning algorithm strengthen study on the basis of, introduce deep neural network substitute Q (s, a) (s, a) wherein θ is the parameter matrix of neutral net to i.e. Q (s, a, θ) ≈ Q, and θ is that the parameter matrix of neutral net uses neutral net Training method DQN be calculated minima, and s is obtained by mathematical modeling, calculates according to the training method DQN of neutral net Obtain minima θ and s substitutes into Q (s, a, θ) formula, i.e. can get optimal scheduling scheme a.A described tool in this programme The deep neural network of body is illustrated in fig. 2 shown below, and neutral net comprises a lot of level, and each level comprises some neurons, one Neuron substantially can be described as function, receives the output of upper strata neuron, calculates through function, exports to lower floor neural Unit.The function that inside neurons uses, we term it activation primitive, typically we use ReLu (Rectified Linear Units) activation primitive, specific as follows shown in, that input vector s, s are in MDP modeling in task status set, We have

F (θ s)=max (0, θ s)

For a neuron in a layer, θ is substantially a 1*n vector.According to neutral net this one One mapping relations, after we obtain θ value, can find the value of the Q (s, a, θ) of correspondence, the s generation then obtained according to mathematical modeling Enter and obtain optimal scheduling scheme a.This scheduling scheme a makes the task in workflow be dispatched in a certain order properly perform Perform on node, make Executing Cost minimum.The method avoids the occurrence of in Q learning algorithm because status number S and set of actions A may Very big, the defect that often computer can not be left.

Technical solution of the present invention i.e. uses same model to different DAG in scheduling unlike common heuritic approach Input is scheduling, and also unlike genetic algorithm, carries out successive ignition execution during calculating scheduling result, consumes a large amount of Time.Meanwhile, use the most merely Q learning algorithm, but use DQN (Q learning algorithm combines deep neural network) to carry out Workflow schedule.Owing to calculating based on deep neural network can accelerate to ensure the time efficiency of algorithm by GPU, increase simultaneously The Generalization Capability of computation system itself, allow scheduling machine can be according to actual scene feature autonomic learning scheduling strategy.

It is further preferred that also include step F) after sample pool accumulation to a certain extent, repeat step C) to the degree of depth Neutral net formula Q (s, a；θ_i) calculate again, obtain new θ_iValue and new deep neural network Q ' (s, a；θ_i) be used for after The DAG workflow directed acyclic graph scheduling of continuous input calculates.

Sample pool is accumulated to a certain degree to refer to the sample number of the DAG workflow directed acyclic graph accumulated in sample pool Amount, more than 100, just starts step C) calculate, training sample random can sample 100 DAG workflows from sample pool Directed acyclic graph calculates.Along with being continuously increased of DAG workflow directed acyclic graph in sample pool, utilize the instruction of neutral net Practice method DQN, constantly update θ_iValue after, then carry out being calculated optimal scheduling scheme, allow the scheduling machine can basis Actual scene feature autonomic learning scheduling strategy.

Preferably, step C) according to markov decision process release neutral net training method DQN step such as Under:

Step C1) for each time point t, there is markoff process e_t=(s_t,a_t,r_t,s_t+1), define storage pool D =e₁,e₂…e_N, initializing storage pool D size is N, initializes and is worth function of movement Q, and gradient descent method iterations is M time, M For DAG workflow directed acyclic graph number, initialization task state set s_tInThereinIt is s₁Corresponding mapping function, thenIt is s_tCorresponding mapping function；

Step C2) after performing a period of time, obtain the scheduling scheme a of corresponding time point t_tIf scheduling scheme is only One is then a_tItself, otherwiseObtain the task status of task status set (t+1) time point SetBy nowIt is stored in storage pool D；

Step C3) to make i be DAG workflow directed acyclic graph number in storage pool, if last DAG that i is circulation Workflow directed acyclic graph, then make y_i=r_i, r_iFor feedback function R (S_t,a_t) value in corresponding moment；If i is not to circulate A rear DAG workflow directed acyclic graph, then make r_iFor feedback function R (S_t, a_t) value in corresponding moment,It is s_tCorresponding mapping function, a ' is the scheduling scheme of corresponding time point t, and Υ is decay system Number, the span of attenuation quotient is 0-1.

Step C4) according to gradient descent algorithm, the loss function L to deep neural network_i(θ_i)=E_{S, a～p ()}[(y_i-Q(s,a；θ_i))²], Iteration M time, i.e. seeks local derviation to this loss function Try to achieve the minima of this loss function, and the minima of this loss function is substituted into formulaTo θ_iEnter Row updates, and wherein α is called learning rate is constant, and here learning rate value is 0.001, until θ_iConvergence or iteration complete into Only, it is possible to obtain the parameter matrix θ of neutral net_iValue.

Step B) in that every DAG workflow directed acyclic graph is carried out MDP markov decision process modeling procedure is as follows:

Step B1) store DAG workflow directed acyclic graph with two-dimensional array G (T, E), wherein T represents that DAG workflow has Task node set in acyclic figure, E represents the limit collection in DAG workflow directed acyclic graph；

Step B2) definition (T_i,T_j) ∈ E represents that task j must perform to perform afterwards in task i, defines V (T_i,T_i) ＞ 0 represents task T_iThe execution time of estimating of itself, define V (T_i,T_j) represent that if task i and task j are assigned to different The call duration time that node performs, as V (T_i, T_jDependence is not had between)=∞ explanation task i and task j；

Step B3) by the two-dimensional matrix V of N × (N+1)_N×(N+1)Represent distribution state S of whole workflow, The sum of task node during wherein N represents DAG, the last string i.e. N+1 row of matrix, represent the distribution condition of each task；

Step B4) definition feedback functionWherein t (S_t) represent task status S_tUnder the time that always performs. C represents original state S₀Under the time that always performs,

To sum up, the invention has the beneficial effects as follows:

This programme i.e. uses same model to carry out different DAG inputs in scheduling unlike common heuritic approach Scheduling, also unlike genetic algorithm, carries out successive ignition execution during calculating scheduling result, consumes the plenty of time, adopt Combine deep neural network with Q learning algorithm and be operated stream scheduling, accelerate to ensure the time efficiency of algorithm, increase calculation simultaneously The Generalization Capability of method itself, allow scheduling machine can be according to actual scene feature autonomic learning scheduling strategy.

Accompanying drawing explanation

Fig. 1 is the flow chart that the present invention runs based on deep neural network；

Fig. 2 be the present invention Q (s, a,；θ) deep neural network；

Fig. 3 is the DAG task scheduling figure of the present invention.

Detailed description of the invention

Below in conjunction with embodiment and accompanying drawing, to the detailed description further of present invention work, but embodiments of the present invention It is not limited to this.

The DAG mentioned in this programme: i.e. workflow directed acyclic graph；MDP: markov decision process；

DQN: degree of depth Q-value network, is that the degree of depth strengthens the core learnt, is substantially a kind of deep neural network, be used for replacing For Q-value matrix noted earlier.

Embodiment 1:

As Figure 1-3, the present invention includes the workflow schedule method strengthening study based on the degree of depth, comprises the steps:

Step C) in the training method DQN step of neutral net released according to markov decision process as follows:

Step C1) for each time point t, there is markoff process e_t=(s_t,a_t,r_t,s_t+1), define storage pool D =e₁,e₂…e_N, initializing storage pool D size is N, initializes and is worth function of movement Q, and gradient descent method iterations is (1-M) Secondary, M is DAG workflow directed acyclic graph number, initialization task state set s_tInWherein 'sIt is s₁Corresponding mapping function, thenIt is s_tCorresponding mapping function；

Step C3) to make i be DAG workflow directed acyclic graph number in storage pool, if last DAG that i is circulation Workflow directed acyclic graph, then make y_i=r_i, r_iFor feedback function R (S_t,a_t) value in corresponding moment；If i is not to circulate A rear DAG workflow directed acyclic graph, then make r_iFor feedback function R (S_t, a_t) value in corresponding moment,It is s_tCorresponding mapping function, a ' is the scheduling scheme of corresponding time point t, and Υ is decay system Number, the span of this attenuation quotient is between 0 to 1.

Step B2) definition (T_i,T_j) ∈ E represents that task j must perform to perform afterwards in task i, defines V (T_i,T_i) ＞ 0 represents task T_iThe execution time of estimating of itself, define V (T_i,T_j) represent that if task i and task j are assigned to different The call duration time that node performs, as V (T_i,T_jDependence is not had between)=∞ explanation task i and task j；

Step B3) by the two-dimensional matrix V of N × (N+1)_N×(N+1)Represent distribution state S of whole workflow, its Middle N represents the sum of task node in DAG, the last string i.e. N+1 row of matrix, represents the distribution condition of each task；

The scheduling of work on hand stream generally uses following algorithm to solve: is respectively heuritic approach, genetic algorithm and Q study and calculates Method.

The first uses heuritic approach to solve workflow schedule problem is the classical dispatching algorithm of comparison.List First scheduling algorithm can calculate the priority of each task according to certain algorithm, and then determining according to priority should Which task this dispatches after first dispatching which task.Substantially, this class algorithm is undertaken in two steps, and the first step obtains all tasks Priority, second step is task Resources allocation to be scheduled.The priority meter that different List scheduling algorithms uses Calculation method and resource selection method are all not quite similar.

Use List scheduling Algorithm for Solving time complexity typically at O (n²), speed is relatively fast.But calculate Method itself belongs to heuritic approach, can only obtain suboptimal solution.And algorithm Generalization Capability is poor, and scheduling scenario once becomes Changing, the result that its algorithm is obtained is not necessarily well solution.

The second uses genetic algorithm to contrast in generalization and to get well with common heuritic approach, has the strongest general Property, but its shortcoming is it is also obvious that genetic algorithm needs to carry out iterative process for several times, and just can finding compares preferably solves, whole It is longer that individual iterative process performs the time, calculates in real time on platform in big data, and the time loss of optimization even cannot accept 's.

The third uses Q learning algorithm (Q-Learning), for the Mission Scheduling in collaborative work, establishes phase The Markovian decision process model answered, proposes the Q learning algorithm based on simulated annealing of a kind of improvement on this basis.Should Algorithm is by introducing simulated annealing, and combines greedy strategy, and the screening on state space judges, considerably improves receipts Hold back speed, shorten the execution time.Utilize Q learning algorithm to be trained obtaining detailed protocol and see " task scheduling based on Q study The linguistic term of problem " (graphics journal in March, 2012).

The program make use of the technology strengthening study, carries out abstract to DAG scheduling problem.Algorithm is at generalization and performance table Now going up all good, but once number of tasks magnitude is excessive, whole state number is too much, and Q-value matrix dimension is too high causes computer Whole Q-value matrix (Q (s, a)) cannot be stored.

Degree of depth enhancing study is degree of depth study to be combined with enhancing study thus realizes perceiving from Perception A kind of brand-new algorithm of the end-to-end study of Action action.It is the best that degree of depth enhancing study has in terms of solving decision problem Effect, it possesses makes robot realize the most entirely autonomous potentiality learning the most multiple a kind of technical ability.

The degree of depth strengthens learning theory basis and is markov decision process.One typical markov decision process By four-tuple<S, A, P, R, a γ>composition, wherein S represents that state set, A represent that set of actions, P represent State Transferring Probability matrix, R represents backoff values, and γ is attenuation quotient, and span is 0 to 1.

Define a decision-making π (a | s)=P [A_t=a | S_t=s] represent state S_tSelection action A_tProbability, definition action Cost function (action-value) Q_π(s, a)=E_π[G_t|S_t=s, A_t=a] wherein G_tRepresent accumulated value,Wherein subscript t express time, k is constant value.E_π[G_t|S_t=s, A_t=a] represent shape under strategy π State s makes the expectation of the accumulated value that action a is obtained.The problem so strengthening Learning demands solution is exactly, and determines all of Plan is found the value of the movements function of maximum value i.e.:

Q (s, a)=max_πQ_π(s,a)；

Above formula is carried out expansion can obtain:

Q (s, a)=max_πE_π[R_t|S_t=s, A_t=a]；

This problem is solved itself to solve with dynamic programming, above-mentioned formula is converted into following recurrence Equation represents:

Q (s, a)=max_πE_{S '～ε}[R+γmax_a, Q (s ', a ') | s, a]；

For solving this recursion equation, can to use in the way of dynamic programming, i.e. store each Q (s, value a), but Actually, this realization is unpractical, because status number S and set of actions A may be very big, often computer can not Leave.

So this programme is on the basis of the i.e. degree of depth of Q learning algorithm strengthens study, introduces deep neural network and substitute Q (s is a) that (s, a) wherein θ is the parameter matrix of neutral net to Q (s, a, θ) ≈ Q, and θ is that the parameter matrix of neutral net uses god It is calculated minima through the training method DQN of network, and s is obtained by mathematical modeling, according to the training method of neutral net DQN is calculated minima θ and s substitutes into Q (s, a, θ) formula, i.e. can get optimal scheduling scheme a.Described in this programme A concrete deep neural network be illustrated in fig. 2 shown below, neutral net comprises a lot of level, and each level comprises some nerves Unit, a neuron substantially can be described as function, receives the output of upper strata neuron, calculates through function, and output is given down Layer neuron.The function that inside neurons uses, we term it activation primitive, typically we use ReLu (Rectified Linear Units) activation primitive, specific as follows shown in, input vector s, s are in MDP modeling in task status set One, Wo Menyou

F (θ s)=max (0, θ s)

Technical solution of the present invention i.e. uses same model to different DAG in scheduling unlike common heuritic approach Input is scheduling, and also unlike genetic algorithm, carries out successive ignition execution during calculating scheduling result, consumes a large amount of Time.Meanwhile, use the most merely Q learning algorithm, but use DQN (Q learning algorithm combines deep neural network) to carry out Workflow schedule.Owing to calculating based on deep neural network can accelerate to ensure the time efficiency of algorithm by GPU, simultaneously Increase algorithm itself Generalization Capability, allow dispatch machine can be according to actual scene feature autonomic learning scheduling strategy.

When being operated stream scheduling, initialized state S0 of DAG to be scheduled by MDP modeling process；By state S0 As the input of deep neural network, (s a), chooses the scheduling scheme that Q-value is maximum, updates DAG shape to obtain scheduling strategy Q-value Q State；Whether disconnected DAG state is to have dispatched all tasks, if having dispatched all tasks, output scheduling result, without so Continue iterative process.

Such as Fig. 3, table 1, table 2, table 3, it is simply that use the method described in this programme, by DAG workflow directed acyclic task image Corresponding task T1～T9 are assigned in 4 node M 1～M4 perform:

As Figure of description 3 show DAG task scheduling figure, inside circle, represent execution task and execution task needs Time, arrow represents next step direction performed, the passing time between digitized representation two task on arrow.Such as: task T1 oneself the execution time requires time for 2 seconds, and task T2 that is delivered to after being finished requires time for 4 seconds, is delivered to task T3, T4 Being required to 1 second with T5, being delivered to T7 needs 10 seconds, and T2 oneself is finished and needs 3 seconds, is delivered to task after being finished again T6, by that analogy, performs with this, until having performed task T9.

State S0 before unallocated task, matrix V_N×NIt is as shown in table 1 below, under wherein C=66 is original state S0 Always perform the time, between two tasks of digitized representation corresponding between abscissa and vertical coordinate, perform the required time, as sat Mark T1T1=2 represents T1 oneself and performs to require time for 2 seconds, and by that analogy, M row represent whether node performs, and 1 represents also do not have Performing, 1 representative has performed；

After scheduler task T1 to node M 1, task status S1 attempt to change into, its backoff values is 66/66=1, such as table 2 below Shown in；

Then scheduling T2 performs to node M 1, and task status S2 figure is shown in table 3 below, calculates backoff values and is: 66/62= 1.06；

Owing to T1 and T2 performs on same node, so V (T1, T2) value becomes 0 that is task 1 and task 2 The call duration time expense performed becomes 0.

After model has been set up, just had input Q as DQN algorithm (s, a), then determine for such a Model, learns us by successive ignition and can train our degree of depth Q network, accelerates to ensure the time efficiency of algorithm, simultaneously Increase algorithm itself Generalization Capability, allow dispatch machine can be according to actual scene feature autonomic learning scheduling strategy.

Embodiment 2:

The present embodiment is preferably as follows on the basis of embodiment 1: also include step F) when sample pool accumulate to a certain extent Afterwards, step C is repeated) to deep neural network formula Q (s, a；θ_i) calculate again, obtain new θ_iValue and new degree of depth god Through network Q (s, a；θ_i) for follow-up input DAG workflow directed acyclic graph scheduling calculate.

Sample pool is accumulated to a certain degree to refer to the sample number of the DAG workflow directed acyclic graph accumulated in sample pool Amount, more than 100, just starts step C) calculate, training sample random can sample 100 DAG workflows from sample pool Directed acyclic graph calculates.

Along with being continuously increased of DAG workflow directed acyclic graph in sample pool, utilize the training method DQN of neutral net, Constantly update θ_iValue after, then carry out being calculated optimal scheduling scheme, allow the scheduling machine can be special according to actual scene Levy autonomic learning scheduling strategy.

The above, be only presently preferred embodiments of the present invention, and the present invention not does any pro forma restriction, every depends on Any simple modification of being made above example according to the technical spirit of the present invention, equivalent variations, each fall within the protection of the present invention Within the scope of.

Claims

1. strengthen the workflow schedule method of study based on the degree of depth, it is characterised in that comprise the steps:

Step B) every DAG workflow directed acyclic graph is carried out MDP markov decision process modeling, generate task status collection Close S；

Step C) according to the training method DQN of neutral net, M is opened the task status collection that DAG workflow directed acyclic graph generates Close S and corresponding known action set A as input, substitution deep neural network formula Q (s, a；θ_i), try to achieve execution task i Time neural network parameter matrix θ_iValue, Q is value of the movements function, and s is in task status set S, and a is behavior aggregate Close a scheduling scheme in A；

Step D) judge that the task status set S that DAG workflow directed acyclic graph generates substitutes into neutral net the most successively Training method DQN, if all substituted into, then export final neural network parameter matrix θ_iValue, without then continuing Perform the training method DQN of neutral net；

Step E) for newly inputted DAG workflow directed acyclic task, carry out MDP markov decision process modeling equally, Generate initiating task state S0, substitute into step D) in deep neural network formula Q (s, a；θ_i), wherein θ_iValue be step D) In calculated value, can obtain a scheduling scheme a in final set of actions A, and DAG work will be inputted specifically Task status and the scheduling result of stream directed acyclic graph are input to sample pool.

The workflow schedule method strengthening study based on the degree of depth the most according to claim 1, it is characterised in that also include step Rapid F) after sample pool accumulation to a certain extent, repeat step C) to deep neural network formula Q (s, a；θ_i) count again Calculate, obtain new θ_iValue and new deep neural network Q ' (s, a；θ_i) for follow-up input DAG workflow directed acyclic graph adjust Degree calculates.

The workflow schedule method strengthening study based on the degree of depth the most according to claim 2, it is characterised in that sample pool tires out Meter refers to that the sample size of the DAG workflow directed acyclic graph accumulated in sample pool, more than 100, just starts to a certain extent Step C) calculate, training sample random can sample 100 DAG workflow directed acyclic graphs from sample pool and count Calculate.

4. according to the workflow schedule method strengthening study based on the degree of depth described in Claims 2 or 3, it is characterised in that step The training method DQN step of the neutral net released according to markov decision process in C) is as follows:

Step C1) for each time point t, there is markoff process e_t=(s_t,a_t,r_t,s_t+1), define storage pool D=e₁, e₂…e_N, initializing storage pool D size is N, initializes and is worth function of movement Q, and gradient descent method iterations is M time, and M is DAG Workflow directed acyclic graph number, initialization task state set s_tInThereinIt is s₁Corresponding mapping function, thenIt is s_tCorresponding mapping function；

Step C2) after performing a period of time, obtain the scheduling scheme a of corresponding time point t_tIf, scheduling scheme only one of which, For a_tItself, otherwiseObtain the task status set of task status set (t+1) time pointBy nowIt is stored in storage pool D；

Step C3) to make i be DAG workflow directed acyclic graph number in storage pool, if last DAG work that i is circulation Stream directed acyclic graph, then make y_i=r_i, r_iFor feedback function R (S_t,a_t) value in corresponding moment；If i is not last of circulation Open DAG workflow directed acyclic graph, then make r_iFor feedback function R (S_t,a_t) right Should the value in moment,It is s_tCorresponding mapping function, a ' is the scheduling scheme of corresponding time point t, and Υ is attenuation quotient；

Step C4) according to gradient descent algorithm, the loss function L to deep neural network_i(θ_i)=E_{S, a～p ()}[(y_i-Q(s,a；θ_i))²], iteration M Secondary, i.e. this loss function is sought local derviation Try to achieve the minima of this loss function, and the minima of this loss function is substituted into formulaTo θ_iEnter Row updates, and wherein α is called learning rate is constant, until θ_iTill convergence or iteration complete, it is possible to obtain neutral net Parameter matrix θ_iValue.

The workflow schedule method strengthening study based on the degree of depth the most according to claim 1, it is characterised in that step B) in Every DAG workflow directed acyclic graph is carried out MDP markov decision process modeling procedure as follows:

Step B1) store DAG workflow directed acyclic graph with two-dimensional array G (T, E), wherein T represents the oriented nothing of DAG workflow Task node set in ring figure, E represents the limit collection in DAG workflow directed acyclic graph；

Step B2) definition (T_i,T_j) ∈ E represents that task j must perform to perform afterwards in task i, defines V (T_i,T_i) ＞ 0 table Show task T_iThe execution time of estimating of itself, define (V_i,T_j) represent that if task i and task j are assigned to different nodes and hold The call duration time of row, as V (T_i, Tj) and there is no dependence between=∞ explanation task i and task j；

Step B3) by the two-dimensional matrix V of N × (N+1)_N×(N+1)Represent distribution state S of whole workflow, wherein N Represent the sum of task node in DAG, the last string i.e. N+1 row of matrix, represent the distribution condition of each task；

Step B4) definition feedback functionWherein t (S_t) represent task status S_tUnder the time that always performs, C table Show original state S₀Under the time that always performs,