CN103970613B - Multi-copy task fault tolerance scheduling method of heterogeneous distributed system - Google Patents

Multi-copy task fault tolerance scheduling method of heterogeneous distributed system Download PDF

Info

Publication number
CN103970613B
CN103970613B CN201410216137.0A CN201410216137A CN103970613B CN 103970613 B CN103970613 B CN 103970613B CN 201410216137 A CN201410216137 A CN 201410216137A CN 103970613 B CN103970613 B CN 103970613B
Authority
CN
China
Prior art keywords
task
node
copy
reliability
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410216137.0A
Other languages
Chinese (zh)
Other versions
CN103970613A (en
Inventor
门朝光
何忠政
李香
蒋庆丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201410216137.0A priority Critical patent/CN103970613B/en
Publication of CN103970613A publication Critical patent/CN103970613A/en
Application granted granted Critical
Publication of CN103970613B publication Critical patent/CN103970613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention belongs to the field of computers, and particularly relates to a multi-copy task fault tolerance scheduling method of a heterogeneous distributed system. The method includes the steps that according to the load of each task and the executing speed of each node in the system, the average executing time of all the tasks on all the processor nodes and the average communication time of all communication messages on all chains are calculated; through a bottom end priority method, the bottom end priority of any task in a task set is calculated; the tasks allowed to be scheduled are added into a scheduling queue in a priority non-increasing mode according to the priority of the tasks; the task highest in priority is selected from all the tasks allowed to be scheduled in the scheduling queue. According to the method, the execution starting time of current task scheduling copies can be further shortened, and therefore the task scheduling Makespan can be further reduced.

Description

A kind of many copy task fault-tolerance dispatching methods of heterogeneous distributing system
Technical field
The invention belongs to computer realm, and in particular to a kind of many copy task fault-tolerance dispatching parties of heterogeneous distributing system Method.
Background technology
With the appearance of express network, using resource connection that is distributed, inexpensive and being particularly likely that isomery as meter Calculate environment be it is feasible, therefore distributed system (such as cloud computing, grid computing, distributed mobile computing) Computer isomery Property will progressively strengthen, this provide it is a kind of be referred to as heterogeneous distributed calculating (Heterogeneous Distributed Computing, HDC) system calculating platform.HDC systems have become the calculating of high-performance calculation and the information processing of prevalence Equipment, and progressively used by critical system.HDC systems often have throughput of system and availability higher, can Efficiently access extensive distributed network information.HDC systems are more complicated than isomorphism system and central control system, extra Complexity is likely to result in more thrashings.In HDC systems, the feelings that safety-critical application program will not only occur in failure Can be fault-tolerant under condition, also to meet time constraint condition.
In large-scale heterogeneous distributed computing system effective task scheduling algorithm meet user or system requirements and Realize that high-performance aspect plays the part of pivotal player.Task scheduling be intended to by duty mapping to processor and set task start hold The row time, execution sequence is set to meet dependence between task, while maximizing dispatch reliability and minimizing scheduling Makespan. Current task scheduling problem is directed to independent task mostly, ignores the data correlation and priority constraint relationship between task.Most simultaneously Smallization dispatches Makespan and application program failure probability is often conflicting, it is therefore necessary to which design considers scheduling simultaneously The dispatching algorithm of Makespan and reliability.
Fault-tolerant scheduling method has carried out numerous studies.Fault-tolerant scheduling method can use passive replication (Primary/ Backup, PB) mechanism and Active Replication mechanism improves reliability.Two versions of each task are distributed to difference by PB mechanism Processor, when key plate present processor fails, task will be performed on subedition processor.PB mechanism can only tolerate once event The generation of barrier.PB mechanism task execution time when breaking down is more long, it is most likely that can not meet real-time task time requirement. Active Replication mechanism is based on spatial redundancy, multiple copies of task is dispatched into different processor, by the parallel of multiple copies Perform fault-tolerant to realize.Task scheduling based on many copy modes has two kinds:Strict scheduling and general-purpose scheduler.Strictly dispatch and refer to Task is only carried out completing and relying on message arrival current scheduling task in all copies of its all of direct predecessor task During institute's mapping node, could start to perform.The fault-tolerant scheduling method of task based access control copy all uses this scheduling mode mostly.It is logical As long as thering is a copy to perform completion and the message Successful transmissions of the copy with each direct predecessor task that scheduling refers to task To current scheduling duty mapping node, current scheduling task can just start to perform.Obviously strict scheduling is the special case of general-purpose scheduler. The Starting Executing Time of the lower current scheduling copy of strict scheduling is necessary for all predecessor task copies and its respective call duration time Maximum in sum.And under general-purpose scheduler current scheduling copy Starting Executing Time can for part predecessor task copy and its Maximum in respective call duration time sum.Intertask communication message simultaneously need not all send, as long as required part is appointed Message is relied between business copy to send, general-purpose scheduler mode can further reduce when starting to perform of current scheduling task copy Between, therefore the general Makespan for calling is very possible than strictly dispatching small.Strict scheduling and the reliability of general-purpose scheduler mode Calculation is different, and the reliability of task copy need to consider all pairs of all predecessor tasks of the task in strict scheduling This, and the reliability of task copy can only consider to complete execution time and message in all of predecessor task copy in general-purpose scheduler All copies of the call duration time sum less than current scheduling task copy Starting Executing Time.
Most of Active Replication fault-tolerant scheduling mechanism are blindly processed the certain number of times of Task Duplication to tolerate specified quantity Device fails.A.Girault in 2003 etc. is in meeting《Dependable Systems and Networks》On the article delivered “An algorithm for automatically obtaining distributed and fault-tolerant Static schedules " and Anne Benoit in 2008 etc. are in meeting《Parallel and Distributed Processing》On article " the Fault tolerant scheduling of precedence task graphs that deliver On heterogeneous platforms " propose FTBAR algorithms and FTSA algorithms respectively.The two algorithms are respectively by task Minimum scheduling pressure and the preceding processors of ε+1 of minimum deadline is dispatched to tolerate ε processor failure.The two algorithms Dispatch reliability analysis is not all provided.Zhao Laiping in 2013 etc. are in periodical《Parallel Computing》On deliver Article " Reliable workflow scheduling with less resource redundancy " propose and be based on Active Replication mechanism realizes minimizing the fault-Tolerant Scheduling Algorithm of resource overhead, and the algorithm picks reliability highest node is performed Task copy, the copy deadline is not considered.Reliability node high is scheduled to it is therefore possible to go out current task, but Its execution time is long, and the method is unfavorable for system load balancing.And the method is based on strict scheduling mode Dispatching method, its scheduling Makespan it is more long compared with general-purpose scheduler mode.Antonios Litke in 2007 etc. are in periodical 《Future Generation Computer Systems》On article " the Efficient task replication that deliver And management for adaptive fault tolerance in Mobile Grid environments " are proposed Fault-tolerant scheduling mechanism under mobile network's computing environment, but it is directed to Independent Task Scheduling, and the mechanism do not account for Scheduling Makespan.Alain Girault in 2009 etc. are in periodical《Journal of Parallel and Distributed Computing》On article " the Reliability versus performance for critical that deliver Applications " is proposed can be while optimize the Two Phase Method of reliability and scheduling length.But what the algorithm was used It is strict scheduling mode, Makespan is long for its scheduling.The algorithm does not account for link failure, and its task copy maps Node is randomly selected.Laiping Zhao in 2011 etc. are in meeting《Advanced Information Networking and Applications》On article " the A Resource Minimizing Scheduling Algorithm with that deliver Ensuring the Deadline and Reliability in Heterogeneous Systems " use Active Replication machine System carries out task fault-tolerance scheduling research.This article algorithm considers node and link failure simultaneously, but does not use general-purpose scheduler Mode.
Random search algorithm can combine the information obtained in existing Search Results and produce new knot with some random characters Really.Genetic algorithm (GA) is a kind of method of utilization natural selection and evolution the thought optimizing in higher dimensional space, with simple, fast Speed, the features such as robustness is good, and tend to provide preferably solution.It is using the non-traversal random search machine for having tutorial message System, can rapidly converge to global near-optimum solution.The operation expense of GA is usually higher, but for long-play task This is acceptable, and can improve calculating speed by parallel GA technologies.SungHo Chin in 2009 etc. are in meeting 《Ubiquitous Information Technologies&Applications》On the article " Genetic that delivers Algorithm based Scheduling Method for Efficiency and Reliability in Mobile Grid " carries out the task copy scheduling in isomery mobile grid environment based on GA to improve mission reliability, but what it was directed to It is independent task.Atakan Dogan in 2005 etc. are in periodical《The Computer Journal》On the article delivered “Biobjective scheduling algorithms for execution time-reliability trade-off The double object genetic algorithm (BGA) that in heterogeneous computing systems " are proposed can be while Optimized Operation Makespan and reliability, but BGA is possible to the trivial solution relied between producing the task of running counter to during evolution.2011 Xiaofeng Wang etc. are in periodical《Future Generation Computer Systems》On the article delivered “Optimizing the makespan and reliability for workflow applications with Reputation and a look-ahead genetic algorithm " using GA while task dependence is met, Do not carried using task copy mechanism come Optimized Operation Makespan and reliability simultaneously, but the algorithm using two-stage policy High reliability, therefore the lifting of its reliability is limited.
The copy scheduling problem of reliability can be optimized in heterogeneous computing system for np complete problem, i.e., in the absence of multinomial Time algorithm can maximize reliability.Therefore this method carries out task scheduling using alternative:As long as dispatching method is full Sufficient mission reliability requirement, and scheduler task reliability need not be maximized.Either transient fault or permanent fault, Assessment general-purpose scheduler reliability has been demonstrated it is all #P ' complete problems, and the problem is at least and the equal difficulty of np complete problem.Cause Even if this obtain task set dispatching scheme, still can not in polynomial time calculating task collection reliability.Therefore this method meter Calculate the reliability requirement of each task, meet between mission reliability requirement and task rely on restriction relation on the premise of, enter one Step Optimized Operation Makespan.The reliability calculation method of task scheduling is not under strict scheduling mode and general-purpose scheduler mode With.Mission reliability and its Starting Executing Time are tight associations, because task Starting Executing Time determines task energy The predecessor task message number for enough receiving.Current scheduling problem mostly do not account for the task copy on node start perform when Between position selection, therefore the deadline optimization on there is certain defect.Therefore this method is using based on general-purpose scheduler mode Many copy fault-tolerant scheduling mechanism, on the premise of reliability requirement is met, using the further Optimized Operation of genetic algorithm Makespan。
The content of the invention
It is of the invention to be to provide a kind of many copy task fault-tolerance scheduling of heterogeneous distributing system based on Active Replication mechanism Method.
The object of the present invention is achieved like this:
(1)According to the execution speed of each node in the load of each task and system, each is appointed in calculating application program Business vjIt is scheduled to each node p in systemkExecution time ET (vj,pk);The application program G=of constraint is relied on for existing< V,E>, set V={ v1,v2,...vN, task quantity N=| V |, E are the oriented communication weight line set between task in V;System Model is non-directed graph GS=<P,L>, P={ p1,p2,...,pMIt is M heterogeneous nodes set, M=| P |, L are the individual communication chains of | L | Gather on road;Task-set reliability requirement R;
(2)Each task is calculated in the average performance times and every communication information of all processor nodes in all chains The average communication data on road;
(3)Any task v is concentrated come calculating task using bottom priority approachjBottom priority bl (vj):
Succ (v in formulaj) it is task vjDirect follow-up work set,It is task vjThe institute in node set P There are the average performance times of node,It is message ej,iIn systems during the average transmission of all links of link set L Between;
(4)Priority according to task will allow scheduler task according to the nonincremental mode of its priority added to scheduling team Row;
(5)Highest priority task is selected from all permission scheduler tasks of scheduling queue, highest priority is calculated and is appointed Business vjReliability requirement rx, x is position of the task in priority query:
1≤x≤n in formula, and meet the prioritization of task;R is task-set reliability requirement;r′iIt is priority team Row middle position is set to the actual institute's achieved reliability, r ' of task of i0=1;If the task is for highest priority task Entry tasks, reliability requirement
(6)If reliability requirement is invalid, i.e. task vjReliability requirement rx>=1, then refusal scheduler task, and return Return;The otherwise many copy general-purpose scheduler methods of calling task calculate the copy scheduling node and Starting Executing Time of the task;
(7)Scheduler task is deleted from scheduling queue, while new permission scheduler task is added according to priority It is added in scheduling queue;Next highest priority task is scheduled in continuing selection scheduling queue, repeat step(5)-(7) Until all tasks all dispatch completion.
The many copy general-purpose scheduler methods of task are:
(6.1) corresponding information is initialized:By task viCopy amount is assigned to 0, and mapping node is assigned to sky, by idle node Set is assigned to node set P;
(6.2) if task vjIt is entry tasks, chooses deadline earliest node in idle node queue and perform and appoint Business copy, calculating task vjReliability
proc(vj) it is task vjMapping node set, λ pnIt is processor node pnPermanent fault probability, w (vj) Expression task vjLoad, w (pn) represent node pnThe amount of calculation that be can perform in unit interval;If task can not be met Reliability, then continue to choose in idle queues deadline earliest node to perform task copy, then calculating task Reliability, until meeting mission reliability requirement;If until idle node queue is sky, mission reliability still can not meet It is required that, mission reliability loss is made up by Calculation of Reliability formula when follow-up work copy is dispatched;
(6.3) if task vjThere is predecessor task, call many copy general-purpose scheduler methods of the task based on genetic algorithm Carry out copy scheduling.
The many copy general-purpose scheduler methods of task based on genetic algorithm are:
(6.3.1)Initialization crossover probability pc, mutation probability pm, population quantity GN, Evolution of Population number of times EN;
(6.3.2)Generation initial population:
Calculate the predecessor task v of current scheduling taskiBe mapped in node pkTask copyMessage reach node pn Time
FT (v in formulai,pk) it is task viIn node pkCompletion perform the time, rdy (lk,n) it is link lk,nBe ready to Call duration time is last message communicating deadline of link, w (ei,j) it is task viWith task vjBetween communication information ei,j's Size, w (lk,n) it is node pkWith node pnBetween link lk,nThe data volume that can be transmitted in unit interval, if mapping node phase Together, i.e. pk=pn, then time rdy (lk,n) it is 0, communication overhead is 0,
Each node effectively need to be started to perform by task encoding scheme in minimum effectively Starting Executing Time position and maximum All position encoded between time location is gene in individuality, task vjIn processor pnMinimum effectively perform time location EST (vj,pn) calculate;
Pred (v in formulai) it is task viDirect predecessor task set;rep(vi) it is task viCopy set;rdy (pn) it is current scheduling situation lower node pnThe completion of last mapping tasks performs time PFT (pn)
Proc (v in formulai) task viThe processor sets for being mapped;
Task vjIn processor pnMaximum effectively Starting Executing Time position LST (vj,pn)
Processor node is chosen from node idle queues, an effective Starting Executing Time position is chosen in processor node Put, map the copy of current scheduling task, the reliability of calculating task copy, if the reliability of the task is unsatisfactory for requiring, Continue to choose processor node from node idle queues and in the effective Starting Executing Time position of node selection task, Zhi Daoren The reliable sexual satisfaction requirement of business, using task copy mapping scheme as the individual in population, repeatedly generates individuality, Zhi Daoda To population scale, if task copy amount is M, the reliability of task is also not reaching to reliability requirement, will the task Copy mapping scheme is used as the individual in population, because follow-up work can in right amount compensate task reliability when dispatching is damaged Lose,
In formulaIt is task vjIt is mapped in node pnCopyReliability,It is node pnUpper current scheduling is appointed Business copyThe task copy for performing beforePrepnIt is node pnThe task copy set of execution;ST(vj,pn) it is task vj In node pnStarting Executing Time;etp,qIt is task vpWith vqBetween communication information beginning call duration time;ON(lk,n) it is in chain Road lk,nThe all of communication for occurring;etp,q≤etl,j(vp,vq∈ V) it is link lk,nUpper communication information ep,qBeginning call duration time Less than or equal to message el,jBeginning call duration time;λlk,nIt is node pkWith node pnBetween link lk,nFailure probability;If appointed Business copyWithMapping node it is identical, then its link communication time is 0, and the reliability of the communication information is 1;
The corresponding encoding gene value in effective Starting Executing Time position of mapping tasks is 1, the no mapping tasks of correspondence Position is 0, and in duty mapping, be up to one value of position is 1 in the corresponding gene of each node, and the value of other positions is 0;
Coding also includes effective mapping position number of each node in individual UVR exposure, and the position is represented by array s, such as Fruit task vjDistribute to node pnIn k-th effective Starting Executing Time position, then individual gjIn l-th gene gj,l=1,|si| it is s in array siRepresentative node piEffective mapping position number, | s0|=0, coding individuality Length isArray element siIn individual gjCorresponding gene sets are
(6.3.3)According to crossover probability pcAll individualities in population carry out crossover operation:
If random number is less than crossover probability pc, it is right in two individualities in selection array s for two individualities selected The same node point for answering encoding gene value to differ, the gene corresponding to all nodes that will be chosen in two individualities is swapped, The new individual that will be generated is added to population;
(6.3.4)According to mutation probability pmAll individualities in population carry out mutation operation:
It is newly-generated individual added to population;
(6.3.5)Deadline valuation functions FTimWith reliability assessment function FRelCalculate each individual g in populationiIt is suitable Response, by all individualities according to FTimAnd FRelThe descending arrangement of functional value obtains two sequence individual queues
(6.3.6)The individuality in two queues is selected based on RR mechanism as the individuality in new population, until reaching population Scale requirements;
(6.3.7)If being unsatisfactory for stop condition, repeat step(6.3.3)-(6.3.6), regulation evolution number of times it Interior reliability or Makespan are not improved, and stop solving.
The beneficial effects of the present invention are:
The general-purpose scheduler mode of many copy task fault-tolerance dispatching methods of heterogeneous distributing system of the present invention allows current scheduling The Starting Executing Time of copy is to lead between the maximum in part predecessor task copy and its respective call duration time sum, task Letter message simultaneously need not all send, as long as being sent message is relied between required partial task copy, the method can enter one Step reduces the Starting Executing Time of current scheduling task copy, therefore the method can further reduce the scheduling of task Makespan.Multiple intersection and change of the method on the premise of ensuring to meet the reliability requirement of task-set using genetic algorithm Different evolutional operation further optimizes the scheduling Makespan of task, node failure is considered in Calculation of Reliability and link loses Effect;And the method is not in idle task copy pair in the evolutionary process of genetic algorithm.
Brief description of the drawings
The many copy task fault-tolerance dispatching method flow charts of Fig. 1 heterogeneous distributing systems;
Fig. 2 scheduler task DAG structure charts;
Fig. 3 systems interior joint and link configuration parameters;
Fig. 4 tasks v3An individual for initialization of population generation during mapping;
Fig. 5 tasks v3Second individuality of initialization of population generation during mapping;
Fig. 6 tasks v33rd individuality of initialization of population generation during mapping;
Fig. 7 tasks v34th individuality of initialization of population generation during mapping;
Fig. 8 tasks v3Individuality g during mapping3The 5th individuality generated after making a variation for the first time;
Fig. 9 tasks v4An individual for initialization of population generation during mapping;
Figure 10 tasks v4Second individuality of initialization of population generation during mapping;
Figure 11 tasks v43rd individuality of initialization of population generation during mapping;
Figure 12 tasks v44th individuality of initialization of population generation during mapping;
Figure 13 tasks v4The 5th individuality generated after intersecting during mapping;
Figure 14 tasks v4The 6th individuality generated after being made a variation during mapping;
The scheduling scheme that Figure 15 is ultimately generated.
Specific embodiment
The present invention is described in more detail below in conjunction with the accompanying drawings:
The brought wasting of resources and other reliability dispatching methods ignorance scheduling Makespan, task is replicated for blindness Between rely on link failure probability and strict scheduling mode scheduling Makespan defects more long, it is an object of the invention to provide one Plant many copy task fault-tolerance dispatching methods of heterogeneous distributing system based on Active Replication mechanism.The method is based on general-purpose scheduler side Formula on the premise of mission reliability requirement is met, is evolved using many copy fault tolerant mechanisms by the intersection and variation of genetic algorithm Operation further optimizes the scheduling Makespan of task-set.
Many copy task fault-tolerance dispatching methods of heterogeneous distributing system of the present invention are comprised the following steps that:
The application program G=of constraint is relied on for existing<V,E>, set of tasks V={ v1,v2,...vN, task quantity N =| V |, E are the oriented communication weight line set between task in V;System model is non-directed graph GS=<P,L>, P={ p1, p2,...,pMIt is M heterogeneous nodes set, M=| P |, L are the individual communication link set of | L |;Task-set reliability requirement R:
1. the load of each task and the execution speed of each node in system are first according to, each in application program is calculated Task vjIt is scheduled to each node p in systemkExecution time ET (vj,pk)。
2. average performance times and every communication information of each task in all processor nodes are calculated in all chains The average communication data on road.
3. any task v is concentrated come calculating task using bottom priority approach according to formula (1)jBottom priority bl (vj)。
Succ (v in formulaj) it is task vjDirect follow-up work set,It is task vjThe institute in node set P There are the average performance times of node,It is message ej,iIn systems during the average transmission of all links of link set L Between.
4. the priority according to task will allow scheduler task according to the nonincremental mode of its priority added to scheduling team Row, it is allowed to which scheduler task is that predecessor task is scheduled to be completed or in the absence of the task of predecessor task.
5. highest priority task is selected from all permission scheduler tasks of scheduling queue, calculates excellent according to formula (2) First level super objective vjReliability requirement rx(x is position of the task in priority query).
1≤x≤n in formula, and meet the prioritization of task;R is task-set reliability requirement;r′iIt is priority team Row middle position is set to the actual institute's achieved reliability, r ' of task of i0=1.If the task be highest priority task (i.e. Entry tasks), then its reliability requirement
If 6. reliability requirement is invalid, i.e. task vjReliability requirement rx>=1, then refusal scheduler task, and return Return.The otherwise many copy general-purpose scheduler methods of calling task calculate the copy scheduling node and Starting Executing Time of the task.
The many copy general-purpose scheduler method implementation process of task are:
For scheduler task vj, system interior joint set P, mission reliability requirement rx
(1) corresponding information is initialized first.By task viCopy amount is assigned to 0, and mapping node is assigned to sky.Free time is saved Point set is assigned to node set P.
(2) if task vjIt is entry tasks, then choose deadline earliest node in idle node queue first Execution task copy (if the deadline of two nodes identical so randomly select any node).Calculated according to formula (3) Task vjReliabilityAs long as all copies reliability sexual satisfaction corresponding requirements of the task, it is not necessary to consider task Between rely on message.
In formula, proc (vj) it is task vjMapping node set, λ pnIt is processor node pnPermanent fault probability, w(vj) represent task vjLoad, w (pn) represent node pnThe amount of calculation that be can perform in unit interval.
If mission reliability can not be met, then continue to choose in idle queues deadline earliest node to perform Task copy, then calculates the reliability of the task according to formula (3), until meeting the mission reliability requirement.If until Idle node queue is sky, and the mission reliability still can not meet requirement, can be by can when follow-up work copy is dispatched Mission reliability loss is made up by property computing formula, to ensure to meet set of tasks reliability requirement.
(3) if task vjThere is predecessor task, then call many copy general-purpose scheduler sides of task based on genetic algorithm Method carries out copy scheduling.
Many copy general-purpose scheduler methods of task based on genetic algorithm are comprised the following steps that:
1) initialization crossover probability p firstc, mutation probability pm, population quantity GN, Evolution of Population number of times EN.
2) then generation initial population.
Dependence between in order to ensure task, can only include effective Starting Executing Time position in coding.Effectively start to hold Line position is put the task of being necessary to ensure that and can receive the message of its predecessor task institute mapping node transmission.For current scheduling task Any predecessor task viBe mapped in node pkTask copyMessage reach node pnTimeAccording to formula (4) calculate.
FT (v in formulai,pk) it is task viIn node pkCompletion perform the time.It is link lk,nBe ready to communication Time is last message communicating deadline of link.w(ei,j) it is task viWith task vjBetween communication information ei,jIt is big It is small.w(lk,n) it is node pkWith node pnBetween link lk,nThe data volume that can be transmitted in unit interval.If mapping node is identical, That is pk=pn, then time rdy (lk,n) it is 0, communication overhead is 0, now
Each node effectively need to be started to perform by task encoding scheme in minimum effectively Starting Executing Time position and maximum All position encoded between time location is gene in individuality.Task vjIn processor pnMinimum effectively Starting Executing Time position Put EST (vj,pn) calculated according to formula (5).
Pred (v in formulai) it is task viDirect predecessor task set;rep(vi) it is task viCopy set;rdy (pn) it is current scheduling situation lower node pnThe completion of last mapping tasks performs time PFT (pn), its computational methods such as formula (6) shown in.
Proc (v in formulai) task viThe processor sets for being mapped.
Task vjIn processor pnMaximum effectively Starting Executing Time position LST (vj,pn) calculated according to formula (7).
Certain processor node is randomly selected from node idle queues, device node is managed in this place and is randomly selected one effectively Starting Executing Time position, maps the copy of current scheduling task.The reliability of the task copy is calculated according to formula (8).Such as Really the reliability of the task is unsatisfactory for requiring, then continue to choose processor node from node idle queues and in node choosing The effective Starting Executing Time position of task is taken, until the reliable sexual satisfaction requirement of the task, the task copy mapping scheme is made It is the individual in population.Individuality is repeatedly generated, until reaching population scale.If task copy amount is M, task Reliability is also not reaching to reliability requirement, will the task copy mapping scheme as the individual in population because after The reliability loss of the task can be in right amount compensated during continuous task scheduling.
In formulaIt is task vjIt is mapped in node pnCopyReliability,It is node pnUpper current scheduling Task copyThe task copy for performing beforePrepnIt is node pnThe task copy set of execution;ST(vj,pn) it is task vjIn node pnStarting Executing Time;etp,qIt is task vpWith vqBetween communication information beginning call duration time;ON(lk,n) be Link lk,nThe all of communication for occurring;etp,q≤etl,j(vp,vq∈ V) it is link lk,nUpper communication information ep,qStart communication when Between be less than or equal to message el,jBeginning call duration time.λlk,nIt is node pkWith node pnBetween link lk,nFailure probability.If Task copyWithMapping node it is identical, then its link communication time is 0, and the reliability of the communication information is 1.
The corresponding encoding gene value in effective Starting Executing Time position of mapping tasks is 1, corresponds to not map therewith and appoints The position of business is 0.In duty mapping, in order to prevent that task is repeatedly mapped to identical node, the corresponding base of each node It is 1 that can only at most have a value for position because in, and the value of other positions is 0, that is, map to the same task copy of same node point Can only at most there is one.
Encoding scheme will also include effective mapping position number of each node in individual UVR exposure, and the position is by array s Represent.If task vjDistribute to node pnIn k-th effective Starting Executing Time position, then individual gjIn l-th gene gj,l=1,|si| it is s in array siRepresentative node piEffective mapping position number, | s0|=0.Coding Individual length isArray element siIn individual gjCorresponding gene sets are
3) according to crossover probability pcAll individualities in population carry out crossover operation.If random number is less than crossover probability pc, for two individualities selected, in random selection array s in two individualities correspondence encoding gene value is differed certain or Certain several same node point, the gene corresponding to all nodes that will be chosen in two individualities is swapped.It is finally new by what is generated Individuality is added to population.
4) according to mutation probability pmAll individualities in population carry out mutation operation.If random number is less than mutation probability pm, certain individuality and randomly selected certain node location in array s for selecting carry out mutation operation.Mutation operation includes Change task copy is in two kinds of the Starting Executing Time position of mapping node and task copy mapping node.
If the individual reliability is low compared with the reliability requirement of current scheduling task, and the corresponding nodes of array s are Through mapping tasks copy, then postpone increasing the mapping tasks Starting Executing Time of the variation node chosen in individuality backward Mission reliability.
If the individual reliability is low compared with the reliability requirement of current scheduling task, and the corresponding nodes of array s do not have There is mapping tasks copy, then the Starting Executing Time position of the variation node without mapping tasks copy that will be chosen in individuality Corresponding genic value is set to 1, adds new mappings task copy to improve reliability.
If the individual reliability is high compared with the reliability requirement of current scheduling task, and exists than current scheduling task The early effective Starting Executing Time position of copy Starting Executing Time, then the mapping tasks of the variation node that will be chosen in individuality Starting Executing Time is elapsed forward, and this can reduce mission reliability, but as long as meeting corresponding reliability conditions.
If the individual reliability is high compared with the reliability requirement of current scheduling task, and current scheduling task copy Starting Executing Time is earliest effective Starting Executing Time of mapping node, then node Starting Executing Time position is corresponding Genic value is set to 0, and the copy of the variation node of the mapping tasks copy that will be chosen in individuality is cancelled, and can so reduce can By property, but as long as ensuring to meet mission reliability requirement.
Finally will be newly-generated individual added to population.
5) according to formula (9) deadline valuation functions FTimWith formula (10) reliability assessment function FRelIn calculating population Each individual giFitness.By all individualities according to FTimAnd FRelThe descending arrangement of functional value obtains two individual teams of sequence Row.
6) individuality in two queues is selected based on RR mechanism as the individuality in new population, will until reaching population scale Ask.
If 7) be unsatisfactory for stop condition, repeat step 3) -6).It is final after certain Evolution of Population number of times or algorithm is received When holding back (reliability or Makespan are not significantly improved within the evolution number of times of regulation), stop solving.
7. scheduler task is deleted from scheduling queue, while new permission scheduler task is added according to its priority It is added in scheduling queue.Continue selection scheduling queue in next highest priority task be scheduled, repetitive process 5-7 until All tasks all dispatch completion.
Fig. 1 shows many copy task fault-tolerance dispatching method flow charts of heterogeneous distributing system, with reference to flow chart and example Describe the implementation process of the method in detail.
Example is by task-set V={ v in Fig. 21,v2,v3,v4Configuration parameter is dispatched to for Fig. 3 interior joint set P={ p1, p2,p3,p4,p5Heterogeneous distributing system when dispatch situation, reliability requirement R be 0.999.
1. the load of each task and the execution speed of each node in system are first according to, each task are calculated and is scheduled The execution time of each node into system.It is computed:Task v1Be respectively in the execution time of five nodes 18,9,9, 18,6 }, task v2It is respectively { 20,10,10,20,6.7 } in the execution time of five nodes, task v3In five execution of node Time is respectively { 22,11,11,22,7.3 }, task v4It is respectively { 24,12,12,24,8 } in the execution time of five nodes.
2. average performance times of each task in all processor nodes are calculated:Task v1Held in five the average of node The row time is 12, task v2It is 13.3, task v in five average performance times of node3In five average performance times of node It is 14.7, task v4It is 16 in five average performance times of node.Every communication information is calculated in the average logical of all links The letter time:Message e1,2It is 8.5, message e in the average communication data of all links1,3It is in the average communication data of all links 10.6, message e2,4It is 6.4, message e in the average communication data of all links3,4It is in the average communication data of all links 12.8。
3. each task priority is calculated using bottom priority approach according to formula (1).Task v1Bl (v1) be 54.1, task v2Bl (v2) it is 35.7, task v3Bl (v3) it is 43.5, task v4Bl (v4) it is 16.The priority of task It is ordered as { v1,v3,v2,v4}。
4. the priority according to task will allow scheduler task v1Added to scheduling queue.
5. highest priority task v is selected from all permission scheduler tasks of scheduling queue1, calculated according to formula (2) Task v1Reliability requirement
6. many copy general-purpose scheduler methods of calling task carry out calculating task v1Copy scheduling node and Starting Executing Time.
(1) first by task v1Copy amount is assigned to 0, and mapping node is assigned to sky.Idle node set is assigned to set of node Close P.
(2) deadline earliest node p in idle node queue is chosen1Execution task copy.Calculated according to formula (3) Task v1ReliabilityIt is 0.998202.Mission reliability requirement can not be met, then complete in continuation selection idle queues Into the node p that the time is earliest4To perform task copy, it is 0.999981 that the reliability of the task is then calculated according to formula (3), Meet the mission reliability requirement.That is r '1It is 0.999981.
7. by scheduler task v1Deleted from scheduling queue, while by new permission scheduler task v2And v3According to it Priority is added in scheduling queue.Continue to select highest priority task v3It is scheduled.
8. highest priority task v is selected from all permission scheduler tasks of scheduling queue3, calculated according to formula (2) Task v3Reliability requirement
9. many copy general-purpose scheduler methods of calling task carry out calculating task v3Copy scheduling node and Starting Executing Time.
(1) first by task v3Copy amount is assigned to 0, and mapping node is assigned to sky.Idle node set is assigned to set of node Close P.
(2) calling many copy general-purpose scheduler algorithms of the task based on genetic algorithm carries out copy scheduling, comprises the following steps that:
1) initialization crossover probability p firstc=0.5, mutation probability pm=0.25, population quantity GN=4, Evolution of Population time Number EN=3.
2) then generation initial population.
Firstly generate individual g1It is by task v3Map to node p1.Now EST (v3,p1) it is 18, LST (v3, p1) it is 38.CopyStarting Executing Time position be(for task vjPredecessor task copyTask copyIn section Point pnCorresponding Starting Executing Time position is designated as), i.e., 18.Now task v3ReliabilityIt is 0.996008.After It is continuous to choose node p2Mapping tasks v3Copy.Now EST (v3,p2) it is 23, LST (v3,p2) it is 28.CopyStart perform Time location isI.e. 23.Now task v3ReliabilityIt is 0.999967.Task v3Individual g1Encoding scheme is as schemed Shown in 4.
Then second individuality g is generated2It is by task v3Map to node p1.Now EST (v3,p1) it is 18, LST (v3, p1) it is 38.CopyStarting Executing Time position beI.e. 18.Now task v3ReliabilityIt is 0.996008. Continue to choose node p4Mapping tasks v3Copy.Now EST (v3,p4) it is 18, LST (v3,p4) it is 38.CopyStart hold Row time location isI.e. 18.Now task v3ReliabilityIt is 0.999905.Task v3Individual g2Encoding scheme is such as Shown in Fig. 5.
Then the 3rd individuality g is generated3It is by task v3Map to node p2.Now EST (v3,p2) it is 23, LST (v3, p2) it is 28.CopyStarting Executing Time position beI.e. 23.Now task v3ReliabilityIt is 0.991635. Continue to choose node p3Mapping tasks v3Copy.Now EST (v3,p3) it is 23, LST (v3,p3) it is 28.CopyStart hold Row time location isI.e. 23.Now task v3ReliabilityIt is 0.999946.Task v3Individual g3Encoding scheme is such as Shown in Fig. 6.
Ultimately produce the 4th individuality g4It is by task v3Map to node p2.Now EST (v3,p2) it is 23, LST (v3, p2) it is 28.CopyStarting Executing Time position beI.e. 23.Now task v3ReliabilityIt is 0.991635. Continue to choose node p4Mapping tasks v3Copy.Now EST (v3,p4) it is 18, LST (v3,p4) it is 38.CopyStart hold Row time location isI.e. 18.Now task v3ReliabilityIt is 0.999802.Task v3Individual g4Encoding scheme As shown in Figure 7.
3) according to crossover probability pcAll individualities in=0.5 pair of population carry out first time crossover operation.Assuming that now every The random number of secondary crossover operation is less than crossover probability pc, therefore crossover operation is not carried out.
4) according to mutation probability pmAll individualities in=0.25 pair of population carry out mutation operation.Assuming that in the 3rd individuality g3Become different time random number and be more than mutation probability pm, then the 3rd individuality is made a variation.Randomly select the 3rd individual nodes p2's Mapping position enters row variation.The individual reliability is compared with task v3Reliability requirement it is high, and task copyStart hold The row time is earliest effective Starting Executing Time of mapping node, therefore by node p2Middle Starting Executing Time positionCorrespondence Genic value be set to 0, by the node p of being made a variation in individuality2CopyCancel, generation individuality g5.Task v3Generated after variation individual g5Encoding scheme is as shown in Figure 8.Will be newly-generated individual added to population.
5) according to formula (9) deadline valuation functions FTimWith formula (10) reliability assessment function FRelIn calculating population Each individual fitness.According to FTimAnd FRelFunctional value is descending to arrange to obtain two sequence individual queues.Deadline comments Estimate queue:g3, g5, g1, g2, g4.Reliability assessment queue:g1, g3, g2, g4, g5
6) individuality in two queues is selected based on RR mechanism as the individuality in new population, will until reaching population scale Ask.Choose queue in individuality be:g3, g1, g5, g2.Evolve for the first time and complete.
7) remaining evolutionary process as procedure described above 3) -6) is carried out, in the mutation operation evolved for second, it is assumed that individual Body g5Meet variation condition, variation mode is in node p5Addition task v3Mapping copyIt is raw after finally being evolved at three times Into final task v3Copy mapping scheme:WithNow task copyStarting Executing Time be 23, task copyStarting Executing Time be 21.3, task v3Completion perform the time be 34, reliability is 0.999970.That is r '2For 0.999970。
10. by scheduler task v3From scheduling queue delete, continue selection scheduling queue in next priority most Task v high2It is scheduled.According to formula (2) calculating task v2Reliability requirement
The many copy general-purpose scheduler methods of 11. calling tasks carry out calculating task v2Copy scheduling node and start perform when Between.Calculation procedure and task v3Calculation procedure it is similar, herein with regard to not repeated.After finally being evolved at three times, generation is final Task copy mapping scheme:WithNow task copyStarting Executing Time be 18, task copyStart hold The row time is 22, task v2Completion perform the time be 38, reliability is 0.999973.That is r '3It is 0.999973.
12. by scheduler task v2Deleted from scheduling queue, while by new permission scheduler task v4It is added to scheduling In queue.Continue to select highest priority task v4It is scheduled.
13. select highest priority task v from all permission scheduler tasks of scheduling queue4, calculated according to formula (2) Task v4Reliability requirement r4。r4=R/ (r '1*r′2*r′3)=0.99907593.
The many copy general-purpose scheduler methods of 14. calling tasks carry out calculating task v4Copy scheduling node and start perform when Between.
(1) first by task v4Copy amount is assigned to 0, and mapping node is assigned to sky.Idle node set is assigned to set of node Close P.
(2) calling many copy general-purpose scheduler algorithms of the task based on genetic algorithm carries out copy scheduling, comprises the following steps that:
1) initialization crossover probability p firstc=0.5, mutation probability pm=0.25, population quantity GN=4, Evolution of Population time Number EN=3.
2) then generation initial population.
Firstly generate individual g1It is by task v4Map to node p1.Now EST (v4,p1) it is 40, LST (v4, p1) it is 40.6.CopyStarting Executing Time position beI.e. 40.Now task v4ReliabilityFor 0.992627.Continue to choose node p2Mapping tasks v4Copy.Now EST (v4,p2) it is 38, LST (v4,p2) it is 52.6.It is secondary ThisStarting Executing Time position beI.e. 38.Now task v4ReliabilityIt is 0.999927.Task v4Individuality g1Encoding scheme is as shown in Figure 9.
Then second individuality g is generated2It is by task v4Map to node p1.Now EST (v4,p1) it is 40, LST (v4, p1) it is 40.6.CopyStarting Executing Time position beI.e. 40.Now task v4ReliabilityIt is 0.992627. Continue to choose node p3Mapping tasks v4Copy.Now EST (v4,p3) it is 34, LST (v4,p3) it is 52.6.CopyBeginning Performing time location isI.e. 34.Now task v4ReliabilityIt is 0.999911.Task v4Individual g2Coding staff Case is as shown in Figure 10.
Then the 3rd individuality g is generated3It is by task v4Map to node p2.Now EST (v4,p2) it is 38, LST (v4, p2) it is 52.6.CopyStarting Executing Time position beI.e. 38.Now task v4ReliabilityFor 0.990050.Continue to choose node p3Mapping tasks v4Copy.Now EST (v4,p3) it is 34, LST (v4,p3) it is 52.6.It is secondary ThisStarting Executing Time position beI.e. 41.Now task v4ReliabilityIt is 0.999886.Task v4Individuality g3Encoding scheme is as shown in figure 11.
Ultimately produce the 4th individuality g4It is by task v4Map to node p3.Now EST (v4,p3) it is 34, LST (v4, p3) it is 52.6.CopyStarting Executing Time position beI.e. 34.Now task v4ReliabilityFor 0.987973.Continue to choose node p4Mapping tasks v3Copy.Now EST (v4,p4) it is 35, LST (v4,p4) it is 50.CopyStarting Executing Time position beI.e. 35.Now task v4ReliabilityIt is 0.999603.Task v4Individuality g4Encoding scheme is as shown in figure 12.
3) according to crossover probability pcAll individualities in=0.5 pair of population carry out first time crossover operation.Assuming that now only In individual g1With individual g4Random number during crossover operation is more than crossover probability pc, therefore carry out crossover operation.During crossover operation with Two positions that machine chooses array s include array element s2And s3Between mapping position.Therefore by two individual interior joint p2And p3 Corresponding encoding gene is swapped, and produces new individual g5.Individual g5It is by task v4Map to node p2And p4.Now EST(v4,p2) it is 38, LST (v4,p2) it is 52.6.CopyStarting Executing Time position beI.e. 38.Now EST (v4, p4) it is 35, LST (v4,p4) it is 50.CopyStarting Executing Time position beI.e. 35.Now task v4ReliabilityIt is 0.999671.By newly-generated individual g5Added to population.Task v4The individual g of crossover operation generation5Encoding scheme is such as Shown in Figure 13.
4) according to mutation probability pmAll individualities in=0.25 pair of population carry out mutation operation.Assuming that in the 3rd individuality g3Become different time random number and be more than mutation probability pm, then the 3rd individuality is made a variation.Randomly select the 3rd individual nodes p3's Mapping position enters row variation.The individual reliability is compared with task v4Reliability requirement it is high, and task copyStart hold Still suffered from effective Starting Executing Time before the row time, therefore by node p3Starting Executing Time position is migrated to positionCopyStarting Executing Time position be34.Generation individuality g6, its reliability is 0.999880, by newly-generated individual g6Addition To population.Task v4The individual g of mutation operation generation6Encoding scheme is as shown in figure 14.
5) according to formula (9) deadline valuation functions FTimWith formula (10) reliability assessment function FRelIn calculating population Each individual fitness.According to FTimAnd FRelFunctional value is descending to arrange to obtain two sequence individual queues.Deadline comments Estimate queue:g6, g3, g4, g5, g1, g2.Reliability assessment queue:g1, g2, g3, g6, g5, g4
6) individuality in two queues is selected based on RR mechanism as the individuality in new population, will until reaching population scale Ask.Choose queue in individuality be:g6, g1, g3, g2.Evolve for the first time and complete.
7) as procedure described above 3) -6) remaining evolutionary process is carried out.After finally being evolved at three times, final task is generated v4Copy mapping scheme:WithNow task copyStarting Executing Time be 38, task copyStart perform Time is 34, task v4Completion perform the time be 50, reliability is 0.999880.
15. scheduling schemes for ultimately generating are as shown in figure 15.Now the scheduling Makespan of task-set is 50, and reliability is 0.99980401。

Claims (1)

1. many copy task fault-tolerance dispatching methods of a kind of heterogeneous distributing system, it is characterised in that:
(1) according to the execution speed of each node in the load of each task and system, each task v in application program is calculatedjQuilt It is dispatched to each node p in systemkExecution time ET (vj,pk);The application program G=of constraint is relied on for existing<V,E>, Set V={ v1,v2,...vN, task quantity N=| V |, E are the oriented communication weight line set between task in V;System model It is non-directed graph GS=<P,L>, P={ p1,p2,...,pMIt is M heterogeneous nodes set, M=| P |, L are the individual communication link collection of | L | Close;Task-set reliability requirement R;
(2) each task is calculated in the average performance times and every communication information of all processor nodes in all links Average communication data;
(3) any task v is concentrated come calculating task using bottom priority approachjBottom priority bl (vj):
Succ (v in formulaj) it is task vjDirect follow-up work set,It is task vjAll nodes in node set P Average performance times,It is message ej,iThe average transmission time of all links of link set L in systems;
(4) priority according to task will allow scheduler task to be added to scheduling queue according to the nonincremental mode of its priority;
(5) highest priority task is selected from all permission scheduler tasks of scheduling queue, highest priority task v is calculatedj's Reliability requirement rx, x is position of the task in priority query:
r x = R / &Pi; i = 0 x - 1 r i &prime; n - x + 1
1≤x≤n in formula, and meet the prioritization of task;R is task-set reliability requirement;r′iIt is priority query's middle position It is set to the actual institute's achieved reliability, r ' of task of i0=1;If the task is entry tasks for highest priority task, Reliability requirement
(6) if reliability requirement is invalid, i.e. task vjReliability requirement rx>=1, then refusal scheduler task, and return;It is no Then many copy general-purpose scheduler methods of calling task calculate the copy scheduling node and Starting Executing Time of the task;
(7) scheduler task is deleted from scheduling queue, while new permission scheduler task is added to according to priority In scheduling queue;Continue selection scheduling queue in next highest priority task be scheduled, repeat step (5)-(7) until All tasks all dispatch completion;
The many copy general-purpose scheduler methods of described task are:
(6.1) corresponding information is initialized:By task viCopy amount is assigned to 0, and mapping node is assigned to sky, by idle node set It is assigned to node set P;
(6.2) if task vjIt is entry tasks, deadline earliest node performs task pair in choosing idle node queue This, calculating task vjReliability P [Evj]:
P[Evj]=1- ∏pn∈proc(vj)(1-exp{-λpn*w(vj)/w(pn)})
proc(vj) it is task vjMapping node set, λ pnIt is processor node pnPermanent fault probability, w (vj) represent Task vjLoad, w (pn) represent node pnThe amount of calculation that be can perform in unit interval;If task reliability can not be met Property, then continue to choose in idle queues deadline earliest node to perform task copy, then the reliability of calculating task Property, until meeting mission reliability requirement;If until idle node queue is sky, mission reliability still can not meet will Ask, make up mission reliability loss by Calculation of Reliability formula when follow-up work copy is dispatched;
(6.3) if task vjThere is predecessor task, call many copy general-purpose scheduler methods of the task based on genetic algorithm to carry out pair This scheduling;
The many copy general-purpose scheduler methods of the described task based on genetic algorithm are:
(6.3.1) initialization crossover probability pc, mutation probability pm, population quantity GN, Evolution of Population number of times EN;
(6.3.2) generates initial population:
Calculate the predecessor task v of current scheduling taskiBe mapped in node pkTask copyMessage reach node pnWhen Between
a v e ( v i k , p n ) = m a x { F T ( v i , p k ) , r d y ( l k , n ) } + w ( e i , j ) / w ( l k , n )
FT (v in formulai,pk) it is task viIn node pkCompletion perform the time, rdy (lk,n) it is link lk,nBe ready to communication when Between i.e. link last message communicating deadline, w (ei,j) it is task viWith task vjBetween communication information ei,jSize, w (lk,n) it is node pkWith node pnBetween link lk,nThe data volume that can be transmitted in unit interval, if mapping node is identical, i.e. pk =pn, then time rdy (lk,n) it is 0, communication overhead is 0,
Task encoding scheme need to be by each node in minimum effectively Starting Executing Time position and maximum effectively Starting Executing Time All position encoded between position is gene in individuality, task vjIn processor pnMinimum effectively perform time location EST (vj, pn) calculate;
E S T ( v j , p n ) = m a x { m a x v i &Element; p r e d ( v j ) { m i n v i k &Element; r e p ( v i ) { a v e ( v i k , p n ) } } , r d y ( p n ) }
Pred (v in formulai) it is task viDirect predecessor task set;rep(vi) it is task viCopy set;rdy(pn) be Current scheduling situation lower node pnThe completion of last mapping tasks performs time PFT (pn)
PFT(pk)=maxvi∈V,pk∈proc(vi){FT(vi,pk)}
Proc (v in formulai) task viThe processor sets for being mapped;
Task vjIn processor pnMaximum effectively Starting Executing Time position LST (vj,pn)
L S T ( v j , p n ) = max { m a x v i &Element; p r e d ( v j ) { m a x v i k &Element; r e p ( v i ) { a v e ( v i k , p n ) } } , r d y ( p n ) }
Processor node is chosen from node idle queues, an effective Starting Executing Time position is chosen in processor node, The copy of current scheduling task is mapped, the reliability of calculating task copy, if the reliability of the task is unsatisfactory for requiring, continues Processor node is chosen from node idle queues and in the effective Starting Executing Time position of node selection task, until task Reliable sexual satisfaction requirement, using task copy mapping scheme as the individual in population, repeatedly generates individuality, until reaching kind Group's scale, if task copy amount is M, the reliability of task is also not reaching to reliability requirement, will the task copy Mapping scheme is used as the individual in population, because follow-up work can in right amount compensate the reliability loss of the task when dispatching,
P &lsqb; E v j &rsqb; = 1 - &Pi; p n &Element; p r o c ( v j ) ( 1 - AR v j p n ) = 1 - &Pi; p n &Element; p r o c ( v j ) ( 1 - ( &Pi; v i n &Element; Prep n &cap; tv i n &le; tv j n ( e - &lambda;p n * w ( v i ) / w ( p n ) ) ) ) &times; &Pi; v l &Element; p r e d ( v j ) ( 1 - &Pi; p k &Element; p r o c ( v l ) , a v e ( v l k , p n ) &le; S T ( v j , p n ) ( 1 - ( &Pi; et p , q &Element; O N ( l k , n ) &cap; et p , q &le; et l , j ( e - &lambda;l k , n * w ( e p , q ) / w ( l k , n ) ) ) ) ) )
In formulaIt is task vjIt is mapped in node pnCopyReliability,It is node pnUpper current scheduler task pair ThisThe task copy for performing beforePrepnIt is node pnThe task copy set of execution;ST(vj,pn) it is task vjIn section Point pnStarting Executing Time;etp,qIt is task vpWith vqBetween communication information beginning call duration time;ON(lk,n) it is in link lk,n The all of communication for occurring;etp,q≤etl,j(vp,vq∈ V) it is link lk,nUpper communication information ep,qBeginning call duration time be less than Or equal to message el,jBeginning call duration time;λlk,nIt is node pkWith node pnBetween link lk,nFailure probability;If task pair ThisWithMapping node it is identical, then its link communication time is 0, and the reliability of the communication information is 1;
The corresponding encoding gene value in effective Starting Executing Time position of mapping tasks is 1, position of the correspondence without mapping tasks It is 0, in duty mapping, be up to one value of position is 1 in the corresponding gene of each node, and the value of other positions is 0;
Coding also includes effective mapping position number of each node in individual UVR exposure, and the position is represented by array s, if appointed Business vjDistribute to node pnIn k-th effective Starting Executing Time position, then individual gjIn l-th gene gj,l=1,|si| it is s in array siRepresentative node piEffective mapping position number, | s0|=0, coding individuality Length isArray element siIn individual gjCorresponding gene sets are
(6.3.3) is according to crossover probability pcAll individualities in population carry out crossover operation:
If random number is less than crossover probability pc, for two individualities selected, correspondence is compiled in two individualities in selection array s The same node point that code genic value is differed, the gene corresponding to all nodes that will be chosen in two individualities is swapped, by life Into new individual be added to population;
(6.3.4) is according to mutation probability pmAll individualities in population carry out mutation operation:
It is newly-generated individual added to population;
(6.3.5) deadline valuation functions FTimWith reliability assessment function FRelCalculate each individual g in populationiFitness, By all individualities according to FTimAnd FRelThe descending arrangement of functional value obtains two sequence individual queues
F T i m ( g i ) = 1 - m a x 1 &le; k &le; M { F T ( v j , p k ) &Sigma; &Sigma; q = 0 k - 1 | s q | < p < 1 + &Sigma; l = 0 k | s l | g i , p = 1 }
F Re l ( g i ) = P &lsqb; E v j &rsqb; = 1 - &Pi; 1 &le; k &le; M , &Sigma; &Sigma; q = 0 k - 1 | s q | < p < 1 + &Sigma; l = 0 k | s l | g i , p = 1 ( 1 - AR v j p k ) ;
(6.3.6) is based on RR mechanism and selects the individuality in two queues as the individuality in new population, until reaching population scale It is required that;
(6.3.7), if being unsatisfactory for stop condition, repeat step (6.3.3)-(6.3.6) can within the evolution number of times of regulation Do not improved by property or Makespan, stop solving.
CN201410216137.0A 2014-05-21 2014-05-21 Multi-copy task fault tolerance scheduling method of heterogeneous distributed system Active CN103970613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410216137.0A CN103970613B (en) 2014-05-21 2014-05-21 Multi-copy task fault tolerance scheduling method of heterogeneous distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410216137.0A CN103970613B (en) 2014-05-21 2014-05-21 Multi-copy task fault tolerance scheduling method of heterogeneous distributed system

Publications (2)

Publication Number Publication Date
CN103970613A CN103970613A (en) 2014-08-06
CN103970613B true CN103970613B (en) 2017-05-24

Family

ID=51240145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410216137.0A Active CN103970613B (en) 2014-05-21 2014-05-21 Multi-copy task fault tolerance scheduling method of heterogeneous distributed system

Country Status (1)

Country Link
CN (1) CN103970613B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628708A (en) * 2017-03-20 2018-10-09 中兴通讯股份有限公司 Cloud computing fault-tolerance approach and device
CN108108233B (en) * 2017-11-29 2021-10-01 上海交通大学 Cluster job scheduling method and system for task multi-copy execution
CN109254841B (en) * 2018-09-30 2021-11-26 湘潭大学 Dual-objective optimization task scheduling method for distributed system
CN109976890B (en) * 2019-03-28 2023-05-30 东南大学 Variable frequency method for minimizing heterogeneous private cloud computing resource energy consumption
CN111090783B (en) * 2019-12-18 2023-10-03 北京百度网讯科技有限公司 Recommendation method, device and system, graph embedded wandering method and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799474A (en) * 2012-06-21 2012-11-28 浙江工商大学 Cloud resource fault-tolerant scheduling method based on reliability drive

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799474A (en) * 2012-06-21 2012-11-28 浙江工商大学 Cloud resource fault-tolerant scheduling method based on reliability drive

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Resource Minimizing Scheduling Algorithm with Ensuring the Deadline and Reliability in Heterogeneous Systems;Laiping Zhao et al.;《2011 IEEE International Conference on Advanced Information Networking and Application》;20111231;全文 *
Genetic Algorithm based Scheduling Method for Efficiency and Reliability in Mobile Grid;SungHo Chin et al.;《Proceedings of the 4th International Conference on Ubiquitous Information Technologies & Applications, 2009》;20091231;全文 *
Optimizing Makespan and Reliability for Workflow Applications with Reputation and Look-ahead Genetic Algorithm;Wang X et al.;《Future Generation Computer Systems》;20110315;第27卷(第8期);全文 *

Also Published As

Publication number Publication date
CN103970613A (en) 2014-08-06

Similar Documents

Publication Publication Date Title
CN103970613B (en) Multi-copy task fault tolerance scheduling method of heterogeneous distributed system
Chen et al. Energy-efficient offloading for DNN-based smart IoT systems in cloud-edge environments
CN103870317A (en) Task scheduling method and system in cloud computing
US11223674B2 (en) Extended mobile grid
CN111325356A (en) Neural network search distributed training system and training method based on evolutionary computation
CN103281374B (en) A kind of method of data fast dispatch during cloud stores
CN106201701A (en) A kind of workflow schedule algorithm of band task duplication
CN104283963B (en) A kind of CDN load-balancing methods of Distributed Cooperative formula
Bukhsh et al. A decentralized edge computing latency-aware task management method with high availability for IoT applications
Liu et al. Task scheduling in cloud computing based on improved discrete particle swarm optimization
Emberson et al. Extending a task allocation algorithm for graceful degradation of real-time distributed embedded systems
Zhou et al. Learning to optimize dag scheduling in heterogeneous environment
Sheeba et al. An efficient fault tolerance scheme based enhanced firefly optimization for virtual machine placement in cloud computing
Aliyu et al. Management of cloud resources and social change in a multi-tier environment: a novel finite automata using ant colony optimization with spanning tree
CN109951551A (en) A kind of container mirror image management system and method
CN102799474A (en) Cloud resource fault-tolerant scheduling method based on reliability drive
CN110730241B (en) Global scale oriented blockchain infrastructure
CN112883526B (en) Workload distribution method under task delay and reliability constraint
Meddeber et al. Tasks assignment for Grid computing
Semmoud et al. A survey of load balancing in distributed systems
CN112698944A (en) Distributed cloud computing system and method based on human brain simulation
CN113285823A (en) Business function chain arranging method based on container
Stavrinides et al. Resource allocation and scheduling of linear workflow applications with ageing priorities and transient failures
Kuang et al. Level value density task scheduling algorithm for cyber physical systems on cloud
Samal et al. Bio-inspired approach to fault-tolerant scheduling of real-time tasks on multiprocessor-a study

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant