CN114925935A - Multi-workflow scheduling method for time delay constraint in cloud edge environment - Google Patents
Multi-workflow scheduling method for time delay constraint in cloud edge environment Download PDFInfo
- Publication number
- CN114925935A CN114925935A CN202210702160.5A CN202210702160A CN114925935A CN 114925935 A CN114925935 A CN 114925935A CN 202210702160 A CN202210702160 A CN 202210702160A CN 114925935 A CN114925935 A CN 114925935A
- Authority
- CN
- China
- Prior art keywords
- server
- task
- workflow
- time
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 122
- 230000008569 process Effects 0.000 claims abstract description 40
- 239000002245 particle Substances 0.000 claims abstract description 25
- 230000002028 premature Effects 0.000 claims abstract description 6
- 230000005540 biological transmission Effects 0.000 claims description 52
- 238000004364 calculation method Methods 0.000 claims description 48
- 238000013507 mapping Methods 0.000 claims description 34
- 238000005457 optimization Methods 0.000 claims description 14
- 238000012546 transfer Methods 0.000 claims description 11
- 238000009826 distribution Methods 0.000 claims description 10
- 239000000126 substance Substances 0.000 claims description 9
- 230000035772 mutation Effects 0.000 claims description 7
- 238000011160 research Methods 0.000 claims description 7
- 230000008030 elimination Effects 0.000 claims description 4
- 238000003379 elimination reaction Methods 0.000 claims description 4
- 230000000717 retained effect Effects 0.000 claims description 4
- 241000135164 Timea Species 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000009396 hybridization Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 24
- 238000002474 experimental method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000002243 precursor Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002040 relaxant effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0633—Workflow analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a multi-workflow scheduling method with time delay constraint in a cloud edge environment, which minimizes the execution cost of multiple workflows by using a differential evolution algorithm on the premise of meeting the constraint of the deadline time of the multiple workflows; in order to improve the rationality and diversity of the population evolution process, two-dimensional discrete particles are introduced to encode individuals, and the differential evolution algorithm is optimized by using a selection operator based on the whole population, so that the fitness value of the whole population is improved more quickly and the speed of searching a solution space by the algorithm is increased on the premise of avoiding premature convergence.
Description
Technical Field
The invention belongs to the technical field of cloud computing and edge computing, and particularly relates to a time delay constrained multi-workflow scheduling method in a cloud edge environment.
Background
With the rapid development of technologies such as 5G and artificial intelligence, the market scale is gradually enlarged, the amount of generated heterogeneous data to be processed shows explosive growth, and meanwhile, users also put forward service requirements of "instant interaction", while the traditional cloud computing service cannot meet the increasing service requirements of users due to the fact that all data need to be uploaded to a cloud service data center for processing due to high transmission delay caused by geographical distribution of the data center and the fact that data preprocessing is lacked, and the large network bandwidth pressure is undoubtedly caused. Therefore, edge computing characterized by low transmission delay and low network bandwidth pressure is rapidly emerging for the pain point of the traditional cloud computing service. The edge calculation is a novel calculation paradigm for processing data by using information service resources at the edge of a core network, and realizes the sinking of the calculation resources and services to the edge end, so that the calculation resources and services are closer to a user end. In the aspect of computing, the resource and service capability of edge computing are lower than that of a cloud end but are obviously higher than that of a terminal, so that the problem of limited computing capability of terminal equipment can be effectively solved; in terms of transmission delay, because the edge computing service node is close to the user, the user can quickly process data which does not need high computing resources through the edge node, the data transmission quantity of a cloud is greatly reduced, the congestion degree of a data transmission network is reduced, and the energy loss of a network edge terminal is greatly reduced.
In order to combine the respective advantages of the cloud computing platform and the edge computing platform, the computer science community provides a new computing mode, namely cloud edge cooperative computing, on the basis of the cloud computing and the edge computing, and pushes the processing of data with low computing resource requirements to the edge of the internet or the near end of the data; the processing and the calculation of the data with high calculation resource requirements are submitted to the central cloud, so that the calculation density is improved, the delay is reduced as much as possible, and the availability and the expansibility of the application system are effectively improved. As an important research problem in the cloud side environment, the solution of task scheduling optimization directly affects the service efficiency of cloud and side platform service resources and the service use experience of users, which undoubtedly provides new requirements for the research of task scheduling optimization and brings new challenges.
In a cloud-edge environment, the structural complexity of the workflow and the data dependency among the subtasks cause that the execution of the workflow is still a challenge to be completed under the constraint of reasonable deadline even in a high-performance computing environment; and a large amount of cross-server data transmission is required in the workflow execution process, which causes a huge contradiction with the limited network bandwidth between the servers, and has serious transmission delay and high execution cost. Therefore, in the cloud side environment, the workflow is reasonably scheduled under the constraint of proper deadline, the workflow completion time can be reduced, the resource utilization rate is improved, and the expenditure of executing the workflow by the user is effectively reduced.
Disclosure of Invention
In order to make up for the blank and the defects of the prior art, the invention provides a time delay constrained multi-workflow scheduling method in a cloud edge environment. On the premise of meeting the multi-workflow deadline constraint, the execution cost of the multi-workflow is minimized by utilizing a differential evolution algorithm. In order to improve the rationality and diversity of the population evolution process, a two-dimensional discrete particle is introduced to encode individuals, and a basic differential evolution algorithm is optimized by using a selection operator based on the whole population, so that the fitness value of the whole population is improved more quickly on the premise of avoiding premature convergence, and the speed of searching a solution space by the algorithm is increased. Through a plurality of groups of simulation comparison experiments, the performance of the multi-workflow scheduling algorithm based on differential evolution is superior to that of other scheduling algorithms in terms of cut-off time and multi-workflow scale, and the execution cost of the multi-workflow in a cloud-edge environment can be effectively reduced.
The invention specifically adopts the following technical scheme:
a time delay constrained multi-workflow scheduling method in a cloud edge environment is characterized in that: on the premise of meeting the multiple workflow deadline constraint, minimizing the execution cost of the multiple workflows by using a differential evolution algorithm; in order to improve the rationality and diversity of the population evolution process, two-dimensional discrete particles are introduced to encode individuals, and a differential evolution algorithm is optimized by using a selection operator based on the whole population, so that the fitness value of the whole population is improved more quickly and the speed of searching a solution space by the algorithm is increased on the premise of avoiding premature convergence.
Further, the multiple workflow deadline constraint is expressed as:
wherein, c e For the cost of the execution of a multi-workflow application,completion time for multi-workflow applications The middle element.
Further, the construction process of the multi-workflow deadline constraint representation is as follows:
assuming that the time intervals of the user submitting different workflows to the cloud-edge environment, i.e. the arrival times of different workflows, approximately obey the poisson distribution P (λ), where λ represents the arrival rate of the workflows, these workflows are represented by an infinite set:
W={w 1 ,w 2 … equation (1)
Wherein each workflow may be represented by a triplet:
representation, wherein elements represent arrival time, expiration date, and structure in order;
the structure of the workflow is represented by a directed acyclic graph:
G i ={T i ,E i formula (3)
Wherein the content of the first and second substances,
is a set of tasks, N represents the number of tasks;representing the jth task in the ith workflow;is the set of edges between tasks; directed edgeDenotes t ip And t ij With data transmission between, t ip Is t ij Predecessor task of (1), t ij Is t ip The subsequent task of (2);
is t ip The predecessor task set of (2);
is t ij The successor task set of (1);
due to the mobility of the workflow, the task can be distributed to the server to be executed only when all the precursor nodes of the task are executed and all the data generated by the precursor nodes are transmitted;
in the scheduling process of the multi-workflow application, the cloud side environment provides computing resources and data transmission services for users;
cloud side environment:
S={S cloud ,S edge equation (7)
The cloud is composed of a cloud and an edge, wherein the cloud comprises m cloud servers:
S cloud ={s 1 ,s 2 ,…,s m equation (8)
The edge contains n edge servers:
S edge ={s m+1 ,s m+2 ,…,s m+n equation (9)
In the resource model, any type of server can be leased or released at any time, provided that the number of servers is sufficient; server s k Expressed as:
wherein p is k Presentation Server s k The computing performance of (a); u. of k Presentation Server s k A specific asking price unit time set for providing service;presentation server s k At unit time u k The unit calculation cost is approximately proportional to the calculation performance of the unit calculation cost; f. of k E {0,1} represents the server s k Type of the platform, when f k When equal to 0, s k The method belongs to a cloud platform and has strong computing performance; when f is k When 1, s k The method belongs to an edge platform and has general computing performance; according to the type of the platform to which the server belongs, the server s in the cloud side environment r And s t Bandwidth beta between r,t Expressed as:
wherein, b r,t Represents the bandwidth beta r,t The value of (a) is set to (b),representing slave servers s r Transmitting 1GB data to a server s t The resulting data transmission cost;
in a cloud edge environment, the scheduling scheme of the multiple workflow solves the problem of the distribution of task nodes in the multiple workflow to specific servers, and embodies the corresponding relation between each task and the server in the application of the multiple workflow;
the multi-workflow scheduling scheme is represented as:
Γ=(W,S,M,c e ,T f ) Formula (12)
Wherein, the first and the second end of the pipe are connected with each other,
a mapping representing the multi-workflow application W corresponding to the cloud-edge environment S, c e Representing the cost of execution of the multi-workflow application W in the cloud-edge environment S,
representing a completion time of the multi-workflow application;
for two types of elements in the mapping M, (v) i,j ,s k ) Representing a task v i,j At the server s k The upper execution is carried out on the first execution block,representing data edgesSlave server s r To a server s t The above step (1); when mapping a child of M:
when determined, the child map:
is also determined accordingly; thus, mapping M is equivalent to:
under a cloud edge environment, selecting a cut-off time delay as a constraint condition to research the problem of time delay minimization; the cost scheduler is a cost-driven scheduler and aims to minimize the execution cost of the optimization target through reasonable scheduling according to a scheduling scheme; the problem to be solved by the cost scheduler under the time delay constraint is to minimize the execution cost of multiple workflows on the premise of meeting the deadline time of all workflows in the multiple workflows; assuming that each server has sufficient storage space to store data generated or transmitted during execution; computing time t using tasks tc Measuring the computing power of a server using a data transfer time t dt The data transmission capacity between the servers is measured by the following specific calculation method:
wherein formula (18) represents task v i,j At server s k The calculated time of (1), equation (19) represents the data edgeSlave server s r To a server s t The resulting transmission time; when the data transmission edge is connected with the same server, the data transmission time is 0;
in a latency constrained cost scheduler, for one scheduling scheme Γ, each server s once its mapping M is determined k The starting time t of each server is determined boot (s k ) And then determining; to compute the execution cost c of a multi-workflow application e And completion time T f According to the mapping M of the multi-workflow application W corresponding to the cloud side environment S, the opposite is performedThe variables are defined as follows:
t start (v i,j ,s k ): task v i,j At server s k By the server s k Current idle time and task v i,j The completion time of all predecessor tasks is determined, as shown in equation (20);
t end (v i,j ,s k ): task v i,j At the server s k Is equal to task v i,j With its start time at server s k The sum of the upper calculation times, as shown in equation (21);
t end (v i,j ,s k )=t start (v i,j ,s k )+t tc (v i,j ,s k ),(v i,j ,s k ) E.g. M formula (21)
t shut (s k ): server s k Is equal to the completion time of the task executed at the latest on the server, as shown in equation (22);
c com (s k ): server s in cloud edge environment k The task calculation cost of (2) is represented by the running time of the server, and the calculation mode is shown as formula (23);
c tran (w i ): workflow application w given scheduling scheme Γ i The data transmission cost of (2) is calculated as shown in formula (24);
d i under a given scheduling scheme Γ, the workflow applies w i The cutoff time constraint of (2) is calculated as shown in equation (25);
d i =α i +baseline*|W|*HEFT(w i ) Equation (25)
Wherein HEFT (w) i ) Representing the scheduling of a workflow w with the HEFT algorithm i A required execution time; the parameter baseline is defined by the set of equations (26):
based on the definition, the execution cost c of the multi-workflow application is obtained e And completion timeAs shown in equation (27) and equation (28);
further, the introducing two-dimensional discrete particles to encode the individuals specifically includes:
the particles consist of task priority and server number; one individual in the population corresponds to a potential scheduling scheme of multiple workflows in a cloud-edge environment; for the G evolution, the kth individual in the populationAs represented by equation (30);
wherein NP represents the size of the population,andrespectively represent jth task v in ith workflow application i,j Priority coding and server coding; in the initialization process, 0 th generation individualsAs shown in equation (33):
wherein the content of the first and second substances,
i=1,2,…,|W|,j=1,2,…,| V i 1,2, … NP equation (34)
rand () represents randomly selecting a decimal in a given interval, and randint () represents randomly selecting an integer in the given interval;
in a binary groupIn the step (1), the first step,is a real number representing the priority encoding of the multi-workflow application;is an integer representing a server code for a multi-workflow application; for theElement (1) ofThe value of the value indicates the scheduling priority of the corresponding task in the scheduling scheme, if the two tasks correspond to each otherIf the numerical values are the same, the task received by the platform is higher in priority; forElement (1) ofWhose value represents the number of the server performing the task.
Further, the optimizing the differential evolution algorithm by using a selection operator based on the whole population specifically includes:
first, the N offspring generated by N parent individuals are preserved, that is, when the parent individuals generate 1 offspring, the algorithm does not immediately perform one-to-one elimination selection, but marks all new individuals generated by mutation and crossover as new individualsi belongs to {1,2, …, N } and is temporarily reserved, so that N child nodes and original N parents exist; thus, through one round of evolution, 2N individuals were temporarily retained; then, calculating the fitness function values of the 2N individuals in the current individual pool, and then sequencing the 2N individuals according to the fitness function values from large to small; the first N individuals in the sorted queue are then selected as the final evolution result of the current generation and used as parents for the next generation of evolution.
Further, the fitness function is a fitness function for comparing two candidate solutions, and is defined as follows:
let both individuals be feasible solutions, i.e. select c e The fitness function for the lower individuals is defined as shown in equation (35):
if at least one infeasible solution exists in the two individuals, the constraint conditions are satisfied according to the two solutionsThe fitness function value is updated according to the number of the workflows, and is defined as follows:
(2.1) if the number of workflows meeting the constraint condition in the two individuals is the same:
(2.2) if the number of workflows meeting the constraint conditions in the two individuals is different:
wherein, the first and the second end of the pipe are connected with each other,is an event function expressed as a constraintThe result function of (2); when the constraint condition is met, taking 1 as the function value; otherwise, the function value is taken as 0.
Further, the specific implementation process of the differential evolution algorithm is as follows:
step S1: determining control parameters of a differential evolution algorithm and determining a fitness function; the control parameters of the differential evolution algorithm comprise a population size NP, a scaling factor F and a hybridization probability CR;
step S2: randomly generating an initial population;
step S3: evaluating an initial population and calculating the fitness value of each individual in the initial population;
step S4: judging whether a termination condition is reached or an evolution algebra reaches a maximum value; if so, terminating the evolution, and outputting the obtained optimal individual as an optimal solution; if not, continuing;
step S5: carrying out mutation and cross operation to obtain an intermediate population;
step S6: selecting individuals from the original population and the intermediate population to obtain a new generation of population;
step S7: turning to step S4 when the evolution algebra g is g + 1;
the mapping of the individuals of the population to the multi-workflow scheduling scheme is realized by the following algorithm:
the input of the algorithm 1 comprises a multi-workflow application W, a cloud edge environment S and a coded particle X, and the output is a coded particle X [2 ]]The corresponding scheduling scheme Γ ═ W, S, M, c e ,T f ) (ii) a First, the mapping M is initialized to an empty set null, and the queue to be executed Q ═ Q (Q) 1 ,Q 2 ,...,Q |S| ) Initialized to empty queue null, data transfer cost c tran Initialization is 0; the scheduling of the multi-workflow application W is then started, the process being divided into two steps:
(1) calling an algorithm 2 to monitor the arrival of the multi-workflow application W in real time and perform task allocation of the multi-workflow application;
(2) calling an algorithm 3 to execute the multi-workflow application on queues to be executed of all servers;
after the scheduling is finished, all the opened servers are closed, and the execution cost c is calculated according to the formula (27) and the formula (28) e And completion timeAfter the calculation is completed, if the completion time of a certain workflow application exceeds the cut-off time, the methodThe scheduling scheme does not meet the deadline constraint and marks the coded particle X as an infeasible solution); finally, the scheduling scheme of the return workflow Γ ═ (W, S, M, c) e ,T f );
In the execution process of the algorithm 1, the arrival of the multi-workflow application W needs to be monitored in real time, and the task allocation of the multi-workflow application is carried out, wherein the process is shown as an algorithm 2, and input parameters comprise the multi-workflow application W, a cloud edge environment S and a coded particle X; during the operation of the algorithm, if the workflow applies w i If so, calculating the task calculation time t according to the formula (18) and the formula (19) respectively tc [|V i |×|S|]And data transmission time t dt [|E i |,|S|×|S|]And recording its arrival time alpha i (ii) a Traversing workflow applications w i All tasks in, if task v i,j For entering a task, i.e. the task does not have a predecessor task, then the value s is determined according to the server code i,j V. task i,j Put into the server s i,j The queue to be executed; otherwise, the task v i,j Put into the server s i,j The task waiting pool of (1); otherwise, waiting for the arrival of a certain workflow application; until all workflow applications have arrived, the algorithm ends;
in the execution process of the algorithm 1, the multi-workflow application distribution needs to be carried out on the queue to be executed of the server, and the process is shown as an algorithm 3, wherein the server s is input k Server s k To-be-executed queue Q of k Mapping M and data transmission cost c tran (ii) a During the operation of the algorithm, if the server s k In the off state, the server s is turned on k A server s k Starting time t boot (s k ) Setting as a current time; if server s k To-be-executed queue Q of k If not, encoding mu according to priority level, and queuing Q to be executed k Task v with highest medium priority i,j Dispatch to server s k Corresponding mapping relation (v) i,j ,s k ) Adding the mapping M into the mapping M, calling an algorithm 4, and performing task calculationA process and a data transfer process; otherwise, wait for Q k Is not empty; the algorithm ends until all workflow applications are executed;
during the execution of algorithm 3, the task calculation and data transfer process of the simulated workflow application is shown as algorithm 4, with the input comprising a task v i,j And server s k The output is the transmission cost of the currently generated dataFirstly, the following components are mixedInitialization is 0; second, record task v i,j Start time t of start (v i,j ,s k ) And according to t end (v i,j ,s k )=t start (v i,j ,s k )+t tc (v i,j ,s k ) (ii) a Calculation task v i,j Is completed by time t end (v i,j ,s k ) (ii) a Finally, traverse task v i,j According to the server code s i,s To convert data intoTo the execution of a subsequent task v i,s Server s of i,s Calculating the corresponding generated data transmission cost; at this time, if the task v i,s Having completed the reception of all its predecessor task data, task v will be completed i,s Slave server s i,s The task waiting pool of (1) is put into a queue to be executed.
Further, the maximum evolutionary iteration number k is 1000, which is used as a termination condition of the differential evolution algorithm, that is, the algorithm is ended when the 1000 th evolution is completed.
The invention and the optimal scheme thereof provide a multi-workflow scheduling method based on differential evolution under the deadline constraint aiming at the scheduling problem of the multi-workflow, and the execution cost of the multi-workflow is minimized by utilizing a differential evolution algorithm on the premise of meeting the deadline constraint of the multi-workflow. In order to improve the rationality and diversity of the population evolution process, a two-dimensional discrete particle is introduced to encode individuals, and a basic differential evolution algorithm is optimized by using a selection operator based on the whole population, so that the fitness value of the whole population is improved more quickly on the premise of avoiding premature convergence, and the speed of searching a solution space by the algorithm is increased. Through a plurality of groups of simulation comparison experiments, the performance of the multi-workflow scheduling algorithm based on differential evolution is superior to that of other scheduling algorithms in terms of cut-off time and multi-workflow scale, and the execution cost of the multi-workflow in a cloud-edge environment can be effectively reduced.
Drawings
Fig. 1 is a diagram of an example of coding applied to multi-workflow scheduling according to an embodiment of the present invention.
Fig. 2 is a flowchart of a basic differential evolution algorithm according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a scheduling result of a small-sized multi-workflow under different deadlines and different optimization algorithms according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a scheduling result of a multi-workflow in different deadlines and different optimization algorithms according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a scheduling result of a large-scale multi-workflow under different deadlines and different optimization algorithms according to an embodiment of the present invention.
Detailed Description
In order to make the features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail as follows:
it should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
1 model construction
1.1 Multi-workflow model
In the cloud-side environment, in order to simulate an actual interaction scenario, it is assumed that time intervals of different workflows submitted by a user to the cloud-side environment, that is, arrival times of different workflows approximately follow a poisson distribution P (λ) (where λ represents arrival rate of a workflow), and the workflows can be represented by an infinite set:
W={w 1 ,w 2 … equation (1)
Wherein each workflow may be represented by a triplet:
representation, where elements represent arrival time, expiration date, and structure in order.
The structure of a workflow is typically represented by a DAG (Direct Acyclic Graph):
G i ={T i ,E i formula (3)
Wherein the content of the first and second substances,
is a set of tasks, and N represents the number of tasks.Representing the jth task in the ith workflow.Is the set of edges between tasks. Directed edgeRepresents t ip And t ij With data transmission between, t ip Is t ij Predecessor task of (1), t ij Is t ip Is executed.
Is t ip The predecessor task set of (2).
Is t ij The successor task set of (1).
Due to the fluidity of the workflow, a task can be allocated to a server to be executed only when all predecessor nodes of the task are executed and all data generated by the predecessor nodes are transmitted.
1.2 cloud edge Environment
In the scheduling process of the multi-workflow application, the cloud-side environment provides computing resources and data transmission services to users. The Service model of the present embodiment is Infrastructure as a Service (Infrastructure as a Service), and it is assumed that the environment provides users with an elastic cloud computing Service EC2 similar to amazon corporation and an elastic data block storage Service EBS similar to amazon corporation.
Cloud-edge environment:
S={S cloud ,S edge equation (7)
The cloud consists of a cloud and an edge, wherein the cloud comprises m cloud servers:
S cloud ={s 1 ,s 2 ,…,s n equation (8)
The edge contains n edge servers:
S edge ={s m+1 ,s m+2 ,…,s m+n equation (9)
The different types of servers have different hardware structures, such as CPU, chipset, memory, and disk systemEtc. corresponding to performance parameters of different specifications. The better the performance of the server, the higher its rent. In the resource model assumed in this embodiment, any type of server can be leased or released at any time, assuming that the number of servers is unlimited. Server s k Can be expressed as:
wherein p is k Presentation server s k The computing performance of (a); u. of k Presentation Server s k A specific asking price unit time set for providing service;presentation server s k At unit time u k The unit calculation cost is approximately proportional to the calculation performance (the pricing of the resource model adopts a pay-as-you-go mode, namely pricing according to the unit number of the rented virtual machine, and generally speaking, when the rented virtual machine is not used for a complete unit, the unit calculation cost is charged according to a unit time); f. of k E {0,1} represents server s k Type of the platform, when f k When equal to 0, s k The method belongs to a cloud platform and has strong computing performance; when f is k When 1, s k The method belongs to an edge platform and has general computing performance. According to the type of the platform to which the server belongs, the server s in the cloud side environment r And s t Bandwidth beta between r,t Can be expressed as:
wherein, b r,t Represents the bandwidth beta r,t The value of (a) is set to (b),representing slave servers s r Transmitting 1GB data to a server s t Resulting data transmission cost。
1.3 Multi-workflow scheduling scheme
In a cloud edge environment, the scheduling scheme of the multiple workflow solves the problem of the distribution of task nodes in the multiple workflow to specific servers, and embodies the corresponding relation between each task and the server in the application of the multiple workflow.
Therefore, the multi-workflow scheduling scheme can be expressed as:
Γ=(W,S,M,c e ,T f ) Formula (12)
Wherein, the first and the second end of the pipe are connected with each other,
a mapping representing the multi-workflow application W corresponding to the cloud-edge environment S, c e Representing the cost of execution of the multi-workflow application W in the cloud-edge environment S,
indicating the completion time of the multi-workflow application.
For two types of elements in the mapping M, (v) i,j ,s k ) Representing a task v i,j At server s k The upper side is executed in the upper part,representing data edgesSlave server s r To a server s t The above. It was observed that when mapping a child of M:
when determined, the child map:
as determined accordingly. Thus, mapping M may be equivalent to:
cost scheduler under 1.4 time delay constraint
In a cloud-edge environment, the delay is mainly divided into transmission delay and deadline delay, and the deadline delay is selected as a constraint condition to study the delay minimization problem. The objective of a cost scheduler, i.e. a cost-driven scheduler, is to minimize the execution cost of the optimization objective by reasonable scheduling according to the scheduling scheme. Therefore, the problem to be solved by the cost scheduler under the delay constraint in the embodiment is to minimize the execution cost of multiple workflows on the premise of satisfying the deadlines of all workflows in the multiple workflows. The main research objects are the task calculation time generated by server calculation and the data transmission time delay generated by data transmission, and it is assumed that each server has enough storage space to store the data generated or transmitted in the execution process. The present embodiment calculates time t using a task tc Measuring the computing power of a server using a data transfer time t dt The specific calculation method for measuring the data transmission capacity between the servers is as follows:
wherein the formula (18) represents the task v i,j At server s k The calculated time of (1), equation (19) represents the data edgeSlave server s r To a server s t The resulting transmission time. Specifically, when the data transfer edge is connected to the same server, the data transfer time is 0.
In a latency constrained cost scheduler, for one scheduling scheme Γ, each server s once its mapping M is determined k The starting time t of each server is determined boot (s k ) And also determined accordingly. To compute the execution cost c of a multi-workflow application e And completion time T f According to the mapping M of the multi-workflow application W corresponding to the cloud side environment S, the related variables are defined as follows:
t start (v i,j ,s k ): task v i,j At server s k By the server s k Current idle time and task v i,j The completion time of all predecessor tasks is determined, as shown in equation (20).
t end (v i,j ,s k ): task v i,j At the server s k Is equal to task v i,j With its start time at server s k The sum of the upper calculation times is as shown in equation (21).
t end (v i,j ,s k )=t start (v i,j ,s k )+t tc (v i,j ,s k ),(v i,j ,s k ) E.g. M formula (21)
t shut (s k ): server s k Is equal to the completion time of the task that was executed at the latest on the server, as shown in equation (22).
c com (s k ): server s in cloud edge environment k The calculation cost of the task of (1) is represented by the running time of the server, and the calculation mode is shown as formula (23).
c tran (w i ): workflow application w given scheduling scheme Γ i The data transmission cost of (2) is calculated as shown in equation (24).
d i Workflow application w given scheduling scheme Γ i The cutoff time constraint of (2) is calculated as shown in equation (25).
d i =α i +baseline*|W|*HEFT(w i ) Equation (25)
Wherein HEFT (w) i ) Representing the scheduling of a workflow w with the HEFT algorithm i The required execution time. The parameter baseline is defined by the set of equations (26):
based on the above definition, the execution cost c of the multi-workflow application can be obtained e And completion timeAs shown in equations (27) and (28).
In the actual workflow operation process, there is also an operation cost generated by the use of services such as data storage and security verification, and compared with the execution cost generated by the calculation service and the data transmission service, the above cost has a negligible influence on the overall cost, so the embodiment only considers the execution cost generated by the use of the calculation service and the data transmission service.
2 problem definition
In summary, the deadline-constrained scheduling problem of multiple workflows studied in this embodiment can be abstracted as:
3 Algorithm design
3.1 population initialization
To better fit the multi-workflow scheduling problem in a real-world environment, the present embodiment encodes the workflow using a two-dimensional discrete particle, which consists of task priority and server number. One individual in the population corresponds to a potential scheduling scheme of multiple workflows in a cloud-edge environment. For the G evolution, the kth individual in the populationAs represented by equation (30).
Wherein NP represents the size of the population,andrespectively represent the jth task v in the ith workflow application i,j Priority encoding and server encoding. In the initialization process, 0 th generation individualsThe encoding initialization of (2) is as shown in equation (33).
Wherein the content of the first and second substances,
i=1,2,…,|W|,j=1,2,…,| V i 1,2, … NP equation (34)
rand () represents a one-digit decimal number in a given interval that is randomly selected, and randint () represents an integer number in a given interval that is randomly selected.
In a binary groupIn the step (1), the first step,is a real number representing the priority encoding of the multi-workflow application;is an integer representing the server code of the multi-workflow application. ForElement (1) ofThe value of the value indicates the scheduling priority of the corresponding task in the scheduling scheme, if the two tasks correspond to each otherIf the numerical values are the same, the priority of the task received by the platform is higher; forElement (1) ofWhose value represents the number of the server that performed the task.
FIG. 1 illustrates an encoding method for a multi-workflow application with 2 workflows in a cloud-edge environment, where the multi-workflow is composed of two workflows v 1 ,v 2 Each workflow contains 4 tasks; the cloud side environment is composed of 1 cloud server S 1 And 2 edge servers S 2 ,S 3 And (4) forming. At some point in the execution process, if task v 1,3 And v 2,1 While at the server S 3 According to the task priority, the server S 3 Will execute the task v preferentially 2,1 (ii) a Waiting task v 2,1 And after the execution is finished, executing the next task according to the priority of each task in the queue to be executed.
3.2 fitness function
The research objective of this embodiment is to optimize the scheduling policy of the workflow in the cloud-edge environment, so as to reduce the operating cost of the workflow. Therefore, for an individual, the fitness function value is the operation cost of the scheduling scheme corresponding to the individual, and the lower the operation cost of the individual is, the better the individual is. However, the coding strategies proposed in the previous section may have unfeasible solutions that do not meet the deadline constraints. Thus, the fitness function for comparing two candidate solutions is defined as follows:
both individuals are feasible solutions, i.e. selection c e The fitness function for the lower individuals is defined as shown in equation (35).
The presence of at least two individualsIf an infeasible solution is obtained, the constraint conditions are satisfied according to the infeasible solution and the infeasible solutionThe fitness function value is updated by the number of the workflows, and is defined as follows:
(2.1) if the number of workflows meeting the constraint condition in the two individuals is the same, then:
(2.2) if the number of workflows meeting the constraint condition in the two individuals is different:
wherein the content of the first and second substances,is an event function, here denoted as constraintThe result function of (2). When the constraint condition is satisfied, taking 1 as a function value; otherwise, the function value is 0.
3.3 update strategy for populations
3.3.1 mutation operator
The mutation operation of the differential evolution algorithm is realized by a differential strategy, and the realization method comprises the steps of randomly selecting two different individuals in a population, scaling the vector difference of the two different individuals, and then carrying out vector synthesis with the individual to be mutated, as shown in a formula (38).
Where F is called the scaling factor and is a definite constant, usually between (0,2), but through long-term practice in the academic world, F e (0,1) is more effective in practice, although F e (0,2) is theoretically acceptable.
3.3.2 crossover operator
The purpose of the crossover operation is to randomly select individuals, as determined by the crossover parameter CR ∈ [0,1 ]]A control for controlling the rate of the interleaving operation. The interleaving operation can be performed in two ways: binomial and exponential methods. The binomial method first generates a random number r for each of the d components, which random number r is subject to uniform distribution i ∈[0,1]According to r i The comparison with the crossover probability CR determines the crossover of the new individual, X i May be expressed as:
in this way, it can be randomly decided whether to exchange a certain component with the variant individual.
In the exponential method, the algorithm selects a segment of the gene of the variant individual, the segment of the gene starting from a random integer k and having a length of a random value L, and the segment of the gene may include a plurality of components. Mathematically, this can be referred to as randomly selecting k e [0, d-1 ∈]And L ∈ [1, d ]]Thus, X i May be expressed as:
in this embodiment, a binomial method is selected to implement the crossover operator.
3.3.3 selection operator
In order to further improve the optimization effect of the differential evolution algorithm, a new selection mechanism is introduced in the embodiment. First, the algorithm will keep the N offspring generated by N parent individuals, that is, when the parent individuals generate 1 offspring, the algorithm will not immediately make one-to-one elimination selection, but will mark all new individuals generated by mutation and crossover as new individualsi∈{1,2,…,N and temporarily retained, with N child nodes and the original N parents. Thus, through one round of evolution, 2N individuals were temporarily retained. Next, the algorithm calculates fitness function values of 2N individuals in the current individual pool, and then performs a sorting operation on the 2N individuals according to the fitness function values from large to small. The first N individuals in the sorted queue are then selected as the final evolution result of the current generation and used as parents for the next generation of evolution. Compared with a one-to-one elimination mechanism of a traditional differential evolution algorithm selection operator, the selection method embodies stronger and more comprehensive evolution capability. The selection operation based on the whole population enables the evolution process to be more reasonable and diversified, the whole fitness value of the population is improved more quickly, and the convergence speed of a search solution space is accelerated.
3.4 mapping of population individuals to a Multi-workflow scheduling scheme
In a cloud-edge environment, the mapping of encoded particles to a multi-workflow application scheduling scheme is shown as algorithm 1. For simplicity of presentation, superscripts and subscripts encoding particles are omitted in this section, i.e. X ═ μ, pi is used instead
The input of the algorithm comprises a multi-workflow application W, a cloud edge environment S and a coded particle X, and the output is a coded particle X [2 ]]The corresponding scheduling scheme Γ ═ W, S, M, c e ,T f ). First, the mapping M is initialized to an empty set null, and the queue to be executed Q ═ Q (Q) 1 ,Q 2 ,...,Q |s| ) Initialized to empty queue null, data transfer cost c tran Initialized to 0 (line 1). The scheduling of the multi-workflow application W is then started (lines 2-9), the process being divided into two steps:
(1) calling an algorithm 2, namely Workflow _ Applications _ installing (W, S, X), monitoring the arrival of the multi-Workflow application W in real time, and performing task allocation of the multi-Workflow application (line 5);
(2) on the queues to be executed of all servers, the algorithm 3 Workflow _ Applications _ Execution(s) is called k ,Q k ,M,c tran ) Execution of the multi-workflow application is performed (lines 6-9).
After the scheduling is finished, all the opened servers are closed, and the execution cost c is calculated according to the formula (27) and the formula (28) e And completion time(lines 11, 12). After the calculation is completed, if the completion time of a certain workflow application exceeds the cut-off time, the methodThe scheduling scheme does not meet the deadline constraint and marks the encoded particle X as an infeasible solution (lines 14-16). Finally, the scheduling scheme of the return workflow Γ ═ (W, S, M, c) e ,T f ) (ii) a (line 18).
During the execution of the algorithm 1, the arrival of the multi-workflow application W needs to be monitored in real time, and the task allocation of the multi-workflow application is performed, which is shown in the algorithm 2. Input parameters of the algorithm include a multi-workflow application W, a cloud-edge environment S and encoded particles X. During the operation of the algorithm, if the workflow applies w i If so, the task calculation time t is calculated according to the formula (18) and the formula (19) tc [|V i |×|S|]And data transmission time t dt [|E i |,|S|×|S|]And recording its arrival time alpha i (lines 3 to 4). Traversing workflow applications w i All tasks in, if task v i,j For entering a task, i.e. the task does not have a predecessor task, the value s is determined according to the server code i,j V. task i,j Put into the server s i,j The queue to be executed; otherwise, the task v i,j Put into the Server s i,j The task waiting pool (lines 5-11). Otherwise, thenWait for the arrival of a workflow application (lines 14-15). The algorithm ends until all workflow applications have arrived.
During the execution of algorithm 1, the multi-workflow application allocation needs to be performed on the queue to be executed of the server, and the process is shown as algorithm 3. The input to the algorithm is a server s k Server s k To-be-executed queue Q of k Mapping M and data transmission cost c tean . During the operation of the algorithm, if the server s k In the off state, the server s is turned on k The server s k Starting time t boot (s k ) Set to the current time (lines 2-4). If server s k To-be-executed queue Q of k If not, encoding mu according to priority, and queuing Q to be executed k Task v with highest medium priority i,j Dispatch to server s k Corresponding mapping relation (v) i,j ,s k ) Adding the data into the mapping M, calling an algorithm 4, and executing a task calculation process and a data transmission process (lines 10-12); otherwise, wait for Q k Non-empty (lines 13-14). The algorithm ends until all workflow applications are executed.
During the execution of algorithm 3, the task calculation and data transmission process of the simulated workflow application is shown as algorithm 4. The inputs to the algorithm include a task v i,j And server s k The output is the transmission cost of the currently generated dataFirst, willInitialized to 0 (line 1). Second, record task v i,j Start time t of start (v i,j ,s k ) And according to t end (v i,j ,s k )=t start (v i,j ,s k )+t tc (v i,j ,s k ) (ii) a Computation task v i,j Is completed by time t end (v i,j ,s k ) (lines 2 to 3). Finally, traverse task v i,j According to the server code s i,s To convert data intoTo the execution of a subsequent task v i,s Server s of i,s Calculating the corresponding generated data transmission cost; at this time, if the task v i,s Having completed the reception of all its predecessor task data, task v will be completed i,s Slave server s i,s The task waiting pool is put into a queue to be executed (rows 4-10).
3.5 end conditions
The termination conditions of the differential evolution algorithm are generally two types: one is to limit the maximum evolutionary algebra, and the other is to terminate the algorithm when the value of the objective function is smaller than a certain threshold, which is usually selected to be 10-6 in general research. In this embodiment, the maximum evolution iteration number k is 1000, which is the termination condition of the algorithm, that is, the algorithm ends when the 1000 th generation of evolution is completed.
3.6 Algorithm flow-chart
The algorithm flow is shown in fig. 2:
(1) and determining control parameters of the differential evolution algorithm and determining a fitness function. The control parameters of the differential evolution algorithm comprise a population size NP, a scaling factor F and a hybridization probability CR;
(2) randomly generating an initial population;
(3) evaluating an initial population and calculating the fitness value of each individual in the initial population;
(4) the judgment is that the termination condition is reached or the evolution algebra reaches the maximum value. If so, terminating the evolution, and outputting the obtained optimal individual as an optimal solution; if not, continuing;
(5) carrying out variation and cross operation to obtain an intermediate population;
(6) selecting individuals from the original population and the intermediate population to obtain a new generation of population;
(7) and (5) turning to step (4) when the evolution algebra g is g + 1.
4 Algorithm evaluation
4.1 design of the experiment
All experiments were run with a Win 10 system of 8GB memory and 2.60GHz Intel Core i7-6700HQ CPU, and were all performed in a Python 3.10 environment.
4.1.1 workflow example
The workflow used for the test comes from 5 scientific workflows from Bharathi et al, intensively studied 5 different scientific fields: cybersheke in earthquake science, Epigenomics in biogenetic, LIGO in gravity physics, Montage in astronomy, and SIPHT in bioinformatics. Each workflow has different attributes such as structure, task quantity and the like, and relevant information such as calculation requirements and data transmission quantity is stored in a corresponding xml file. For each scientific workflow, this example selects 3 scales: micro (containing about 10 tasks), mini (containing about 30 tasks), and mid (containing about 50 tasks); the workflow application scale submitted by the user is 1 of the 3 scales. For multiple workflows, 3 scales were also chosen: small (containing about 20 workflows), medium (containing about 30 workflows), large (containing about 50 workflows).
4.1.2 resource instances
Currently mainstream commercial cloud services generally require price time p in units of 60 seconds or 1 hour i Payment is made. In this experiment, payment was selected in units of 60 seconds.
Two servers s i And s j Bandwidth and unit data transmission cost therebetween, according to the environment to which the two belong (f) i 0 and f j 1), set up as in table 1.
TABLE 1 s i And s j Bandwidth and unit data transmission cost therebetween
4.1.3 Experimental parameter settings
In section 3.1, assuming that the time interval of the workflow arrival obeys poisson distribution P (λ), it is set that the user submits the workflow application to the cloud edge environment every 2.5s on average, that is, λ ═ 2.5, and then the arrival rate of the workflow is 1/λ ═ 0.4. Thus, for workflow w i Its arrival time α i As shown in equation (41).
Where rand (exp (λ)) is used to generate a poisson-distributed random number with parameter λ.
The cloud edge environment is composed of 5 cloud servers(s) 1 ,s 2 ,...,s 5 ) And 5 edge servers(s) 6 ,s 7 ,...,s 10 ) And (4) forming. Wherein, the cloud server s 1 ,s 2 ,...,s 5 Respectively 2.5,3.5,5.0,7.5,10.0Mbps, edge server s 6 ,s 7 ,...,s 10 The computing power of (a) is 2.5,2.6,2.2,2.3,2.7Mbps respectively; suppose a cloud server s 5 Has the highest calculation capacity, and the lease cost per unit time is 5/24$/min (12.5 $/h). At the same time, with cloud server s 5 The calculation cost per unit time of (2) is a benchmark, and the lease cost per unit time of the rest servers is in proportion to the calculation capacity of the servers.
For the control parameters in the differential evolution algorithm, the scaling factor F is set to 0.5, the mutation probability CR is set to 0.5, and the population size NP is set to 10.
4.2 Experimental results and analysis
In order to test the workflow scheduling performance of the improved differential evolution algorithm in the cloud-edge environment, 10 groups of experiments are carried out on multiple workflows with different workflow quantities, the average value is taken as the operation result of the algorithm on the current scale after the infeasible solution is eliminated, and the advantage of cost optimization of the differential evolution algorithm in workflow scheduling is analyzed. For the multi-workflow with different scales, the optimal values (unit: $) of the workflow execution cost of the multi-workflow with three scales under different algorithm scheduling strategies are intuitively reflected in fig. 3, 4 and 5.
The results of the scheduling of the small multi-workflow under different deadlines and different optimization algorithms are shown in fig. 3. For small multi-workflows, the differential evolutionary average outperforms the sequential scheduling by 43.2%. This is because the embodiment improves the selection operator of the conventional differential evolution algorithm, avoids trapping in a locally optimal solution, and obtains a better scheduling strategy. In addition, the solution space size of the multi-workflow scheduling problem is generally exponential, while the random strategy adopted randomly has low efficiency, and a high-quality solution or even a feasible solution is difficult to search under the limited population size and the limited search times.
The scheduling results of the medium-sized multi-workflow under different deadlines and different optimization algorithms are shown in fig. 4. DE yields the optimal solution at all cut-off times. Furthermore, the average cost of DE is up to 44.9% better than the proportion of sequential scheduling. It is noted that the multi-workflow includes a large number of data-intensive and computation-intensive tasks, and has a complex structure, that is, the DE has better performance for scheduling the composite workflow.
The scheduling results of the large multi-workflow under different deadlines and different optimization algorithms are shown in fig. 5. Like the small multi-workflow, DE gets the best solution for the average cost at all deadlines and is on average better than the sequential algorithm. Therefore, the DE algorithm can obtain better scheduling performance on a plurality of workflows with larger task sizes and has better robustness.
Combining the fig. 3, 4, and 5, the performance cost of DE and Sequence decreases with relaxed deadline constraints. This is because relaxing the deadline of the workflow allows each task in the workflow to have a more relaxed execution time, and tasks can be executed after meeting the workflow deadline. Therefore, tasks can be distributed to the servers with lower prices for execution, and more tasks can be distributed to the same server for execution, so that the number of rented virtual machines is reduced, and server rental cost is reduced. Meanwhile, the resource utilization of the four algorithms increases as the deadlineBase time increases. This is because the workflow has a longer deadline and more parallel tasks can share the same virtual machine, thereby compressing and reducing idle time on these virtual machines.
THE ADVANTAGES OF THE PRESENT INVENTION
Aiming at the scheduling problem of multiple workflows, the invention provides a differential evolution-based multiple workflow scheduling algorithm under the deadline constraint, and the execution cost of the multiple workflows is minimized by utilizing the differential evolution algorithm on the premise of meeting the deadline constraint of the multiple workflows. In order to improve the rationality and diversity of the population evolution process, a two-dimensional discrete particle is introduced to encode individuals, and a basic differential evolution algorithm is optimized by using a selection operator based on the whole population, so that the fitness value of the whole population is improved more quickly on the premise of avoiding premature convergence, and the speed of searching a solution space by the algorithm is increased. Through multiple groups of simulation comparison experiments, the performance of the multi-workflow scheduling algorithm based on differential evolution is superior to that of other scheduling algorithms in terms of cut-off time and multi-workflow scale, and the execution cost of the multi-workflow under the cloud-edge environment can be effectively reduced.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.
The present invention is not limited to the above-mentioned preferred embodiments, and any other various types of multi-workflow scheduling methods with time delay constraint under the cloud-edge environment can be derived from the teaching of the present invention.
Claims (8)
1. A time delay constrained multi-workflow scheduling method under a cloud edge environment is characterized by comprising the following steps: on the premise of meeting the multiple workflow deadline constraint, minimizing the execution cost of the multiple workflows by using a differential evolution algorithm; in order to improve the rationality and diversity of the population evolution process, two-dimensional discrete particles are introduced to encode individuals, and the differential evolution algorithm is optimized by using a selection operator based on the whole population, so that the fitness value of the whole population is improved more quickly and the speed of searching a solution space by the algorithm is increased on the premise of avoiding premature convergence.
2. The method for scheduling multiple workflows with time delay constraints in a cloud-edge environment according to claim 1, wherein the method comprises the following steps: the multi-workflow deadline constraint is expressed as:
3. The method for scheduling multiple workflows with time delay constraint in cloud-edge environment according to claim 2, wherein the method comprises the following steps: the construction process of the multi-workflow deadline constraint representation is as follows:
assuming that the time intervals at which the user submits different workflows to the cloud-edge environment, i.e. the arrival times of different workflows, approximately obey the poisson distribution P (λ), where λ represents the arrival rate of the workflows, these workflows are represented by an infinite set:
W={w 1 ,w 2 … equation (1)
Wherein each workflow may be represented by a triplet:
representation, wherein elements represent arrival time, expiration date, and structure in order;
the structure of the workflow is represented by a directed acyclic graph:
G i ={T i ,E i formula (3)
Wherein the content of the first and second substances,
is a set of tasks, N represents the number of tasks;representing a jth task in an ith workflow;is a set of edges between tasks; directed edgeRepresents t ip And t ij With data transmission therebetween, t ip Is t ij Predecessor task of, t ij Is t ip After thatA task;
is t ip The predecessor task set of (2);
is t ij The successor task set of (1);
due to the fluidity of the workflow, a task can be allocated to a server to be executed only when all predecessor nodes of the task are executed and all data generated by the predecessor nodes are transmitted;
in the scheduling process of the multi-workflow application, the cloud side environment provides computing resources and data transmission services for users;
cloud side environment:
S={S cloud ,S edge equation (7)
The cloud is composed of a cloud and an edge, wherein the cloud comprises m cloud servers:
S cloud ={s 1 ,s 2 ,…,s m equation (8)
The edge contains n edge servers:
S edge ={s m+1 ,s m+2 ,…,s m+n equation (9)
In the resource model, any type of server can be leased or released at any time, provided that the number of servers is sufficient; server s k Expressed as:
wherein p is k Presentation server s k The computing performance of (a); u. of k Presentation server s k A specific asking price unit time set for providing service;presentation Server s k At unit time u k The unit calculation cost is approximately proportional to the calculation performance of the unit calculation cost; f. of k E {0,1} represents the server s k Type of the platform, when f k When equal to 0, s k The method belongs to a cloud platform and has strong computing performance; when f is k When 1, s k The method belongs to an edge platform and has general computing performance; according to the type of the platform to which the server belongs, the server s in the cloud side environment r And s t Bandwidth beta between r,t Expressed as:
wherein, b r,t Represents the bandwidth beta r,t The value of (a) is,representing slave servers s r Transmitting 1GB data to a server s t The resulting data transmission cost;
in a cloud edge environment, the scheduling scheme of the multiple workflow solves the problem of the distribution of task nodes in the multiple workflow to specific servers, and embodies the corresponding relation between each task and the server in the application of the multiple workflow;
the multi-workflow scheduling scheme is represented as:
Γ=(W,S,M,c e ,T f ) Formula (12)
Wherein the content of the first and second substances,
a mapping representing the multi-workflow application W corresponding to the cloud-edge environment S, c e Representing the cost of execution of the multi-workflow application W in the cloud-edge environment S,
representing a completion time of the multi-workflow application;
for two types of elements in the mapping M, (v) i,j ,s k ) Representing a task v i,j At the server s k The upper side is executed in the upper part,representing data edgesSlave server s r To a server s t The above step (1); when mapping a child of M:
when determined, the child map:
is also determined accordingly; thus, mapping M is equivalent to:
under a cloud edge environment, selecting a cut-off time delay as a constraint condition to research the problem of time delay minimization; the cost scheduler is a cost-driven scheduler and aims to minimize the execution cost of the optimization target through reasonable scheduling according to a scheduling scheme; the problem to be solved by the cost scheduler under the time delay constraint is to minimize the execution cost of multiple workflows on the premise of meeting the deadline time of all workflows in the multiple workflows; assuming each server has sufficient storage capacityTo store data generated or transmitted during execution; computing time t using tasks tc Measuring the computing power of a server using a data transfer time t dt The data transmission capacity between the servers is measured by the following specific calculation method:
wherein formula (18) represents task v i,j At the server s k The calculated time of (1), equation (19) represents the data edgeSlave server s r To a server s t The resulting transmission time; when the data transmission edge is connected with the same server, the data transmission time is 0;
in a latency constrained cost scheduler, for one scheduling scheme Γ, each server s once its mapping M is determined k The starting time t of each server is determined boot (s k ) And then determining; to calculate the execution cost c of a multi-workflow application e And completion time T f According to the mapping M of the multi-workflow application W corresponding to the cloud side environment S, the related variables are defined as follows:
t start (v i,j ,s k ): task v i,j At server s k By the server s k Current idle time and task v i,j The completion time of all predecessor tasks is determined, as shown in equation (20);
t end (v i,j ,s k ): task v i,j At server s k Is equal to task v i,j With its start time at the server s k The sum of the upper calculation times, as shown in equation (21);
t end (v i,j ,s k )=t start (v i,j ,s k )+t tc (v i,j ,s k ),(v i,j ,s k ) E.g. M formula (21)
t shut (s k ): server s k Is equal to the completion time of the task executed at the latest on the server, as shown in equation (22);
c com (s k ): server s in cloud edge environment k The task calculation cost of (2) is represented by the running time of the server, and the calculation mode is shown as formula (23);
c tran (w i ): workflow application w given scheduling scheme Γ i The data transmission cost of (2) is calculated as shown in formula (24);
d i workflow application w given scheduling scheme Γ i The cutoff time constraint of (2) is calculated as shown in equation (25);
d i =α i +baseline*|W|*HEFT(w i ) Equation (25)
Among them, HEFT (w) i ) Is represented by HEFT algorithm scheduling workflow w i A required execution time; the parameter baseline is defined by the set of equations (26):
based on the definition, the execution cost c of the multi-workflow application is obtained e And completion timeAs shown in equation (27) and equation (28);
4. the method according to claim 3, wherein the method comprises: the method for encoding the individual by introducing the two-dimensional discrete particles specifically comprises the following steps:
the particles consist of task priority and server number; one individual in the population corresponds to a potential scheduling scheme of multiple workflows in a cloud-edge environment; for the G evolution, the kth individual in the populationAs represented by equation (30);
wherein NP represents the size of the population,andrespectively represent jth task v in ith workflow application i,j Priority coding and server coding; in the initialization process, 0 th generation individualsAs shown in equation (33):
wherein the content of the first and second substances,
i=1,2,…,|W|,j=1,2,…,|V i 1,2, … NP equation (34)
rand () represents randomly selecting a decimal in a given interval, and randint () represents randomly selecting an integer in the given interval;
in a binary groupIn the step (1), the first step,is a real number representing the priority encoding of the multi-workflow application;is an integer representing a server code for a multi-workflow application; for theElement (1) ofThe value of the value indicates the scheduling priority of the corresponding task in the scheduling scheme, if the two tasks correspond to each otherIf the numerical values are the same, the task received by the platform is higher in priority; for theElement (1) ofWhose value represents the number of the server performing the task.
5. The method for scheduling multiple workflows with time delay constraints in a cloud-edge environment according to claim 4, wherein the method comprises the following steps: the optimization of the differential evolution algorithm by using a selection operator based on the whole population specifically comprises the following steps:
first, the N offspring generated by N parent individuals are preserved, that is, when the parent individuals generate 1 offspring, the algorithm does not immediately perform one-to-one elimination selection, but marks all new individuals generated by mutation and crossover as new individualsTemporarily reserving, wherein N sub nodes and original N parents exist; thus, through one round of evolution, 2N individuals were temporarily retained; then, calculating the fitness function values of the 2N individuals in the current individual pool, and then sequencing the 2N individuals according to the fitness function values from large to small; the first N individuals in the sorted queue are then selected as the final evolution result of the current generation and used as parents for the next generation of evolution.
6. The method for scheduling multiple workflows with time delay constraints in a cloud-edge environment according to claim 5, wherein the method comprises the following steps: the fitness function is a fitness function for comparing two candidate solutions, and is defined as follows:
let both individuals be feasible solutions, i.e. choose c e The fitness function for the lower individuals is defined as shown in equation (35):
if at least one infeasible solution exists in the two individuals, the constraint conditions are met according to the two solutionsThe fitness function value is updated by the number of the workflows, and is defined as follows:
(2.1) if the number of workflows meeting the constraint condition in the two individuals is the same, then:
(2.2) if the number of workflows meeting the constraint conditions in the two individuals is different:
7. The method of claim 6, wherein the method comprises:
the specific implementation process of the differential evolution algorithm is as follows:
step S1: determining control parameters of a differential evolution algorithm and determining a fitness function; the control parameters of the differential evolution algorithm comprise a population size NP, a scaling factor F and a hybridization probability CR;
step S2: randomly generating an initial population;
step S3: evaluating an initial population and calculating the fitness value of each individual in the initial population;
step S4: judging whether a termination condition is reached or an evolution algebra reaches a maximum value; if so, terminating the evolution, and outputting the obtained optimal individual as an optimal solution; if not, continuing;
step S5: carrying out variation and cross operation to obtain an intermediate population;
step S6: selecting individuals from the original population and the intermediate population to obtain a new generation population;
step S7: turning to step S4 when the evolution algebra g is g + 1;
the mapping of the individuals of the population to the multi-workflow scheduling scheme is realized by the following algorithm:
the input of the algorithm 1 comprises a multi-workflow application W, a cloud edge environment S and a coded particle X, and the output is a coded particle X [2 ]]The corresponding scheduling scheme Γ ═ W, S, M, c e ,T f ) (ii) a First, map M is initialized to empty set null, and queue to be executed Q ═ Q (Q) 1 ,Q 2 ,...,Q S ) Initialized to empty queue null, data transfer cost c tran Initialization is 0; the scheduling of the multi-workflow application W then starts, the process being divided into two steps:
(1) calling an algorithm 2 to monitor the arrival of the multi-workflow application W in real time and perform task allocation of the multi-workflow application;
(2) calling an algorithm 3 to execute the multi-workflow application on queues to be executed of all servers;
after the scheduling is finished, all the opened servers are closed, and the execution cost c is calculated according to the formula (27) and the formula (28) e And completion timeAfter the calculation is completed, if the completion time of a certain workflow application exceeds the cut-off time, the methodThe scheduling scheme does not meet the deadline constraint and marks the coded particle X as an infeasible solution); finally, the scheduling scheme of the return workflow Γ ═ (W, S, M, c) e ,T f );
In the execution process of the algorithm 1, the arrival of the multi-workflow application W needs to be monitored in real time, and the task allocation of the multi-workflow application is carried out, wherein the process is shown as an algorithm 2, and input parameters comprise the multi-workflow application W, a cloud edge environment S and a coded particle X; during the operation of the algorithm, if the workflow applies w i If so, calculating the task calculation time t according to the formula (18) and the formula (19) respectively tc [|V i |×|S|]And data transmission time t dt [|E i |,|S|×|S|]And recording its arrival time alpha i (ii) a Traversing workflow applications w i All tasks in, if task v i,j For entering a task, i.e. the task does not have a predecessor task, then the value s is determined according to the server code i,j V. task i,j Put into the Server s i,j The queue to be executed; otherwise, the task v i,j Put into the server s i,j The task waiting pool of (1); otherwise, waiting for the arrival of a certain workflow application; until all workflow applications have arrived, the algorithm ends;
in the execution process of the algorithm 1, the multi-workflow application distribution needs to be carried out on the queue to be executed of the server, and the process is shown as an algorithm 3, wherein the server s is input k Server s k To-be-executed queue Q of k Mapping M and data transmission cost c tran (ii) a During the operation of the algorithm, if the server s k Is in offIn the closed state, the server s is opened k The server s k Starting time t boot (s k ) Setting as a current time; if server s k To-be-executed queue Q of k If not, encoding mu according to priority level, and queuing Q to be executed k Task v with highest medium priority i,j Dispatch to server s k Corresponding mapping relation (v) i,j ,s k ) Adding the data into the mapping M, calling an algorithm 4, and executing a task calculation process and a data transmission process; otherwise, wait for Q k Is not empty; the algorithm ends until all workflow applications are executed;
during the execution of algorithm 3, the task calculation and data transfer process of the simulated workflow application is shown as algorithm 4, with the input comprising a task v i,j And server s k The output is the currently generated data transmission costFirstly, the following components are mixedInitializing to 0; second, record task v i,j Start time t of staet (v i,j ,s k ) And according to t end (v i,j ,s k )=t start (v i,j ,s k )+t tc (v i,j ,s k ) (ii) a Computation task v i,j Is completed by time t end (v i,j ,s k ) (ii) a Finally, traverse task v i,j According to the server code s i,s To convert data intoTo the execution of a subsequent task v i,s Server s of i,s Calculating the corresponding generated data transmission cost; at this time, if the task v i,s Having completed receiving all of its predecessor task data, task v will be completed i,s Slave server s i,s The task waiting pool of (1) is put into a queue to be executed.
8. The method according to claim 7, wherein the method comprises: the maximum evolution iteration number k is 1000, that is, the algorithm is ended when the 1000 th evolution is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210702160.5A CN114925935A (en) | 2022-06-21 | 2022-06-21 | Multi-workflow scheduling method for time delay constraint in cloud edge environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210702160.5A CN114925935A (en) | 2022-06-21 | 2022-06-21 | Multi-workflow scheduling method for time delay constraint in cloud edge environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114925935A true CN114925935A (en) | 2022-08-19 |
Family
ID=82814883
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210702160.5A Pending CN114925935A (en) | 2022-06-21 | 2022-06-21 | Multi-workflow scheduling method for time delay constraint in cloud edge environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114925935A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107656799A (en) * | 2017-11-06 | 2018-02-02 | 福建师范大学 | The workflow schedule method of communication and calculation cost is considered under a kind of more cloud environments |
CN108133260A (en) * | 2018-01-17 | 2018-06-08 | 浙江理工大学 | The workflow schedule method of multi-objective particle swarm optimization based on real-time status monitoring |
CN109597682A (en) * | 2018-11-26 | 2019-04-09 | 华南理工大学 | A kind of cloud computing workflow schedule method using heuristic coding strategy |
JPWO2020235649A1 (en) * | 2019-05-21 | 2020-11-26 |
-
2022
- 2022-06-21 CN CN202210702160.5A patent/CN114925935A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107656799A (en) * | 2017-11-06 | 2018-02-02 | 福建师范大学 | The workflow schedule method of communication and calculation cost is considered under a kind of more cloud environments |
CN108133260A (en) * | 2018-01-17 | 2018-06-08 | 浙江理工大学 | The workflow schedule method of multi-objective particle swarm optimization based on real-time status monitoring |
CN109597682A (en) * | 2018-11-26 | 2019-04-09 | 华南理工大学 | A kind of cloud computing workflow schedule method using heuristic coding strategy |
JPWO2020235649A1 (en) * | 2019-05-21 | 2020-11-26 |
Non-Patent Citations (1)
Title |
---|
林潮伟等: "边缘环境下基于模糊理论的科学工作流调度研究", 计算机科学, vol. 49, no. 2, 28 February 2022 (2022-02-28), pages 312 - 320 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108880663B (en) | Space-ground integrated network resource allocation method based on improved genetic algorithm | |
Chunlin et al. | Hybrid cloud adaptive scheduling strategy for heterogeneous workloads | |
Dutta et al. | A genetic: algorithm approach to cost-based multi-QoS job scheduling in cloud computing environment | |
CN112416585B (en) | Deep learning-oriented GPU resource management and intelligent scheduling method | |
CN113411369A (en) | Cloud service resource collaborative optimization scheduling method, system, medium and equipment | |
Mishra et al. | A state-of-art on cloud load balancing algorithms | |
Tong et al. | DDQN-TS: A novel bi-objective intelligent scheduling algorithm in the cloud environment | |
CN109491761A (en) | Cloud computing multiple target method for scheduling task based on EDA-GA hybrid algorithm | |
Kaur et al. | Enhanced genetic algorithm based task scheduling in cloud computing | |
Shishira et al. | Survey on meta heuristic optimization techniques in cloud computing | |
CN109710372B (en) | Calculation intensive cloud workflow scheduling method based on owl search algorithm | |
Kaur et al. | Load balancing optimization based on deep learning approach in cloud environment | |
Hafsi et al. | Genetically-modified Multi-objective Particle Swarm Optimization approach for high-performance computing workflow scheduling | |
Li et al. | Endpoint-flexible coflow scheduling across geo-distributed datacenters | |
CN116263681A (en) | Mobile edge computing task unloading method, device, equipment and storage medium | |
CN114675953A (en) | Resource dynamic scheduling method, device, equipment and computer readable storage medium | |
Entezari-Maleki et al. | A genetic algorithm to increase the throughput of the computational grids | |
Huang et al. | Computation offloading for multimedia workflows with deadline constraints in cloudlet-based mobile cloud | |
CN117032902A (en) | Cloud task scheduling method for improving discrete particle swarm algorithm based on load | |
Zhu et al. | SAAS parallel task scheduling based on cloud service flow load algorithm | |
Kousalya et al. | Hybrid algorithm based on genetic algorithm and PSO for task scheduling in cloud computing environment | |
CN114925935A (en) | Multi-workflow scheduling method for time delay constraint in cloud edge environment | |
CN115168011A (en) | Multi-workflow application scheduling method under fuzzy edge cloud collaborative environment | |
CN109344463A (en) | The two stages dispatching method of electronic product stress cloud emulation platform | |
Tang | Load balancing optimization in cloud computing based on task scheduling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |