CN111736959B - Spark task scheduling method considering data affinity under heterogeneous cluster - Google Patents

Spark task scheduling method considering data affinity under heterogeneous cluster Download PDF

Info

Publication number
CN111736959B
CN111736959B CN202010683860.5A CN202010683860A CN111736959B CN 111736959 B CN111736959 B CN 111736959B CN 202010683860 A CN202010683860 A CN 202010683860A CN 111736959 B CN111736959 B CN 111736959B
Authority
CN
China
Prior art keywords
stage
task
tasks
time
spark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010683860.5A
Other languages
Chinese (zh)
Other versions
CN111736959A (en
Inventor
文建璋
陈祥军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Nansoft Technology Co ltd
Original Assignee
Nanjing Nansoft Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Nansoft Technology Co ltd filed Critical Nanjing Nansoft Technology Co ltd
Priority to CN202010683860.5A priority Critical patent/CN111736959B/en
Publication of CN111736959A publication Critical patent/CN111736959A/en
Application granted granted Critical
Publication of CN111736959B publication Critical patent/CN111736959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a Spark task scheduling method considering data affinity under a heterogeneous cluster, which is used for minimizing the maximum completion time of Spark application considering data affinity from the perspective of a user. Decomposing Spark application submitted by a user into task scheduling sequences, distributing tasks to proper virtual machines to obtain an initial solution, and then further optimizing the initial solution by adjusting the task scheduling sequences to obtain an optimal scheduling result. The method of the invention optimizes the maximum completion time of Spark application by realizing dynamic allocation of proper resources.

Description

Spark task scheduling method considering data affinity under heterogeneous cluster
Technical Field
The invention relates to a Spark task scheduling method considering data affinity under a heterogeneous cluster, and belongs to the technical field of cloud computing resource scheduling.
Background
In recent years, with the rapid development of social networks, internet of things and other technologies, a great deal of data analysis needs exist in many fields such as banking, medical care, business prediction, scientific exploration and the like, and big data processing becomes crucial. The Spark framework has been widely used in big data processing.
There are two default task scheduling strategies: FIFO mode and Fair sharing mode. In the FIFO mode, the Spark defaults to considering mobile computing instead of mobile data, and allocates the task in the Stage to the nodes for storing the input data required by the task as much as possible, so that some nodes may run in an overload state and some nodes are in an idle state in the cluster, which seriously wastes the computing resources of the cluster and leads to the increase of the total completion time of the application. On the other hand, with the continuous development of computer hardware technology, the machine update of the data center is very frequent, the server cluster of the data center is no longer homogeneous, but the default scheduling policy of Spark is used for solving the task scheduling problem of the homogeneous cluster. It is necessary to study the Spark task scheduling problem under heterogeneous clusters.
The Spark application consists of Jobs with partial order relationship constraints, which make up the DAG. Each Job may be divided into multiple stages, and partial order relationship constraints also exist among stages in the Job, which may also constitute DAGs. In addition, each Stage comprises a plurality of independent tasks, and data required by the tasks can come from original data or intermediate data generated by the tasks in a precursor Stage set. The coupling relation of the original data, the intermediate data and the task, namely data affinity, required by the task is comprehensively considered. Data affinity considers that tasks and data thereof are as close as possible to reduce the network transmission cost of the data.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems and the defects in the prior art, the invention provides a Spark task scheduling method considering data affinity under a heterogeneous cluster, which is used for improving the existing Spark task scheduler, constructing a Spark workflow scheduling system architecture and minimizing the maximum completion time of Spark application, regarding the Spark task scheduling problem under the cluster formed by data center heterogeneous servers and considering the characteristics of Spark application workflows and the data affinity of virtual machines.
The Spark application consists of Jobs with partial order relationship constraints, which make up the DAG. Each Job may be divided into multiple stages, and partial order relationship constraints also exist among stages in the Job, which may also constitute DAGs. In addition, each Stage comprises a plurality of independent tasks, and data required by the tasks can come from original data or intermediate data generated by the tasks in a precursor Stage set. The coupling relation of the original data, the intermediate data and the task, namely data affinity, required by the task is comprehensively considered. Data affinity considers that tasks and data thereof are as close as possible to reduce the network transmission cost of the data.
The technical scheme is as follows: a Spark task scheduling method considering data affinity under a heterogeneous cluster comprises the following steps:
step 1, calculating time parameters of all stages in Spark according to partial sequence properties of Spark workflow, and generating a Stage scheduling queue RSQ according to a Stage sequencing model;
step 2, taking out stages from the Stage scheduling queue RSQ in sequence, and generating a task scheduling queue TQ for parallel tasks in the stages according to a task sequencing model;
step 3, taking out the tasks from the task scheduling queue TQ in sequence, maintaining an earliest available virtual machine List for each task, then calculating the data affinity of all the virtual machines in the earliest available virtual machine List, and adding a plurality of virtual machines with the highest data affinity into the VMList according to the requirement;
step 4, searching the virtual machine from the VMList table according to the virtual machine search strategy for the tasks in the task scheduling queue TQ, and distributing the tasks to the virtual machine;
step 5, repeating the step 3 and the step 4 until the task scheduling queue TQ is empty, and when the task scheduling queue TQ is empty, indicating that all tasks in the Stage are completely executed;
step 6, invoking an RSA algorithm, updating elements in the Stage scheduling queue RSQ, turning to step 2, and obtaining an initial scheduling solution until the Stage scheduling queue RSQ is empty;
and 7, improving the initial scheduling solution by adjusting the task scheduling sequence to obtain a final scheduling result, and ending the method.
The partial order property of the Spark workflow in the step 1 is considered from two aspects: the Spark workflow is composed of Job with partial order relation, and Job is composed of Stage with partial order relation, namely, partial order relation exists in both Job level and Stage level. Job, Stage are merged and the Spark workflow is represented as a Directed Acyclic Graph (DAG) for Stage.
Spark task stream: g { S }1,S2,..,SnAnd the step B is a DAG consisting of n stages, and a plurality of tasks which can be executed in parallel exist in each Stage.
In the step 1, the Stage sequencing model is as follows:
(1) earliest start time priority rule: calculating the earliest start time EST for each Stage, and arranging stages in the Stage scheduling queue RSQ in an increasing order according to the calculated earliest start time EST;
(2) maximum estimated processing time precedence rule: calculating estimated processing time EDT for each Stage, and sequencing stages in the Stage scheduling queue RSQ in descending order according to the calculated estimated processing time EDT;
(3) minimum float time priority rule: calculating FL (difference value between latest starting time and earliest starting time) for each Stage, and arranging stages in the Stage scheduling queue RSQ in descending order according to the calculated FL;
(4) random rule: to compare with the above rule, Stage in the Stage scheduling queue RSQ is randomly selected as the Stage with the highest priority.
The task ordering model in the step 2 specifically comprises the following steps:
(1) instruction number priority rule: the ith Stage SiAll tasks in the system are sorted in a non-increasing mode according to the size of the task instruction number;
(2) transmission time priority rule: the ith Stage SiAll tasks in (a) are ordered in a non-increasing manner according to estimated transmission times;
(3) processing time priority rule: the ith Stage SiAll tasks in (a) are ordered in a non-incremental manner according to the estimated task processing time.
The calculation formula of the data affinity of the virtual machine in the step 3 is as follows:
Figure DEST_PATH_IMAGE001
wherein
Figure 679722DEST_PATH_IMAGE002
To represent
Figure DEST_PATH_IMAGE003
All direct predecessor Stage sets of (1);
Figure 788361DEST_PATH_IMAGE004
is a task
Figure DEST_PATH_IMAGE005
And task
Figure 193935DEST_PATH_IMAGE006
The amount of data to be transferred between,
Figure DEST_PATH_IMAGE007
representing tasks
Figure 429744DEST_PATH_IMAGE005
Whether or not at the server
Figure 987895DEST_PATH_IMAGE008
In the above-mentioned step (2),
Figure DEST_PATH_IMAGE009
is a task
Figure 752589DEST_PATH_IMAGE005
Is required to be stored in
Figure 379879DEST_PATH_IMAGE008
The amount of raw data in (1);
Figure 871909DEST_PATH_IMAGE010
representing tasks
Figure 268256DEST_PATH_IMAGE005
Required to be stored in server
Figure 672692DEST_PATH_IMAGE008
The amount of data in (1) is,
Figure DEST_PATH_IMAGE011
representing tasks
Figure 69170DEST_PATH_IMAGE005
The total amount of data required; data are stored in the server, and data transmission is not considered among the virtual machines in the same server, so that the data affinity of the virtual machines in the same server is the same, and the virtual machines
Figure 584465DEST_PATH_IMAGE012
The data affinity calculation formula of (1):
Figure DEST_PATH_IMAGE013
the virtual machine search strategy in the step 4 specifically includes:
(1) the fastest speed priority strategy is as follows: considering the processing speed of the virtual machine, preferentially distributing the tasks to the virtual machine with high processing speed in the VMList table for execution, and shortening the execution time of the tasks as much as possible;
(2) earliest available time first policy: allocating the task to the virtual machine with the earliest availability for execution by considering the earliest availability time of the virtual machine in the VMList table;
(3) earliest completion time priority strategy: the method comprises the steps that the start time and the task execution time of a task are considered, and the task is distributed to a virtual machine in a VMList table which can guarantee that the completion time of the task is earliest to be executed;
(4) random strategy: and comparing with the virtual machine searching strategy, randomly selecting a virtual machine from the VMList, and distributing the task to the virtual machine for execution.
The RSA (Ready Stage addition) algorithm in step 6 specifically includes:
(1) the input is the ith Stage Si(SiAll tasks have been scheduled) and save the ready Stage scheduling queue RSQ that has been sorted. The output is a Stage scheduling queue RSQ which is added into some ready stages and then is reordered;
(2) for the ith Stage SiEach Stage S in the direct successor set ofi', will SiFrom Si' deletion in the immediate predecessor set, followed by a decision Si' whether the immediate predecessor set is an empty set, if so, then Si' insert into Stage scheduling queue RSQ.
(3) And reordering the elements in the Stage scheduling queue RSQ according to the Stage ordering model.
The method for adjusting the task scheduling sequence in the step 7 comprises the following steps:
for all tasks in the Stage in the critical path, the completion time of the Stage is determined by the latest completed task, the time gap between two tasks in the virtual machine is searched, the latest completed task is transferred to the time gap for execution, and the completion time of the Stage is reduced as much as possible, namely the completion time of the Stage is optimized;
marking the Stage with optimized completion time as true, and not needing to optimize again even if the Stage is still in the critical path next time;
after the optimization of the previous Stage, the start time and the completion time of the subsequent Stage are changed. At this time, the critical path may be changed, and the critical path needs to be obtained again.
Drawings
FIG. 1 is a block diagram of a Spark workflow;
FIG. 2 is a block diagram of a method of performing an embodiment of the invention;
FIG. 3 is a flow chart of an implementation of a method of an embodiment of the present invention;
FIG. 4 is an initial state diagram of an example RSA algorithm prior to scheduling;
FIG. 5 is a state diagram of an example RSA algorithm after execution of S1;
FIG. 6 is a state diagram of an example RSA algorithm after execution of S3;
FIG. 7 is a process diagram of a DAGSarser merge Job, Stage.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
As shown in fig. 2, the data center in this embodiment includes 2 servers: a Master Server (Master) and a general Server (Server). The main function of the main server is task scheduling, the common servers are responsible for executing tasks, and each common server comprises a plurality of heterogeneous virtual machines with different numbers.
In the present embodiment, in the Spark task scheduling method considering data affinity under a heterogeneous cluster, first, Job and Stage are merged to obtain a Stage-based DAG, a maximum value method is adopted to estimate an execution speed of a virtual machine, then, an execution time and a data transmission time of a task are estimated, so as to estimate a processing time of the Stage, and then, time parameters of the Stage (that is, the earliest start time, the earliest completion time, the latest start time and the latest completion time of the Stage) are calculated on the basis. Secondly, a ready Stage priority queue, the Stage scheduling queue RSQ, is created for saving a ready Stage. Considering that the start time of Stage, the execution time of Stage and the completion time of Stage have a crucial influence on Spark task scheduling, four Stage sorting rules (Stage sorting model) are proposed: an earliest start time priority rule, a maximum estimated execution time priority rule, a minimum float difference time priority rule, and a random rule. And sequencing stages in the RSQ based on four Stage sequencing rules to obtain a Stage topological order. When task scheduling is carried out, Stage with the highest priority is taken out from RSQ each time, and according to three task sorting rules (task sorting models): and ordering all parallel tasks in the Stage according to an instruction number priority rule, a transmission time priority rule and a processing time priority rule to obtain a task scheduling sequence TQ. And resource allocation is carried out according to the task scheduling sequence: maintaining a virtual machine list in which the data affinity of the virtual machines is high and all the virtual machines are available, considering the data affinity and the load balance; according to four designed virtual machine search strategies: and selecting a virtual machine for the task according to a fastest speed priority strategy, an earliest available time priority strategy, an earliest completion time priority strategy and a random selection strategy. And after the tasks in the current Stage are all scheduled, calling an RSA algorithm to add the new ready Stage to the RSQ, and recalculating the time parameters of the rest unscheduled stages. And repeating the steps until the RSQ is an empty set to obtain a scheduling solution. And finally, improving the initial scheduling solution by adjusting the task scheduling sequence to obtain a final scheduling solution.
As shown in fig. 3, the specific implementation steps of the Spark task scheduling method in the heterogeneous cluster environment according to the embodiment of the present invention are as follows:
in step s201, Spark application G is composed of h Jobs with partial order constraints, and each Job J i In which comprisesh i Stage, Stage has partial order relation,S i,l to representJ i To (1)lThe stages are converted into DAG formed by stages with partial order relationship by combining the Job, stages and Spark application through DADAGSerserS i,l Is converted intoS j The mapping relationship is as follows:
Figure 632055DEST_PATH_IMAGE014
. As shown in FIG. 7, DAGSParser merges the processes of Job, Stage: an example of a simple Spark application, G, containing h jobs,J 1 the first Job is shown, which contains 3 stages.J 1 AndJ 2 there is a partial order relationship between them, specificallyS 1,3 AndS 2,1 the partial order relationship exists between the two, the simplification is carried out through the DADAGSER, the partial order relationship of the Job layer is ignored, and the partial order relationship to the Stage is directly embodied, namely the original partial order relationshipS 1,3 According to the formula of mapping relationship (
Figure 472972DEST_PATH_IMAGE014
) Is simplified intoS 3 (ii) a OriginallyS 2,1 Is simplified intoS 4 J i Representing the ith Job in the application.
Step s202, calculating time parameters of all stages in G, including: the earliest start time and the earliest completion time, the latest start time and the latest completion time.
Step S203, establishing a Stage scheduling queue RSQ according to the Stage sorting model, and sequentially sequencing S according to the priorityiAdded to the RSQ.
And step s204, judging whether the RSQ is empty, and if the RSQ is empty, executing step s 210. If not, step s205 is executed.
Step s205, taking out the Stage with the highest priority in the RSQ, adding all tasks in the taken-out Stage into a queue TQ, sorting the tasks in the TQ according to a task sorting model, and maintaining an earliest available virtual machine List for each task in the TQ;
step s206, one task in the TQ is fetched, a List of virtual machines is obtained for the fetched task, data affinities of all the virtual machines in the List are calculated, and a plurality of virtual machines with higher data affinity are selected from the List and added into the VMList table according to requirements;
step s207, searching a virtual machine from the VMList table for the current task according to a virtual machine search strategy, and distributing the task to the searched virtual machine for execution;
in step s208, it is determined whether the TQ is empty, and if so, step s209 is performed. If not, go to step s 206;
step s209, the RSA algorithm is scheduled to be added into a new ready Stage, a new sequenced RSQ queue is obtained, and the step s204 is switched to;
step s210, according to the above steps, an initial scheduling solution can be obtained, the task scheduling sequence of Stage on the critical path is adjusted according to the Stage sequencing model, and the completion time of Spark application is further minimized; the current Stage has completed scheduling, and the Stage with the Stage admission of 0 is the ready Stage.
In step s211, the maximum completion time of the entire Spark application is obtained.
The RSA algorithm is specifically:
(1) the input is the ith Stage Si(SiAll tasks have been scheduled) and save the ready Stage scheduling queue RSQ that has been sorted. The output is a Stage scheduling queue RSQ which is added into some ready stages and then is reordered;
(2) for the ith Stage SiEach Stage S in the direct successor set ofi', will SiFrom Si' deletion in the immediate predecessor set, followed by a decision Si' whether the immediate predecessor set is an empty set, if so, then Si' insert into Stage scheduling queue RSQ.
(3) And reordering the elements in the Stage scheduling queue RSQ according to the Stage ordering model.
As shown in fig. 4-6, one vertex of each directed graph represents a Stage, and the five tuples above each vertex respectively represent the estimated current earliest start time, earliest completion time, latest start time, latest completion time and floating time of the Stage, wherein the floating time is obtained by subtracting the latest start time from the earliest start time, and the floating time of 0 represents that the Stage is on the critical path. Fig. 4 is the initial state before scheduling, when S1, S3, S5, S6 and S7 are on the critical path (float time is all 0), S2 and S4 float times are all 1, Stage1 is ready (all immediate predecessors have completed scheduling) and is added to the ready Stage priority queue RSQ. Scheduling S1 to execute on a resource that is fast enough in processing rate, Stage1 completes earlier than expected (expected to complete in two time units, actually complete in one time unit). After S1 is completed, the earliest start time and the earliest completion time of the stages immediately succeeding S2 and S3 are updated, and then the earliest start time and the earliest completion time of all stages are updated from front to back according to the topological sequence. S7 is used as the last Stage, the latest start time is equal to the earliest start time, the latest completion time is equal to the earliest completion time, the latest start time and the latest completion time of all unscheduled stages are obtained from S7, and the floating time of each Stage is finally calculated, the result is shown in fig. 5. At this time, S2 and S3 become ready states, and these two stages are added to the ready Stage queue RSQ, and if the float time of S3 is 0 according to the minimum float time priority rule, the priority is highest among all ready stages. Then scheduling S3, where there are not enough fast resources to use, S3 the completion time is later than expected (expected to complete within 1 time unit, actually take 2 time units to complete), and updating the earliest start time, the earliest completion time, the latest start time, the latest completion time and the float time by the above method, the result is shown in fig. 6.

Claims (5)

1. A Spark task scheduling method considering data affinity under a heterogeneous cluster is characterized by comprising the following steps:
step 1, calculating time parameters of all stages in Spark according to partial sequence properties of Spark workflow, and generating a Stage scheduling queue RSQ according to a Stage sequencing model;
step 2, taking out stages from the Stage scheduling queue RSQ in sequence, and generating a task scheduling queue TQ for parallel tasks in the stages according to a task sequencing model;
step 3, taking out the tasks from the task scheduling queue TQ in sequence, establishing an earliest available virtual machine List for each task, then calculating the data affinity of all the virtual machines in the earliest available virtual machine List, and adding a plurality of virtual machines with the highest data affinity into the VMList according to the requirement;
step 4, searching the virtual machines from the VMList table according to the virtual machine search strategy for the tasks in the task scheduling queue TQ, and distributing the tasks to the searched virtual machines;
step 5, repeating the step 3 and the step 4 until the task scheduling queue TQ is empty;
step 6, invoking an RSA algorithm, updating elements in the Stage scheduling queue RSQ, turning to step 2 until the Stage scheduling queue RSQ is empty, and obtaining an initial scheduling solution;
step 7, adjusting a task scheduling sequence to obtain a final scheduling result;
the calculation formula of the data affinity of the virtual machine in the step 3 is as follows:
Figure DEST_PATH_IMAGE002
wherein
Figure DEST_PATH_IMAGE004
To representS j All direct predecessor Stage sets of (1);
Figure DEST_PATH_IMAGE006
is a task
Figure DEST_PATH_IMAGE008
And task
Figure DEST_PATH_IMAGE010
The amount of data to be transferred between,
Figure DEST_PATH_IMAGE012
representing tasks
Figure 858654DEST_PATH_IMAGE008
Whether or not at the server
Figure DEST_PATH_IMAGE014
In the above-mentioned step (2),
Figure DEST_PATH_IMAGE016
is a task
Figure 163078DEST_PATH_IMAGE008
Is required to be stored in
Figure 411657DEST_PATH_IMAGE014
The amount of raw data in (1);
Figure DEST_PATH_IMAGE018
representing tasks
Figure 993817DEST_PATH_IMAGE008
Required to be stored in server
Figure 480293DEST_PATH_IMAGE014
The amount of data in (1) is,
Figure DEST_PATH_IMAGE020
representing tasks
Figure 958547DEST_PATH_IMAGE008
The total amount of data required; data are stored in the server, and data transmission is not considered among the virtual machines in the same server, so that the data affinity of the virtual machines in the same server is the same, and the virtual machines
Figure DEST_PATH_IMAGE022
The data affinity calculation formula of (1):
Figure DEST_PATH_IMAGE024
in the step 1, the Stage sorting model is executed according to the following rules:
earliest start time priority rule: calculating the earliest start time EST for each Stage, and arranging stages in the Stage scheduling queue RSQ in an increasing order according to the calculated earliest start time EST;
maximum estimated processing time precedence rule: calculating estimated processing time EDT for each Stage, and sequencing stages in the Stage scheduling queue RSQ in descending order according to the calculated estimated processing time EDT;
minimum float time priority rule: calculating the difference FL between the latest starting time and the earliest starting time for each Stage, and arranging the stages in the Stage scheduling queue RSQ in a descending order according to the calculated difference FL between the latest starting time and the earliest starting time;
random rule: comparing with the rule, and randomly selecting the Stage in the Stage scheduling queue RSQ as the Stage with the highest priority;
the virtual machine search strategy in the step 4 is as follows:
the fastest speed priority strategy is as follows: according to the processing speed of the virtual machine, preferentially distributing the tasks to the virtual machine with the high processing speed in the VMList table for execution;
earliest available time first policy: distributing the task to the virtual machine with the earliest availability according to the earliest availability time of the virtual machine in the VMList table for execution;
earliest completion time priority strategy: distributing the tasks to the virtual machines in the VMList table which can ensure the earliest task completion time to execute according to the start time and the task execution time of the tasks;
random strategy: comparing with the virtual machine searching strategy, randomly selecting a virtual machine from the VMList, and distributing the task to the virtual machine for execution;
the RSA algorithm in the step 6 is as follows:
the input of RSA algorithm is ith Stage SiAnd saving the ordered ready Stage scheduling queue RSQ; the output of RSA algorithm is the Stage scheduling queue RSQ which is added into the ready Stage and then is reordered; wherein SiAll the tasks are scheduled;
for the ith Stage SiEach Stage S in the direct successor set ofi', will SiFrom Si' deletion in the immediate predecessor set, followed by a decision Si' whether the immediate predecessor set is an empty set, if so, then Si' insert into Stage scheduling queue RSQ;
and reordering the elements in the Stage scheduling queue RSQ according to the Stage ordering model.
2. The Spark task scheduling method considering data affinity under the heterogeneous cluster according to claim 1, wherein the partial ordering property of Spark workflow in step 1 includes a partial ordering relationship at a Job level and a partial ordering relationship at a Stage level; and merging the partial order relationship of the Job level and the partial order relationship of the Stage level, so that the Spark workflow is represented as a directed acyclic graph about the Stage.
3. The method for dispatching Spark tasks considering data affinity under heterogeneous cluster as claimed in claim 1, wherein the Spark workflow is a directed acyclic graph composed of n stages, denoted as G { S ™1,S2,..,SnAnd there are multiple tasks in each Stage that can be executed in parallel.
4. The method for dispatching Spark tasks under heterogeneous clusters according to claim 1, wherein the task ordering model in step 2 is executed according to the following rules:
(1) instruction number priority rule: the ith Stage SiAll tasks in the system are sorted in a non-increasing mode according to the size of the task instruction number;
(2) transmission time priority rule: the ith Stage SiAll tasks in (1) are ordered in a non-incremental manner according to estimated task data transmission time;
(3) processing time priority rule: the ith Stage SiAll tasks in (a) are ordered in a non-incremental manner according to the estimated task processing time.
5. The Spark task scheduling method considering data affinity under the heterogeneous cluster according to claim 1, wherein the method for adjusting the task scheduling sequence in step 7 is:
for all tasks in the Stage of the critical path, searching a time gap between two tasks in the virtual machine, and migrating part of the tasks to the time gap for execution;
marking the optimized Stage as true, and not optimizing the Stage again next time when the Stage is still in the critical path;
and after the previous Stage is optimized, the critical path is obtained again.
CN202010683860.5A 2020-07-16 2020-07-16 Spark task scheduling method considering data affinity under heterogeneous cluster Active CN111736959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010683860.5A CN111736959B (en) 2020-07-16 2020-07-16 Spark task scheduling method considering data affinity under heterogeneous cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010683860.5A CN111736959B (en) 2020-07-16 2020-07-16 Spark task scheduling method considering data affinity under heterogeneous cluster

Publications (2)

Publication Number Publication Date
CN111736959A CN111736959A (en) 2020-10-02
CN111736959B true CN111736959B (en) 2020-11-27

Family

ID=72654738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010683860.5A Active CN111736959B (en) 2020-07-16 2020-07-16 Spark task scheduling method considering data affinity under heterogeneous cluster

Country Status (1)

Country Link
CN (1) CN111736959B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015539B (en) * 2020-10-29 2021-02-02 北京世纪好未来教育科技有限公司 Task allocation method, device and computer storage medium
CN116430738B (en) * 2023-06-14 2023-08-15 北京理工大学 Self-adaptive dynamic scheduling method of hybrid key system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089272A1 (en) * 2016-09-26 2018-03-29 Splunk Inc. Techniques for generating structured metrics from ingested events
US20180300174A1 (en) * 2017-04-17 2018-10-18 Microsoft Technology Licensing, Llc Efficient queue management for cluster scheduling
CN109857526A (en) * 2018-12-27 2019-06-07 曙光信息产业(北京)有限公司 A kind of scheduling system towards mixing computation frame
CN111209104A (en) * 2020-04-21 2020-05-29 南京南软科技有限公司 Energy perception scheduling method for Spark application under heterogeneous cluster

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804211A (en) * 2018-04-27 2018-11-13 西安华为技术有限公司 Thread scheduling method, device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089272A1 (en) * 2016-09-26 2018-03-29 Splunk Inc. Techniques for generating structured metrics from ingested events
US20180300174A1 (en) * 2017-04-17 2018-10-18 Microsoft Technology Licensing, Llc Efficient queue management for cluster scheduling
CN109857526A (en) * 2018-12-27 2019-06-07 曙光信息产业(北京)有限公司 A kind of scheduling system towards mixing computation frame
CN111209104A (en) * 2020-04-21 2020-05-29 南京南软科技有限公司 Energy perception scheduling method for Spark application under heterogeneous cluster

Also Published As

Publication number Publication date
CN111736959A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN107273209B (en) Hadoop task scheduling method based on minimum spanning tree clustering improved genetic algorithm
US20050177833A1 (en) Method and apparatus for reassigning objects to processing units
CN106980532A (en) A kind of job scheduling method and device
Barbosa et al. Dynamic scheduling of a batch of parallel task jobs on heterogeneous clusters
Xiao et al. A cooperative coevolution hyper-heuristic framework for workflow scheduling problem
CN111736959B (en) Spark task scheduling method considering data affinity under heterogeneous cluster
CN114610474B (en) Multi-strategy job scheduling method and system under heterogeneous supercomputing environment
CN110008013B (en) Spark task allocation method for minimizing job completion time
CN108427602B (en) Distributed computing task cooperative scheduling method and device
CN115292016A (en) Task scheduling method based on artificial intelligence and related equipment
CN106934539B (en) Workflow scheduling method with deadline and expense constraints
CN106934537A (en) The sub- time limit based on the scheduling of reverse operation stream obtains optimization method
CN106407007B (en) Cloud resource configuration optimization method for elastic analysis process
US7664858B2 (en) Method for balancing load between processors in a multi-processor environment
CN113946430B (en) Job scheduling method, computing device and storage medium
CN110048966B (en) Coflow scheduling method for minimizing system overhead based on deadline
CN108958919B (en) Multi-DAG task scheduling cost fairness evaluation method with deadline constraint in cloud computing
Shu et al. Performance optimization of Hadoop workflows in public clouds through adaptive task partitioning
CN110262896A (en) A kind of data processing accelerated method towards Spark system
CN111930485B (en) Job scheduling method based on performance expression
CN116755851A (en) Task scheduling method and system based on heterogeneous priority and key task replication
Wang et al. Cooperative job scheduling and data allocation in data-intensive parallel computing clusters
CN116795503A (en) Task scheduling method, task scheduling device, graphic processor and electronic equipment
CN113407336B (en) Full-comparison data distribution method based on tabu search optimization algorithm
CN114860417A (en) Multi-core neural network processor and multi-task allocation scheduling method for processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant