CN114077486A

CN114077486A - MapReduce task scheduling method and system

Info

Publication number: CN114077486A
Application number: CN202111386374.8A
Authority: CN
Inventors: 高永强; 张凯丰
Original assignee: Inner Mongolia University
Current assignee: Inner Mongolia University
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2022-02-22
Anticipated expiration: 2041-11-22
Also published as: CN114077486B

Abstract

The invention provides a MapReduce task scheduling method and system, which make up for the defect that the existing Kill-based preemption mechanism of Yarn directly kills tasks by introducing the preemption mechanism based on a Docker container. The preemption mechanism based on the Docker container can release resources occupied by tasks while keeping task progress, and can realize that tasks with high priority preempt running resources of other tasks and ensure that the completion time of the operation reaches the target of a Service Level Agreement (SLA) by combining a task strategy perceived by the service level agreement.

Description

MapReduce task scheduling method and system

Technical Field

The invention relates to the technical field of task scheduling in a heterogeneous cluster environment, in particular to a MapReduce task scheduling method and system.

Background

At present, with the development of internet technology, the scale of data to be calculated and processed in daily production and life is getting larger and larger, and the way of processing large-scale data by using a distributed computing system is widely used. Among them, the scheduler is a vital part of the distributed system. A well-designed scheduling strategy can effectively allocate program requirements and available cluster resources and can reduce the operation cost of a data center. The most widely applied distributed computing framework at present is the flagship project Hadoop of Apache, and the provided programming computing framework is MapReduce. Hadoop extracts an independent framework Yarn from the resource management part. The Yarn is a universal resource management platform and can provide resources required by operation for computing programs such as MapReduce and the like.

Today Yarn has implemented three different schedulers, first-in-first-out, capacity and fair, based on different scheduling strategies. Although the three scheduling strategies can improve the utilization rate of the cluster and optimize the cluster performance to a certain extent, how to schedule jobs with different resource requirements and QoS constraints in a complex heterogeneous cluster environment is still a difficult problem to be solved urgently. According to the completion time of the job, it can be divided into a short job and a long job. Short jobs generally have low latency requirements, while long jobs can tolerate higher latency, but have quality of service requirements. So for short jobs, scheduling needs to be done immediately after they are committed to avoid queuing delays. For long jobs, the scheduler should allow the long job to use cluster resources if there are free resources in the cluster, which may improve the resource utilization of the cluster.

Under a real working environment, a plurality of long jobs and short jobs are generally mixed together for scheduling, and the existing solution either forcibly terminates a running long job to ensure low delay of a short job or completely forbids a resource preemption behavior to improve the resource utilization rate of a cluster. The simple scheduling strategy cannot meet the scheduling of jobs with different resource requirements in a complex heterogeneous environment. The goal is to make a trade-off between resource utilization and job queuing delay, and how to improve hardware resource utilization and efficiency and reduce job queuing delay as much as possible, thereby achieving the goal of service level agreement.

In order to solve the problem, a brand-new scheduling strategy needs to be developed urgently to meet the needs of actual work.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a MapReduce task scheduling method and system used in a heterogeneous Yarn cluster environment, which ensure higher cluster resource utilization rate and simultaneously give consideration to low delay and instant response speed of operation.

The MapReduce task scheduling method provided by the invention comprises the following steps:

s1: the client creates a JobSummiter instance, calculates the input fragment of the job by an internal method of the JobSummiter, copies the resource required by the job operation into a distributed file system, and submits the MapReduce job to a resource scheduler;

s2: after receiving the job submission message, the resource scheduler transmits the request message to the central resource scheduler, and the central resource scheduler analyzes the detailed information related to the job by an internal job analysis method and analyzes the latest deadline required by reaching a service level agreement;

s3: the central resource scheduler adds the new tasks into a central task queue, and reorders all the tasks from near to far according to the deadline of each task according to the different deadline dates;

s4: the central resource scheduler receives the heartbeat information from the node resource scheduler, obtains the number of tasks allocated by each node resource scheduler, sequentially selects the node with the minimum task number from the number of tasks, and assigns the task with the latest current deadline to be executed in the past;

s5: after receiving the new task, the node resource scheduler adds the new task into a local task queue and reorders the task queue according to the deadline;

s6: the node resource scheduler checks the position of the new task in the task queue and if the deadline of the new task is closer than the task being executed, the new task preempts the task being executed.

Further, in step S3, the central resource scheduler obtains the total amount of CPU resources C and the total amount of memory resources M, and obtains the job share of the long job according to the job number

Periodically calculating the resource share of each job in the central task queue according to the fairness principle

Further, in step S4, after receiving the resource request of a job, the central resource scheduler analyzes whether the job can be completed before the deadline by combining the deadline constraint, the resource condition of the cluster, and the resource requirement of the job, and adds the job into the central task queue if the central resource scheduler determines that the job can be completed before the deadline; otherwise, the central resource scheduler may deny execution of the job.

Further, in step S4, when a certain job arrives, the central resource scheduler determines the current cluster resource amount according to the heartbeat information sent by each node resource scheduler, analyzes the resource amount requested by the job according to the history log of the job running, and if the job is not executed on the cluster, the scheduler executes the job by using a small part of the original data set as a pretest set.

Further, in step S4, if the resource amount of the job request does not exceed the available resource amount in the cluster, the central resource scheduler adds the job to the central task queue;

otherwise, it needs to be subdivided into two cases: one case includes executing the job if the job directly preempts the resources of the job currently running and can be completed in time if executed immediately;

another scenario includes that, even if the job preempts the resources of other jobs, the job cannot meet the deadline requirement, and the central resource scheduler directly denies execution of the job.

Further, in step S4, the amount of resource to be preempted based on the service level agreement is determined by:

when the execution of the W map tasks is finished, the reduce task is started to be executed, and T^upRepresenting the upper limit of the execution time of the W map tasks, the following can be obtained:

wherein M is_avgIs the average execution time of the map tasks in job j,

for the number of map tasks in Job j, M_maxThe maximum execution time of the map task in the operation j is taken as the maximum execution time; there are Q jobs that can be on the upper time limit T^upThe amount of resources that were previously executed and released after these jobs were completed is R, the value of which can be calculated by the following formula:

wherein j represents a certain operation,

representing the reduce task number of the operation j;

the amount of resources required in the reduce phase is E, and the value of E can be calculated by the following formula:

wherein C is_rIndicating the amount of resources available in the cluster at the current time,

representing the amount of resources required by the map task for job j.

Further, in step S6, when job preemption is required, the resource share additionally occupied by the job k to be preempted is calculated first

Wherein

Represents the actual occupied resources of the operation k in the process of executionThe source of the light source is,

representing the resource amount which should be obtained by the operation k according to the fair resource allocation principle; and then acquiring the resource share requested by the job j needing to be executed

If it is

The resource to be preempted can be calculated

Further, in step S6, if it is determined that the operation is not in progress

Calculating the resource to be preempted by an algorithm

Resource requiring preemption

The calculation of (a) includes: comparing CPU resource and memory resource, dividing them into main resource and secondary resource, then obtaining resource recovery of secondary resource according to recovery of main resource, the calculation formula includes:

wherein the content of the first and second substances,C_j,M_jrespectively representing the amount of CPU resources and the amount of memory resources requested by job j, C_a,M_aRespectively representing the CPU resource amount and the memory resource amount actually and additionally occupied by the current operation k in the cluster;

representing the amount of resources that need to be preempted, if

Then representing that the CPU resource requested by the job j is a main resource, and then preempting all the CPU resources additionally occupied by the job k, and preempting the memory resources additionally occupied by the job k in proportion; and otherwise, preempting all the memory resources additionally occupied by the operation k according to the fact that the memory is the main resource requested by the operation j, and preempting the CPU resources additionally occupied by the operation k in proportion.

Further, the MapReduce task scheduling method is performed according to a scheduling policy based on a service level agreement, and the scheduling policy based on the service level agreement comprises the following steps:

when job j arrives, analyzing the expiration date of the job, the required throughput and the required resource amount;

the central resource scheduler analyzes whether the current resource quantity of the cluster meets the resource demand quantity of the judgment job j, and if the current resource quantity of the cluster meets the resource demand quantity of the judgment job j, the job j is added into a central task queue;

if not, judging whether the resource quantity of the cluster can meet the resource demand quantity of the map task of the job j or not, and whether the resource released after the execution of the map task can meet the resource demand quantity of the reduce task or not;

if the two conditions are met, adding the job j into the central task queue, and marking the job j as a high priority, so that the job j can occupy the resources of other jobs in the execution process; if the two conditions cannot be met simultaneously, the central resource scheduler refuses to execute the job j;

the central resource scheduler sorts the jobs in the central task queue according to the deadline, and performs traversal processing on each job respectively; for the job j in the central task queue, the central resource scheduler judges whether the map task of the job j is completely executed or not, if not, the priority of the job j is judged, if the job is a high-priority job, the central resource scheduler immediately communicates with the node resource scheduler to preempt the appointed resource from the cluster so as to execute the map task of the job j, otherwise, the central resource scheduler waits for the cluster to generate idle resources and allocates the idle resources to the map task of the job j;

the node resource scheduler reports the task execution state to the central resource scheduler through heartbeat information, if the map tasks are completely executed, the central resource scheduler judges whether the number of the executed map tasks exceeds a threshold value W, if so, the central resource scheduler starts to execute the reduce task of the job j and also judges the priority of the job j, if the job is executed preferentially, the node resource scheduler completes job preemption, otherwise, idle resources are waited for allocation.

The invention also provides a MapReduce task scheduling system adopting the MapReduce task scheduling method, which comprises the following steps: the distributed data center cluster comprises a central resource scheduler and a plurality of node resource schedulers;

a central task queue is maintained in the central resource scheduler, and when a new job arrives, the central resource scheduler analyzes job characteristics to obtain the operation time and the deadline of the job;

the node resource scheduler maintains a running task queue and a pause task queue, and the two queues are sorted according to the sequence of the deadline date and continuously report the task information and the resource use condition on the node to the central resource scheduler through a heartbeat mechanism.

The invention provides a MapReduce task scheduling method used in a heterogeneous Yarn cluster environment, wherein a central resource scheduler runs a daemon process on a resource manager (resource manager) and is responsible for receiving task information transmitted by a node resource scheduler, periodically checks a current task scheduling strategy, resource availability of each working node and resource requirement of a newly arrived task, deduces which queues occupy redundant resources and are insufficient in resource allocation, calculates the number of resources to be preempted, obtains an optimal resource allocation scheme of the task queues in each time period, and transmits a scheduling decision to the node resource scheduler for execution.

The scheduling method creatively introduces a Docker container-based preemption mechanism, and overcomes the defect that the existing Kill-based preemption mechanism of Yarn directly kills tasks. The preemption mechanism based on the Docker container can release resources occupied by tasks while keeping task progress, and can realize that tasks with high priority preempt running resources of other tasks and ensure that the completion time of the operation reaches the target of a Service Level Agreement (SLA) by combining a task strategy perceived by the service level agreement.

Drawings

The invention will be described in more detail hereinafter on the basis of embodiments and with reference to the accompanying drawings. Wherein:

FIG. 1 is a schematic flow chart of a MapReduce task scheduling method in the present invention;

FIG. 2 is a system architecture diagram of a MapReduce task scheduling method in the present invention;

FIG. 3 is a deployment diagram of an example of the MapReduce task scheduling method in the present invention.

Detailed Description

In order to clearly illustrate the inventive content of the present invention, the present invention will be described below with reference to examples.

In the description of the present invention, it should be noted that the terms "upper", "lower", "horizontal", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Referring to fig. 1-3, a central resource scheduler in the present invention is a daemon process running on a resource manager (resource manager), and is responsible for receiving task information transmitted by a node resource scheduler, periodically checking a current task scheduling policy, resource availability of each working node, and acquiring resource requirements of a newly arrived task, so as to infer which queues occupy redundant resources, which queues have insufficient resource allocation, and calculate the number of resources to be preempted, obtain an optimal resource allocation scheme for a task queue in each time period, and send a scheduling decision to the node resource scheduler for execution.

The Node resource scheduler is a daemon process running on a working Node Manager (Node Manager), realizes the integration of a Docker container and a Yarn framework, and solves the problem of a preemption mode that a native Yarn container directly kills a task container. After receiving the task request, the node resource scheduler loads the task into the Docker container and configures the container according to the resource request of the task. In addition, the node resource scheduler is responsible for container suspension and container recovery operations, and recovers or recovers container resources as needed.

In the actual job scheduling process, which job is to be specifically preempted is determined before job preemption operation, and the preemptive job scheduling strategy which can ensure the QoS service quality to the maximum extent and meet the SLA service level agreement is designed in the invention. The idea of the job scheduling strategy is to preferentially execute the job with the earliest deadline, so that the number of jobs missing the deadline can be minimized, and the execution effect of the job is greatly improved. Specifically, after receiving a resource request of a job, the central resource scheduler analyzes whether the job can be completed before the deadline in combination with the deadline constraint, the resource status of the cluster, and the resource requirement of the job. If the hub resource scheduler determines that a job can be completed before its deadline, it adds it to the job queue. Otherwise, the central resource scheduler may deny execution of the job.

When a certain job j arrives, the central resource scheduler determines the current cluster resource amount according to the heartbeat information sent by each node resource scheduler, analyzes the resource amount requested by the job j according to the historical log of job running, and if the job j is not executed on the cluster, the scheduler executes the job by using a small part of the original data set as a pretest set, so as to obtain the performance index related to the job.

Representing the total amount of resources required for job j,

the amount of resources required by the map task representing job j,

representing the amount of resources required for the reduce task of Job j, we use C_rIndicating the amount of resources available in the cluster at the current time. If the amount of resources requested by the job does not exceed the amount of resources available in the cluster, i.e., the job requests are not processed by the cluster

The hub resource scheduler adds job j to the job queue. If not, then,

it needs to be subdivided into two cases: one case is that if the job j directly occupies the resources of the job k currently running and is executed immediately, in which case the job can be completed in time, the job j is executed; alternatively, if job j fails to meet the deadline requirement even if job j preempts resources of other jobs, the central resource scheduler may directly deny execution of job j.

The MapReduce task scheduling method is carried out according to a scheduling strategy based on a Service Level Agreement (SLA), and the specific deployment mode based on the Service Level Agreement (SLA) is divided into the following steps:

step 1: in the Yarn cluster, the resource manager node is integrated with the central resource scheduler provided by the invention, and the rest node manager nodes are integrated with the node resource scheduler provided by the invention.

Step 2: when a user submits a batch of jobs to be distributed to the cluster, the central resource scheduler analyzes the jobs, analyzes the size of input data volume of the jobs, the size of required resources such as a CPU (Central processing Unit), a memory and the like, and the deadline specified by the user.

And step 3: the central resource scheduler collects the node state information sent by each node resource scheduler, and counts the execution progress of the currently executed job and the utilization rate of various resources in the cluster.

And 4, step 4: and the central resource scheduler gathers the available resources of the current cluster and the characteristics of the jobs to be executed, analyzes whether the current cluster resources can meet the resource demand of the job j, adds the job j into a job queue to be executed if the current cluster resources can meet the resource demand of the map task of the job j, and judges whether the resource demand of the released resources can meet the resource demand of the reduce task after the map task is executed. If both conditions are met, job j is added to the job queue and is set to high priority.

And 5: and the central resource scheduler sorts the jobs in the job queue according to the deadline and performs traversal processing on each job respectively. For the job j in the job queue, the central resource scheduler judges whether the map task of the job j is completely executed or not, if not, the priority of the job j is judged, if the job is a high-priority job, the central resource scheduler immediately communicates with the node resource scheduler to preempt the appointed resource from the cluster so as to execute the map task of the job j, otherwise, the central resource scheduler waits for the cluster to generate idle resources and allocates the idle resources to the map task of the job j;

step 6: and the node resource scheduler reports the task execution state to the central resource scheduler through the heartbeat information. If the map tasks are completely executed, the central resource scheduler judges whether the number of the executed map tasks exceeds a threshold value W. And if the priority exceeds the threshold value W, starting to execute the reduce task of the job j, judging the priority of the job j, finishing job preemption together with the node resource scheduler if the job is executed preferentially, and otherwise waiting for idle resources to be allocated.

The scheduling method can realize that the task with high priority occupies the running resources of other tasks based on the scheduling strategy of the Service Level Agreement (SLA), and ensures that the completion time of the operation reaches the target of the Service Level Agreement (SLA). The MapReduce task scheduling system can balance the relationship between the resource utilization rate and the job queuing delay, effectively improve the hardware resource utilization rate and efficiency, and greatly reduce the job queuing delay so as to achieve the service level agreement target.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A MapReduce task scheduling method is characterized by comprising the following steps:

s1: the client creates a JobSummiter instance, calculates the input fragment of the job by an internal method of the JobSummiter, copies the resource required by the job operation into a distributed file system, and submits the MapReduce job to a resource scheduler Resourcemanager;

2. The MapReduce task scheduling method of claim 1, wherein in step S2, the central resource scheduler obtains a total amount of CPU resources C and a total amount of memory resources M, and obtains a job share of a long job according to a job number

According to the fairness principle, periodicityLocal computation of resource share for each job in central task queue

3. The MapReduce task scheduling method according to claim 2, wherein in step S3, after receiving a resource request of a job, the central resource scheduler analyzes whether the job can be completed before the deadline by combining the deadline constraint, the resource condition of the cluster, and the resource requirement of the job, and adds the job to the central task queue if the central resource scheduler determines that the job can be completed before the deadline; otherwise, the central resource scheduler may deny execution of the job.

4. The MapReduce task scheduling method according to claim 1, wherein in step S4, when a job arrives, the central resource scheduler determines a current cluster resource amount according to the heartbeat information sent by each node resource scheduler, analyzes the resource amount requested by the job according to the history log of the job running, and if the job is not executed on the cluster, the scheduler executes the job using a small part of the original data set as a pre-test set.

5. The MapReduce task scheduling method of claim 4, wherein in step S4, if the resource amount of the job request does not exceed the available resource amount in the cluster, the central resource scheduler adds the job to a central task queue;

6. The MapReduce task scheduling method of claim 4, wherein in step S4, the amount of preempted resources based on the service level agreement is determined by the following scheme:

wherein M is_avgIs the average execution time of the map tasks in job j,

wherein j represents a certain operation,

representing the reduce task number of the operation j;

representing the amount of resources required by the map task for job j.

7. The MapReduce task scheduling method of claim 6, wherein in step S6, when job preemption is required, the method calculates the resource share additionally occupied by the job k to be preempted

Wherein

Representing the resources actually occupied by job k during execution,

If it is

The resource to be preempted can be calculated

8. The MapReduce task scheduling method of claim 7, wherein in step S6, if yes, the task is scheduled according to the following order

Then go toResource needing to be preempted by over-algorithm calculation

Resource requiring preemption

wherein, C_j,M_jRespectively representing the amount of CPU resources and the amount of memory resources requested by job j, C_a,M_aRespectively representing the CPU resource amount and the memory resource amount actually and additionally occupied by the current operation k in the cluster;

representing the amount of resources that need to be preempted, if

Then representing that the CPU resource requested by the job j is a main resource, and then preempting all the CPU resources additionally occupied by the job k, and preempting the memory resources additionally occupied by the job k in proportion; otherwise, the memory is regarded as the main resource requested by the job j, and the whole memory additionally occupied by the job k is preemptedAnd storing resources and preempting the CPU resources additionally occupied by the operation k in proportion.

9. The MapReduce task scheduling method of claim 8, wherein the MapReduce task scheduling method is performed according to a service level agreement-based scheduling policy, and the service level agreement-based scheduling policy comprises:

10. A MapReduce task scheduling system using the MapReduce task scheduling method of any one of claims 1 to 9, comprising: the distributed data center cluster comprises a central resource scheduler and a plurality of node resource schedulers;