CN108170417B

CN108170417B - Method and device for integrating high-performance job scheduling framework in MESOS cluster

Info

Publication number: CN108170417B
Application number: CN201711476493.6A
Authority: CN
Inventors: 郝文静; 张涛; 原帅; 吕灼恒; 王家尧; 李媛
Original assignee: Dawning Information Industry Beijing Co Ltd
Current assignee: Guoke Jinyun Technology Co ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2022-02-11
Anticipated expiration: 2037-12-29
Also published as: CN108170417A

Abstract

The invention discloses a method and a device for integrating a high-performance job scheduling framework in an MESOS cluster, wherein the method comprises the following steps: acquiring job information of a job scheduling framework, wherein the job information comprises resource occupation information of jobs on the job scheduling framework; matching the operation information with the available resource information in the MESOS cluster; and after the job information is successfully matched with the available resource information in the MESOS cluster, synchronizing the resource occupation information of the job into the MESOS cluster, thereby updating the available resource information in the MESOS cluster. According to the technical scheme, the Slurm/PBS high-performance job scheduling framework is integrated in the mess cluster, so that high-performance jobs can run in the mess cluster and synchronize resource occupation conditions to the mess cluster, the super-fusion scheduling framework is further realized, and the high-performance jobs and other jobs can run in the same cluster without mutual influence.

Description

Method and device for integrating high-performance job scheduling framework in MESOS cluster

Technical Field

The invention relates to the field of computers, in particular to a method and a device for integrating a high-performance job scheduling framework in a MESOS cluster.

Background

In order to build a super-fusion cluster, a plurality of types of computing frameworks need to be built, and available resources of the computing frameworks need to be managed uniformly, so that a MESOS (or distributed computing framework) resource management system (or MESOS cluster) is used for uniformly distributing the resources. In addition, the MESOS sends the node resource condition to each computing frame at regular time, and the computing frame performs accepting or rejecting operation on the provided available resources according to the resources required by actual operation, so that the computing frame uses the resources provided by the MESOS.

In addition, in the current method for integrating big data frames such as Hadoop in the MESOS cluster, a job is submitted to a Jobtracker (task scheduling process, which is used for task scheduling), and waits for a plug-in Framework to extract job information, and meanwhile, when the MESOS cluster allocates resources to the Framework plug-in of the frame, the job information is extracted and transmitted to a node, and the job is started through an executer (or EXCUTOR, which represents an Executor), so that the starting and state updating processes of the job are completed.

However, while a big data, high performance, container super-converged cluster has been built so far that multiple types of jobs can run in one cluster at the same time, there are still many problems:

1. resources used among different types of jobs are not managed uniformly, so that the problem of resource preemption exists, and the cluster is unstable and the resources are wasted;

2. the MESOS official does not integrate the Slurm (Simple Linux availability for Resource Management)/PBS (batch Job and computer System Resource Management software package, meanwhile, Slurm and PBS are both high-performance Job scheduling frameworks, or they are high-performance Job scheduling frameworks), so that the resources used by jobs started by other frameworks in the cluster are seized up with the high-performance Job scheduling frameworks, and part of the resources are idle;

3. the Slurm/PBS job submission process is not much the same as the Hadoop (distributed system infrastructure) and other jobs submission, and the plug-in integration principle of Hadoop and MESOS cannot be directly used, that is, the executor cannot be submitted through the MESOS, and the high-performance job can be started in the executor.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a method and a device for integrating a high-performance job scheduling framework in a MESOS cluster.

The technical scheme of the invention is realized as follows:

according to one aspect of the invention, a method for integrating a high-performance job scheduling framework in a MESOS cluster is provided.

The method for integrating the high-performance job scheduling framework in the MESOS cluster comprises the following steps: acquiring job information of a job scheduling framework, wherein the job information comprises resource occupation information of jobs on the job scheduling framework; matching the operation information with the available resource information in the MESOS cluster; and after the job information is successfully matched with the available resource information in the MESOS cluster, synchronizing the resource occupation information of the job into the MESOS cluster, thereby updating the available resource information in the MESOS cluster.

According to an embodiment of the present invention, matching the job information with the available resource information in the MESOS cluster includes: and matching the collected job information of all the jobs on the job scheduling framework with the available resource information in the MESOS cluster through the plug-in.

According to an embodiment of the present invention, after the job information is successfully matched with the available resource information in the MESOS cluster, synchronizing the resource occupation information of the job to the MESOS cluster, so as to update the available resource information in the MESOS cluster, including: after the job information is successfully matched with the available resource information in the MESOS cluster, the plug-in submits a task to the MESOS cluster according to the resource occupation information, so that the available resource information in the MESOS cluster is updated; and monitoring the running state of the job through the ID number of the job.

According to an embodiment of the present invention, further comprising: and the job scheduling framework updates the state of the task and the release of the resource according to the running state of the job.

According to another aspect of the present invention, an apparatus for integrating a high performance job scheduling framework in a MESOS cluster is provided.

The device for integrating the high-performance job scheduling framework in the MESOS cluster comprises the following steps: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring job information of a job scheduling frame, and the job information comprises resource occupation information of jobs on the job scheduling frame; the matching module is used for matching the operation information with the available resource information in the MESOS cluster; and the updating module is used for synchronizing the resource occupation information of the operation into the MESOS cluster after the operation information is successfully matched with the available resource information in the MESOS cluster, so as to update the available resource information in the MESOS cluster.

According to one embodiment of the invention, the matching module comprises: and the matching submodule is used for matching the collected job information of all the jobs on the job scheduling framework with the available resource information in the MESOS cluster through the plug-in.

According to one embodiment of the invention, the update module comprises: the update submodule is used for submitting a task to the MESOS cluster by the plug-in according to the resource occupation information after the job information is successfully matched with the available resource information in the MESOS cluster, so as to update the available resource information in the MESOS cluster; and the monitoring module is used for monitoring the running state of the operation through the ID number of the operation.

According to an embodiment of the present invention, further comprising: and the updating release module is used for updating the state of the task and releasing the resources according to the running state of the job by the job scheduling framework.

The invention has the beneficial technical effects that:

according to the method and the device, the operation information of the operation scheduling framework is obtained, the operation information is matched with the available resource information in the MESOS cluster, and after the operation information is successfully matched with the available resource information in the MESOS cluster, the resource occupation information of the operation is synchronized into the MESOS cluster, so that the available resource information in the MESOS cluster is updated, and therefore the operation scheduling framework with high performance such as Slurm/PBS (phosphate buffer solution) is integrated in the tasks cluster, high-performance operation can run in the tasks cluster, the resource occupation condition is synchronized into the tasks cluster, the super-fusion scheduling framework is further realized, and the high-performance operation and other operations can run in the same cluster without mutual influence.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow diagram of a method of integrating a high performance job scheduling framework in a MESOS cluster according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a resource allocation of a MESOS cluster according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of integrating a Slurm/PBS in a meso cluster, according to an embodiment of the invention;

FIG. 4 is a flow diagram of a design of a plug-in according to an embodiment of the present invention;

FIG. 5 is a block diagram of an apparatus for integrating a high performance job scheduling framework in a MESOS cluster according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

According to an embodiment of the invention, a method for integrating a high-performance job scheduling framework in a MESOS cluster is provided.

As shown in fig. 1, the method for integrating a high-performance job scheduling framework in a MESOS cluster according to an embodiment of the present invention includes: step S101, acquiring job information of a job scheduling framework, wherein the job information comprises resource occupation information of jobs on the job scheduling framework; step S103, matching the job information with the available resource information in the MESOS cluster; step S105, after the job information is successfully matched with the available resource information in the MESOS cluster, synchronizing the resource occupation information of the job to the MESOS cluster, thereby updating the available resource information in the MESOS cluster.

By means of the technical scheme, the job information of the job scheduling frame is obtained, the job information is matched with the available resource information in the MESOS cluster, and after the job information is successfully matched with the available resource information in the MESOS cluster, the resource occupation information of the job is synchronized into the MESOS cluster, so that the available resource information in the MESOS cluster is updated, high-performance job scheduling frames such as Slurm/PBS and the like are integrated in the tasks cluster, high-performance jobs can run in the tasks cluster and synchronize the resource occupation condition into the tasks cluster, a super-fusion scheduling frame is further realized, and the high-performance jobs and other jobs can run in the same cluster without influencing each other.

In order to better describe the technical solution of the present invention, the following detailed description is made by specific examples.

Heterogeneous computing resource management and scheduling are basic supports for organization and management of the system, are indispensable components and are more important for a super-large-scale system. In addition, in order to realize a super-fusion self-adaptive cluster bottom architecture, various types of jobs are scheduled in a cluster, a Mesos cluster is adopted as a kernel of a DCOS (data center operating system), and the Mesos cluster centrally manages all resources such as memories, CPUs, disks and the like of the cluster, so that the distributed cluster is operated as a single machine.

In addition, in order to establish a high-efficiency service operation environment and a resource utilization rate in a multi-application heterogeneous environment, a multi-strategy distributed scheduling algorithm which is suitable for the multi-application heterogeneous environment is researched, high-level scheduling such as performance balanced resources and the like of automatic environment identification is realized, the resource utilization rate and the application performance are considered, capacity complementary allocation strategies with intensive consumption requirements according to resource capacity indexes are adopted, efficient and fine scheduling operation of various applications is realized according to structural allocation of system topological characteristics, and the realization of various scheduling algorithms is supported in a plug-in mode, so that users are delegated to allocate required resources according to user application resource requirements and an appointed strategy.

In addition, fig. 2 shows a diagram of a Mesos cluster uniform resource deployment scenario, where HPC Portal in fig. 2 represents a Portal of HPC, cafe Portal represents a Portal of convolutional neural network framework, Hadoop Portal represents a Portal of Hadoop, Docker Portal represents a Portal of application container engine, Yarn represents a resource manager of Hadoop, Marathon represents a container layout framework, Zookeeper represents distributed application coordination service, standby Zookeeper represents standby Zookeeper, and HOST represents a virtual machine, HPC JOB represents a work unit of HPC, cafe represents convolutional neural network framework, tensoroflow is a second generation artificial intelligence learning system developed by google based on distebief, Docker represents application container engine, and Spark is a fast general-purpose computing engine designed specifically for large-scale data processing. Meanwhile, in order to construct a super-fusion cluster of big data, high performance and containers, a task scheduling framework with high performance such as Slurm/PBS and the like, a Hadoop big data framework and a Docker management framework Marathon are simultaneously operated on a tasks cluster and submitted, as the Hadoop framework and the Marathon framework are provided with companies for realizing plug-ins and opening sources to a tasks official network, however, in the prior art, a method for integrating Slurm/PBS into the tasks cluster does not exist, so that the method for integrating the task scheduling framework with high performance in the tasks cluster is provided, the task submitted by Slurm/PBS occupies resources and is synchronous with the resources in the tasks resource pool, and the application resources of other computing frameworks are not influenced.

In addition, the invention provides a method for integrating a high-performance job scheduling framework in an MESOS cluster, which realizes synchronization of Slurm/PBS computing resources in an MESOS resource pool (or the MESOS cluster), and builds a super-fusion bottom architecture, so that various types of job resources are not influenced by each other.

Method for integrating Slurm/PBS in MESOS cluster

First, since the churm and the PBS are both HPC job scheduling frameworks, the method of integrating the churm in the MESOS cluster is mainly explained below since the method of integrating the churm or the PBS in the MESOS cluster is basically the same.

Secondly, since the Slurm job can only carry out resource scheduling and job starting through the daemon Slurmcctld, and cannot use the Executor Executor in the MESOS cluster to carry out job starting and running, the invention designs a plug-in or middleware Framework which realizes integration of MESOS and Slurm, so that high-performance job can still carry out scheduling and running through Slurm, and the middleware Framework communicated with the MESOS and Slurm computing Framework checks the running condition of the job, so that if the job runs, the MESOS runs a task which monitors the job corresponding to the Scheduler in the plug-in Framework of Slurm, occupies the resource completely same as the job resource, and the job runs until the job is finished, so that the effect that the job started by Slurm occupies the resource and is synchronized to the MESOS resource pool is achieved, and the submission and running of other ultra-fusion jobs cannot be influenced.

In addition, the implementation process of the method for integrating the high-performance job scheduling framework in the MESOS cluster comprises the following steps: 1. the method for the Slurm to submit the operation is still submitted according to the original mode of a Slurm frame, the scheduling strategy uses the own scheduling strategy of the Slurm, and the operation is started and operated according to the original mode, namely after the Slurm and the MESOS are integrated, the operation is scheduled and operated according to the scheduling strategy of the Slurm frame; 2. after integration, adding an MESOS and a Slurm plug-in Framework, and extracting detailed information of all jobs with an operation state of R (Running) at a management node through a scheduler of the plug-in, so as to obtain information such as an ID (identity) of each job, an operation node, a resource condition occupied by the job and the like; 3. a scheduler of the plug-in matches available resources provided by the MESOS with collected Slurm job information, submits a task according to the resource occupation condition of the job after the matching is successful, the task is started by an Executor, simultaneously runs a job monitoring script of a computing node, transmits a job number to the past, monitors the job state, synchronizes the use condition of each node resource managed by the MESOS cluster with the resource occupied by the Slurm job, and updates the available resource offer information of the MESOS; 4. when the job state changes, the management node Slurm can update the job state in real time, and the MESOS cluster can also update the Task state and release resources according to the Task state. Therefore, by the method, the super-fusion cluster is built, the basic bottom architecture of Hadoop + Marathon + Slurm is completed, the effect that big data jobs, container jobs and high-performance jobs are operated in different types of jobs in the same cluster by applying respective calculation flows under the unified resource management of the MESOS cluster is realized, and meanwhile, the super-fusion cluster building mainly comprises the following steps: building an MESOS cluster basic environment; building a Hadoop cluster foundation environment, and integrating a Framework plug-in of Hadoop; building a Marathon basic environment, building a Docker environment and building a private warehouse; and building a Slurm cluster foundation environment and integrating Slurm Framework plug-ins.

In addition, as shown in fig. 3, the implementation process of the method for integrating the high-performance job scheduling framework in the MESOS cluster is as follows:

1. and the Slurm/PBS submits the operation, selects operation execution nodes according to a scheduling strategy of a high-performance framework, starts the operation and runs.

2. The master node management process of the messos sends the node agent to the available resource information of the master to the Scheduler driver at regular time through an Allocator module (which provides typed memory allocation and object allocation and revocation), and then distributes the part of node information to the schedulers Scheduler of different Framework according to the Hierarchica DRF algorithm.

3. And after receiving the resource condition, the scheduling process of the Framewok executes the jobs.sh script to obtain the operation information which runs on the current Slurm/PBS and comprises the operation ID, the running node and the resource occupation condition of the operation.

4. After the Scheduler of the Framework obtains the operation information, the operation information is matched with the obtained computing node resources of the Mesos, if a certain node has operation, a Mesos task is submitted to the node, and the node occupies the same resources as the high-performance operation.

5. After the operation is submitted, submitting the operation to a media-agent node for specifically executing the operation through a scheduler driver and a media-master, sending operation information to an executorDriver, and driving a method for calling an operation task to start an executor. And starting a script jobmonitor.sh by an executor of the Framework, monitoring the running state of the operation every 3 seconds, and if the operation state is a finished state or no operation, considering that the operation is finished, updating the state of a corresponding task of the mess, and releasing resources.

Second, plug-in implementation

The plug-in Framework of the churm/PBS needs to be implemented by self-coding, the plug-in Framework is implemented by java language coding, and a resource offer method is mainly rewritten, for example, fig. 4 is a flow chart of the design implementation of the plug-in Framework, and the design implementation process of the plug-in Framework is as follows: after the start, registering a plug-in Framework, determining whether the plug-in Framework is successfully registered after the plug-in Framework is successfully registered, ending the process if the plug-in Framework is unsuccessfully registered, providing available resources offer if the plug-in Framework is successfully registered, then obtaining job information of a job in operation by executing a script, then determining whether a job (job or task) in an R state exists according to the obtained job information, ending the process if the job in the R state does not exist, extracting resource occupation information of the job in the R state if the job in the R state exists, then circulating the job resource information, comparing the job resource information with the available resources offer, judging whether the job runs in the same node, and splicing the job running in the same node into a job offer list (a list of the resources occupied by the job), then, it is determined whether there is a job number (or job ID number) in the job MAP object, and if there is a job ID number, the flow is ended, and if there is no job ID number, job offset list is circulated, and a job task is started, the monitored resource and the job resource are matched, then a job is issued, and then a task number (task number or job number) is stored in the job MAP, and then the flow is ended.

According to the embodiment of the invention, the device for integrating the high-performance job scheduling framework in the MESOS cluster is also provided.

As shown in fig. 5, an apparatus for integrating a high-performance job scheduling framework in a MESOS cluster according to an embodiment of the present invention includes: an obtaining module 51, configured to obtain job information of a job scheduling frame, where the job information includes resource occupation information of a job on the job scheduling frame; a matching module 52, configured to match the job information with available resource information in the MESOS cluster; and an updating module 53, configured to synchronize the resource occupation information of the job to the MESOS cluster after the job information is successfully matched with the available resource information in the MESOS cluster, so as to update the available resource information in the MESOS cluster.

According to one embodiment of the present invention, the matching module 52 includes: and a matching submodule (not shown) for matching the collected job information of all the jobs on the job scheduling framework with the available resource information in the MESOS cluster through the plug-in.

According to one embodiment of the invention, the updating module 53 comprises: an update sub-module (not shown) for submitting a task to the MESOS cluster according to the resource occupation information after the job information is successfully matched with the available resource information in the MESOS cluster, so as to update the available resource information in the MESOS cluster; and the monitoring module is used for monitoring the running state of the operation through the ID number of the operation.

According to an embodiment of the present invention, further comprising: and an update release module (not shown) for updating the state of the task and the release of the resource according to the running state of the job by the job scheduling framework.

In summary, according to the technical solution of the present invention, by obtaining the job information of the job scheduling framework, matching the job information with the available resource information in the MESOS cluster, and synchronizing the resource occupation information of the job to the MESOS cluster after the job information is successfully matched with the available resource information in the MESOS cluster, the available resource information in the MESOS cluster is updated, so that a task scheduling framework with high performance such as Slurm/PBS is integrated in the tasks cluster, so that the high performance job can run in the tasks cluster and synchronize the resource occupation condition to the tasks cluster, and a super-fusion scheduling framework is further implemented, so that the high performance job and other jobs can run in the same cluster without affecting each other.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for integrating a high-performance job scheduling framework in a MESOS cluster is characterized by comprising the following steps:

acquiring job information of the job scheduling framework, wherein the job information comprises resource occupation information of jobs on the job scheduling framework;

matching the job information with the available resource information in the MESOS cluster; and

after the job information is successfully matched with the available resource information in the MESOS cluster, synchronizing the resource occupation information of the job to the MESOS cluster, so as to update the available information in the MESOS cluster;

wherein, the integrated framework of the MESOS cluster comprises: Slurm/PBS;

after the job information is successfully matched with the available resource information in the MESOS cluster, synchronizing the resource occupation information of the job to the MESOS cluster, including:

and if a job runs, a scheduler in a Slurm/PBS plug-in corresponding to the MESOS cluster runs a task for monitoring the job and occupies the resource completely same as the job resource until the job runs, so as to synchronize the resource occupied by the job started by Slurm to a resource pool of the MESOS cluster.

2. The method of claim 1, wherein the scheduler in the plug-in of the churm/PBS corresponding to the MESOS cluster runs a task to monitor the job if the job is running, comprising:

if a job runs, the MESOS cluster corresponding to a scheduler in a plug-in developed by the Slurm/PBS extracts detailed information of all jobs in a running state at a management node, so as to obtain an ID (identity), a running node and information of resource conditions occupied by the job of each job, then the scheduler of the plug-in matches available resources provided by the MESOS with the collected Slurm/PBS job information, and submits a task according to the resource conditions occupied by the job after the matching is successful, wherein the task is started by an Executor execluter, simultaneously runs a job monitoring script of a computing node, transmits a job number to the past, and monitors the job state.

3. The method of claim 1, wherein matching the job information to available resource information in the MESOS cluster comprises:

and matching the collected job information of all the jobs on the job scheduling framework with the available resource information in the MESOS cluster through a plug-in.

4. The method of claim 3, wherein after the job information and the available resource information in the MESOS cluster are successfully matched, synchronizing the resource occupancy information of the job to the MESOS cluster, thereby updating the available resource information in the MESOS cluster comprises:

after the job information is successfully matched with the available resource information in the MESOS cluster, the plug-in submits a task to the MESOS cluster according to the resource occupation information, so that the available resource information in the MESOS cluster is updated; and

and monitoring the running state of the operation through the ID number of the operation.

5. The method of claim 4, further comprising:

and the job scheduling framework updates the state of the task and the release of the resource according to the running state of the job.

6. An apparatus for integrating a high performance job scheduling framework in a MESOS cluster, comprising:

the acquisition module is used for acquiring the job information of the job scheduling framework, wherein the job information comprises resource occupation information of jobs on the job scheduling framework;

the matching module is used for matching the operation information with the available resource information in the MESOS cluster; and

an updating module, configured to synchronize the resource occupation information of the job to the MESOS cluster after the job information is successfully matched with the available resource information in the MESOS cluster, so as to update the available resource information in the MESOS cluster;

wherein, the MESOS cluster integration framework comprises: Slurm/PBS;

and if a job runs, the updating module is specifically used for controlling a scheduler in the plug-in of the Slurm/PBS to run a task for monitoring the job and occupy the resource completely same as the job resource until the job runs, so as to synchronize the resource occupied by the job started by the Slurm/PBS to the resource pool of the MESOS cluster.

7. The apparatus according to claim 6, wherein if there is a job running, the update module is specifically configured to control a scheduler in a plug-in developed by the churm/PBS in the MESOS cluster to extract detailed information of all jobs in a running state at a management node, so as to obtain an ID of each job, a running node, and resource condition information occupied by the job, and then the scheduler of the plug-in matches available resources provided by the MESOS with the collected churm/PBS job information, and submits a task according to the resource condition occupied by the job after matching succeeds, where the task is started by an Executor executer, and runs a job monitoring script of the compute node at the same time, and transmits a job number to the past, and monitors a job state.

8. The apparatus of claim 6, wherein the matching module comprises:

and the matching submodule is used for matching the collected job information of all the jobs on the job scheduling framework with the available resource information in the MESOS cluster through a plug-in.

9. The apparatus of claim 8, wherein the update module comprises:

the update submodule is used for submitting a task to the MESOS cluster by the plug-in according to the resource occupation information after the job information is successfully matched with the available resource information in the MESOS cluster, so as to update the available resource information in the MESOS cluster; and

and the monitoring module is used for monitoring the running state of the operation through the ID number of the operation.

10. The apparatus of claim 9, further comprising:

and the updating and releasing module is used for updating the state of the task and releasing the resources according to the running state of the job by the job scheduling framework.