CN104243617A - Task scheduling method and system facing mixed load in heterogeneous cluster - Google Patents

Task scheduling method and system facing mixed load in heterogeneous cluster Download PDF

Info

Publication number
CN104243617A
CN104243617A CN201410543294.2A CN201410543294A CN104243617A CN 104243617 A CN104243617 A CN 104243617A CN 201410543294 A CN201410543294 A CN 201410543294A CN 104243617 A CN104243617 A CN 104243617A
Authority
CN
China
Prior art keywords
task
machine
constraint
soft
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410543294.2A
Other languages
Chinese (zh)
Other versions
CN104243617B (en
Inventor
王旻
张章
汤学海
韩冀中
孟丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201410543294.2A priority Critical patent/CN104243617B/en
Publication of CN104243617A publication Critical patent/CN104243617A/en
Application granted granted Critical
Publication of CN104243617B publication Critical patent/CN104243617B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种异构集群中面向混合负载的任务调度方法及系统,包括以下步骤:资源调度器接收机器心跳,维护机器的属性簇;作业管理器接收并解析作业,获得若干个任务;作业管理器为任务设置属性簇和约束需求,并将任务信息发送给资源管理器;资源调度器为任务匹配满足约束且最优的机器,并将任务与机器的匹配关系返回给作业管理器;作业管理器将任务下发到匹配机器上的执行器上,执行任务。本发明通过一种易拓展的约束描述方法来表示异构化的机器属性和任务需求,在此基础上,将硬约束作为过滤标准,将软约束作为选择标准,为任务分配最优机器,显著提高了任务的执行效率和系统的整体性能。

The present invention relates to a mixed load-oriented task scheduling method and system in a heterogeneous cluster, comprising the following steps: a resource scheduler receives machine heartbeats and maintains machine attribute clusters; a job manager receives and parses jobs to obtain several tasks; The manager sets attribute clusters and constraint requirements for the task, and sends the task information to the resource manager; the resource scheduler matches the task with an optimal machine that meets the constraints, and returns the matching relationship between the task and the machine to the job manager; The manager sends the task to the executor on the matching machine to execute the task. The present invention expresses heterogeneous machine attributes and task requirements through an easily expandable constraint description method. On this basis, hard constraints are used as filtering criteria and soft constraints are used as selection criteria to allocate optimal machines for tasks, which is significantly The execution efficiency of tasks and the overall performance of the system are improved.

Description

一种异构集群中面向混合负载的任务调度方法及系统A mixed load-oriented task scheduling method and system in a heterogeneous cluster

技术领域technical field

本发明涉及一种异构集群中面向混合负载的任务调度方法及系统,属于计算机并行计算领域。The invention relates to a mixed load-oriented task scheduling method and system in a heterogeneous cluster, belonging to the field of computer parallel computing.

背景技术Background technique

近年来,集群机器表现出日益显著的异构化特征。现代集群往往规模较大,运行周期较长,甚至可能分布于不同的地理位置。在整个生命周期中,集群常常需要更新机器。此外,在集群整合的场景下,集群管理员可能将若干个不同批次的小型集群整合为一个大型集群。考虑到上述情况,集群的硬件和软件很可能存在一定差异。In recent years, cluster machines have shown increasingly significant heterogeneity. Modern clusters tend to be large, run for long periods of time, and may even be geographically distributed. Throughout its lifecycle, clusters often need to update machines. In addition, in the cluster integration scenario, the cluster administrator may integrate several small clusters of different batches into one large cluster. Given the above, it is likely that there are some differences in the hardware and software of the cluster.

另一方面,随着云计算的不断发展,在同一集群上运行混合负载也已经成为一种趋势,这具有诸多好处,例如提高资源利用率、共享数据、降低运维成本等。具体来说,集群上可能运行着多种不同类型的任务,包括科学计算、大规模的数据分析、长时间运行的互联网服务,以及软件开发测试等。On the other hand, with the continuous development of cloud computing, it has become a trend to run mixed loads on the same cluster, which has many benefits, such as improving resource utilization, sharing data, and reducing operation and maintenance costs. Specifically, many different types of tasks may be running on the cluster, including scientific computing, large-scale data analysis, long-running Internet services, and software development and testing.

任务调度是指为任务分配资源,即将任务放置到机器上。可以说任务和机器是任务调度中两个重要角色。在传统应用场景中,任务和机器都是单一同构的,因此调度只需考虑基本的资源需求,例如基于slot的资源分配(云数据中心虚拟资源管理研究综述[J].计算机应用研究,2012,29(7):2411-2415.)。但是随着集群和负载的不断异构化,机器属性和任务需求发生了很大变化,并非所有的机器都能满足任务的约束需求,任务调度必须考虑各种约束:例如图像处理任务必须运行在具有GPU的机器上,一些任务只能运行在特定内核版本的机器上,数据分析任务应优先运行在存储相关数据的机器上等等。目前,考虑约束的任务调度是该领域的重要挑战,如何通过一种可拓展的方式来描述种类繁多的约束,以及如何根据约束将任务调度到最优的机器上,已经成为异构环境中任务调度的关键问题。Task scheduling refers to assigning resources to tasks, that is, placing tasks on machines. It can be said that tasks and machines are two important roles in task scheduling. In traditional application scenarios, tasks and machines are single and isomorphic, so scheduling only needs to consider basic resource requirements, such as slot-based resource allocation (a review of cloud data center virtual resource management research [J]. Computer Application Research, 2012 , 29(7):2411-2415.). However, with the continuous heterogeneity of clusters and loads, machine attributes and task requirements have changed greatly. Not all machines can meet the task constraints. Task scheduling must consider various constraints: for example, image processing tasks must run on On a machine with a GPU, some tasks can only run on a machine with a specific kernel version, and data analysis tasks should be run first on machines that store related data, etc. At present, task scheduling considering constraints is an important challenge in this field. How to describe a wide variety of constraints in an expandable way, and how to schedule tasks to the optimal machine according to the constraints, has become a task in heterogeneous environments. key issues of scheduling.

一些研究成果和开源软件开始关注考虑约束的任务调度。Hadoop YARN在进行任务调度时会考虑数据本地性这一约束,即优先将任务调度到存放相应数据的机器上(参见Vavilapalli V K,Murthy A C,Douglas C,et al.Apache hadoop yarn:Yet another resourcenegotiator[C]//Proceedings of the 4th annual Symposium on Cloud Computing.ACM,2013:5.)。流式计算框架Storm会优先将频繁通信的任务调度到相同或相近的机器上(参见Aniello L,Baldoni R,Querzoni L.Adaptive online scheduling in storm[C]//Proceedings of the7th ACM international conference on Distributed event-based systems.ACM,2013:207-218.)。Spark在DAG计算场景中,会将后序任务尽量调度到前序任务输出数据所在的机器上(参见The Apache Software Foundation.Spark Lightning-fast clustercomputing[EB/OL].(2012-1-10)[2014-9-5].https://spark.apache.org/.)。在数据本地性和公平性存在冲突的场景下,可以使用延迟调度进行权衡,这种方法取得了较好的效果(参见Zaharia M,Borthakur D,Sen Sarma J,et al.Delay scheduling:a simple technique forachieving locality and fairness in cluster scheduling[C]//Proceedings of the 5th Europeanconference on Computer systems.ACM,2010:265-278.)。上述研究考虑了一些具体的约束特例,但是约束种类是多样化的,实际工作中的集群调度器通常需要调度不同种类的任务、处理各种各样的约束。上述研究没有充分考虑任务与机器的异构化特征,没有提出一种可拓展的约束调度机制。Some research results and open source software began to focus on task scheduling considering constraints. Hadoop YARN will consider the constraint of data locality when scheduling tasks, that is, it will preferentially schedule tasks to the machines that store the corresponding data (see Vavilapalli V K, Murthy A C, Douglas C, et al. Apache hadoop yarn: Yet another resource negotiator[C]//Proceedings of the 4th annual Symposium on Cloud Computing.ACM,2013:5.). The streaming computing framework Storm will prioritize frequent communication tasks to the same or similar machines (see Aniello L, Baldoni R, Querzoni L.Adaptive online scheduling in storm[C]//Proceedings of the 7th ACM international conference on Distributed event -based systems. ACM, 2013:207-218.). In the DAG computing scenario, Spark will try to schedule the subsequent tasks to the machine where the output data of the previous tasks are located (see The Apache Software Foundation.Spark Lightning-fast clustercomputing[EB/OL].(2012-1-10)[ 2014-9-5].https://spark.apache.org/.). In scenarios where data locality and fairness conflict, delay scheduling can be used to make a trade-off, which has achieved better results (see Zaharia M, Borthakur D, Sen Sarma J, et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling[C]//Proceedings of the 5th European conference on Computer systems. ACM,2010:265-278.). The above research considers some specific special cases of constraints, but the types of constraints are diverse, and cluster schedulers in actual work usually need to schedule different types of tasks and deal with various constraints. The above studies did not fully consider the heterogeneous characteristics of tasks and machines, and did not propose a scalable constraint scheduling mechanism.

目前通用的任务调度方法仍然默认任务和机器是同构的,在调度过程中不考虑约束,仅考虑基本的资源匹配。但是,由于任务和机器的异构化,任务调度必须考虑各种约束。现有方法不能描述种类繁多的约束,也不能根据约束将任务调度到最优的机器上,这会导致任务无法正常执行或者运行时间明显变长,严重影响了任务的执行效率和任务调度系统的整体性能。The current general task scheduling method still assumes that tasks and machines are isomorphic. In the scheduling process, constraints are not considered, and only basic resource matching is considered. However, due to the heterogeneity of tasks and machines, task scheduling must consider various constraints. Existing methods cannot describe a wide variety of constraints, nor can they schedule tasks to the optimal machine according to the constraints, which will lead to abnormal execution of tasks or significantly longer running time, which seriously affects the execution efficiency of tasks and the performance of task scheduling systems. overall performance.

发明内容Contents of the invention

本发明的技术解决问题:克服现有技术的不足,提供一种异构集群中面向混合负载的任务调度方法及系统,考虑约束的任务调度,可以根据约束将任务调度最优机器上,提高了任务的执行效率和系统整体性能。The technology of the present invention solves the problem: overcomes the deficiencies of the prior art, and provides a mixed-load-oriented task scheduling method and system in a heterogeneous cluster. Considering the task scheduling of constraints, tasks can be scheduled on the optimal machine according to the constraints, improving the Task execution efficiency and overall system performance.

本发明技术解决方案:一种异构集群中面向混合负载的任务调度方法,包括以下步骤:Technical solution of the present invention: a task scheduling method for mixed loads in heterogeneous clusters, comprising the following steps:

步骤1,资源调度器接收机器心跳,维护机器的属性簇;所述机器心跳是由执行器定时发送给资源调度器的、心跳内容为机器的属性簇;Step 1, the resource scheduler receives the machine heartbeat and maintains the attribute cluster of the machine; the machine heartbeat is regularly sent to the resource scheduler by the executor, and the content of the heartbeat is the attribute cluster of the machine;

步骤2,作业管理器接收并解析作业,获得若干个任务;Step 2, the job manager receives and parses the job, and obtains several tasks;

步骤3,作业管理器为任务设置属性簇和约束需求,然后将任务信息发送给资源调度器;Step 3, the job manager sets attribute clusters and constraint requirements for the task, and then sends the task information to the resource scheduler;

步骤4,资源调度器收到任务信息后,为任务匹配满足约束且最优的机器,并将任务与机器的匹配关系返回给作业管理器;Step 4. After receiving the task information, the resource scheduler matches the task with the optimal machine that satisfies the constraints, and returns the matching relationship between the task and the machine to the job manager;

步骤5,作业管理器接收到任务与机器的匹配关系后,将任务下发到匹配机器上的执行器上,执行任务。Step 5: After receiving the matching relationship between the task and the machine, the job manager sends the task to the executor on the matching machine to execute the task.

进一步地,所述步骤1提到的机器的属性簇包括多个键值(Key-Value)对,其中键表示机器属性,值则表示属性的具体值,其中属性包括机器主机名、IP地址、机器类型、机器架构、操作系统、CPU总数、内存总量、可用CPU、可用内存量、约束估值等。Further, the attribute cluster of the machine mentioned in step 1 includes a plurality of key-value (Key-Value) pairs, wherein the key represents the machine attribute, and the value represents the specific value of the attribute, wherein the attribute includes the machine host name, IP address, Machine type, machine architecture, operating system, total CPU, total memory, available CPU, available memory, constraint valuation, etc.

进一步地,所述步骤3提到的约束需求包括硬约束和软约束。硬约束是任务执行的必要条件,在调度过程中必须得到满足,对于硬约束的处理属于定性分析。软约束是任务执行的偏好条件,应尽量满足以提升任务执行效率,不过如果无法满足则可以忽略,以免造成资源浪费和任务执行的延迟,对软约束的处理属于定量分析。Further, the constraint requirements mentioned in step 3 include hard constraints and soft constraints. Hard constraints are necessary conditions for task execution and must be met during the scheduling process. The processing of hard constraints belongs to qualitative analysis. Soft constraints are preference conditions for task execution, which should be satisfied as much as possible to improve task execution efficiency, but if they cannot be satisfied, they can be ignored to avoid waste of resources and delays in task execution. The processing of soft constraints is a quantitative analysis.

进一步地,所述步骤3具体包括以下步骤:Further, the step 3 specifically includes the following steps:

步骤3.1:为任务设置易拓展的属性簇,属性簇包括多个键值(Key-Value)对,其中键表示任务的属性,值则表示属性的具体值,任务的属性簇包括任务标示、执行命令、所需CPU资源、所需内存资源等;Step 3.1: Set an easy-to-extend attribute cluster for the task. The attribute cluster includes multiple key-value (Key-Value) pairs, where the key represents the attribute of the task, and the value represents the specific value of the attribute. The attribute cluster of the task includes task labeling, execution Commands, required CPU resources, required memory resources, etc.;

步骤3.2:为任务设置硬约束需求,通过一个布尔表达式来表示任务的硬约束需求,如果存在多个硬约束,则将它们做“与运算”,仍然可以通过一个布尔表达式来表示多个硬约束需求;Step 3.2: Set hard constraint requirements for the task, and express the hard constraint requirements of the task through a Boolean expression. If there are multiple hard constraints, perform an "AND operation" on them, and still use a Boolean expression to represent multiple Hard constraint requirements;

步骤3.3:为任务设置软约束需求,通过软约束需求链表来表示任务的多个软约束需求,链表中包含若干个元素,每个元素包括一个布尔表达式和一个估值,布尔表达式表明具体的软约束需求,估值用以量化满足软约束需求带来的执行效率的提升;Step 3.3: Set the soft constraint requirements for the task. Multiple soft constraint requirements of the task are represented by the soft constraint requirement linked list. The linked list contains several elements, and each element includes a Boolean expression and an estimate. The Boolean expression indicates the specific Soft constraint requirements, the valuation is used to quantify the improvement in execution efficiency brought about by meeting the soft constraint requirements;

步骤3.4:作业管理器将任务的属性簇和软硬约束需求发送给资源调度器,请求分配机器。Step 3.4: The job manager sends the task's attribute clusters and soft and hard constraint requirements to the resource scheduler, requesting machine allocation.

进一步地,所述步骤4具体包括以下步骤:Further, the step 4 specifically includes the following steps:

步骤4.1:将接收到的任务记为“待调度任务”,初始化机器集合M,将所有机器放入M中,初始化备选机器列表为空;Step 4.1: Record the received task as "task to be scheduled", initialize the machine set M, put all machines into M, and initialize the candidate machine list to be empty;

步骤4.2:从机器集合M中取出一个机器,记为“备选机器”,根据机器和任务的信息,计算得到待调度任务硬约束需求的值;Step 4.2: Take a machine from the machine set M and record it as a "candidate machine". According to the information of the machine and the task, calculate the value of the hard constraint requirement of the task to be scheduled;

步骤4.3:判断待调度任务的硬约束需求是否为真,如果为真,则将备选机器加入到备选机器列表中,并根据机器信息和任务的软约束链表,计算得到备选机器的约束估值;Step 4.3: Determine whether the hard constraint requirement of the task to be scheduled is true. If it is true, add the candidate machine to the list of candidate machines, and calculate the constraint of the candidate machine according to the machine information and the soft constraint list of the task Valuation;

步骤4.4:从机器集合M去除备选机器,判断机器集合M是否为空,不为空则转至步骤4.2;Step 4.4: Remove the candidate machines from the machine set M, and judge whether the machine set M is empty, if not, go to step 4.2;

步骤4.5:以约束估值为标准,在备选机器列表中选择约束估值最大的机器,记为待调度任务的匹配机器。Step 4.5: Taking the constraint valuation as the standard, select the machine with the largest constraint valuation in the candidate machine list, and record it as the matching machine for the task to be scheduled.

进一步地,所述步骤4.3中,计算备选机器的约束估值时进一步包括:Further, in the step 4.3, the calculation of the constraint estimate of the candidate machine further includes:

将备选机器的约束估值初始化为0;Initialize the constrained estimate of the candidate machine to 0;

遍历待调度任务的软约束链表,对于每个元素,计算得到其软约束需求,如果软约束需求为真,则当前的约束估值加上该软约束元素的估值,最后得到备选机器的约束估值。Traversing the soft constraint linked list of tasks to be scheduled, for each element, calculate its soft constraint requirement, if the soft constraint requirement is true, add the current constraint estimate to the soft constraint element's estimate, and finally get the candidate machine Constrained valuation.

为解决上述技术问题,本发明还提出了一种异构集群中面向混合负载的任务调度系统,包括作业管理器,资源调度器和执行器;In order to solve the above technical problems, the present invention also proposes a mixed load-oriented task scheduling system in a heterogeneous cluster, including a job manager, a resource scheduler and an executor;

所述作业管理器和资源调度器部署在主控节点上,作业管理器用于管理作业和任务,为任务设置属性簇和软硬约束需求,并将任务信息发送给资源调度器,请求任务所需的机器;The job manager and resource scheduler are deployed on the master control node, and the job manager is used to manage jobs and tasks, set attribute clusters and soft and hard constraint requirements for tasks, and send task information to the resource scheduler, requesting tasks required machine;

资源调度器用于接收执行器定时发送的机器心跳,在维护整个集群机器心跳的基础上,资源调度器可以接收作业管理器发送的任务信息,为任务匹配满足约束且最优的机器;The resource scheduler is used to receive the machine heartbeat sent regularly by the executor. On the basis of maintaining the heartbeat of the entire cluster machine, the resource scheduler can receive the task information sent by the job manager, and match the task with the optimal machine that meets the constraints;

所述执行器部署在除主控节点外的其他所有机器上,定时向资源调度器上报机器心跳,并接收作业管理器下发的任务指令,负责具体执行任务。The executor is deployed on all machines except the main control node, regularly reports the heartbeat of the machine to the resource scheduler, receives task instructions issued by the job manager, and is responsible for specific task execution.

本发明与现有技术相比的优点在于:The advantage of the present invention compared with prior art is:

(1)本发明提出的任务调度方法及系统,通过一种易拓展的约束描述方法来表示异构化的机器属性和任务需求,在此基础上,区别对待硬约束和软约束,将硬约束作为过滤标准,将软约束作为选择标准,为任务匹配满足硬约束且最优的机器。本发明在任务调度过程中综合考虑各种约束,显著提高了任务的执行效率和系统的整体性能。(1) The task scheduling method and system proposed in the present invention represent heterogeneous machine attributes and task requirements through an easily expandable constraint description method. On this basis, hard constraints and soft constraints are treated differently, and hard constraints As a filtering criterion, the soft constraint is used as a selection criterion to match the task with an optimal machine that satisfies the hard constraint. The invention comprehensively considers various constraints in the task scheduling process, and significantly improves the task execution efficiency and the overall performance of the system.

(2)测试了满足约束的任务调度和无视约束的任务调度策略下的任务执行效率,以此来验证本发明提出的考虑约束的任务调度方法的有效性。图11记录了虚拟机应用场景下的任务启动时间,数据显示,满足约束的任务启动时间明显短于无视约束的任务启动时间,具体的加速比与镜像大小有关,在本组实验中为6.91到24.18不等。图12记录了任务间相互通信的应用场景下的任务完成时间,数据显示,满足约束的任务完成时间同样明显短于无视约束的任务完成时间,具体的加速比与数据规模、网络状态有关,在本组实验中约为2.25。总体而言,本发明提出的任务调度方法及系统可以处理多种约束情况,并显著提高任务执行效率。(2) The task execution efficiency under the task scheduling that satisfies the constraints and the task scheduling strategy that ignores the constraints is tested, so as to verify the effectiveness of the task scheduling method that considers the constraints proposed by the present invention. Figure 11 records the task startup time in the virtual machine application scenario. The data shows that the task startup time that satisfies the constraints is significantly shorter than the task startup time that ignores the constraints. The specific speedup ratio is related to the size of the image. 24.18 varies. Figure 12 records the task completion time in the application scenario where tasks communicate with each other. The data shows that the task completion time that satisfies the constraints is also significantly shorter than the task that ignores the constraints. The specific speedup ratio is related to the data scale and network status. It is about 2.25 in this group of experiments. Generally speaking, the task scheduling method and system proposed by the present invention can handle various constraints and significantly improve task execution efficiency.

附图说明Description of drawings

图1为本发明实施例中任务调度方法及系统的原理示意图;FIG. 1 is a schematic diagram of the principle of a task scheduling method and system in an embodiment of the present invention;

图2为本发明实施例中任务调度方法的流程图;Fig. 2 is the flow chart of task scheduling method in the embodiment of the present invention;

图3为本发明实施例中机器属性簇的示意图;FIG. 3 is a schematic diagram of a machine attribute cluster in an embodiment of the present invention;

图4为本发明实施例中设置任务属性簇和约束需求的流程图;Fig. 4 is a flowchart of setting task attribute clusters and constraint requirements in an embodiment of the present invention;

图5为本发明实施例中任务属性簇的示意图;5 is a schematic diagram of a task attribute cluster in an embodiment of the present invention;

图6为本发明实施例中任务硬约束的示意图;FIG. 6 is a schematic diagram of task hard constraints in an embodiment of the present invention;

图7为本发明实施例中任务软约束链表的示意图;FIG. 7 is a schematic diagram of a linked list of task soft constraints in an embodiment of the present invention;

图8为本发明实施例中为任务分配最优机器的流程图;Fig. 8 is a flow chart of assigning an optimal machine for a task in an embodiment of the present invention;

图9为本发明实施例中计算任务硬约束的示意图;FIG. 9 is a schematic diagram of hard constraints of computing tasks in an embodiment of the present invention;

图10为本发明实施例中计算机器约束估值的示意图;Fig. 10 is a schematic diagram of computing machine constraint estimates in an embodiment of the present invention;

图11为本发明实施例中虚拟机应用场景下的任务启动时间;Fig. 11 is the task startup time in the virtual machine application scenario in the embodiment of the present invention;

图12为本发明实施例中任务间相互通信应用场景下的任务完成时间。FIG. 12 shows the task completion time in the application scenario of inter-task communication in the embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图和实施例对本发明的原理和特征进行描述,所举实例只用于解释本发明,并非用于限定本发明的范围。The principles and features of the present invention will be described below in conjunction with the accompanying drawings and embodiments, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

如图1所示,本发明实施例实现了一个运行在异构集群上的面向混合负载的任务调度系统,该系统采用典型的主从(Master-Slave)架构,主控部分(Master)包括两个核心进程作业管理器(Jobs Manager)和资源调度器(Resource Scheduler),二者部署在主控物理节点上。从部分(Slave)包括一个核心进程执行器(Executor),部署在主控物理节点以外的其他所有机器上。As shown in Figure 1, the embodiment of the present invention implements a mixed load-oriented task scheduling system running on a heterogeneous cluster, the system adopts a typical master-slave (Master-Slave) A core process Job Manager (Jobs Manager) and Resource Scheduler (Resource Scheduler), both deployed on the master physical node. The slave part (Slave) includes a core process executor (Executor), which is deployed on all other machines except the master physical node.

作业管理器负责管理作业与任务,一个作业包括若干个任务,一组作业ID和任务ID可以唯一标示一个任务。作业管理器接收并解析用户提交的作业,根据作业参数为任务设置属性簇和软硬约束,并将任务信息发送给资源调度器,请求分配所需的机器。在获得匹配机器后,作业管理器再将任务下发到指定机器上,监控任务的执行状态并进行容错。The job manager is responsible for managing jobs and tasks. A job includes several tasks, and a set of job IDs and task IDs can uniquely identify a task. The job manager receives and parses the job submitted by the user, sets attribute clusters and soft and hard constraints for the task according to the job parameters, and sends the task information to the resource scheduler to request the allocation of the required machines. After obtaining the matching machine, the job manager sends the task to the designated machine, monitors the execution status of the task and implements fault tolerance.

资源调度器负责接收执行器定时发送的机器心跳(Heartbeat)信息,这些心跳信息包含着机器的属性簇。在维护整个集群机器属性簇的基础上,资源调度器可以接收作业管理器发送的任务信息,为任务分配满足约束且最优的机器,并将任务与机器的匹配关系作为结果,返回给作业调度器。The resource scheduler is responsible for receiving the machine heartbeat (Heartbeat) information regularly sent by the executor, which contains the attribute clusters of the machine. On the basis of maintaining the attribute cluster of the entire cluster machine, the resource scheduler can receive the task information sent by the job manager, allocate the optimal machine that satisfies the constraints for the task, and return the matching relationship between the task and the machine as the result to the job scheduler device.

执行器负责接收作业管理器的指令,启动执行任务,向作业管理器报告任务状态;另一方面,执行器定时将该机器的属性簇通过心跳形式上报给资源调度器。对于每个任务,执行器首先创建一个虚拟化环境,然后在虚拟化环境内部执行任务。The executor is responsible for receiving the instructions from the job manager, starting the execution task, and reporting the task status to the job manager; on the other hand, the executor regularly reports the attribute cluster of the machine to the resource scheduler in the form of heartbeat. For each task, the executor first creates a virtualized environment, and then executes the task inside the virtualized environment.

如图2所示,本实施例中,任务调度方法可以包括如下步骤:As shown in Figure 2, in this embodiment, the task scheduling method may include the following steps:

步骤201,资源调度器接收机器心跳,维护机器的属性簇;Step 201, the resource scheduler receives the heartbeat of the machine, and maintains the attribute cluster of the machine;

步骤202,作业管理器接收并解析作业,获得若干个任务;Step 202, the job manager receives and parses the job to obtain several tasks;

步骤203,作业管理器为任务设置属性簇和约束需求,然后将任务信息发送给资源管理器;Step 203, the job manager sets attribute clusters and constraint requirements for the task, and then sends the task information to the resource manager;

步骤204,资源调度器收到任务信息后,为任务匹配满足约束且最优的机器,并将任务与机器的匹配关系返回给作业管理器;Step 204, after receiving the task information, the resource scheduler matches the task with an optimal machine that satisfies the constraints, and returns the matching relationship between the task and the machine to the job manager;

步骤205,作业管理器接收到任务与机器的匹配关系后,将任务下发到匹配机器上的执行器上,执行任务。Step 205, after receiving the matching relationship between the task and the machine, the job manager sends the task to the executor on the matching machine to execute the task.

图3为本发明实施例中机器属性簇的示意图。图3表明了某台机器的属性簇及相应的值,机器的属性簇是机器心跳的具体内容,由执行器定时发送给资源调度器。属性簇包括多个键值(Key-Value)对,其中键表示机器属性,值则表示属性的具体值,其中机器属性包括机器主机名、IP地址、机器类型、机器架构、操作系统、CPU总数、内存总量、可用CPU、可用内存量、约束估值等,表1列举了机器的属性簇。在图3所示例子中,该机器的主机名为“Blade10”、IP地址为“192.168.1.160”、机器类型为“A”,机器架构为“X86_64”,此外图3还表明了机器的其他属性。Fig. 3 is a schematic diagram of machine attribute clusters in an embodiment of the present invention. Figure 3 shows the attribute clusters and corresponding values of a certain machine. The attribute clusters of a machine are the specific content of the heartbeat of the machine, which is sent to the resource scheduler by the executor at regular intervals. The attribute cluster includes multiple key-value (Key-Value) pairs, where the key represents the machine attribute, and the value represents the specific value of the attribute, where the machine attribute includes the machine host name, IP address, machine type, machine architecture, operating system, and total number of CPUs , total memory, available CPU, available memory, constraint valuation, etc. Table 1 lists the attribute clusters of the machine. In the example shown in Figure 3, the host name of the machine is "Blade10", the IP address is "192.168.1.160", the machine type is "A", and the machine architecture is "X86_64". Attributes.

表1 机器的属性簇Table 1 The attribute cluster of the machine

属性名attribute name 说明illustrate 数据类型type of data ATTR_MACHINEATTR_MACHINE 机器主机名machine hostname stringstring ATTR_IPATTR_IP IP地址IP address stringstring ATTR_TYPEATTR_TYPE 机器类型machine type stringstring ATTR_ARCHATTR_ARCH 机器架构machine architecture stringstring ATTR_OSATTR_OS 操作系统operating system stringstring ATTR_TOTAL_CPUATTR_TOTAL_CPU CPU总数Total number of CPUs doubledouble ATTR_TOTAL_MEMATTR_TOTAL_MEM 内存总量total memory intint ATTR_AVAIL_CPUATTR_AVAIL_CPU 可用CPUAvailable CPU doubledouble ATTR_AVAIL_MEMATTR_AVAIL_MEM 可用内存量Amount of memory available intint ATTR_AVG_LOADATTR_AVG_LOAD 平均负载average load doubledouble CON_VALUECON_VALUE 约束估值constrained valuation intint

图4为本发明实施例中设置任务属性簇和约束需求的流程图。特别说明一点,约束需求包括硬约束和软约束。硬约束是任务执行的必要条件,在调度过程中必须得到满足,对于硬约束的处理属于定性分析。软约束是任务执行的偏好条件,应尽量满足以提升任务执行效率,不过如果无法满足则可以忽略,以免造成资源浪费和任务执行的延迟,对软约束的处理属于定量分析。如图4所示,作业管理器设置任务属性簇和约束需求的步骤如下:Fig. 4 is a flowchart of setting task attribute clusters and constraint requirements in an embodiment of the present invention. In particular, constraint requirements include hard constraints and soft constraints. Hard constraints are necessary conditions for task execution and must be met during the scheduling process. The processing of hard constraints belongs to qualitative analysis. Soft constraints are preference conditions for task execution, which should be satisfied as much as possible to improve task execution efficiency, but if they cannot be satisfied, they can be ignored to avoid waste of resources and delays in task execution. The processing of soft constraints is a quantitative analysis. As shown in Figure 4, the steps for the job manager to set task attribute clusters and constraint requirements are as follows:

步骤401:为任务设置易拓展的属性簇,属性簇包括多个键值对,其中键表示任务的属性,值则表示属性的具体值,具体包括任务标示、执行命令、所需资源等;Step 401: Set an easily expandable attribute cluster for the task. The attribute cluster includes multiple key-value pairs, where the key represents the attribute of the task, and the value represents the specific value of the attribute, specifically including the task label, execution command, required resources, etc.;

步骤402:为任务设置硬约束需求,通过一个布尔表达式来表示任务的硬约束需求;Step 402: setting hard constraint requirements for the task, expressing the hard constraint requirements of the task through a Boolean expression;

进一步地,任务可能存在多个硬约束,可以直接将多个硬约束做“与运算”,这样仍然可以通过一个布尔表达式来表示多个硬约束需求;Furthermore, there may be multiple hard constraints in the task, and multiple hard constraints can be directly "ANDed", so that multiple hard constraint requirements can still be expressed through a Boolean expression;

步骤403:为任务设置软约束需求,通过软约束需求链表来表示任务的多个软约束需求,链表中包含若干个元素,每个元素包括一个布尔表达式和一个估值,其中布尔表达式表明具体的软约束需求,估值用以量化满足软约束需求带来的执行效率的提升。Step 403: Set the soft constraint requirements for the task, and express the multiple soft constraint requirements of the task through the soft constraint requirement linked list. The linked list contains several elements, and each element includes a Boolean expression and an estimate, where the Boolean expression indicates For specific soft constraint requirements, the valuation is used to quantify the improvement in execution efficiency brought about by meeting the soft constraint requirements.

图5为本发明实施例中任务属性簇的示意图。图5表明了某任务的属性簇及相应的值,任务的属性簇由作业管理器负责设置。与机器属性簇类似,任务的属性簇也包括多个键值(Key-Value)对,其中键表示任务属性,值则表示属性的具体值,其中任务属性包括作业ID、任务ID、虚拟化类型、执行命令、所需CPU、所需内存等,表2列举了任务的属性簇和约束需求。在图5所示例子中,该任务的作业ID为1、任务ID为2,虚拟化类型为“KVM”、执行命令为“run.sh”、所需CPU为2、所需内存为2048(MB)。FIG. 5 is a schematic diagram of task attribute clusters in an embodiment of the present invention. Figure 5 shows the attribute cluster and the corresponding value of a certain task, and the attribute cluster of the task is set by the job manager. Similar to the machine attribute cluster, the task attribute cluster also includes multiple key-value (Key-Value) pairs, where the key represents the task attribute, and the value represents the specific value of the attribute, where the task attribute includes job ID, task ID, virtualization type , execution command, required CPU, required memory, etc. Table 2 lists the attribute clusters and constraint requirements of the task. In the example shown in Figure 5, the job ID of the task is 1, the task ID is 2, the virtualization type is "KVM", the execution command is "run.sh", the required CPU is 2, and the required memory is 2048 ( MB).

表2 任务的属性簇和约束需求Table 2 Attribute clusters and constraint requirements of tasks

属性/约束名attribute/constraint name 说明illustrate 数据类型type of data ATTR_JOB_IDATTR_JOB_ID 作业IDjob id intint ATTR_TASK_IDATTR_TASK_ID 任务IDtask ID intint ATTR_VMTYPEATTR_VMTYPE 虚拟化类型virtualization type stringstring ATTR_EXE_PATHATTR_EXE_PATH 执行命令Excuting an order stringstring ATTR_NEED_CPUATTR_NEED_CPU 所需CPURequired CPU doubledouble ATTR_NEED_MEMATTR_NEED_MEM 所需内存required memory intint HARD_CONSTRAINTHARD_CONSTRAINT 硬约束需求hard constraints bool表达式boolean expression SOFT_CON_LISTSOFT_CON_LIST 软约束链表soft constraint linked list 链表linked list

图6为本发明实施例中任务硬约束的示意图。硬约束是任务执行的必要条件,其处理结果只能是满足或者不满足两种情况,因此处理硬约束属于定性分析。一个任务可能存在多个硬约束,我们可以直接对其做“与运算”。Fig. 6 is a schematic diagram of task hard constraints in an embodiment of the present invention. Hard constraints are a necessary condition for task execution, and the processing results can only be satisfied or not satisfied, so dealing with hard constraints belongs to qualitative analysis. A task may have multiple hard constraints, and we can directly perform "AND operations" on them.

在图6所示例子中,任务具有四个硬约束,每个硬约束都可以通过一个布尔表达式来表示。其中,“ATTR_AVAIL_CPU>=ATTR_NEED_CPU”表示机器当前可用的CPU应当大于等于任务所需的CPU,“ATTR_AVAIL_MEM>=ATTR_NEED_MEM”表示机器当前可用的内存应当大于等于任务所需的内存,“ATTR_ARCH==X86_64”表示机器的架构应当是“X86_64”,“ATTR_OS==Centos 6.3”表示机器的操作系统应当是“Centos6.3”。四个硬约束中的前两个属于资源层面的约束需求,保证机器中包含任务所需的资源;后两个则属于非资源层面的约束需求。最后,我们可以将任务的四个硬约束直接相与,得到HARD_CONSTRAINT=(ATTR_AVAIL_CPU>=ATTR_NEED_CPU)&&(ATTR_AVAIL_MEM>=ATTR_NEED_MEM)&&(ATTR_ARCH==X86_64)&&(ATTR_OS==Centos6.3),这样仍然可以通过一个布尔表达式来表示任务的多个硬约束需求。In the example shown in Figure 6, the task has four hard constraints, each of which can be represented by a Boolean expression. Among them, "ATTR_AVAIL_CPU>=ATTR_NEED_CPU" indicates that the currently available CPU of the machine should be greater than or equal to the CPU required by the task, "ATTR_AVAIL_MEM> = ATTR_NEED_MEM" indicates that the current available memory of the machine should be greater than or equal to the memory required by the task, "ATTR_ARCH == X86_64" Indicates that the architecture of the machine should be "X86_64", and "ATTR_OS==Centos 6.3" indicates that the operating system of the machine should be "Centos6.3". The first two of the four hard constraints belong to the constraint requirements at the resource level, ensuring that the machine contains the resources required for the task; the latter two belong to the constraint requirements at the non-resource level. Finally, we can directly AND the four hard constraints of the task to get HARD_CONSTRAINT=(ATTR_AVAIL_CPU>=ATTR_NEED_CPU)&&(ATTR_AVAIL_MEM>=ATTR_NEED_MEM)&&(ATTR_ARCH==X86_64)&&(ATTR_OS==Centos6.3), so still Multiple hard constraint requirements of a task can be represented by a Boolean expression.

图7为本发明实施例中任务软约束链表的示意图。软约束是任务执行的偏好条件,应当尽量得到满足,但不是强制性需求。对于软约束的处理并非只有满足或不满足两种情况,应当充分考虑多个软约束的满足程度,以及满足各个软约束对任务执行带来的性能提升,因此对于软约束的处理属于定量分析。本发明通过软约束需求链表来表示任务的多个软约束需求,链表中包含若干个元素,每个元素包括一个布尔表达式和一个估值,其中布尔表达式表明具体的软约束需求,估值用以量化满足软约束需求带来的执行效率的提升。FIG. 7 is a schematic diagram of a linked list of task soft constraints in an embodiment of the present invention. Soft constraints are preference conditions for task execution and should be met as much as possible, but they are not mandatory requirements. The processing of soft constraints is not limited to the two cases of satisfaction or dissatisfaction. The degree of satisfaction of multiple soft constraints and the performance improvement brought about by satisfying each soft constraint should be fully considered. Therefore, the processing of soft constraints is a quantitative analysis. The present invention represents a plurality of soft constraint requirements of a task through a soft constraint requirement linked list. The linked list contains several elements, and each element includes a Boolean expression and an estimate, wherein the Boolean expression indicates a specific soft constraint requirement, and the estimate It is used to quantify the improvement of execution efficiency brought about by meeting the soft constraint requirements.

在图7所示例子中,任务具有三个软约束,第一个软约束的具体需求是“ATTR_IP in(192.168.1.160,192.168.170,192.168.1.180)”,表明机器的IP地址最好是上述三个IP地址之一,相应估值是50,表明满足这个软约束可以带来50的性能提升;第二个软约束的具体需求是“ATTR_TYPE==A”,表明机器类型最好是A型,相应估计是30,表明满足这个软约束可以带来30的性能提升;第三个软约束的具体需求是“ATTR_AVG_LOAD<=0.5”,表明机器的平均负载最好小于等于0.5,相应估值为20,表明满足这个软约束可以带来20的性能提升。In the example shown in Figure 7, the task has three soft constraints, the specific requirement of the first soft constraint is "ATTR_IP in (192.168.1.160,192.168.170,192.168.1.180)", indicating that the IP address of the machine is preferably the above three One of the IP addresses, the corresponding valuation is 50, indicating that satisfying this soft constraint can bring about a performance improvement of 50; the specific requirement of the second soft constraint is "ATTR_TYPE==A", indicating that the machine type is best type A, The corresponding estimate is 30, indicating that satisfying this soft constraint can bring a performance improvement of 30; the specific requirement of the third soft constraint is "ATTR_AVG_LOAD<=0.5", indicating that the average load of the machine should preferably be less than or equal to 0.5, and the corresponding estimate is 20 , showing that satisfying this soft constraint can lead to a performance improvement of 20.

图8为本发明实施例中为任务分配最优机器的流程图。如图8所示,本实施例中,为任务分配约束且最优的机器包括如下步骤:Fig. 8 is a flow chart of assigning an optimal machine to a task in an embodiment of the present invention. As shown in FIG. 8, in this embodiment, assigning constraints and optimal machines to tasks includes the following steps:

步骤801:将接收到的任务记为“待调度任务”,初始化机器集合M,将所有机器放入M中,初始化备选机器列表为空;Step 801: Record the received task as "task to be scheduled", initialize the machine set M, put all machines into M, and initialize the candidate machine list to be empty;

步骤802:从机器集合M中取出一个机器,记为“备选机器”,根据机器和任务的信息,计算得到待调度任务硬约束需求的值;Step 802: Take a machine from the machine set M, record it as "candidate machine", and calculate the value of the hard constraint requirement of the task to be scheduled according to the information of the machine and the task;

步骤803:判断任务硬约束是否为真,如果为真,则转至步骤804;如果不为真,则转至步骤805;Step 803: judge whether the task hard constraint is true, if true, go to step 804; if not true, go to step 805;

步骤804:将备选机器加入到备选机器列表中,并根据机器信息和任务的软约束链表,计算得到备选机器的约束估值;Step 804: Add the candidate machine to the candidate machine list, and calculate the constraint estimate of the candidate machine according to the machine information and the soft constraint linked list of the task;

步骤805:从机器集合M中取出备选机器;Step 805: Take out candidate machines from the machine set M;

步骤806:判断机器集合M是否为空,如果为空,则转至步骤807;如果不为空,则转至步骤802;Step 806: Determine whether the machine set M is empty, if it is empty, go to step 807; if not, go to step 802;

步骤807:以约束估值为标准,在备选机器列表中选择约束估值最大的机器,记为待调度任务的匹配机器。Step 807: Using the constraint valuation as a standard, select the machine with the largest constraint valuation from the list of candidate machines, and record it as the matching machine for the task to be scheduled.

图9为本发明实施例中计算任务硬约束的示意图。在图9所示的例子中,机器的属性簇记录着该机器的各种属性,包括机器主机名为“Blade10”、机器架构为“x86_64”、操作系统为“Centos6.3”、可用CPU为13个核、可用内存为23552MB等。同时,任务信息记录其资源需求为2个CPU核、2GB内存,硬约束需求为机器可用资源必须大于任务所需资源、机器架构必须是“X86_64”、操作系统必须是“Centos6.3”,其硬约束需求可写作HARD_CONSTRAINT=(ATTR_AVAIL_CPU>=ATTR_NEED_CPU)&&(ATTR_AVAIL_MEM>=ATTR_NEED_MEM)&&(ATTR_ARCH==X86_64)&&(ATTR_OS==Centos6.3)。根据机器和任务属性簇,可以计算得到任务硬约束需求(HARD_CONSTRAINT)的布尔返回值为真,这表明该机器满足任务的硬约束需求。Fig. 9 is a schematic diagram of computing task hard constraints in an embodiment of the present invention. In the example shown in Figure 9, the attribute cluster of the machine records various attributes of the machine, including the host name of the machine "Blade10", the machine architecture of "x86_64", the operating system of "Centos6.3", and the available CPU of 13 cores, available memory is 23552MB, etc. At the same time, the task information records that its resource requirements are 2 CPU cores and 2GB of memory. The hard constraint requirements are that the available resources of the machine must be greater than the resources required by the task, the machine architecture must be "X86_64", and the operating system must be "Centos6.3". Hard constraint requirements can be written as HARD_CONSTRAINT=(ATTR_AVAIL_CPU>=ATTR_NEED_CPU)&&(ATTR_AVAIL_MEM>=ATTR_NEED_MEM)&&(ATTR_ARCH==X86_64)&&(ATTR_OS==Centos6.3). According to the cluster of machine and task attributes, it can be calculated that the Boolean return value of the hard constraint requirement of the task (HARD_CONSTRAINT) is true, which indicates that the machine meets the hard constraint requirement of the task.

图10为本发明实施例中计算机器约束估值的示意图。本发明为每个机器维护一个约束估值(CON_VALUE),用以量化机器与任务软约束的总体匹配程度,计算机器约束估值的大体步骤为:先初始化机器的约束估值为0;然后遍历任务的软约束链表,对于每个元素,计算得到其软约束需求,如果软约束需求为真,则将约束估值加上该软约束相应的估值;最后得到的约束估值即为所求。Fig. 10 is a schematic diagram of calculating machine constraint estimates in an embodiment of the present invention. The present invention maintains a constraint valuation (CON_VALUE) for each machine to quantify the overall matching degree of the machine and the soft constraints of the task. The general steps for calculating the constraint valuation of the machine are: first initialize the constraint valuation of the machine to 0; then traverse The soft constraint linked list of the task, for each element, calculate its soft constraint requirement, if the soft constraint requirement is true, then add the constraint value to the corresponding value of the soft constraint; the final constraint value is the required value .

在图10所示的例子中,机器属性簇中记录着其IP地址为“192.168.1.160”、机器类型为“B”、平均负载为0.3,最初其约束估值为0。然后遍历任务的软约束链表,对于软约束1,软约束需求:ATTR_IP in(192.168.1.160,192.168.170,192.168.1.180)为真,所以约束估值会加上相应的估值50;对于软约束2,软约束需求(ATTR_TYPE==A)不为真;对于软约束3,软约束需求(ATTR_AVG_LOAD<=0.5)为真,所以约束估值也会加上相应的估值20,最终得到约束估值为70。In the example shown in FIG. 10 , the IP address “192.168.1.160”, the machine type “B”, and the average load of 0.3 are recorded in the machine attribute cluster, and its constraint value is initially 0. Then traverse the soft constraint list of the task. For soft constraint 1, the soft constraint requirement: ATTR_IP in(192.168.1.160, 192.168.170, 192.168.1.180) is true, so the corresponding value 50 will be added to the constraint value; for soft constraint 2 , the soft constraint requirement (ATTR_TYPE==A) is not true; for soft constraint 3, the soft constraint requirement (ATTR_AVG_LOAD<=0.5) is true, so the constraint valuation will also add the corresponding valuation 20, and finally the constraint valuation for 70.

图11为本发明实施例中虚拟机应用场景下的任务启动时间。在图11所示的例子中,记录了在不同虚拟机镜像大小的情况下,满足约束和无视约束的任务启动时间。其中,实线代表满足约束的任务启动时间,虚线代表无视约束的任务启动时间。数据显示,满足约束的任务启动时间明显短于无视约束的任务启动时间,具体的加速比与镜像大小有关,在本组实施例中加速比为6.91到24.18不等。FIG. 11 shows the task startup time in the application scenario of the virtual machine in the embodiment of the present invention. In the example shown in Fig. 11, under the condition of different virtual machine image sizes, the task start times satisfying constraints and ignoring constraints are recorded. Among them, the solid line represents the task start time that satisfies the constraint, and the dashed line represents the task start time that ignores the constraint. The data shows that the start-up time of tasks satisfying the constraints is significantly shorter than that of ignoring the constraints, and the specific speedup is related to the size of the image. In this group of examples, the speedup ranges from 6.91 to 24.18.

图12为本发明实施例中任务间相互通信应用场景下的任务完成时间。在图12所示的例子中,记录了在不同数据规模的情况下,满足约束和无视约束的任务完成时间。其中,实线代表满足约束的任务完成时间,虚线代表无视约束的任务完成时间。数据显示,满足约束的任务完成时间明显短于无视约束的任务完成时间,具体的加速比与数据规模、网络状态有关,在本实施例中约为2.25。总体而言,本发明提出的任务调度方法可以处理多种约束情况,并显著提高任务执行效率。FIG. 12 shows the task completion time in the application scenario of inter-task communication in the embodiment of the present invention. In the example shown in Fig. 12, the task completion time of satisfying constraints and ignoring constraints under different data scales is recorded. Among them, the solid line represents the task completion time satisfying the constraints, and the dashed line represents the task completion time ignoring the constraints. The data shows that the completion time of the task satisfying the constraints is significantly shorter than that of ignoring the constraints, and the specific speedup ratio is related to the data size and network status, which is about 2.25 in this embodiment. Generally speaking, the task scheduling method proposed in the present invention can handle various constraints and significantly improve task execution efficiency.

提供以上实施例仅仅是为了描述本发明的目的,而并非要限制本发明的范围。本发明的范围由所附权利要求限定。不脱离本发明的精神和原理而做出的各种等同替换和修改,均应涵盖在本发明的范围之内。The above embodiments are provided only for the purpose of describing the present invention, not to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent replacements and modifications made without departing from the spirit and principle of the present invention shall fall within the scope of the present invention.

Claims (7)

1. in isomeric group towards a method for scheduling task for mixed load, it is characterized in that performing step is as follows:
Step 1, Resource Scheduler receives machine heartbeat, the attribute bunch of machine maintenance; Described machine heartbeat is the attribute bunch of machine by actuator timed sending to Resource Scheduler, heartbeat content;
Step 2, job manager receives and resolves operation, obtains several tasks;
Step 3, job manager is that task to set a property bunch and constraint demand, then mission bit stream is sent to Resource Scheduler;
Step 4, after Resource Scheduler receives mission bit stream, for task coupling meets constraint and the machine of optimum, and returns to job manager by the matching relationship of task and machine;
Step 5, after job manager receives the matching relationship of task and machine, is issued to task on the actuator on coupling machine, executes the task.
2. in isomeric group according to claim 1 towards the method for scheduling task of mixed load, it is characterized in that: in described step 1, it is right that the attribute bunch of described machine comprises multiple key assignments (Key-Value), wherein key table shows machine attribute, value then represents the occurrence of attribute, and wherein attribute comprises machine host name, IP address, Machine Type, machine architecture, operating system, CPU sum, memory amount, available CPU, free memory amount, constraint valuation.
3. in isomeric group according to claim 1 towards the method for scheduling task of mixed load, it is characterized in that: the constraint demand in described step 3 comprises hard constraint and soft-constraint; Described hard constraint is the necessary condition of tasks carrying, must be met in scheduling process; Described soft-constraint is the preferences of tasks carrying, should meet to promote tasks carrying efficiency as far as possible, if but cannot meet, can ignore, in order to avoid cause the delay of the wasting of resources and tasks carrying.
4. in isomeric group according to claim 1 towards the method for scheduling task of mixed load, it is characterized in that: described step 3 specific implementation step is as follows:
Step 3.1: for task arranges the attribute bunch easily expanded, it is right that described attribute bunch comprises multiple key assignments (Key-Value), wherein key table shows the attribute of task, value then represents the occurrence of attribute, and the attribute bunch of task comprises task sign, fill order, required cpu resource, required memory resource;
Step 3.2: for task arranges hard constraint demand, represents the hard constraint demand of task by a Boolean expression, if there is multiple hard constraint, then they done " with computing ", still can represent multiple hard constraint demand by a Boolean expression;
Step 3.3: for task arranges soft-constraint demand, multiple soft-constraint demands of task are represented by soft-constraint demand chained list, several elements are comprised in chained list, each element comprises a Boolean expression and a valuation, Boolean expression shows concrete soft-constraint demand, and valuation is in order to quantize the lifting meeting the execution efficiency that soft-constraint demand is brought;
Step 3.4: the attribute of task bunch is sent to Resource Scheduler, request dispatching machine with soft or hard constraint demand by job manager.
5. in isomeric group according to claim 1 towards the method for scheduling task of mixed load, it is characterized in that: described step 4 specific implementation step is as follows:
Step 4.1: receiving of task is designated as " treating scheduler task ", initialization collection of machines M, puts into M by all machines, and the alternative machine list of initialization is empty;
Step 4.2: take out a machine from collection of machines M, be designated as " alternative machine ", according to the information of machine and task, calculates the value treating scheduler task hard constraint demand;
Step 4.3: judge whether the hard constraint demand treating scheduler task is true, if be true, is then joined in alternative machine list by alternative machine, and according to the soft-constraint chained list of machine information and task, calculate the constraint valuation of alternative machine;
Step 4.4: remove alternative machine from collection of machines M, judges whether collection of machines M is empty, does not then go to step 4.2 for sky;
Step 4.5: to retrain valuation for standard, selects the machine that constraint valuation is maximum, is designated as the coupling machine treating scheduler task in alternative machine list.
6. in isomeric group according to claim 1 towards the method for scheduling task of mixed load, it is characterized in that: in described step 4.3, the constraint valuation calculating alternative machine comprises:
The constraint valuation of alternative machine is initialized as 0;
Traversal treats the soft-constraint chained list of scheduler task, and for each element, calculate its soft-constraint demand, if soft-constraint demand is true, then current constraint valuation adds the valuation of this soft-constraint element, finally obtains the constraint valuation of alternative machine.
7. in isomeric group towards a task scheduling system for mixed load, it is characterized in that comprising: job manager, Resource Scheduler and actuator; Described job manager and Resource Scheduler are deployed on main controlled node, wherein:
Job manager is used for management operations and task, for task to set a property bunch and soft or hard constraint demand, and mission bit stream is sent to Resource Scheduler, the machine needed for request task;
Resource Scheduler is used for the machine heartbeat of receiving actuator timed sending, and on the basis of safeguarding whole clustered machine heartbeat, Resource Scheduler can receive the mission bit stream that job manager sends, for task coupling meets constraint and the machine of optimum;
Described actuator is deployed on other all machines except main controlled node, and timing reports machine heartbeat to Resource Scheduler, and receives the assignment instructions that job manager issues, and is responsible for specifically executing the task.
CN201410543294.2A 2014-10-14 2014-10-14 Towards the method for scheduling task and system of mixed load in a kind of isomeric group Active CN104243617B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410543294.2A CN104243617B (en) 2014-10-14 2014-10-14 Towards the method for scheduling task and system of mixed load in a kind of isomeric group

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410543294.2A CN104243617B (en) 2014-10-14 2014-10-14 Towards the method for scheduling task and system of mixed load in a kind of isomeric group

Publications (2)

Publication Number Publication Date
CN104243617A true CN104243617A (en) 2014-12-24
CN104243617B CN104243617B (en) 2017-10-27

Family

ID=52230945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410543294.2A Active CN104243617B (en) 2014-10-14 2014-10-14 Towards the method for scheduling task and system of mixed load in a kind of isomeric group

Country Status (1)

Country Link
CN (1) CN104243617B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022670A (en) * 2015-07-17 2015-11-04 中国海洋大学 Heterogeneous distributed task processing system and processing method in cloud computing platform
CN105302643A (en) * 2015-10-14 2016-02-03 浪潮集团有限公司 Job scheduling method and self-learning scheduling machine
CN105589745A (en) * 2015-12-18 2016-05-18 中国科学院软件研究所 Unbalanced task allocation supported dynamic vulnerability discovery system and method
CN107025141A (en) * 2017-05-18 2017-08-08 成都海天数联科技有限公司 A kind of dispatching method based on big data mixture operation model
CN107357661A (en) * 2017-07-12 2017-11-17 北京航空航天大学 A kind of fine granularity GPU resource management method for mixed load
CN107515784A (en) * 2016-06-16 2017-12-26 阿里巴巴集团控股有限公司 A kind of method and apparatus of computing resource in a distributed system
CN107678752A (en) * 2017-08-31 2018-02-09 北京百度网讯科技有限公司 A kind of task processing method and device towards isomeric group
CN109101339A (en) * 2018-08-15 2018-12-28 北京邮电大学 Video task parallel method, device and Heterogeneous Cluster Environment in isomeric group
CN110012062A (en) * 2019-02-22 2019-07-12 北京奇艺世纪科技有限公司 A kind of multimachine room method for scheduling task, device and storage medium
CN111147546A (en) * 2019-11-29 2020-05-12 中科院计算技术研究所大数据研究院 Method and system for processing edge cluster resources
CN114168283A (en) * 2021-12-02 2022-03-11 北京千帆阅文科技有限公司 Distributed timed task scheduling method and system
CN114787830A (en) * 2019-12-20 2022-07-22 惠普发展公司,有限责任合伙企业 Machine learning workload orchestration in heterogeneous clusters

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271405A (en) * 2008-05-13 2008-09-24 武汉理工大学 Bidirectional Hierarchical Grid Resource Scheduling Method Based on QoS Constraints
CN102495758A (en) * 2011-12-05 2012-06-13 中南大学 Scheduling method of real-time tasks in distributing type high performance calculation environment
US20140068049A1 (en) * 2012-09-03 2014-03-06 Bull Sas Method and device for processing commands in a set of components of a computer system
CN103631870A (en) * 2013-11-06 2014-03-12 广东电子工业研究院有限公司 System and method used for large-scale distributed data processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271405A (en) * 2008-05-13 2008-09-24 武汉理工大学 Bidirectional Hierarchical Grid Resource Scheduling Method Based on QoS Constraints
CN102495758A (en) * 2011-12-05 2012-06-13 中南大学 Scheduling method of real-time tasks in distributing type high performance calculation environment
US20140068049A1 (en) * 2012-09-03 2014-03-06 Bull Sas Method and device for processing commands in a set of components of a computer system
CN103631870A (en) * 2013-11-06 2014-03-12 广东电子工业研究院有限公司 System and method used for large-scale distributed data processing

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022670B (en) * 2015-07-17 2018-03-13 中国海洋大学 Heterogeneous distributed task processing system and its processing method in a kind of cloud computing platform
CN105022670A (en) * 2015-07-17 2015-11-04 中国海洋大学 Heterogeneous distributed task processing system and processing method in cloud computing platform
CN105302643A (en) * 2015-10-14 2016-02-03 浪潮集团有限公司 Job scheduling method and self-learning scheduling machine
CN105302643B (en) * 2015-10-14 2018-08-24 浪潮集团有限公司 A kind of method and self study scheduler of job scheduling
CN105589745A (en) * 2015-12-18 2016-05-18 中国科学院软件研究所 Unbalanced task allocation supported dynamic vulnerability discovery system and method
CN107515784A (en) * 2016-06-16 2017-12-26 阿里巴巴集团控股有限公司 A kind of method and apparatus of computing resource in a distributed system
CN107515784B (en) * 2016-06-16 2021-07-06 阿里巴巴集团控股有限公司 Method and equipment for calculating resources in distributed system
CN107025141A (en) * 2017-05-18 2017-08-08 成都海天数联科技有限公司 A kind of dispatching method based on big data mixture operation model
CN107025141B (en) * 2017-05-18 2020-09-01 成都海天数联科技有限公司 Scheduling method based on big data mixed operation model
CN107357661B (en) * 2017-07-12 2020-07-10 北京航空航天大学 A Fine-Grained GPU Resource Management Method for Mixed Loads
CN107357661A (en) * 2017-07-12 2017-11-17 北京航空航天大学 A kind of fine granularity GPU resource management method for mixed load
US10977076B2 (en) 2017-08-31 2021-04-13 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for processing a heterogeneous cluster-oriented task
CN107678752A (en) * 2017-08-31 2018-02-09 北京百度网讯科技有限公司 A kind of task processing method and device towards isomeric group
CN109101339B (en) * 2018-08-15 2019-05-31 北京邮电大学 Video task parallel method, device and Heterogeneous Cluster Environment in isomeric group
CN109101339A (en) * 2018-08-15 2018-12-28 北京邮电大学 Video task parallel method, device and Heterogeneous Cluster Environment in isomeric group
CN110012062A (en) * 2019-02-22 2019-07-12 北京奇艺世纪科技有限公司 A kind of multimachine room method for scheduling task, device and storage medium
CN110012062B (en) * 2019-02-22 2022-02-08 北京奇艺世纪科技有限公司 Multi-computer-room task scheduling method and device and storage medium
CN111147546A (en) * 2019-11-29 2020-05-12 中科院计算技术研究所大数据研究院 Method and system for processing edge cluster resources
CN114787830A (en) * 2019-12-20 2022-07-22 惠普发展公司,有限责任合伙企业 Machine learning workload orchestration in heterogeneous clusters
CN114168283A (en) * 2021-12-02 2022-03-11 北京千帆阅文科技有限公司 Distributed timed task scheduling method and system

Also Published As

Publication number Publication date
CN104243617B (en) 2017-10-27

Similar Documents

Publication Publication Date Title
CN104243617B (en) Towards the method for scheduling task and system of mixed load in a kind of isomeric group
CN104951372B (en) A kind of Map/Reduce data processing platform (DPP) memory source dynamic allocation methods based on prediction
CN107888669B (en) A large-scale resource scheduling system and method based on deep learning neural network
CN105956021B (en) A kind of automation task suitable for distributed machines study parallel method and its system
CN104021040B (en) Based on the cloud computing associated task dispatching method and device under time constraint condition
TWI786564B (en) Task scheduling method and apparatus, storage media and computer equipment
CN104050042B (en) The resource allocation methods and device of ETL operations
CN107222531B (en) Container cloud resource scheduling method
CN104915407A (en) Resource scheduling method under Hadoop-based multi-job environment
CN107291550B (en) A Spark platform resource dynamic allocation method and system for iterative applications
CN103761146B (en) A kind of method that MapReduce dynamically sets slots quantity
CN103530189A (en) Automatic scaling and migrating method and device oriented to stream data
CN110187960A (en) A distributed resource scheduling method and device
CN111190691A (en) Automatic migration method, system, device and storage medium suitable for virtual machine
CN108021435A (en) A kind of cloud computing task stream scheduling method with fault-tolerant ability based on deadline
CN102999317B (en) Towards the elasticity multi-process service processing method of many tenants
CN112882828A (en) Upgrade processor management and scheduling method based on SLURM job scheduling system
Zhang et al. A Spark Scheduling Strategy for Heterogeneous Cluster.
WO2020108337A1 (en) Cpu resource scheduling method and electronic equipment
Elshater et al. A study of data locality in YARN
Wang et al. Dependency-aware network adaptive scheduling of data-intensive parallel jobs
CN106201681A (en) Task scheduling algorithm based on pre-release the Resources list under Hadoop platform
CN114356714B (en) Resource integrated monitoring and scheduling device based on Kubernetes intelligent board cluster
CN110084507B (en) A hierarchical-aware scientific workflow scheduling optimization method in cloud computing environment
US20220405135A1 (en) Scheduling in a container orchestration system utilizing hardware topology hints

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant