CN112965800A

CN112965800A - Distributed computing task scheduling system

Info

Publication number: CN112965800A
Application number: CN202110256742.0A
Authority: CN
Inventors: 王麟; 王钞; 余庆; 李昕; 石涛声; 李涛; 李君龙
Original assignee: Shanghai Kunyao Network Technology Co ltd
Current assignee: Shanghai Kunyao Network Technology Co ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2021-06-15

Abstract

The application aims to provide a distributed computing task scheduling system. The distributed computing task scheduling system comprises a scheduling service process and a scheduling local service; the scheduling service process comprises components such as a scheduling information database, an affinity characteristic configuration service and the like; the scheduling local service comprises a task generation process and a task getting process; the scheduling information database is used for recording background data of the scheduling service process; the affinity characteristic configuration service is used for receiving a resource usage model input by a user and writing the resource usage model into the scheduling information database in a regular mode for the scheduling service process to use. Compared with the prior art, the method and the device have the advantages that the static scheduling and the dynamic scheduling are combined, the heterogeneous environment can be adapted to calculation, the heterogeneous environment can be adapted to storage, and efficient scheduling can be achieved in the heterogeneous environment.

Description

Distributed computing task scheduling system

Technical Field

The application relates to the technical field of information, in particular to a distributed computing task scheduling technology.

Background

Following the technological evolution route, the development of task scheduling technology goes through three generations: macro scheduling, static scheduling, and two-tier scheduling. In a broad sense, task scheduling is divided into two categories, static scheduling and dynamic scheduling. The heterogeneous resource environment is divided into computing heterogeneous and storage heterogeneous. The current technical scheme is generally based on an isomorphic system and has poor support on a heterogeneous system. The existing macro scheduling technical scheme has poor expansibility and limited cluster scale, and a new scheduling strategy is difficult to be integrated into the existing codes. The existing static scheduling technical scheme is suitable for large-scale cloud computing and is not suitable for other heterogeneous environments. The scheduling framework of the existing two-layer scheduling technical scheme is too complex, the development cost is high, and the scheduling efficiency is not high.

Disclosure of Invention

It is an object of the present application to provide a distributed computing task scheduling system.

According to one aspect of the application, a distributed computing task scheduling system is provided, wherein the system comprises a scheduling service process and a scheduling local service;

the scheduling service process comprises components such as a scheduling information database, an affinity characteristic configuration service and the like;

the scheduling local service comprises a task generation process and a task getting process;

the scheduling information database is used for recording background data of the scheduling service process;

the affinity characteristic configuration service is used for receiving a resource usage model input by a user and writing the resource usage model into the scheduling information database in a regular mode for the scheduling service process to use.

In the technical scheme provided by the application, the distributed computing task scheduling system comprises a scheduling service process and a scheduling local service; the scheduling service process comprises components such as a scheduling information database, an affinity characteristic configuration service and the like; the scheduling local service comprises a task generation process and a task getting process; the scheduling information database is used for recording background data of the scheduling service process; the affinity characteristic configuration service is used for receiving a resource usage model input by a user and writing the resource usage model into the scheduling information database in a regular mode for the scheduling service process to use. Compared with the prior art, the method and the device have the advantages that the static scheduling and the dynamic scheduling are combined, the heterogeneous environment can be adapted to calculation, the heterogeneous environment can be adapted to storage, and efficient scheduling can be achieved in the heterogeneous environment.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic structural diagram of a distributed computing task scheduling system according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating the working principle of a distributed computing task scheduling system according to an embodiment of the present application;

FIG. 3 is a schematic illustration of a task split into multiple jobs according to an embodiment of the present application;

FIG. 4 is a state transition diagram of a task according to an embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, program means, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The embodiment of the application provides a distributed computing task scheduling system, combines the advantages of static scheduling and dynamic scheduling, can adapt to computing heterogeneous environments, can adapt to storing heterogeneous environments, and can realize efficient scheduling in the heterogeneous environments.

In a practical scenario, the device implementing the system may be a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, a terminal device such as a smartphone, a tablet computer, a Personal Computer (PC), and the like, and the network device includes, but is not limited to, a network host, a single network server, multiple network server sets, or a cloud computing-based computer set. Here, the Cloud is made up of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, one virtual computer consisting of a collection of loosely coupled computers.

FIG. 1 is a schematic structural diagram of a distributed computing task scheduling system according to an embodiment of the present application, the system including a scheduling service process and a scheduling local service; the scheduling service process comprises components such as a scheduling information database, an affinity characteristic configuration service and the like; the scheduling local service comprises a task generation process and a task getting process; the scheduling information database is used for recording background data of the scheduling service process; the affinity characteristic configuration service is used for receiving a resource usage model input by a user and writing the resource usage model into the scheduling information database in a regular mode for the scheduling service process to use.

For example, the scheduling service process is a companion process to the scheduling information database, and functions like a global resource manager to serve the scheduling local service. Compared with the scheduling local service, the scheduling service process is a server side, and the scheduling local service is a client side. The scheduling service process comprises the scheduling information database, the affinity feature configuration service and other related components, and the scheduling local service comprises the task generation process and the task extraction process.

In some embodiments, the scheduling information database is used for recording background data of the scheduling service process; the data recorded by the scheduling information database comprises: the method comprises the steps of using and distributing heterogeneous resources, calculating and storing distribution conditions, mapping relations between nodes and tasks, mapping relations between the nodes and each subtask, memory and storage dividing modes on the nodes, node IP addresses, task types, task quantity, resource affinity configuration information and the like.

In some embodiments, the affinity feature configuration service is configured to receive a resource usage model input by a user (e.g., an administrator) and write the resource usage model in a regular manner into the scheduling information database for direct use by the scheduling service process. Such administrator-directed rules are often the most efficient or cost effective way to schedule. Herein, the affinity feature configuration service mainly adopts static scheduling, which works well in a scene with a fixed task type; and other components of the distributed computing task scheduling system adopt dynamic scheduling. For task scheduling of heterogeneous computing environments and heterogeneous storage environments, comprehensive scheduling needs to be performed in combination with task characteristics and resource characteristics.

The scheduling local service comprises the task generating process and the task getting process. In the distributed computing task scheduling system, pooling of resources is achieved through different types of resource abstraction and resource isolation. Pooling and isolation of computing resources is achieved through cgroup technology, and abstraction of memory and external memory resources may utilize mature technologies of operating systems. Pooling and isolation of all resources may be perceived by the task generation process and the task retriever process.

In some embodiments, the task generation process is configured to generate a task used for a heterogeneous resource or a sub-task of a certain task by using a greedy method according to a target of the distributed computing task scheduling system. Here, it is mainly specified by task type and task number; the number of tasks generated tends to be greater than the tasks already picked up.

In some embodiments, the task fetcher is configured to claim tasks to the scheduling service process based on the perception and pricing of resources local to the distributed computing task scheduling system. For example, what tasks need to be picked up by the task picking process is to claim the tasks to the global scheduling service process according to the perception and pricing of resources locally (an abstract way of measuring resource consumption).

In some embodiments, the scheduling service process is configured to respond to a specific task to the task retriever process according to the affinity characteristics configuration and the pipelining requirements of the task. For example, a global scheduling service process may respond to a specific task to the scheduling local service according to the affinity feature configuration and the pipelining requirement of the task, thereby completing the scheduling. After locally scheduling the service activity task, a local resource (e.g., a computing or storage resource) is invoked to complete the task.

In some embodiments, as shown in fig. 2, the main components of the distributed computing task scheduling system include a schedule service process (schedule service), the schedule information database, the affinity feature configuration service, and the schedule local service. The scheduling service process and the scheduling information database both have redundancy and load balancing mechanisms, and the scheduling local service comprises two agent daemon processes of task generation and task extraction. The task here may be a job level after task splitting. The affinity of the task to the resource is obtained by analyzing the requirement characteristics of the task to the resource. As shown in fig. 3, a task may be split into multiple jobs, and there may be dependencies or parallelism between different jobs. A priori analysis may basically define the affinity and counter-affinity of a task for a certain resource. The affinity identification is read by the scheduling service process in a mode of configuring a database. The task generation process perceives resource conditions according to corresponding policies (such as static policies and dynamic policies), quantifies resource quotas, and determines how many resources are allocated to each application. Memory, external memory and computation are abstract representations of resources based on which the task fetcher can instantiate assignment task types (jobs) and task quantities in the belonging cluster. The scheduling information database records the distribution condition of tasks and resources, the operation expectation of the tasks, the occupation expectation of the resources, the fault condition of the tasks and the like. Task generation decides how many resources can be allocated to the cluster based on a certain algorithm strategy (such as fair scheduling), and task picking decides which resources to use (accept) and which tasks to run. For a task that is restarting with a fault, or a pre-cursor task with a dependency, it is scheduled with a higher priority. And when the task is idle, the task getting process registers to the scheduling service process. After the scheduling is completed, the task getting process sends the received tasks and resources to the scheduling service process, so that the task execution condition and the resource occupation condition are reported.

In some embodiments, the dispatch local service is configured to track a state of a task, and discover and report an abnormal condition occurring in the system. For example, the task in the dispatch service process has the state and transition status shown in FIG. 4. After the task is executed, the result of the task is reported by the task getting process and is collected by the scheduling service process.

Here, exception handling requires handling the following exception cases: (1) and (3) completing: the collection or instance completes normally (also referred to as "success"). (2) And (4) expelling: due to (rare) hardware failures, forced operating system upgrades (about 1 per month per computer), or preemption by higher priority instances or scheduling commitments to collections or instances due to computers having expired, the system has to kill one or more instances to free up the remaining sufficient resources; most evictions are instances rather than abstract sets, and in almost all cases, evicted instances will be rearranged elsewhere in the same cluster. (3) Killing: the user cancels a set or instance directly or through an RPC to a management node, or directly if it is a child of an exiting or killed parent job. (4) Failure: a collection or instance terminates unexpectedly due to its own problems (e.g., segment errors) or attempts to use resources beyond its requirements (e.g., due to memory leaks or configuration errors). (5) Task drifting: saving the current context, and performing drift; resource conditions cannot be nearby, and stage tasks drift.

In some embodiments, the dispatch local service is capable of discovering and reporting most exceptions including completion, eviction, kill, failure, task drift, and the like. The dispatch service process is needed for discovery and handling in only a few scenarios, such as server failure, network failure, etc. where the dispatch local service is located. For example, if the scheduling service process does not have a certain computing task for a long time, after a certain time, all tasks related to the computing node are marked as faults, the node is evicted, and rescheduling is performed.

In some embodiments, the process of the system scheduling tasks includes a task analysis phase, a task splitting phase, a resource mapping phase, and a result collection phase.

For example, tasks can be classified as compute intensive and IO intensive, depending on their different characteristics and use of resources. For the compute intensive type, tasks are divided into CPU affinity and GPU affinity; for IO intensive tasks, the tasks are divided into memory affinity and disk affinity. Splitting the task, determining the priority and the dependency relationship of the task, quantizing the resource, and determining the specification of the resource as the basic work of a scheduling system.

In order to make task splitting and resource planning, the prior analysis plays a crucial role in scheduling effect. In general, analysis can be divided into off-line analysis and on-line analysis. The prior analysis belongs to the category of off-line analysis. The size and the method of task splitting can be better defined by the service developer through deep understanding of the service model. The operation and maintenance personnel can know the information of the resources very well in the purchasing and deploying stage of the system, analyze, quantify and plan in advance, and can effectively utilize the physical resources. In this phase, the affinity of tasks to resources and pooling of resources are two important tasks. The affinity of the task to the resource is obtained by analyzing the requirement characteristics of the task to the resource. As shown in fig. 3, a task may be split into multiple jobs, and there may be dependencies or parallelism between different jobs. A priori analysis may basically define the affinity and counter-affinity of a task for a certain resource.

In some embodiments, the task analysis phase includes offline analysis and online analysis; and the scheduling local service acquires the resource affinity condition of each subtask by collecting the tasks and the online running condition of each stage of the tasks to complete the online analysis.

For example, the offline analysis includes: and (4) performing bench test on the task and each stage of the task in advance to obtain the resource affinity condition (such as running duration, resource occupation and the like) of each subtask. The obtained data features directly serve as configuration of the affinity feature database. In particular, this can be done by an off-line auxiliary tool, such as a bench test tool.

For example, the online analysis includes: and collecting the tasks and the re-running conditions of each stage of the tasks to obtain the resource affinity conditions (such as running time, resource occupation and the like) of each subtask. And acquiring an operation cost performance report of the whole cluster, and then adjusting the distribution of tasks according to the resource distribution of the system, namely dynamically scheduling. In particular, this is done by said scheduling a local service process.

In some embodiments, in the task splitting stage, the dispatch service process and the dispatch local service cooperate to split the task according to a static state and a dynamic state, respectively.

For example, the individual subtasks of a task are not necessarily on a single machine, and may be distributed throughout the cluster. The task may be split according to a static state and a dynamic state, specifically, the task is completed by the coordination of the scheduling service process and the scheduling local service.

In some embodiments, in the resource mapping stage, the scheduling local service cooperates to split the resource according to a splitting condition of the task.

For example, the resource mapping is to split the resource according to the splitting situation of the task. Here, a docker technique, a container technique, or the like may be used. The splitting and the releasing of the task are automatic, and the splitting of the task is also the starting of the execution of the task. Specifically, the local service coordination is completed through the scheduling.

In some embodiments, in the result collection phase, this is done by the dispatch service process in cooperation with the dispatch local service. For example, when a task has a plurality of subtasks, some of the tasks are parallel to each other, and some of the tasks are pipelined. All subtasks are completed, and the whole task can be completed. The scheduling of tasks comprises the pipelining information, and for the tasks with time requirements, the tasks starved due to long waiting time on the pipelining task line are scheduled preferentially.

In summary, the distributed computing task scheduling system provided by the embodiment of the present application can adapt to both an isomorphic system and a heterogeneous system; the advantages of static scheduling are combined, and the effect of dynamic scheduling is exerted; the optimal matching problem of tasks and resources in the heterogeneous environment can be solved. Under the condition of equivalent hardware resources and algorithms, the distributed computing task scheduling system provided by the embodiment of the application can improve the overall efficiency by 4-5 times.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. Herein, some embodiments of the present application provide a computing device comprising a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the methods and/or aspects of the embodiments of the present application as described above.

Furthermore, some embodiments of the present application also provide a computer readable medium, on which computer program instructions are stored, the computer readable instructions being executable by a processor to implement the methods and/or aspects of the foregoing embodiments of the present application.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In some embodiments, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A distributed computing task scheduling system, wherein the system comprises a scheduling service process and a scheduling local service;

2. The system of claim 1, wherein the task generation process is configured to generate a task for heterogeneous resource usage, or a subtask of the task, according to a goal of the system.

3. The system of claim 1, wherein the task extraction process is configured to apply tasks to the scheduling service process based on perception and pricing of resources local to the system.

4. The system of claim 3, wherein the scheduler service process is configured to respond to a particular task to the task fetcher process based on the affinity profile and the streamlining requirements of the task.

5. The system of claim 1, wherein the data recorded by the scheduling information database comprises:

the method comprises the steps of using and distributing heterogeneous resources, calculating and storing distribution conditions, mapping relations between nodes and tasks, mapping relations between the nodes and each subtask, memory and storage dividing modes on the nodes, node IP addresses, task types, task quantity and resource affinity configuration information.

6. The system of claim 1, wherein the dispatch local service is configured to track a status of a task, discover and report an abnormal situation occurring in the system.

7. The system of claim 1, wherein the system schedules tasks comprising a task analysis phase, a task splitting phase, a resource mapping phase, and a result collection phase.

8. The system of claim 7, wherein the task analysis phase includes offline analysis and online analysis; and the scheduling local service acquires the resource affinity condition of each subtask by collecting the tasks and the online running condition of each stage of the tasks to complete the online analysis.

9. The system of claim 7, wherein, during the task splitting phase, the dispatch service process and the dispatch local service work in concert to split tasks according to static and dynamic states, respectively.

10. The system of claim 7, wherein, in the resource mapping phase, the scheduled local service works in cooperation to split resources according to a splitting situation of a task.