CN114168283A

CN114168283A - Distributed timed task scheduling method and system

Info

Publication number: CN114168283A
Application number: CN202111462653.8A
Authority: CN
Inventors: 金凯峰; 孔颖
Original assignee: Beijing Qianfan Yuewen Technology Co ltd
Current assignee: Beijing Qianfan Yuewen Technology Co ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-03-11

Abstract

The application discloses a distributed timing task scheduling method and a distributed timing task scheduling system. Firstly, acquiring a target task set submitted by a user, generating a corresponding target task set identifier, analyzing and writing the target task set into a database, and sending the target task set identifier to a task scheduler cluster through a message queue; the task dispatcher cluster acquires a currently available task executor address, issues a target task set to the task executor cluster, and reports a load state to the monitor cluster; the task executor cluster acquires and executes each task in the target task set and reports the load state to the listener cluster; and the listener cluster expands and shrinks the task executor cluster according to the load states of the task scheduler cluster and the task executor cluster. The high availability and the high performance of task execution are guaranteed through the flexible expansion of the task scheduler cluster and the task executor cluster.

Description

Distributed timed task scheduling method and system

Technical Field

The invention relates to the field of timing task scheduling, in particular to a distributed timing task scheduling method and system.

Background

The development of internet technology and the continuous abundance of service scenes make timing tasks an indispensable foundation for supporting normal operation of services. Timing task execution logics of various services are greatly different, but the processes of task registration, task issuing and task scheduling execution are irrelevant to the services, and the tasks can be completely designed and deployed as independent services.

The company business has various timing or delay tasks to be executed, such as timing email reports, repeated timing detection of program execution states, delayed pushing of various APP messages and the like. These tasks are enormous and it is almost impossible for a single machine to perform because of their limited hardware and software resources to ensure the accuracy and timeliness of the execution. The traditional timing task scheduling system based on the single machine mode can not meet the change and the requirement of the service.

Disclosure of Invention

Based on this, the embodiment of the application provides a distributed timed task scheduling method and system, which implement service modules of management and control service, trigger execution, unified configuration management, task load balancing, and the like of timed tasks.

In a first aspect, a distributed timed task scheduling method is provided, where the method includes:

acquiring a target task set submitted by a user, and generating a corresponding target task set identifier, wherein the target task set comprises at least one task;

analyzing and writing the target task set into a database, and sending a target task set identifier to a task scheduler cluster through a message queue;

the task scheduler cluster acquires a currently available task executor address registered in the Zookeeper, issues a target task set in the database to the task executor cluster through the target task set identifier, and reports the load state of the task scheduler cluster to the listener cluster;

the task executor cluster acquires and executes each task in the target task set and reports the load state of the task executor cluster to the monitor cluster;

the listener cluster expands and shrinks the task scheduler cluster according to the load state of the task scheduler cluster and a preset scheduling threshold; and carrying out expansion and contraction on the task executor cluster according to the load state of the task executor cluster and a preset execution threshold.

Optionally, parsing and writing the target task set into a database includes:

and analyzing the configuration file of the target task set through the pars engine, and storing the analyzed target task set into a database.

Optionally, before the task scheduler cluster acquires the currently available task executor address registered in the Zookeeper, the method further includes:

the listener cluster registers a currently available task executor list in the Zookeeper through an API (application program interface) of the Zookeeper, wherein the task executor list comprises a task executor address.

Optionally, the Zookeeper monitors a change of data in the task executor list through a dispatcher mechanism, and always issues the latest task executor list to the task scheduler cluster, and the task scheduler cluster performs task assignment according to an available task executor in the latest task executor list.

Optionally, the target task set includes a set state, the tasks in the target task set include task states, and a task execution progress is reported through the set state and the task states;

the set state and the task state at least comprise an initial state, an accepting state, an executing state, a failure state, a canceling state and a success state.

Optionally, the performing, by the listener cluster, a scale-up and capacity-reduction on the task executor cluster according to the load state of the task executor cluster and a preset execution threshold includes:

and performing expansion and contraction capacity on the task executor cluster through a task executor standby pool.

Optionally, the database includes a Job table and a Task table, where the Job table is used to store Task set data, and the Task table is used to store Task data.

In a second aspect, a distributed timed task scheduling system is provided, the system comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target task set submitted by a user and generating a corresponding target task set identifier, and the target task set comprises at least one task;

the analysis module is used for analyzing and writing the target task set into a database, and sending the target task set identification to the task scheduler cluster through a message queue;

the task scheduler cluster is used for acquiring a currently available task executor address registered in the Zookeeper, issuing a target task set in the database to the task executor cluster through the target task set identifier, and reporting the load state of the task scheduler cluster to the listener cluster;

the task executor cluster is used for acquiring and executing each task in the target task set and reporting the load state of the task executor cluster to the monitor cluster;

the listener cluster is used for carrying out capacity expansion and capacity reduction on the task scheduler cluster according to the load state of the task scheduler cluster and a preset scheduling threshold; and carrying out expansion and contraction on the task executor cluster according to the load state of the task executor cluster and a preset execution threshold.

Optionally, the listener cluster is further configured to:

and registering a currently available task executor list in the Zookeeper through an API (application programming interface) of the Zookeeper, wherein the task executor list comprises task executor addresses.

According to the technical scheme provided by the embodiment of the application, a target task set submitted by a user is obtained, a corresponding target task set identifier is generated, the target task set is analyzed and written into a database, and the target task set identifier is sent to a task scheduler cluster through a message queue; the task scheduler cluster acquires a currently available task executor address, issues a target task set to the task executor cluster, and reports the load state of the task scheduler cluster to the monitor cluster; the task executor cluster acquires and executes each task in the target task set and reports the load state of the task executor cluster to the monitor cluster; the listener cluster expands and contracts the task scheduler cluster according to the load state of the task scheduler cluster and a preset scheduling threshold; and carrying out expansion and contraction on the task executor cluster according to the load state of the task executor cluster and a preset execution threshold.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

(1) the tasks can be guaranteed to be scheduled in time through the flexible expansion of the task scheduler cluster, and the high availability of task scheduling is achieved.

(2) The high availability and the high performance of task execution are guaranteed through the flexible expansion of the task executor cluster.

(3) The redundancy of the monitor cluster ensures the elastic expansion of the task scheduler cluster and the task executor cluster, and the reasonable utilization of resources is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

Fig. 1 is a flowchart illustrating steps of a distributed timed task scheduling method according to an embodiment of the present application;

FIG. 2 is a distributed timed task scheduling framework according to an alternative embodiment of the present application;

fig. 3 is a task execution flow in a target task set according to an alternative embodiment of the present application.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The development of internet technology and the continuous abundance of service scenes make timing tasks an indispensable foundation for supporting normal operation of services. Timing task execution logics of various services are greatly different, but the processes of task registration, task issuing and task scheduling execution are irrelevant to the services, and the tasks can be completely designed and deployed as independent services. The traditional timing task scheduling system based on the single machine mode can not meet the change and the requirement of the service. The invention designs a high-performance and high-availability architecture of a timed task scheduling system based on docker, and realizes service modules of management and control service, trigger execution, unified configuration management, task load balancing and the like of timed tasks. The system well meets various requirements through testing and online operation, and achieves the design target.

The company business has various timing or delay tasks to be executed, such as timing email reports, repeated timing detection of program execution states, delayed pushing of various APP messages and the like. These tasks are enormous and it is almost impossible for a single machine to perform because of their limited hardware and software resources to ensure the accuracy and timeliness of the execution. Meanwhile, the hardware cost is continuously reduced, and especially multi-machine clusters based on horizontal expansion are more and more popular, so that distributed task scheduling is necessarily a big trend.

To facilitate understanding of the present embodiment, first, a detailed description is given to a distributed timing task scheduling method disclosed in the embodiment of the present application.

Referring to fig. 1, a flowchart of a distributed timed task scheduling method provided in an embodiment of the present application is shown, where the method may include the following steps:

step 101, acquiring a target task set submitted by a user, and generating a corresponding target task set identifier.

The target task set comprises at least one task. .

In the embodiment of the application, the target task set comprises a set state, tasks in the target task set comprise task states, and the task execution progress is reported through the set state and the task states;

specifically, a series of Task sets needing timing or delay execution are called Job, and a Job can contain a single Task or a plurality of tasks with dependency relations, and each Task is called Task. Each Task under Job and Job has a series of states for tracking and reporting Task execution progress, and the states include PREP (initial state), ACCEPTED (accepting state), RUNNING (executing state), FAILED (failure state), completed (canceling state), and successful (success state).

And 102, analyzing and writing the target task set into a database, and sending the target task set identifier to the task scheduler cluster through a message queue.

In the embodiment of the application, a configuration file is analyzed for a target Task set through a parse engine, and the analyzed target Task set is stored in a database, wherein the parse engine is a configuration file analysis program deployed at a server end, the database comprises a Job table and a Task table, the Job table is used for storing Task set data, and the Task table is used for storing Task data.

In the embodiment of the present application, the specific flow (pseudo code) executed by the parse engine is as follows:

(1) the user completes task configuration on a page in a dragging mode, and after the task configuration is submitted in a clicking mode, the task configuration information is sent to a parse engine through a post request;

(2) an example of a task configuration request parameter body is as follows:

(3) the parse engine performs the following processing on the received request body:

paras＝request.get(paras,{})；

jpara＝json.loads(paras,decode＝'utf8')；

job_name＝jpara.get('job_name',”)；

job_create_time＝jpara.get('job_create_time',”)；

job_schedule_time＝jpara.get('job_schedule_time',”)；

job_status＝jpara.get('job_status',”)；

if job_schedule_time＝＝”:

response ("execution time cannot be null");

job_task_num＝jpara.get('job_task_num',1)；

if job_task_num<1:

response ("task number > ═ 1");

job_container＝jpara.get('job_container',”)；

if job_container＝＝”:

response ("actuator cannot be empty");

job_priority＝jpara.get('job_priority',1)；

job_retry＝jpara.get('job_retry',1)；

and then, writing the information into a Mysql jobTable according to the following fields:

insert into job_t(job_id,job_name,job_status,job_create_time,job_schedule_time,job_task_num,job_container,job_priority,job_retry)；

job_task_schedule＝jpara.get('job_task_schedule',{})；

if len(job_task_schedule)＝＝0:

response ("task link cannot be null");

for stage,stage_task in job_task_schedule:

and extracting a corresponding task list according to the stage, and writing the list into a Mysql task list according to the following fields:

insert into task_t(task_id,job_id,job_status,task_stage_id,job_schedule_time,job_retry,job_container)；

response("OK")

103, the task scheduler cluster acquires the currently available task executor address registered in the Zookeeper, issues the target task set in the database to the task executor cluster through the target task set identifier, and reports the load state of the task scheduler cluster to the listener cluster.

In this embodiment of the present application, before the task scheduler cluster acquires the currently available task executor address registered in the Zookeeper, the listener cluster registers a currently available task executor list in the Zookeeper through an API interface of the Zookeeper, where the task executor list includes the task executor address. The Zookeeper monitors the change of data in the task executor list through a dispatcher mechanism, and issues the latest task executor list to the task dispatcher cluster all the time, and the task dispatcher cluster assigns tasks according to the available task executors in the latest task executor list.

And step 104, the task executor cluster acquires and executes each task in the target task set, and reports the load state of the task executor cluster to the monitor cluster.

105, the listener cluster expands and contracts the task scheduler cluster according to the load state of the task scheduler cluster and a preset scheduling threshold; and carrying out expansion and contraction on the task executor cluster according to the load state of the task executor cluster and a preset execution threshold.

In the embodiment of the present application, a task executor cluster is scaled by a task executor spare pool (docker pool), and the task executor spare pool may also scale a task scheduler cluster.

Referring to fig. 2, a distributed timed task scheduling framework of a distributed timed task scheduling method according to an alternative embodiment of the present application is shown, specifically including:

1. the client represents a client and a task initiator, and submits Job in a form of dragging on a web interface through the process 1;

suppose a user uid _1 submits a job _1 named "ten thousand read APP lingering the next day" through a web interface at 2021-09-1012: 30:56, the execution time is 15 pm every day, and the dependency relationship of the subtasks is as the job _1 task execution flow of FIG. 3.

2. Receiving Job submitted by a user by a pars engine, analyzing, writing the analyzed Job into a Database (DB) through a process 2, performing persistent storage, and writing a task Job _ id into a Message Queue (MQ) through a process 3; the DB database table design includes Job tables and Task tables, stored in the Mongodb engine.

3. A task scheduler (scheduler), shown in the cluster (i), as a consumer, obtaining a new submitted Job _ id in the MQ through a process 4, and updating a Job state corresponding to the Job _ id in the DB1 through a process 5; for job _1, its state changes to ACCEPTED due to PREP.

4. A task scheduler (scheduler) acquires a currently available task executor (worker) registered in a zk module (Zookeeper) through a process 6, and issues a task to a specified worker through a process 7; the phase task _1 state is changed from PREP to ACCEPTED.

5. Zookeeper assumes the roles of configuration center and service discovery through the file system and listening mechanism. Specifically, as shown in a listener (monitor) and a cluster (r), a currently available task executor (worker) list worker _ list is created in zk through an API interface of zk, then zk persistently stores the worker _ list, meanwhile, a watch mechanism is used to monitor changes of data in the worker _ list, and the latest worker _ list is issued to a task scheduler (scheduler) all the time, and the task scheduler performs task assignment according to the available worker in the worker _ list.

6. A task executor (worker) executes a specific task and reports a task execution state to a task scheduler (scheduler) through a process 8;

7. the Task scheduler (scheduler) writes the Task execution state reported by the Task executor (worker) into the database through the process 9, updates the state of the Task in the Task-table (DB2), and updates the state of the corresponding Job in the Job-table (DB1), where the state of Job _1 is changed from ACCEPTED to RUNNING, and the state of Task _1 is changed from ACCEPTED to RUNNING, and assuming that Task _1 is allocated to each worker _1 for execution, Job _ provider in the Job-table (DB1) is updated to worker _ 1.

8. A task scheduler (scheduler) reports the load state of the task scheduler (scheduler) to a monitor (monitor) through a process 10, and the monitor determines whether to perform capacity expansion or capacity reduction operation on the task scheduler (scheduler) through a process 11 according to a reporting result;

9. a task executor (worker) reports the load state of the task executor to a monitor (monitor) through a process 12; the monitor determines whether to perform capacity expansion or capacity reduction operation on a task executor (worker) through the process 11 according to the reported result;

10. the monitor determines whether to expand/reduce the capacity of the cluster II or the cluster III according to the reported information of the cluster I task scheduler and the cluster III task executor and a threshold preset by the service. And simultaneously writing the capacity expansion or capacity reduction result into a zookeeper configuration center mentioned in Step-5, and issuing the latest worker _ list to local caches of task schedulers (schedulers) by the zookeeper through a process 6, wherein the specific process is described in Step-5.

11. The data result generated by the task executor (worker) cluster (c) is uniformly written into a Network File System (NFS) through a process 14 so as to be used by other tasks.

12. And the task executor (worker) cluster (c) reports the received final execution state of the task to a task scheduler (scheduler) through a flow 8, and the task scheduler (scheduler) updates the database through a flow 9. Possible states for this stage include FAILED state, successful state, and KILLED state.

13. A worker standby pool (docker pool), a container with resource isolation is made according to a CPU and a memory, a git code is actively pulled according to a task type, and is packaged into a docker mirror image and pushed to a task executor (worker) cluster. The flexible expansion resource in Step10 is from the resource pool of Step13, that is, the resource is not enough to be taken from the resource pool and is released to the resource pool if the resource is not enough.

14. System administrator (admin)

a. Controlling a task scheduler (scheduler) to stop scheduling tasks, reschedule the tasks and change the priority of the scheduling tasks through a process 15;

b. controlling to add zk nodes, delete zk nodes and update configuration files through a process 16;

c. a monitor (monitor) is controlled to adjust a system load threshold value and adjust the utilization rate of a worker spare pool (packer pool) through a process 17;

steps-1 to Step14 are the execution flows of the high-performance timing task scheduling method and device provided by the invention.

The task can be scheduled in time through the flexible extension of a task scheduler (scheduler) cluster (I), and the high availability of task scheduling is realized.

The high availability and the high performance of the task execution are ensured through the flexible expansion of a task executor (worker) cluster.

The redundancy of the monitor cluster II ensures the elastic expansion of the cluster I and the cluster III, and the reasonable utilization of resources is realized.

The embodiment of the present application provides a further provided distributed timing task scheduling system, where the system 200 may include:

the analysis module is used for analyzing and writing the target task set into a database and sending the target task set identification to the task scheduler cluster through the message queue;

the task scheduler cluster is used for acquiring a currently available task executor address registered in the Zookeeper, issuing a target task set in the database to the task executor cluster through a target task set identifier, and reporting the load state of the task scheduler cluster to the listener cluster;

the listener cluster is used for carrying out capacity expansion on the task scheduler cluster according to the load state of the task scheduler cluster and a preset scheduling threshold; and carrying out expansion and contraction on the task executor cluster according to the load state of the task executor cluster and a preset execution threshold.

In an optional embodiment of the present application, the listener cluster is further configured to:

and registering a currently available task executor list in the Zookeeper through an API (application programming interface) of the Zookeeper, wherein the task executor list comprises a task executor address.

In an optional embodiment of the present application, the Zookeeper monitors a change of data in the task executor list through a dispatcher mechanism, and always issues the latest task executor list to the task scheduler cluster, and the task scheduler cluster performs task assignment according to an available task executor in the latest task executor list.

For specific limitations of the distributed timed task scheduling system, reference may be made to the above limitations of the distributed timed task scheduling method, and details are not described here. The modules in the distributed timed task scheduling system can be wholly or partially implemented by software, hardware and a combination thereof. .

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A distributed timed task scheduling method is characterized by comprising the following steps:

2. The method of claim 1, wherein parsing the set of target tasks into a database comprises:

3. The method of claim 1, wherein prior to the task scheduler cluster obtaining a currently available task executor address registered in a Zookeeper, the method further comprises:

4. The method according to claim 3, wherein the Zookeeper monitors the change of data in the task executor list through a Watcher mechanism, and always issues the latest task executor list to the task scheduler cluster, and the task scheduler cluster performs task assignment according to the available task executors in the latest task executor list.

5. The method according to claim 1, wherein the target task set comprises a set state, tasks in the target task set comprise task states, and task execution progress is reported through the set state and the task states;

6. The method of claim 1, wherein the listener cluster performing scale-up on the task executor cluster according to the load status of the task executor cluster and a preset execution threshold comprises:

7. The method of claim 1 wherein the database includes a Job table and a Task table, the Job table storing Task collection data and the Task table storing Task data.

8. A distributed timed task scheduling system, the system comprising:

9. The system of claim 8, wherein the listener cluster is further configured to:

10. The system according to claim 9, wherein the Zookeeper monitors the change of data in the task executor list through a Watcher mechanism, and always issues the latest task executor list to the task scheduler cluster, and the task scheduler cluster performs task assignment according to the available task executors in the latest task executor list.