CN111459639A

CN111459639A - Distributed task management platform and method supporting global multi-machine-room deployment

Info

Publication number: CN111459639A
Application number: CN202010257724.XA
Authority: CN
Inventors: 李进; 顾湘余; 杨建斌; 张凯文
Original assignee: Hangzhou Quwei Science & Technology Co ltd
Current assignee: Hangzhou Quwei Science & Technology Co ltd
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2020-07-28
Anticipated expiration: 2040-04-03
Also published as: CN111459639B

Abstract

The invention discloses a distributed task management platform and a distributed task management method supporting global multi-machine room deployment. The system comprises a task executor, a task scheduler, a registration center, a task management server and a task control foreground; the task executor executes batch processing tasks, runs in an application service process, is provided with a plurality of nodes, and reports the task execution state of the task executor to a task management server through messages in the process of executing the batch processing tasks; the task scheduler executes the slicing scheduling of the tasks; the registration center is responsible for recording and notifying the upper and lower line states of the task executor nodes, triggering fragment scheduling and storing various information; the task management service is responsible for information management of tasks and manual triggering of task execution; the task control foreground is deployed in the central machine room, and the task management server of each machine room is called to manage the task of each machine room, so that the distributed task management of the single control console and the multiple machine rooms is realized. The invention has the beneficial effects that: reducing the load on the client node.

Description

Distributed task management platform and method supporting global multi-machine-room deployment

Technical Field

The invention relates to the technical field of internet correlation, in particular to a distributed task management platform and a distributed task management method supporting global multi-machine room deployment.

Background

In a business system, a timed batch processing requirement is often needed, which is similar to the function of Crontab of L inux, and these batch processing tasks rely on data of the business system more, and in addition, for a scene with strong data consistency, the execution state of the tasks needs to be fed back and presented in real time, and at the same time, high availability and failover of the tasks need to be guaranteed.

In open source software, there are some such frameworks, the most notable of which is the electronic-job-lite that is open source on the current day, and the problems that exist today are: at present, middleware related to open-source distributed task management is difficult to meet the scheduling requirement of single platform management of global multi-machine-room deployment tasks, and for the requirement of multiple machine rooms, only one set of task management system can be independently deployed in each cluster, so that complexity is brought to global multi-machine-room distributed task management, monitoring, task rule issuing and the like. At present, the main open source scheme electronic-job-lite realizes job time scheduling, fragmentation scheduling, job execution, log recording and the like in a rich client form, so that the load of the client is overlarge, and the stability and the expandability of a service system are greatly influenced.

Disclosure of Invention

The invention provides a distributed task management platform and a distributed task management method supporting global multi-machine-room deployment, which are used for overcoming the defects in the prior art and reducing the load of client nodes.

In order to achieve the purpose, the invention adopts the following technical scheme:

a distributed task management platform supporting global multi-machine-room deployment comprises a task executor, a task scheduler, a registration center, a task management server and a task control foreground;

the task executor executes batch processing tasks, runs in an application service process, is provided with a plurality of nodes, and reports the task execution state of the task executor to a task management server through messages in the process of executing the batch processing tasks;

the task scheduler executes the fragment scheduling of the task, namely determining which actuator executes each fragment;

the registration center is responsible for recording and informing the upper and lower line states of the task executor nodes in the tasks, triggering fragment scheduling and storing various information;

the task management service is responsible for information management of tasks and manual triggering of task execution; meanwhile, the task management service consumes the messages sent by the task executor nodes through the message middleware kafka, records the task execution state and events to the database, and gives an alarm to the tasks which fail to be executed through the alarm center;

the task control foreground is deployed in the central machine room, and the task management server of each machine room is called to manage the task of each machine room, so that the distributed task management of a single console and multiple machine rooms is realized.

The platform manages distributed tasks deployed in multiple machine rooms, and comprises task fragment scheduling and task execution separation, so that a task client is concentrated in task execution and execution state reporting, fragment scheduling, alarming, log recording and the like are decoupled through messages and a server, and the messages and the log recording are asynchronously handed to the server to be completed, and the load of client nodes is greatly reduced.

Preferably, for the task scheduler, the task scheduler is deployed together with the task management server and also monitors the upper and lower line states of the task executor nodes on the registration center, and the slicing scheduling policy is a slicing policy based on an average allocation algorithm.

The invention also provides an implementation method of the distributed task management platform supporting global multi-machine room deployment, which specifically comprises the following steps:

(1) in global machine rooms needing to be deployed, each machine room needs to deploy a dependent basic component;

(2) in global machine rooms needing to be deployed, each machine room needs to be deployed with a highly available registration center;

(3) deploying a task scheduler and a task management server side in a global machine room needing to be deployed;

(4) deploying a task control foreground in the central machine room, wherein the task control foreground is a UI (user interface) for managing and operating tasks, configuring the addresses of task management servers of different machine rooms, and calling each machine room service to manage and control the tasks;

(5) through the provided JAVA client SDK, a task is accessed in a JAVA application in a Spring boot annotation mode, and a developer completes own batch processing service logic through expansion.

Preferably, in step (1), the whole platform needs the high-performance message middleware kafka and a database for recording the task execution state and events of the dependent basic components.

Preferably, in step (2), the registry is implemented by an ETCD, a Zookeeper or by itself as required, and if the Zookeeper is used, at least more than 3 nodes are used, and an odd number of nodes are deployed.

Preferably, in step (3), the two components may be selectively deployed in the same JVM process on the same server, and an alarm center corresponding to each component is selected according to the company condition, and when an abnormal job execution occurs, an alarm is given by telephone, short message, nail or other means at the first time.

Preferably, in step (5), after the service accesses the task, the task control foreground checks the job information, the fragmentation scheduling condition, modifies the job time in real time, and queries the execution information and state.

The invention has the beneficial effects that: the task client is concentrated on task execution and execution state reporting, fragment scheduling, alarming, log recording and the like are decoupled through messages and the server, and the messages and the log recording are asynchronously delivered to the server to be completed, so that the load of the client node is greatly reduced.

Drawings

FIG. 1 is a system framework diagram of the present invention;

fig. 2 is a schematic diagram of a fragmentation scheduling policy.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

In the embodiment shown in fig. 1, a distributed task management platform supporting global multi-machine-room deployment includes a task executor, a task scheduler, a registration center, a task management server, and a task control foreground;

the task executor specifically executes batch processing tasks, services are accessed through the SDK of the JAVA client provided by the invention, the service runs in an application service process, the nodes are provided with a plurality of nodes, the nodes of the task executor are nodes deployed by application services, in the process of executing the batch processing tasks, the task execution state of the task executor is reported to the task management server through messages, and the message middleware used in the process is kafka;

the task scheduler executes the fragment scheduling of the task, namely determining which actuator each fragment is executed by; the method can be deployed together with a task management server, and simultaneously monitors the upper and lower line states of a task executor node on a registration center, and the fragment scheduling policy is a fragment policy based on an average allocation algorithm, as shown in fig. 2, if there are three execution nodes, the whole task is divided into 7 fragments, the process of fragment scheduling is to divide the 7 fragments by the number of nodes 3, each node is to allocate 2 fragments in sequence, the final remainder is 1, the fragments are allocated to the node 1, and the final scheduling result is: node 1(1,2,7), node 2(3,4), node 3(5, 6).

The registration center is responsible for recording and informing the upper and lower line states of the task executor nodes in the tasks, triggering fragment scheduling and storing various information; almost all other templates need to deal with the registry, and the registry is realized through Zookeeper in the scheme.

The task management service is responsible for information management of tasks and manual triggering of task execution; meanwhile, the task management service consumes the messages sent by the task executor nodes through the message middleware kafka, records the task execution state and events to the database, and gives an alarm to the tasks which fail to be executed through the alarm center; the tasks of each room need to be managed, so the module will be deployed in each room.

The task control foreground is deployed in the central machine room, and the task management server of each machine room is called to manage the task of each machine room, so that the distributed task management of the single control console and the multiple machine rooms is realized.

(1) in global machine rooms needing to be deployed, each machine room needs to deploy a dependent basic component; the whole platform needs to depend on basic components with high-performance message middleware kafka and a database for recording task execution states and events.

(2) In global machine rooms needing to be deployed, each machine room needs to be deployed with a highly available registration center; the registry is realized by ETCD, Zookeeper or the registry is realized according to the requirement, if the Zookeeper is used, at least more than 3 nodes are used, and odd number of nodes are deployed.

(3) Deploying a task scheduler and a task management server side in a global machine room needing to be deployed; the two components can be selectively deployed in the same JVM process on the same server, a corresponding alarm center is selected according to the condition of a company, and when the operation execution is abnormal, the alarm is given by telephone, short message, nail or other modes at the first time.

(5) through the provided JAVA client SDK, a task is accessed in a JAVA application in a Spring boot annotation mode, and a developer completes own batch processing service logic through expansion. After the service is accessed to the task, the task control foreground checks the operation information, the fragment scheduling condition, modifies the operation time in real time and inquires the execution information and the state.

Claims

1. A distributed task management platform supporting global multi-machine-room deployment is characterized by comprising a task executor, a task scheduler, a registration center, a task management server and a task control foreground;

2. The distributed task management platform and method for supporting global multi-machine-room deployment according to claim 1, wherein for the task scheduler, the task scheduler is deployed together with the task management server, and simultaneously monitors the upper and lower line states of the task executor nodes on the registration center, and the slicing scheduling policy is a slicing policy based on an average allocation algorithm.

3. An implementation method of a distributed task management platform supporting global multi-machine room deployment is characterized by specifically comprising the following steps:

4. The method as claimed in claim 3, wherein in step (1), the overall platform needs to rely on the basic components with high performance message middleware kafka and a database for recording task execution status and events.

5. The implementation method of the distributed task management platform supporting global multi-machine-room deployment according to claim 3, wherein in the step (2), the registry is implemented by ETCD, Zookeeper or by itself as required, and if Zookeeper is used, at least more than 3 nodes are used to deploy odd number of nodes.

6. The implementation method of the distributed task management platform supporting global multi-machine-room deployment according to claim 3, wherein in step (3), the two components can be selectively deployed in the same JVM process on the same server, and the corresponding alarm center is selected according to company conditions, and when there is an abnormal job execution, the alarm is given by phone call, short message, nail or other means at the first time.

7. The implementation method of the distributed task management platform supporting global multi-machine-room deployment according to claim 3, wherein in step (5), after the service accesses the task, the task control foreground checks the job information, the fragmentation scheduling condition, modifies the job time in real time, and queries the execution information and status.