CN111459639A - Distributed task management platform and method supporting global multi-machine-room deployment - Google Patents

Distributed task management platform and method supporting global multi-machine-room deployment Download PDF

Info

Publication number
CN111459639A
CN111459639A CN202010257724.XA CN202010257724A CN111459639A CN 111459639 A CN111459639 A CN 111459639A CN 202010257724 A CN202010257724 A CN 202010257724A CN 111459639 A CN111459639 A CN 111459639A
Authority
CN
China
Prior art keywords
task
task management
machine
machine room
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010257724.XA
Other languages
Chinese (zh)
Other versions
CN111459639B (en
Inventor
李进
顾湘余
杨建斌
张凯文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Quwei Science & Technology Co ltd
Original Assignee
Hangzhou Quwei Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Quwei Science & Technology Co ltd filed Critical Hangzhou Quwei Science & Technology Co ltd
Priority to CN202010257724.XA priority Critical patent/CN111459639B/en
Publication of CN111459639A publication Critical patent/CN111459639A/en
Application granted granted Critical
Publication of CN111459639B publication Critical patent/CN111459639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a distributed task management platform and a distributed task management method supporting global multi-machine room deployment. The system comprises a task executor, a task scheduler, a registration center, a task management server and a task control foreground; the task executor executes batch processing tasks, runs in an application service process, is provided with a plurality of nodes, and reports the task execution state of the task executor to a task management server through messages in the process of executing the batch processing tasks; the task scheduler executes the slicing scheduling of the tasks; the registration center is responsible for recording and notifying the upper and lower line states of the task executor nodes, triggering fragment scheduling and storing various information; the task management service is responsible for information management of tasks and manual triggering of task execution; the task control foreground is deployed in the central machine room, and the task management server of each machine room is called to manage the task of each machine room, so that the distributed task management of the single control console and the multiple machine rooms is realized. The invention has the beneficial effects that: reducing the load on the client node.

Description

Distributed task management platform and method supporting global multi-machine-room deployment
Technical Field
The invention relates to the technical field of internet correlation, in particular to a distributed task management platform and a distributed task management method supporting global multi-machine room deployment.
Background
In a business system, a timed batch processing requirement is often needed, which is similar to the function of Crontab of L inux, and these batch processing tasks rely on data of the business system more, and in addition, for a scene with strong data consistency, the execution state of the tasks needs to be fed back and presented in real time, and at the same time, high availability and failover of the tasks need to be guaranteed.
In open source software, there are some such frameworks, the most notable of which is the electronic-job-lite that is open source on the current day, and the problems that exist today are: at present, middleware related to open-source distributed task management is difficult to meet the scheduling requirement of single platform management of global multi-machine-room deployment tasks, and for the requirement of multiple machine rooms, only one set of task management system can be independently deployed in each cluster, so that complexity is brought to global multi-machine-room distributed task management, monitoring, task rule issuing and the like. At present, the main open source scheme electronic-job-lite realizes job time scheduling, fragmentation scheduling, job execution, log recording and the like in a rich client form, so that the load of the client is overlarge, and the stability and the expandability of a service system are greatly influenced.
Disclosure of Invention
The invention provides a distributed task management platform and a distributed task management method supporting global multi-machine-room deployment, which are used for overcoming the defects in the prior art and reducing the load of client nodes.
In order to achieve the purpose, the invention adopts the following technical scheme:
a distributed task management platform supporting global multi-machine-room deployment comprises a task executor, a task scheduler, a registration center, a task management server and a task control foreground;
the task executor executes batch processing tasks, runs in an application service process, is provided with a plurality of nodes, and reports the task execution state of the task executor to a task management server through messages in the process of executing the batch processing tasks;
the task scheduler executes the fragment scheduling of the task, namely determining which actuator executes each fragment;
the registration center is responsible for recording and informing the upper and lower line states of the task executor nodes in the tasks, triggering fragment scheduling and storing various information;
the task management service is responsible for information management of tasks and manual triggering of task execution; meanwhile, the task management service consumes the messages sent by the task executor nodes through the message middleware kafka, records the task execution state and events to the database, and gives an alarm to the tasks which fail to be executed through the alarm center;
the task control foreground is deployed in the central machine room, and the task management server of each machine room is called to manage the task of each machine room, so that the distributed task management of a single console and multiple machine rooms is realized.
The platform manages distributed tasks deployed in multiple machine rooms, and comprises task fragment scheduling and task execution separation, so that a task client is concentrated in task execution and execution state reporting, fragment scheduling, alarming, log recording and the like are decoupled through messages and a server, and the messages and the log recording are asynchronously handed to the server to be completed, and the load of client nodes is greatly reduced.
Preferably, for the task scheduler, the task scheduler is deployed together with the task management server and also monitors the upper and lower line states of the task executor nodes on the registration center, and the slicing scheduling policy is a slicing policy based on an average allocation algorithm.
The invention also provides an implementation method of the distributed task management platform supporting global multi-machine room deployment, which specifically comprises the following steps:
(1) in global machine rooms needing to be deployed, each machine room needs to deploy a dependent basic component;
(2) in global machine rooms needing to be deployed, each machine room needs to be deployed with a highly available registration center;
(3) deploying a task scheduler and a task management server side in a global machine room needing to be deployed;
(4) deploying a task control foreground in the central machine room, wherein the task control foreground is a UI (user interface) for managing and operating tasks, configuring the addresses of task management servers of different machine rooms, and calling each machine room service to manage and control the tasks;
(5) through the provided JAVA client SDK, a task is accessed in a JAVA application in a Spring boot annotation mode, and a developer completes own batch processing service logic through expansion.
Preferably, in step (1), the whole platform needs the high-performance message middleware kafka and a database for recording the task execution state and events of the dependent basic components.
Preferably, in step (2), the registry is implemented by an ETCD, a Zookeeper or by itself as required, and if the Zookeeper is used, at least more than 3 nodes are used, and an odd number of nodes are deployed.
Preferably, in step (3), the two components may be selectively deployed in the same JVM process on the same server, and an alarm center corresponding to each component is selected according to the company condition, and when an abnormal job execution occurs, an alarm is given by telephone, short message, nail or other means at the first time.
Preferably, in step (5), after the service accesses the task, the task control foreground checks the job information, the fragmentation scheduling condition, modifies the job time in real time, and queries the execution information and state.
The invention has the beneficial effects that: the task client is concentrated on task execution and execution state reporting, fragment scheduling, alarming, log recording and the like are decoupled through messages and the server, and the messages and the log recording are asynchronously delivered to the server to be completed, so that the load of the client node is greatly reduced.
Drawings
FIG. 1 is a system framework diagram of the present invention;
fig. 2 is a schematic diagram of a fragmentation scheduling policy.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
In the embodiment shown in fig. 1, a distributed task management platform supporting global multi-machine-room deployment includes a task executor, a task scheduler, a registration center, a task management server, and a task control foreground;
the task executor specifically executes batch processing tasks, services are accessed through the SDK of the JAVA client provided by the invention, the service runs in an application service process, the nodes are provided with a plurality of nodes, the nodes of the task executor are nodes deployed by application services, in the process of executing the batch processing tasks, the task execution state of the task executor is reported to the task management server through messages, and the message middleware used in the process is kafka;
the task scheduler executes the fragment scheduling of the task, namely determining which actuator each fragment is executed by; the method can be deployed together with a task management server, and simultaneously monitors the upper and lower line states of a task executor node on a registration center, and the fragment scheduling policy is a fragment policy based on an average allocation algorithm, as shown in fig. 2, if there are three execution nodes, the whole task is divided into 7 fragments, the process of fragment scheduling is to divide the 7 fragments by the number of nodes 3, each node is to allocate 2 fragments in sequence, the final remainder is 1, the fragments are allocated to the node 1, and the final scheduling result is: node 1(1,2,7), node 2(3,4), node 3(5, 6).
The registration center is responsible for recording and informing the upper and lower line states of the task executor nodes in the tasks, triggering fragment scheduling and storing various information; almost all other templates need to deal with the registry, and the registry is realized through Zookeeper in the scheme.
The task management service is responsible for information management of tasks and manual triggering of task execution; meanwhile, the task management service consumes the messages sent by the task executor nodes through the message middleware kafka, records the task execution state and events to the database, and gives an alarm to the tasks which fail to be executed through the alarm center; the tasks of each room need to be managed, so the module will be deployed in each room.
The task control foreground is deployed in the central machine room, and the task management server of each machine room is called to manage the task of each machine room, so that the distributed task management of the single control console and the multiple machine rooms is realized.
The invention also provides an implementation method of the distributed task management platform supporting global multi-machine room deployment, which specifically comprises the following steps:
(1) in global machine rooms needing to be deployed, each machine room needs to deploy a dependent basic component; the whole platform needs to depend on basic components with high-performance message middleware kafka and a database for recording task execution states and events.
(2) In global machine rooms needing to be deployed, each machine room needs to be deployed with a highly available registration center; the registry is realized by ETCD, Zookeeper or the registry is realized according to the requirement, if the Zookeeper is used, at least more than 3 nodes are used, and odd number of nodes are deployed.
(3) Deploying a task scheduler and a task management server side in a global machine room needing to be deployed; the two components can be selectively deployed in the same JVM process on the same server, a corresponding alarm center is selected according to the condition of a company, and when the operation execution is abnormal, the alarm is given by telephone, short message, nail or other modes at the first time.
(4) Deploying a task control foreground in the central machine room, wherein the task control foreground is a UI (user interface) for managing and operating tasks, configuring the addresses of task management servers of different machine rooms, and calling each machine room service to manage and control the tasks;
(5) through the provided JAVA client SDK, a task is accessed in a JAVA application in a Spring boot annotation mode, and a developer completes own batch processing service logic through expansion. After the service is accessed to the task, the task control foreground checks the operation information, the fragment scheduling condition, modifies the operation time in real time and inquires the execution information and the state.
The platform manages distributed tasks deployed in multiple machine rooms, and comprises task fragment scheduling and task execution separation, so that a task client is concentrated in task execution and execution state reporting, fragment scheduling, alarming, log recording and the like are decoupled through messages and a server, and the messages and the log recording are asynchronously handed to the server to be completed, and the load of client nodes is greatly reduced.

Claims (7)

1. A distributed task management platform supporting global multi-machine-room deployment is characterized by comprising a task executor, a task scheduler, a registration center, a task management server and a task control foreground;
the task executor executes batch processing tasks, runs in an application service process, is provided with a plurality of nodes, and reports the task execution state of the task executor to a task management server through messages in the process of executing the batch processing tasks;
the task scheduler executes the fragment scheduling of the task, namely determining which actuator executes each fragment;
the registration center is responsible for recording and informing the upper and lower line states of the task executor nodes in the tasks, triggering fragment scheduling and storing various information;
the task management service is responsible for information management of tasks and manual triggering of task execution; meanwhile, the task management service consumes the messages sent by the task executor nodes through the message middleware kafka, records the task execution state and events to the database, and gives an alarm to the tasks which fail to be executed through the alarm center;
the task control foreground is deployed in the central machine room, and the task management server of each machine room is called to manage the task of each machine room, so that the distributed task management of a single console and multiple machine rooms is realized.
2. The distributed task management platform and method for supporting global multi-machine-room deployment according to claim 1, wherein for the task scheduler, the task scheduler is deployed together with the task management server, and simultaneously monitors the upper and lower line states of the task executor nodes on the registration center, and the slicing scheduling policy is a slicing policy based on an average allocation algorithm.
3. An implementation method of a distributed task management platform supporting global multi-machine room deployment is characterized by specifically comprising the following steps:
(1) in global machine rooms needing to be deployed, each machine room needs to deploy a dependent basic component;
(2) in global machine rooms needing to be deployed, each machine room needs to be deployed with a highly available registration center;
(3) deploying a task scheduler and a task management server side in a global machine room needing to be deployed;
(4) deploying a task control foreground in the central machine room, wherein the task control foreground is a UI (user interface) for managing and operating tasks, configuring the addresses of task management servers of different machine rooms, and calling each machine room service to manage and control the tasks;
(5) through the provided JAVA client SDK, a task is accessed in a JAVA application in a Spring boot annotation mode, and a developer completes own batch processing service logic through expansion.
4. The method as claimed in claim 3, wherein in step (1), the overall platform needs to rely on the basic components with high performance message middleware kafka and a database for recording task execution status and events.
5. The implementation method of the distributed task management platform supporting global multi-machine-room deployment according to claim 3, wherein in the step (2), the registry is implemented by ETCD, Zookeeper or by itself as required, and if Zookeeper is used, at least more than 3 nodes are used to deploy odd number of nodes.
6. The implementation method of the distributed task management platform supporting global multi-machine-room deployment according to claim 3, wherein in step (3), the two components can be selectively deployed in the same JVM process on the same server, and the corresponding alarm center is selected according to company conditions, and when there is an abnormal job execution, the alarm is given by phone call, short message, nail or other means at the first time.
7. The implementation method of the distributed task management platform supporting global multi-machine-room deployment according to claim 3, wherein in step (5), after the service accesses the task, the task control foreground checks the job information, the fragmentation scheduling condition, modifies the job time in real time, and queries the execution information and status.
CN202010257724.XA 2020-04-03 2020-04-03 Distributed task management platform and method supporting global multi-machine room deployment Active CN111459639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010257724.XA CN111459639B (en) 2020-04-03 2020-04-03 Distributed task management platform and method supporting global multi-machine room deployment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010257724.XA CN111459639B (en) 2020-04-03 2020-04-03 Distributed task management platform and method supporting global multi-machine room deployment

Publications (2)

Publication Number Publication Date
CN111459639A true CN111459639A (en) 2020-07-28
CN111459639B CN111459639B (en) 2023-10-20

Family

ID=71681077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010257724.XA Active CN111459639B (en) 2020-04-03 2020-04-03 Distributed task management platform and method supporting global multi-machine room deployment

Country Status (1)

Country Link
CN (1) CN111459639B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858007A (en) * 2020-07-29 2020-10-30 广州海鹚网络科技有限公司 Task scheduling method and device based on message middleware
CN112527478A (en) * 2020-11-30 2021-03-19 成都中科大旗软件股份有限公司 Method and system for realizing automatic task registration and asynchronous scheduling based on distribution
CN113485812A (en) * 2021-07-23 2021-10-08 重庆富民银行股份有限公司 Partition parallel processing method and system based on large data volume task
CN114968504A (en) * 2021-02-26 2022-08-30 中国联合网络通信集团有限公司 Distributed task scheduling method and device and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102710554A (en) * 2012-06-25 2012-10-03 深圳中兴网信科技有限公司 Distributed message system and service status detection method thereof
CN103324539A (en) * 2013-06-24 2013-09-25 浪潮电子信息产业股份有限公司 Job scheduling management system and method
US20140298350A1 (en) * 2013-03-27 2014-10-02 Nec Corporation Distributed processing system
CN104536809A (en) * 2014-11-26 2015-04-22 上海瀚之友信息技术服务有限公司 Distributed timing task scheduling system based on client and server system
CN106201694A (en) * 2016-07-13 2016-12-07 北京农信互联科技有限公司 Configuration method and system for executing timing task under distributed system
CN106874090A (en) * 2017-01-23 2017-06-20 北京思特奇信息技术股份有限公司 Job scheduling method and system based on cloud system
CN107766136A (en) * 2017-09-30 2018-03-06 南威软件股份有限公司 A kind of method of task cluster management and running
CN108958920A (en) * 2018-07-13 2018-12-07 众安在线财产保险股份有限公司 A kind of distributed task dispatching method and system
CN110569113A (en) * 2018-06-06 2019-12-13 海通证券股份有限公司 Method and system for scheduling distributed tasks and computer readable storage medium
CN110780869A (en) * 2019-10-31 2020-02-11 辽宁振兴银行股份有限公司 Distributed batch scheduling
CN110928662A (en) * 2019-11-28 2020-03-27 国网信息通信产业集团有限公司 Distributed timing task scheduler facing micro-service architecture

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102710554A (en) * 2012-06-25 2012-10-03 深圳中兴网信科技有限公司 Distributed message system and service status detection method thereof
US20140298350A1 (en) * 2013-03-27 2014-10-02 Nec Corporation Distributed processing system
CN103324539A (en) * 2013-06-24 2013-09-25 浪潮电子信息产业股份有限公司 Job scheduling management system and method
CN104536809A (en) * 2014-11-26 2015-04-22 上海瀚之友信息技术服务有限公司 Distributed timing task scheduling system based on client and server system
CN106201694A (en) * 2016-07-13 2016-12-07 北京农信互联科技有限公司 Configuration method and system for executing timing task under distributed system
CN106874090A (en) * 2017-01-23 2017-06-20 北京思特奇信息技术股份有限公司 Job scheduling method and system based on cloud system
CN107766136A (en) * 2017-09-30 2018-03-06 南威软件股份有限公司 A kind of method of task cluster management and running
CN110569113A (en) * 2018-06-06 2019-12-13 海通证券股份有限公司 Method and system for scheduling distributed tasks and computer readable storage medium
CN108958920A (en) * 2018-07-13 2018-12-07 众安在线财产保险股份有限公司 A kind of distributed task dispatching method and system
CN110780869A (en) * 2019-10-31 2020-02-11 辽宁振兴银行股份有限公司 Distributed batch scheduling
CN110928662A (en) * 2019-11-28 2020-03-27 国网信息通信产业集团有限公司 Distributed timing task scheduler facing micro-service architecture

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858007A (en) * 2020-07-29 2020-10-30 广州海鹚网络科技有限公司 Task scheduling method and device based on message middleware
CN112527478A (en) * 2020-11-30 2021-03-19 成都中科大旗软件股份有限公司 Method and system for realizing automatic task registration and asynchronous scheduling based on distribution
CN112527478B (en) * 2020-11-30 2023-11-07 成都中科大旗软件股份有限公司 Method and system for realizing automatic registration and asynchronous scheduling of tasks based on distribution
CN114968504A (en) * 2021-02-26 2022-08-30 中国联合网络通信集团有限公司 Distributed task scheduling method and device and storage medium
CN113485812A (en) * 2021-07-23 2021-10-08 重庆富民银行股份有限公司 Partition parallel processing method and system based on large data volume task
CN113485812B (en) * 2021-07-23 2023-12-12 重庆富民银行股份有限公司 Partition parallel processing method and system based on large-data-volume task

Also Published As

Publication number Publication date
CN111459639B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN111459639A (en) Distributed task management platform and method supporting global multi-machine-room deployment
CN111290834B (en) Method, device and equipment for realizing high service availability based on cloud management platform
CA3168286A1 (en) Data flow processing method and system
CN113742031B (en) Node state information acquisition method and device, electronic equipment and readable storage medium
CN104935672A (en) High available realizing method and equipment of load balancing service
CN109656742B (en) Node exception handling method and device and storage medium
CN111274052A (en) Data distribution method, server, and computer-readable storage medium
CN111209110B (en) Task scheduling management method, system and storage medium for realizing load balancing
CN105630589A (en) Distributed process scheduling system and process scheduling and execution method
CN109802986B (en) Equipment management method, system, device and server
CN112003728B (en) Kubernetes cluster-based application master and standby implementation method and device
CN112910937B (en) Object scheduling method and device in container cluster, server and container cluster
US20150186489A1 (en) System and method for supporting asynchronous invocation in a distributed data grid
CN111427670A (en) Task scheduling method and system
CN111414241A (en) Batch data processing method, device and system, computer equipment and computer readable storage medium
CN114138434A (en) Big data task scheduling system
CN111897643A (en) Thread pool configuration system, method, device and storage medium
JP2006285443A (en) Object relief system and method
CN113765690A (en) Cluster switching method, system, device, terminal, server and storage medium
CN113238849A (en) Timed task processing method and device, storage medium and electronic equipment
CN115391058B (en) SDN-based resource event processing method, resource creation method and system
CN117201278A (en) Method for realizing disaster recovery high-availability scene of primary and backup cloud primary application in information creation environment
CN115426356A (en) Distributed timed task lock update control execution method and device
CN113434316A (en) Function integration method, device, equipment and storage medium based on redis plug-in
CN115550371B (en) Pod scheduling method and system based on Kubernetes and cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 22nd floor, block a, Huaxing Times Square, 478 Wensan Road, Xihu District, Hangzhou, Zhejiang 310000

Applicant after: Hangzhou Xiaoying Innovation Technology Co.,Ltd.

Address before: 16 / F, HANGGANG Metallurgical Science and technology building, 294 Tianmushan Road, Xihu District, Hangzhou City, Zhejiang Province, 310012

Applicant before: HANGZHOU QUWEI SCIENCE & TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant