CN110798339A - Task disaster tolerance method based on distributed task scheduling framework - Google Patents

Task disaster tolerance method based on distributed task scheduling framework Download PDF

Info

Publication number
CN110798339A
CN110798339A CN201910954331.1A CN201910954331A CN110798339A CN 110798339 A CN110798339 A CN 110798339A CN 201910954331 A CN201910954331 A CN 201910954331A CN 110798339 A CN110798339 A CN 110798339A
Authority
CN
China
Prior art keywords
task
scheduling
actuator
executor
heartbeat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910954331.1A
Other languages
Chinese (zh)
Inventor
陈佳佳
赵京虎
孙云枫
季学纯
马德超
李�昊
赵宇
闫妮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Technology Co Ltd
NARI Nanjing Control System Co Ltd
Original Assignee
Nari Technology Co Ltd
NARI Nanjing Control System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Technology Co Ltd, NARI Nanjing Control System Co Ltd filed Critical Nari Technology Co Ltd
Priority to CN201910954331.1A priority Critical patent/CN110798339A/en
Publication of CN110798339A publication Critical patent/CN110798339A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The invention discloses a task disaster tolerance method based on a distributed task scheduling framework, which comprises the following steps: the method comprises the steps that firstly, a task scheduling center is initialized, and a daemon thread is started in the initialization process and used for monitoring the heartbeat state of an actuator; secondly, registering task information by a user through a task scheduling center; thirdly, the scheduling center submits scheduling requests on time according to Cron configuration of the tasks; fourthly, the actuator receives and operates the scheduling request submitted by the scheduling center; fifthly, if the daemon thread monitors that the executor fails in the task executing process, whether the executor has a task in a running state or not is determined, and if the executor has the task in the running state, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; and sixthly, completing the task execution and returning a scheduling result. The invention solves the problem that the existing distributed task scheduling framework can not process the automatic task recovery of the disaster tolerance scene.

Description

Task disaster tolerance method based on distributed task scheduling framework
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a task disaster tolerance method based on a distributed task scheduling framework.
Background
In an enterprise-level big data platform system, a large number of business-related tasks which need to be scheduled periodically to run are ubiquitous. The tasks are characterized by automatic scheduling, automatic operation and automatic ending according to a certain time rule. Such as periodically updating the sampled data, performing a spreadsheet task at fixed points in the morning each day, periodically generating a database report each month, etc. For the service scenes, a series of open-source distributed task scheduling frameworks exist in the industry at present, such as LTS, XXL-JOB, and Elastic-JOB, and the distributed task scheduling frameworks have good scalability and expansibility, provide a user-friendly operation and maintenance management interface, support dynamic CRUD operation on tasks, and the like, and are a good choice for task scheduling of an enterprise-level large data platform.
The XXL-JOB is a lightweight and easily-extensible distributed task scheduling framework, is simple to operate and convenient to use, and is a popular open-source distributed task scheduling framework at present. The characteristics of XXL-JOB realized in the aspect of task disaster tolerance are as follows: the task scheduling can be dynamically adjusted according to the online condition of the actuator, so that the task is prevented from being scheduled to the actuator with a fault for operation; when the executor running the scheduling task fails, the task management interface provides an operation button of 'end task', and the 'end task' button can be manually clicked to trigger the task to be rescheduled and executed. XXL-JOB provides a method for task disaster tolerance to a certain extent, but can be realized by combining the manual operation of operation and maintenance personnel.
Although some good distributed task scheduling frameworks exist at present, the following problems generally exist in the use of the actual production environment: when a distributed task executor node is disconnected due to a fault or is restarted, a task which is dispatched to the executor node by the dispatching center and is in a running state is hung up, and the execution cannot be automatically resumed. The existing distributed task scheduling framework cannot well solve the problem of task automatic recovery in disaster tolerance scenes, and the reliability of task operation is just an important consideration index of a type-selecting distributed task scheduling system in the industry fields of power grids, banks, insurance and the like. In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The invention aims to provide a task disaster tolerance method based on a distributed task scheduling framework aiming at the problems in the prior art, so as to solve the problems that the running task is dead and cannot be automatically recovered in a disaster tolerance scene.
In order to achieve the purpose, the invention adopts the technical scheme that:
a task disaster tolerance method based on a distributed task scheduling framework comprises the following steps:
s1, deploying a plurality of actuators, wherein the actuators are respectively in communication connection with a dispatching center;
s2, registering task information through a scheduling center, and submitting a scheduling request to an actuator based on Cron configuration of a task;
s3, the executor receives and runs the dispatching request submitted by the dispatching center;
s4, monitoring heartbeat states of a plurality of actuators through a dispatching center;
s5, detecting the fault of the actuator in the process of executing the task, confirming whether the actuator has the task in the running state, and if yes, updating the running state of the task; triggering the task to be rescheduled to be operated on an online executor; if not, refreshing the online state of the actuator;
and S6, the executor completes the scheduling request task and returns the scheduling result.
Specifically, in step S4, the heartbeat state of the actuator is monitored by a daemon thread, and the daemon thread is started in the process of initializing a task scheduling center; the method for monitoring the heartbeat state of the actuator by the daemon thread comprises the following steps: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; and if the update _ time field in the executor information registry has an executor which is not updated in more than 3 heartbeat cycles, the executor is considered to be in a disconnection state.
Specifically, in step S5, the faults occurring in the actuator include a disconnection fault and a restart fault due to the fault;
further, when the actuator has a disconnection fault, the daemon thread can inquire a scheduling log information table of the database to determine whether a task in a running state exists on the actuator, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
Furthermore, when the actuator has a restart fault due to a cause, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles through the rescheduling service interface, and if so, classifying the fault of the actuator as offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
Further, performing retry scheduling according to the retry number configured by the task specifically includes: and periodically polling the scheduling log information table of each task in the task monitoring queue by a daemon thread running in the background of the scheduling center, and if a monitoring task in a failure state exists and the retry number is greater than 0, reducing the retry number corresponding to the task by one and then resubmitting the retry number to the task scheduling center to be scheduled and run.
In particular, the heartbeat cycle is 30 s.
Specifically, the dispatch center and the executor perform information registration discovery (i.e., service registration discovery) in a DB manner.
Corresponding to the task disaster tolerance method, the invention also provides a task disaster tolerance system based on the distributed task scheduling framework, which comprises a scheduling center and a plurality of actuators, wherein the plurality of actuators and the scheduling center register and discover information in a DB mode; the scheduling center is used for registering task information and submitting a scheduling request to an actuator based on Cron configuration of a task; the executor is used for receiving and operating a scheduling request; the dispatching center judges whether the actuator has a fault or not by monitoring the heartbeat state of the actuator; when the fact that the actuator breaks down in the task execution process is monitored, whether a task in a running state exists on the actuator is confirmed, and if the task exists, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; if not, the actuator's online status is refreshed.
Specifically, the scheduling center monitors the heartbeat state of the actuator through a daemon thread, and the daemon thread is started in the initialization process of the scheduling center; the method for monitoring the heartbeat state of the actuator by the daemon thread comprises the following steps: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; and if the update _ time field in the executor information registry has an executor which is not updated in more than 3 heartbeat cycles, the executor is considered to be in a disconnection state.
Specifically, the faults of the actuator include a disconnection fault and a restart fault due to the fault;
when the executor has a disconnection fault, the daemon thread can inquire a scheduling log information table of a database to determine whether a task in a running state exists on the executor, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; if not, refreshing the actuator information registry;
when the actuator has a restart fault due to a fault, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles or not through the rescheduling service interface, and if so, classifying the fault of the actuator into offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
In particular, the heartbeat cycle is 30 s.
Specifically, a plurality of executors and the dispatching center perform information registration discovery in a DB mode.
Compared with the prior art, the invention has the beneficial effects that: the invention can timely react and process various types of faults in the distributed task scheduling system, when the actuator has a disconnection fault or the actuator is restarted due to a fault, the task which is being scheduled and executed on the actuator can be automatically restored and executed again without the cooperation of manual operation; the task disaster tolerance method ensures high availability and high reliability of the distributed task scheduling system, and solves the problem that the existing distributed task scheduling framework cannot process automatic task recovery of disaster tolerance scenes.
Drawings
FIG. 1 is a schematic flow chart of a task disaster recovery method based on a distributed task scheduling framework according to the present invention;
FIG. 2 is a detailed flowchart of a task disaster recovery method based on a distributed task scheduling framework according to an embodiment of the present invention;
fig. 3 is a system architecture diagram of a task disaster recovery method based on a distributed task scheduling framework according to the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 and 2, the present embodiment provides a task disaster tolerance method based on a distributed task scheduling framework, including the following steps:
s1, initializing a task scheduling center, and starting a daemon thread in the initialization process for monitoring the heartbeat state of an actuator;
s2, the user registers the task information through the task scheduling center;
s3, the scheduling center submits scheduling requests according to Cron configuration of tasks;
s4, the executor receives and runs the dispatching request submitted by the dispatching center;
s5, if the daemon thread monitors that the executor fails in the process of executing the task, whether the executor has the task in the running state is determined, and if the executor has the task in the running state, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; if not, refreshing the online state of the actuator;
and S6, the executor completes the scheduling request task and returns the scheduling result.
Specifically, in step S1, the method for the daemon thread to monitor the heartbeat state of the actuator includes: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; if there is an executor whose heartbeat period is not updated (i.e. heartbeat timeout) exceeds 3 heartbeat periods in the update _ time field in the executor information registry, the executor is considered to be in a dropped state.
Specifically, in step S5, the faults occurring in the actuator include a disconnection fault and a restart fault due to the fault.
Further, when the actuator has a disconnection fault, the daemon thread can inquire a scheduling log information table of the database to determine whether a task in a running state exists on the actuator, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
Furthermore, when the actuator has a restart fault due to a cause, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles through the rescheduling service interface, and if so, classifying the fault of the actuator as offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
Further, performing retry scheduling according to the retry number configured by the task specifically includes: and periodically polling the scheduling log information table of each task in the task monitoring queue by a daemon thread running in the background of the scheduling center, and if a monitoring task in a failure state exists and the retry number is greater than 0, reducing the retry number corresponding to the task by one and then resubmitting the retry number to the task scheduling center to be scheduled and run.
In particular, the heartbeat cycle is 30 s.
Specifically, the information registration discovery is performed between the dispatch center and the executor in a DB mode.
In this embodiment, the task information registered by the user is task information to be executed by operating a Web interface or by using a built-in library-refreshing script, where a specific task execution logic must be a JobHandler implementation class in which a service has been developed.
As shown in fig. 3, the present embodiment further provides a task disaster recovery system based on a distributed task scheduling framework, where the task disaster recovery system of the present embodiment includes a scheduling center and a plurality of actuators, and the plurality of actuators and the scheduling center perform information registration and discovery in a DB manner; the scheduling center is used for registering task information and submitting a scheduling request to an actuator based on Cron configuration of a task; the executor is used for receiving and operating a scheduling request; the dispatching center judges whether the actuator has a fault or not by monitoring the heartbeat state of the actuator; when the fact that the actuator breaks down in the task execution process is monitored, whether a task in a running state exists on the actuator is confirmed, and if the task exists, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; if not, the actuator's online status is refreshed.
The dispatching center is responsible for managing dispatching information, sending out dispatching requests according to dispatching configuration, and does not bear service codes; the executor is responsible for receiving the scheduling request and executing the task logic; the task executor adopts cluster deployment, and high availability of task execution can be ensured.
The dispatch center includes: the system comprises an executor management module, a task management module, a log management module and other functional modules, wherein the executor management module is used for providing registration service, the task management module is used for providing task scheduling service, the log management module is used for providing log query service, and the scheduling center is also used for providing task callback service.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A task disaster tolerance method based on a distributed task scheduling framework is characterized by comprising the following steps:
s1, deploying a plurality of actuators, wherein the actuators are respectively in communication connection with a dispatching center;
s2, registering task information and submitting a scheduling request to an actuator based on Cron configuration of the task;
s3, the executor receives and runs the dispatching request submitted by the dispatching center;
s4, monitoring the heartbeat states of a plurality of actuators;
s5, detecting the fault of the actuator in the process of executing the task, confirming whether the actuator has the task in the running state, and if yes, updating the running state of the task; triggering the task to be rescheduled to be operated on an online executor; if not, refreshing the online state of the actuator;
and S6, the executor completes the scheduling request task and returns the scheduling result.
2. The task disaster recovery method based on the distributed task scheduling framework according to claim 1, wherein in step S4, the heartbeat state of the actuator is monitored by a daemon thread, and the daemon thread is started in the initialization process of the scheduling center; the method for monitoring the heartbeat state of the actuator by the daemon thread comprises the following steps: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; and if the update _ time field in the executor information registry has an executor which is not updated in more than 3 heartbeat cycles, the executor is considered to be in a disconnection state.
3. The task disaster recovery method based on the distributed task scheduling framework as claimed in claim 1, wherein in step S5, the failures occurred in the actuator include a disconnection failure and a failure due to restart;
when the executor has a disconnection fault, the daemon thread can inquire a scheduling log information table of a database to determine whether a task in a running state exists on the executor, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; if not, refreshing the actuator information registry;
when the actuator has a restart fault due to a fault, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles or not through the rescheduling service interface, and if so, classifying the fault of the actuator into offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
4. The task disaster recovery method based on the distributed task scheduling framework as claimed in claim 2 or 3, wherein the heartbeat period is 30 s.
5. The task disaster recovery method based on the distributed task scheduling framework as claimed in claim 1, wherein the information registration and discovery between the scheduling center and the executor are performed in a DB manner.
6. A task disaster recovery system based on a distributed task scheduling framework is based on the task disaster recovery method of any one of claims 1 to 5, and is characterized by comprising a scheduling center and a plurality of actuators, wherein the plurality of actuators are in communication connection with the scheduling center; the scheduling center is used for registering task information and submitting a scheduling request to an actuator based on Cron configuration of a task; the executor is used for receiving and operating a scheduling request; the dispatching center judges whether the actuator has a fault or not by monitoring the heartbeat state of the actuator; when the fact that the actuator breaks down in the task execution process is monitored, whether a task in a running state exists on the actuator is confirmed, and if the task exists, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; if not, the actuator's online status is refreshed.
7. The task disaster recovery system based on the distributed task scheduling framework according to claim 6, wherein the scheduling center monitors the heartbeat state of the actuator through a daemon thread, and the daemon thread is started in the initialization process of the scheduling center; the method for monitoring the heartbeat state of the actuator by the daemon thread comprises the following steps: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; and if the update _ time field in the executor information registry has an executor which is not updated in more than 3 heartbeat cycles, the executor is considered to be in a disconnection state.
8. The task disaster recovery system based on the distributed task scheduling framework as claimed in claim 6, wherein the failures occurred in the actuator include a disconnection failure and a restart failure due to a failure;
when the executor has a disconnection fault, the daemon thread can inquire a scheduling log information table of a database to determine whether a task in a running state exists on the executor, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; if not, refreshing the actuator information registry;
when the actuator has a restart fault due to a fault, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles or not through the rescheduling service interface, and if so, classifying the fault of the actuator into offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
9. The task disaster recovery system based on the distributed task scheduling framework as claimed in claim 6, wherein the heartbeat period is 30 s.
10. The task disaster recovery system based on the distributed task scheduling framework as claimed in claim 7 or 8, wherein a plurality of the executors and the scheduling center perform information registration discovery in a DB manner.
CN201910954331.1A 2019-10-09 2019-10-09 Task disaster tolerance method based on distributed task scheduling framework Pending CN110798339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910954331.1A CN110798339A (en) 2019-10-09 2019-10-09 Task disaster tolerance method based on distributed task scheduling framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910954331.1A CN110798339A (en) 2019-10-09 2019-10-09 Task disaster tolerance method based on distributed task scheduling framework

Publications (1)

Publication Number Publication Date
CN110798339A true CN110798339A (en) 2020-02-14

Family

ID=69438876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910954331.1A Pending CN110798339A (en) 2019-10-09 2019-10-09 Task disaster tolerance method based on distributed task scheduling framework

Country Status (1)

Country Link
CN (1) CN110798339A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346837A (en) * 2020-10-28 2021-02-09 常州微亿智造科技有限公司 Distributed timer system under industrial Internet of things
CN112527488A (en) * 2020-12-21 2021-03-19 浙江百应科技有限公司 Distributed high-availability task scheduling method and system
CN116974730A (en) * 2023-09-22 2023-10-31 深圳联友科技有限公司 Large-batch task processing method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268300A1 (en) * 2004-05-14 2005-12-01 Microsoft Corporation Distributed task scheduler for computing environments
CN103259832A (en) * 2012-12-24 2013-08-21 中国科学院沈阳自动化研究所 Cluster resource control method for achieving dynamic load balance, fault diagnosis and failover
CN103716182A (en) * 2013-12-12 2014-04-09 中国科学院信息工程研究所 Failure detection and fault tolerance method and failure detection and fault tolerance system for real-time cloud platform
CN104077181A (en) * 2014-06-26 2014-10-01 国电南瑞科技股份有限公司 Status consistent maintaining method applicable to distributed task management system
US20150242275A1 (en) * 2014-02-21 2015-08-27 Unisys Corporation Power efficient distribution and execution of tasks upon hardware fault with multiple processors
CN105095008A (en) * 2015-08-25 2015-11-25 国电南瑞科技股份有限公司 Distributed task fault redundancy method suitable for cluster system
CN108304255A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing
CN108958920A (en) * 2018-07-13 2018-12-07 众安在线财产保险股份有限公司 A kind of distributed task dispatching method and system
CN109408210A (en) * 2018-09-27 2019-03-01 北京车和家信息技术有限公司 Distributed timing task management method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268300A1 (en) * 2004-05-14 2005-12-01 Microsoft Corporation Distributed task scheduler for computing environments
CN103259832A (en) * 2012-12-24 2013-08-21 中国科学院沈阳自动化研究所 Cluster resource control method for achieving dynamic load balance, fault diagnosis and failover
CN103716182A (en) * 2013-12-12 2014-04-09 中国科学院信息工程研究所 Failure detection and fault tolerance method and failure detection and fault tolerance system for real-time cloud platform
US20150242275A1 (en) * 2014-02-21 2015-08-27 Unisys Corporation Power efficient distribution and execution of tasks upon hardware fault with multiple processors
CN104077181A (en) * 2014-06-26 2014-10-01 国电南瑞科技股份有限公司 Status consistent maintaining method applicable to distributed task management system
CN105095008A (en) * 2015-08-25 2015-11-25 国电南瑞科技股份有限公司 Distributed task fault redundancy method suitable for cluster system
CN108304255A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing
CN108958920A (en) * 2018-07-13 2018-12-07 众安在线财产保险股份有限公司 A kind of distributed task dispatching method and system
CN109408210A (en) * 2018-09-27 2019-03-01 北京车和家信息技术有限公司 Distributed timing task management method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘晨: "基于STM32的起重机安全监控系统设计与应用", 《湖北科技学院学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346837A (en) * 2020-10-28 2021-02-09 常州微亿智造科技有限公司 Distributed timer system under industrial Internet of things
CN112527488A (en) * 2020-12-21 2021-03-19 浙江百应科技有限公司 Distributed high-availability task scheduling method and system
CN116974730A (en) * 2023-09-22 2023-10-31 深圳联友科技有限公司 Large-batch task processing method
CN116974730B (en) * 2023-09-22 2024-01-30 深圳联友科技有限公司 Large-batch task processing method

Similar Documents

Publication Publication Date Title
US7779298B2 (en) Distributed job manager recovery
US7747717B2 (en) Fast application notification in a clustered computing system
US7457236B2 (en) Method for providing fault-tolerant application cluster service
US8938421B2 (en) Method and a system for synchronizing data
EP1650653B1 (en) Remote enterprise management of high availability systems
EP1623325B1 (en) Managing tasks in a data processing environment
CN108710544B (en) Process monitoring method of database system and rail transit comprehensive monitoring system
US20120151272A1 (en) Adding scalability and fault tolerance to generic finite state machine frameworks for use in automated incident management of cloud computing infrastructures
CN110798339A (en) Task disaster tolerance method based on distributed task scheduling framework
CN111506412A (en) Distributed asynchronous task construction and scheduling system and method based on Airflow
US20110131448A1 (en) Performing a workflow having a set of dependancy-related predefined activities on a plurality of task servers
CN106406993A (en) Timed task management method and system
CN110795503A (en) Multi-cluster data synchronization method and related device of distributed storage system
CN110895488B (en) Task scheduling method and device
CN101777020A (en) Fault tolerance method and system used for distributed program
CN117130730A (en) Metadata management method for federal Kubernetes cluster
CN113194096B (en) Task scheduling real-time tracking method and system based on distributed architecture
CN112149975B (en) APM monitoring system and method based on artificial intelligence
CN112350862A (en) Monitoring alarm and fault self-healing system
CN111309456B (en) Task execution method and system
CN112764912A (en) Lightweight distributed scheduling method and system for data integration
CN112328445B (en) Multi-node management system based on condul
CN113806051B (en) Task management method and device of computing equipment, storage medium and computing equipment
EP4006807A1 (en) Event monitoring with support system integration
CN112214323B (en) Resource recovery method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200214