CN110798339A - Task disaster tolerance method based on distributed task scheduling framework - Google Patents
Task disaster tolerance method based on distributed task scheduling framework Download PDFInfo
- Publication number
- CN110798339A CN110798339A CN201910954331.1A CN201910954331A CN110798339A CN 110798339 A CN110798339 A CN 110798339A CN 201910954331 A CN201910954331 A CN 201910954331A CN 110798339 A CN110798339 A CN 110798339A
- Authority
- CN
- China
- Prior art keywords
- task
- scheduling
- actuator
- executor
- heartbeat
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Retry When Errors Occur (AREA)
Abstract
The invention discloses a task disaster tolerance method based on a distributed task scheduling framework, which comprises the following steps: the method comprises the steps that firstly, a task scheduling center is initialized, and a daemon thread is started in the initialization process and used for monitoring the heartbeat state of an actuator; secondly, registering task information by a user through a task scheduling center; thirdly, the scheduling center submits scheduling requests on time according to Cron configuration of the tasks; fourthly, the actuator receives and operates the scheduling request submitted by the scheduling center; fifthly, if the daemon thread monitors that the executor fails in the task executing process, whether the executor has a task in a running state or not is determined, and if the executor has the task in the running state, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; and sixthly, completing the task execution and returning a scheduling result. The invention solves the problem that the existing distributed task scheduling framework can not process the automatic task recovery of the disaster tolerance scene.
Description
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a task disaster tolerance method based on a distributed task scheduling framework.
Background
In an enterprise-level big data platform system, a large number of business-related tasks which need to be scheduled periodically to run are ubiquitous. The tasks are characterized by automatic scheduling, automatic operation and automatic ending according to a certain time rule. Such as periodically updating the sampled data, performing a spreadsheet task at fixed points in the morning each day, periodically generating a database report each month, etc. For the service scenes, a series of open-source distributed task scheduling frameworks exist in the industry at present, such as LTS, XXL-JOB, and Elastic-JOB, and the distributed task scheduling frameworks have good scalability and expansibility, provide a user-friendly operation and maintenance management interface, support dynamic CRUD operation on tasks, and the like, and are a good choice for task scheduling of an enterprise-level large data platform.
The XXL-JOB is a lightweight and easily-extensible distributed task scheduling framework, is simple to operate and convenient to use, and is a popular open-source distributed task scheduling framework at present. The characteristics of XXL-JOB realized in the aspect of task disaster tolerance are as follows: the task scheduling can be dynamically adjusted according to the online condition of the actuator, so that the task is prevented from being scheduled to the actuator with a fault for operation; when the executor running the scheduling task fails, the task management interface provides an operation button of 'end task', and the 'end task' button can be manually clicked to trigger the task to be rescheduled and executed. XXL-JOB provides a method for task disaster tolerance to a certain extent, but can be realized by combining the manual operation of operation and maintenance personnel.
Although some good distributed task scheduling frameworks exist at present, the following problems generally exist in the use of the actual production environment: when a distributed task executor node is disconnected due to a fault or is restarted, a task which is dispatched to the executor node by the dispatching center and is in a running state is hung up, and the execution cannot be automatically resumed. The existing distributed task scheduling framework cannot well solve the problem of task automatic recovery in disaster tolerance scenes, and the reliability of task operation is just an important consideration index of a type-selecting distributed task scheduling system in the industry fields of power grids, banks, insurance and the like. In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The invention aims to provide a task disaster tolerance method based on a distributed task scheduling framework aiming at the problems in the prior art, so as to solve the problems that the running task is dead and cannot be automatically recovered in a disaster tolerance scene.
In order to achieve the purpose, the invention adopts the technical scheme that:
a task disaster tolerance method based on a distributed task scheduling framework comprises the following steps:
s1, deploying a plurality of actuators, wherein the actuators are respectively in communication connection with a dispatching center;
s2, registering task information through a scheduling center, and submitting a scheduling request to an actuator based on Cron configuration of a task;
s3, the executor receives and runs the dispatching request submitted by the dispatching center;
s4, monitoring heartbeat states of a plurality of actuators through a dispatching center;
s5, detecting the fault of the actuator in the process of executing the task, confirming whether the actuator has the task in the running state, and if yes, updating the running state of the task; triggering the task to be rescheduled to be operated on an online executor; if not, refreshing the online state of the actuator;
and S6, the executor completes the scheduling request task and returns the scheduling result.
Specifically, in step S4, the heartbeat state of the actuator is monitored by a daemon thread, and the daemon thread is started in the process of initializing a task scheduling center; the method for monitoring the heartbeat state of the actuator by the daemon thread comprises the following steps: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; and if the update _ time field in the executor information registry has an executor which is not updated in more than 3 heartbeat cycles, the executor is considered to be in a disconnection state.
Specifically, in step S5, the faults occurring in the actuator include a disconnection fault and a restart fault due to the fault;
further, when the actuator has a disconnection fault, the daemon thread can inquire a scheduling log information table of the database to determine whether a task in a running state exists on the actuator, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
Furthermore, when the actuator has a restart fault due to a cause, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles through the rescheduling service interface, and if so, classifying the fault of the actuator as offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
Further, performing retry scheduling according to the retry number configured by the task specifically includes: and periodically polling the scheduling log information table of each task in the task monitoring queue by a daemon thread running in the background of the scheduling center, and if a monitoring task in a failure state exists and the retry number is greater than 0, reducing the retry number corresponding to the task by one and then resubmitting the retry number to the task scheduling center to be scheduled and run.
In particular, the heartbeat cycle is 30 s.
Specifically, the dispatch center and the executor perform information registration discovery (i.e., service registration discovery) in a DB manner.
Corresponding to the task disaster tolerance method, the invention also provides a task disaster tolerance system based on the distributed task scheduling framework, which comprises a scheduling center and a plurality of actuators, wherein the plurality of actuators and the scheduling center register and discover information in a DB mode; the scheduling center is used for registering task information and submitting a scheduling request to an actuator based on Cron configuration of a task; the executor is used for receiving and operating a scheduling request; the dispatching center judges whether the actuator has a fault or not by monitoring the heartbeat state of the actuator; when the fact that the actuator breaks down in the task execution process is monitored, whether a task in a running state exists on the actuator is confirmed, and if the task exists, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; if not, the actuator's online status is refreshed.
Specifically, the scheduling center monitors the heartbeat state of the actuator through a daemon thread, and the daemon thread is started in the initialization process of the scheduling center; the method for monitoring the heartbeat state of the actuator by the daemon thread comprises the following steps: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; and if the update _ time field in the executor information registry has an executor which is not updated in more than 3 heartbeat cycles, the executor is considered to be in a disconnection state.
Specifically, the faults of the actuator include a disconnection fault and a restart fault due to the fault;
when the executor has a disconnection fault, the daemon thread can inquire a scheduling log information table of a database to determine whether a task in a running state exists on the executor, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; if not, refreshing the actuator information registry;
when the actuator has a restart fault due to a fault, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles or not through the rescheduling service interface, and if so, classifying the fault of the actuator into offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
In particular, the heartbeat cycle is 30 s.
Specifically, a plurality of executors and the dispatching center perform information registration discovery in a DB mode.
Compared with the prior art, the invention has the beneficial effects that: the invention can timely react and process various types of faults in the distributed task scheduling system, when the actuator has a disconnection fault or the actuator is restarted due to a fault, the task which is being scheduled and executed on the actuator can be automatically restored and executed again without the cooperation of manual operation; the task disaster tolerance method ensures high availability and high reliability of the distributed task scheduling system, and solves the problem that the existing distributed task scheduling framework cannot process automatic task recovery of disaster tolerance scenes.
Drawings
FIG. 1 is a schematic flow chart of a task disaster recovery method based on a distributed task scheduling framework according to the present invention;
FIG. 2 is a detailed flowchart of a task disaster recovery method based on a distributed task scheduling framework according to an embodiment of the present invention;
fig. 3 is a system architecture diagram of a task disaster recovery method based on a distributed task scheduling framework according to the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 and 2, the present embodiment provides a task disaster tolerance method based on a distributed task scheduling framework, including the following steps:
s1, initializing a task scheduling center, and starting a daemon thread in the initialization process for monitoring the heartbeat state of an actuator;
s2, the user registers the task information through the task scheduling center;
s3, the scheduling center submits scheduling requests according to Cron configuration of tasks;
s4, the executor receives and runs the dispatching request submitted by the dispatching center;
s5, if the daemon thread monitors that the executor fails in the process of executing the task, whether the executor has the task in the running state is determined, and if the executor has the task in the running state, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; if not, refreshing the online state of the actuator;
and S6, the executor completes the scheduling request task and returns the scheduling result.
Specifically, in step S1, the method for the daemon thread to monitor the heartbeat state of the actuator includes: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; if there is an executor whose heartbeat period is not updated (i.e. heartbeat timeout) exceeds 3 heartbeat periods in the update _ time field in the executor information registry, the executor is considered to be in a dropped state.
Specifically, in step S5, the faults occurring in the actuator include a disconnection fault and a restart fault due to the fault.
Further, when the actuator has a disconnection fault, the daemon thread can inquire a scheduling log information table of the database to determine whether a task in a running state exists on the actuator, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
Furthermore, when the actuator has a restart fault due to a cause, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles through the rescheduling service interface, and if so, classifying the fault of the actuator as offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
Further, performing retry scheduling according to the retry number configured by the task specifically includes: and periodically polling the scheduling log information table of each task in the task monitoring queue by a daemon thread running in the background of the scheduling center, and if a monitoring task in a failure state exists and the retry number is greater than 0, reducing the retry number corresponding to the task by one and then resubmitting the retry number to the task scheduling center to be scheduled and run.
In particular, the heartbeat cycle is 30 s.
Specifically, the information registration discovery is performed between the dispatch center and the executor in a DB mode.
In this embodiment, the task information registered by the user is task information to be executed by operating a Web interface or by using a built-in library-refreshing script, where a specific task execution logic must be a JobHandler implementation class in which a service has been developed.
As shown in fig. 3, the present embodiment further provides a task disaster recovery system based on a distributed task scheduling framework, where the task disaster recovery system of the present embodiment includes a scheduling center and a plurality of actuators, and the plurality of actuators and the scheduling center perform information registration and discovery in a DB manner; the scheduling center is used for registering task information and submitting a scheduling request to an actuator based on Cron configuration of a task; the executor is used for receiving and operating a scheduling request; the dispatching center judges whether the actuator has a fault or not by monitoring the heartbeat state of the actuator; when the fact that the actuator breaks down in the task execution process is monitored, whether a task in a running state exists on the actuator is confirmed, and if the task exists, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; if not, the actuator's online status is refreshed.
The dispatching center is responsible for managing dispatching information, sending out dispatching requests according to dispatching configuration, and does not bear service codes; the executor is responsible for receiving the scheduling request and executing the task logic; the task executor adopts cluster deployment, and high availability of task execution can be ensured.
The dispatch center includes: the system comprises an executor management module, a task management module, a log management module and other functional modules, wherein the executor management module is used for providing registration service, the task management module is used for providing task scheduling service, the log management module is used for providing log query service, and the scheduling center is also used for providing task callback service.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A task disaster tolerance method based on a distributed task scheduling framework is characterized by comprising the following steps:
s1, deploying a plurality of actuators, wherein the actuators are respectively in communication connection with a dispatching center;
s2, registering task information and submitting a scheduling request to an actuator based on Cron configuration of the task;
s3, the executor receives and runs the dispatching request submitted by the dispatching center;
s4, monitoring the heartbeat states of a plurality of actuators;
s5, detecting the fault of the actuator in the process of executing the task, confirming whether the actuator has the task in the running state, and if yes, updating the running state of the task; triggering the task to be rescheduled to be operated on an online executor; if not, refreshing the online state of the actuator;
and S6, the executor completes the scheduling request task and returns the scheduling result.
2. The task disaster recovery method based on the distributed task scheduling framework according to claim 1, wherein in step S4, the heartbeat state of the actuator is monitored by a daemon thread, and the daemon thread is started in the initialization process of the scheduling center; the method for monitoring the heartbeat state of the actuator by the daemon thread comprises the following steps: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; and if the update _ time field in the executor information registry has an executor which is not updated in more than 3 heartbeat cycles, the executor is considered to be in a disconnection state.
3. The task disaster recovery method based on the distributed task scheduling framework as claimed in claim 1, wherein in step S5, the failures occurred in the actuator include a disconnection failure and a failure due to restart;
when the executor has a disconnection fault, the daemon thread can inquire a scheduling log information table of a database to determine whether a task in a running state exists on the executor, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; if not, refreshing the actuator information registry;
when the actuator has a restart fault due to a fault, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles or not through the rescheduling service interface, and if so, classifying the fault of the actuator into offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
4. The task disaster recovery method based on the distributed task scheduling framework as claimed in claim 2 or 3, wherein the heartbeat period is 30 s.
5. The task disaster recovery method based on the distributed task scheduling framework as claimed in claim 1, wherein the information registration and discovery between the scheduling center and the executor are performed in a DB manner.
6. A task disaster recovery system based on a distributed task scheduling framework is based on the task disaster recovery method of any one of claims 1 to 5, and is characterized by comprising a scheduling center and a plurality of actuators, wherein the plurality of actuators are in communication connection with the scheduling center; the scheduling center is used for registering task information and submitting a scheduling request to an actuator based on Cron configuration of a task; the executor is used for receiving and operating a scheduling request; the dispatching center judges whether the actuator has a fault or not by monitoring the heartbeat state of the actuator; when the fact that the actuator breaks down in the task execution process is monitored, whether a task in a running state exists on the actuator is confirmed, and if the task exists, the running state of the task is updated; triggering the task to be rescheduled to be operated on an online executor; if not, the actuator's online status is refreshed.
7. The task disaster recovery system based on the distributed task scheduling framework according to claim 6, wherein the scheduling center monitors the heartbeat state of the actuator through a daemon thread, and the daemon thread is started in the initialization process of the scheduling center; the method for monitoring the heartbeat state of the actuator by the daemon thread comprises the following steps: the daemon thread inquires an actuator information registry of the database once every 1 heartbeat cycle, and whether an actuator is disconnected or not is judged according to an update _ time field of the registry; and if the update _ time field in the executor information registry has an executor which is not updated in more than 3 heartbeat cycles, the executor is considered to be in a disconnection state.
8. The task disaster recovery system based on the distributed task scheduling framework as claimed in claim 6, wherein the failures occurred in the actuator include a disconnection failure and a restart failure due to a failure;
when the executor has a disconnection fault, the daemon thread can inquire a scheduling log information table of a database to determine whether a task in a running state exists on the executor, and if so, the running state of the task is updated to be failed; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; if not, refreshing the actuator information registry;
when the actuator has a restart fault due to a fault, calling a rescheduling service interface of a dispatching center, judging whether the restart time of the actuator exceeds 3 heartbeat cycles or not through the rescheduling service interface, and if so, classifying the fault of the actuator into offline fault processing; if the restart time does not exceed 3 heartbeat cycles, inquiring a database scheduling log through a rescheduling service interface, confirming whether a task in a running state exists on the actuator, and if so, updating the running state of the task to be failure; then, the scheduling center performs retry scheduling according to the retry times of the task configuration, and triggers the task to be rescheduled to an on-line executor to run; and if not, refreshing the actuator information registry.
9. The task disaster recovery system based on the distributed task scheduling framework as claimed in claim 6, wherein the heartbeat period is 30 s.
10. The task disaster recovery system based on the distributed task scheduling framework as claimed in claim 7 or 8, wherein a plurality of the executors and the scheduling center perform information registration discovery in a DB manner.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910954331.1A CN110798339A (en) | 2019-10-09 | 2019-10-09 | Task disaster tolerance method based on distributed task scheduling framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910954331.1A CN110798339A (en) | 2019-10-09 | 2019-10-09 | Task disaster tolerance method based on distributed task scheduling framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110798339A true CN110798339A (en) | 2020-02-14 |
Family
ID=69438876
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910954331.1A Pending CN110798339A (en) | 2019-10-09 | 2019-10-09 | Task disaster tolerance method based on distributed task scheduling framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110798339A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112346837A (en) * | 2020-10-28 | 2021-02-09 | 常州微亿智造科技有限公司 | Distributed timer system under industrial Internet of things |
CN112527488A (en) * | 2020-12-21 | 2021-03-19 | 浙江百应科技有限公司 | Distributed high-availability task scheduling method and system |
CN116974730A (en) * | 2023-09-22 | 2023-10-31 | 深圳联友科技有限公司 | Large-batch task processing method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050268300A1 (en) * | 2004-05-14 | 2005-12-01 | Microsoft Corporation | Distributed task scheduler for computing environments |
CN103259832A (en) * | 2012-12-24 | 2013-08-21 | 中国科学院沈阳自动化研究所 | Cluster resource control method for achieving dynamic load balance, fault diagnosis and failover |
CN103716182A (en) * | 2013-12-12 | 2014-04-09 | 中国科学院信息工程研究所 | Failure detection and fault tolerance method and failure detection and fault tolerance system for real-time cloud platform |
CN104077181A (en) * | 2014-06-26 | 2014-10-01 | 国电南瑞科技股份有限公司 | Status consistent maintaining method applicable to distributed task management system |
US20150242275A1 (en) * | 2014-02-21 | 2015-08-27 | Unisys Corporation | Power efficient distribution and execution of tasks upon hardware fault with multiple processors |
CN105095008A (en) * | 2015-08-25 | 2015-11-25 | 国电南瑞科技股份有限公司 | Distributed task fault redundancy method suitable for cluster system |
CN108304255A (en) * | 2017-12-29 | 2018-07-20 | 北京城市网邻信息技术有限公司 | Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing |
CN108958920A (en) * | 2018-07-13 | 2018-12-07 | 众安在线财产保险股份有限公司 | A kind of distributed task dispatching method and system |
CN109408210A (en) * | 2018-09-27 | 2019-03-01 | 北京车和家信息技术有限公司 | Distributed timing task management method and system |
-
2019
- 2019-10-09 CN CN201910954331.1A patent/CN110798339A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050268300A1 (en) * | 2004-05-14 | 2005-12-01 | Microsoft Corporation | Distributed task scheduler for computing environments |
CN103259832A (en) * | 2012-12-24 | 2013-08-21 | 中国科学院沈阳自动化研究所 | Cluster resource control method for achieving dynamic load balance, fault diagnosis and failover |
CN103716182A (en) * | 2013-12-12 | 2014-04-09 | 中国科学院信息工程研究所 | Failure detection and fault tolerance method and failure detection and fault tolerance system for real-time cloud platform |
US20150242275A1 (en) * | 2014-02-21 | 2015-08-27 | Unisys Corporation | Power efficient distribution and execution of tasks upon hardware fault with multiple processors |
CN104077181A (en) * | 2014-06-26 | 2014-10-01 | 国电南瑞科技股份有限公司 | Status consistent maintaining method applicable to distributed task management system |
CN105095008A (en) * | 2015-08-25 | 2015-11-25 | 国电南瑞科技股份有限公司 | Distributed task fault redundancy method suitable for cluster system |
CN108304255A (en) * | 2017-12-29 | 2018-07-20 | 北京城市网邻信息技术有限公司 | Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing |
CN108958920A (en) * | 2018-07-13 | 2018-12-07 | 众安在线财产保险股份有限公司 | A kind of distributed task dispatching method and system |
CN109408210A (en) * | 2018-09-27 | 2019-03-01 | 北京车和家信息技术有限公司 | Distributed timing task management method and system |
Non-Patent Citations (1)
Title |
---|
刘晨: "基于STM32的起重机安全监控系统设计与应用", 《湖北科技学院学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112346837A (en) * | 2020-10-28 | 2021-02-09 | 常州微亿智造科技有限公司 | Distributed timer system under industrial Internet of things |
CN112527488A (en) * | 2020-12-21 | 2021-03-19 | 浙江百应科技有限公司 | Distributed high-availability task scheduling method and system |
CN116974730A (en) * | 2023-09-22 | 2023-10-31 | 深圳联友科技有限公司 | Large-batch task processing method |
CN116974730B (en) * | 2023-09-22 | 2024-01-30 | 深圳联友科技有限公司 | Large-batch task processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7779298B2 (en) | Distributed job manager recovery | |
US7747717B2 (en) | Fast application notification in a clustered computing system | |
US7457236B2 (en) | Method for providing fault-tolerant application cluster service | |
US8938421B2 (en) | Method and a system for synchronizing data | |
EP1650653B1 (en) | Remote enterprise management of high availability systems | |
EP1623325B1 (en) | Managing tasks in a data processing environment | |
CN108710544B (en) | Process monitoring method of database system and rail transit comprehensive monitoring system | |
US20120151272A1 (en) | Adding scalability and fault tolerance to generic finite state machine frameworks for use in automated incident management of cloud computing infrastructures | |
CN110798339A (en) | Task disaster tolerance method based on distributed task scheduling framework | |
CN111506412A (en) | Distributed asynchronous task construction and scheduling system and method based on Airflow | |
US20110131448A1 (en) | Performing a workflow having a set of dependancy-related predefined activities on a plurality of task servers | |
CN106406993A (en) | Timed task management method and system | |
CN110795503A (en) | Multi-cluster data synchronization method and related device of distributed storage system | |
CN110895488B (en) | Task scheduling method and device | |
CN101777020A (en) | Fault tolerance method and system used for distributed program | |
CN117130730A (en) | Metadata management method for federal Kubernetes cluster | |
CN113194096B (en) | Task scheduling real-time tracking method and system based on distributed architecture | |
CN112149975B (en) | APM monitoring system and method based on artificial intelligence | |
CN112350862A (en) | Monitoring alarm and fault self-healing system | |
CN111309456B (en) | Task execution method and system | |
CN112764912A (en) | Lightweight distributed scheduling method and system for data integration | |
CN112328445B (en) | Multi-node management system based on condul | |
CN113806051B (en) | Task management method and device of computing equipment, storage medium and computing equipment | |
EP4006807A1 (en) | Event monitoring with support system integration | |
CN112214323B (en) | Resource recovery method and device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200214 |