CN116436768A - Automatic backup method, system, equipment and medium based on cross heartbeat monitoring - Google Patents

Automatic backup method, system, equipment and medium based on cross heartbeat monitoring Download PDF

Info

Publication number
CN116436768A
CN116436768A CN202310699417.0A CN202310699417A CN116436768A CN 116436768 A CN116436768 A CN 116436768A CN 202310699417 A CN202310699417 A CN 202310699417A CN 116436768 A CN116436768 A CN 116436768A
Authority
CN
China
Prior art keywords
scheduling server
state
registry
service
over
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310699417.0A
Other languages
Chinese (zh)
Other versions
CN116436768B (en
Inventor
姜全尧
张静波
邢翠霞
宫喜斌
刘英莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ideal Information Technology Co ltd
Original Assignee
Beijing Ideal Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ideal Information Technology Co ltd filed Critical Beijing Ideal Information Technology Co ltd
Priority to CN202310699417.0A priority Critical patent/CN116436768B/en
Publication of CN116436768A publication Critical patent/CN116436768A/en
Application granted granted Critical
Publication of CN116436768B publication Critical patent/CN116436768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses an automatic backup method, system, equipment and medium based on cross heartbeat monitoring, wherein the method comprises the following steps: monitoring the application service state of each scheduling server through a heartbeat monitoring function; transmitting the application service state information between the scheduling servers through messages at preset frequency; when a message sent by a first scheduling server is not received within a preset time, judging whether the database state update and the registry service in the first scheduling server are abnormal or not, and judging whether to trigger the fail-over or not; and when the first scheduling server is in a failure standby state, taking over the application service running on the first scheduling server by a second scheduling server. By the processing scheme, the high availability of the platform and the continuity of the service are ensured.

Description

Automatic backup method, system, equipment and medium based on cross heartbeat monitoring
Technical Field
The invention relates to the technical field of computer security, in particular to an automatic backup method, system, equipment and medium based on cross heartbeat monitoring.
Background
With the increasing use of Linux in critical industries, it would be necessary to provide services originally provided by large business companies, such as IBM and SUN, which all have a critical feature, namely a high availability cluster.
A highly available cluster refers to a group of individual computers connected by hardware and software that appear to a user as a single system, one or more nodes within such a group of computer systems are taken out of service, and services are switched from a failed node to a normally operating node and run without causing service disruption. From this definition, it can be seen that the cluster must detect when nodes and services fail and when they revert to available. This task is typically accomplished by a set of codes called "heartbeats". In Linux-HA this function is performed by a procedure called heatbean.
The technical scheme adopted by the prior art for realizing the automatic fail-over based on heartbeat monitoring is that a fail-over method aiming at step examples is added in a fail-over call Interface Redundant method of Heartbeat Monitor.
The disadvantages of this technique are: in a dual-machine hot standby high availability system, when a heartbeat line connecting 2 nodes is disconnected, the HA system which is used as a whole and HAs coordinated actions is split into 2 independent individuals; because the mutual connection is lost, the HA software on 2 nodes strives for the application service like the brain cracking person can fight for the shared resource, serious consequences can occur, or the shared resource is divided by melons, and the service on both sides can not be started. Or both services are up, but read and write the shared memory at the same time, resulting in data corruption.
When a brain split occurs, the impact on the business is extremely severe, sometimes even fatal. If the method is applied to the database or the storage service which is extremely important and highly available, the method causes the data released by the user to be intermittently written on two servers, and finally the data is very difficult to recover or difficult to recover.
Therefore, the above-mentioned existing automatic backup method is still inconvenient and disadvantageous, and needs to be further improved. How to create a new automatic backup method becomes the urgent need of improvement in the current industry.
Disclosure of Invention
In view of the above, embodiments of the present disclosure provide an automatic backup method based on cross-heartbeat monitoring, which at least partially solves the problems in the prior art.
In a first aspect, embodiments of the present disclosure provide an automatic backup method based on cross-heartbeat monitoring, the method comprising the steps of:
monitoring the application service state of each scheduling server through a heartbeat monitoring function;
transmitting the application service state information between the scheduling servers through messages at preset frequency; wherein,,
when a message sent by a first scheduling server is not received within a preset time, judging whether the database state update and the registry service in the first scheduling server are abnormal; wherein,,
when the database state update is abnormal and the service of each executor of the registry is normal, removing the current task from the registry and triggering the fail-over;
when the service of each executor of the registry is abnormal and the database state is updated normally, removing the current task from the registry and not making a fail-over; the method comprises the steps of,
when the service of each executor of the registry is abnormal and the database state is updated and abnormal, removing the current task from the registry and triggering the fail-over;
and when the first scheduling server is in a failure standby state, taking over the application service running on the first scheduling server by a second scheduling server.
According to a specific implementation manner of the embodiment of the present disclosure, determining whether a database state update and a registry service in the first scheduling server are abnormal includes:
comparing the update time of the database in the first scheduling server with the current system time, judging that the database state is updated abnormally when the update time of the database exceeds the current system time by more than 6 minutes, and judging that the database state is updated normally when the update time of the database exceeds the current system time by less than or equal to 6 minutes; and judging whether the executor service in the application service registration center is abnormal or not.
According to a specific implementation manner of the embodiment of the present disclosure, the taking over, by the second scheduling server, the application service running on the first scheduling server includes:
setting a task to be executed in the application service of the first scheduling server to be in a state to be recovered;
the second scheduling server obtains the task to be executed in the state to be recovered through engine polling calculation, the task to be executed is sent to the application service in the second scheduling server for operation, and the application service in the second scheduling server carries out recovery operation on the task to be executed in the state to be recovered through calling a recovery executor.
According to a specific implementation of an embodiment of the disclosure, the method further includes:
when the first scheduling servers are abnormal, tasks on the first scheduling servers are in failure backup in a preemption mode among the scheduling servers.
According to a specific implementation manner of the embodiment of the disclosure, the preemption mode is realized through a redis distributed lock mechanism, and when the lock is found to be occupied, fail-over is abandoned.
According to a specific implementation of an embodiment of the present disclosure, the method is applied to a single-center high-availability deployment architecture, a two-place double-center deployment architecture or a two-place three-center deployment architecture.
In a second aspect, embodiments of the present disclosure provide an automatic backup method system based on cross-heartbeat monitoring, the system comprising:
the heartbeat monitoring module is configured to monitor the application service state of each scheduling server through a heartbeat monitoring function;
the judging module is configured to transmit the application service state information between the scheduling servers through messages at a preset frequency; when a message sent by a first scheduling server is not received within a preset time, judging whether the database state in the first scheduling server is updated and the registry service is abnormal; when the database state update is abnormal and each executor of the registry is normally served, the current task is removed from the registry, and a fail-over is triggered; when the service of each executor of the registry is abnormal and the database state is updated normally, removing the current task from the registry and not making a fail-over; when the service of each executor of the registry is abnormal and the database state is updated and abnormal, the current task is removed from the registry, and the fail-over is triggered;
and the fail-over module is configured to put the first scheduling server into an unavailable state when the fail-over is carried out, and take over the application service running on the first scheduling server by the second scheduling server.
According to a specific implementation manner of the embodiment of the present disclosure, the fail-over module sets a task to be executed in an application service of the first scheduling server to a state to be recovered; the second scheduling server obtains the task to be executed in the state to be recovered through engine polling calculation, the task to be executed is sent to the application service in the second scheduling server for operation, and the application service in the second scheduling server carries out recovery operation on the task to be executed in the state to be recovered through calling a recovery executor.
In a third aspect, embodiments of the present disclosure further provide an electronic device, including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor, which when executed by the at least one processor, cause the at least one processor to implement the cross-heartbeat monitoring-based automatic backup method of any one of the foregoing first aspect or any implementation of the first aspect.
In a fourth aspect, the presently disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions that, when executed by at least one processor, cause the at least one processor to perform the cross-heartbeat monitoring-based auto-backup method of the first aspect or any implementation of the first aspect.
In a fifth aspect, embodiments of the present disclosure also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of cross-heartbeat monitoring based auto-backup in any of the implementations of the first aspect or the first aspect.
According to the automatic backup method based on cross heartbeat monitoring, the heartbeat monitoring is used for triggering the failure backup, and when the failure backup is performed, the state of the task is modified, the engine is used for inquiring and driving the engine, so that the recovery of the task is completed, the high availability of a platform, the continuity, the accuracy and the reliability of the service are ensured, and a large number of tasks can be rapidly processed in a short time.
Drawings
The foregoing is merely an overview of the present invention, and the present invention is further described in detail below with reference to the accompanying drawings and detailed description.
Fig. 1 is a schematic flow chart of an automatic backup method based on cross heartbeat monitoring according to an embodiment of the disclosure;
fig. 2 is a schematic diagram of a main service flow of a heartbeat monitoring module according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a fail-over business process according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a fail-over logic architecture according to an embodiment of the disclosure;
FIG. 5 is a schematic diagram of a high performance cluster composition provided by an embodiment of the disclosure;
FIG. 6 is a schematic diagram of a single-center high-availability deployment architecture provided by an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a two-place dual-center deployment architecture provided by an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a two-place three-center deployment architecture provided by an embodiment of the present disclosure;
fig. 9 is a schematic diagram of a system structure of an automatic backup method based on cross heartbeat monitoring according to an embodiment of the disclosure; and
fig. 10 is a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present disclosure will become readily apparent to those skilled in the art from the following disclosure, which describes embodiments of the present disclosure by way of specific examples. It will be apparent that the described embodiments are merely some, but not all embodiments of the present disclosure. The disclosure may be embodied or practiced in other different specific embodiments, and details within the subject specification may be modified or changed from various points of view and applications without departing from the spirit of the disclosure. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the invention provides an automatic backup method based on cross heartbeat monitoring, which triggers the failure backup through heartbeat monitoring, and completes the recovery of tasks by modifying the state of the tasks and matching with engine polling and engine driving when the failure backup occurs, thereby ensuring the high availability of a platform and the continuity of services.
HA (High available) is also called dual hot standby for critical traffic. The general scene is that there are two service node servers A and B, when one server fails, the other server can bear the service task, thus automatically ensuring the system to continuously provide service to the outside without manual intervention.
Heartbeat is a piece of software that opens the source to provide High-availability (High-availability) services, through which resources (IP and program services, etc.) can be quickly transferred from one computer that has failed to another machine that is operating properly to continue providing service.
Working principle of the heart bean (Linux-HA): the heart of the heart is composed of two parts, namely a heart beat monitoring part and a resource taking-over part, wherein heart beat monitoring can be carried out through a network link and a serial port, and redundant links are supported, messages are mutually sent between the heart beat monitoring part and the serial port to tell the current state of the opposite party, if the messages sent by the opposite party are not received within a designated time, the opposite party is considered to be invalid, and then a resource taking-over module is started to take over resources or services running on the host computer of the opposite party.
Fig. 1 is a schematic diagram of a flow of an automatic backup method based on cross-heartbeat monitoring according to an embodiment of the disclosure.
As shown in fig. 1, at step S110, the application service status of each scheduling server is monitored by a heartbeat monitoring function.
In the embodiment of the invention, each scheduling server is two or more scheduling servers.
More specifically, step S120 is next followed.
At step S120, the application service status information is transmitted between the scheduling servers by a message at a preset frequency.
For example, the preset frequency may be 3 minutes, that is, application service status information is mutually sent by messages between all scheduling servers every 3 minutes.
More specifically, step S130 is next followed.
At step S130, when the message sent by the first scheduling server is not received within a preset time, it is determined whether the database state update and the registry service in the first scheduling server are abnormal.
More specifically, when a message of a certain dispatch server (i.e., the first dispatch server) is not received for more than 9 minutes, it is determined whether an abnormality occurs in the server database status update and the registry service.
In the embodiment of the invention, judging whether the database state update and the registry service in the first scheduling server are abnormal comprises the following steps:
comparing the update time of the database in the first scheduling server with the current system time, judging that the database state is updated abnormally when the update time of the database exceeds the current system time by more than 6 minutes, and judging that the database state is updated normally when the update time of the database exceeds the current system time by less than or equal to 6 minutes; it is determined whether the executor service in the application service registry is abnormal (i.e., whether the service engine is capable of normal task processing and out-of-service).
In the embodiment of the invention, the update time of the database exceeds the current system time can be self-defined to be other time, but the optimal fail-over time is 6 minutes.
More specifically, as shown in fig. 2, heartbeat monitoring mainly realizes three functions of scheduling server self-state update, application service inspection and triggering fail-over:
self state update: each executor service in the scheduling server updates the state of the database at a certain frequency, for example, when the initial frequency is 3 minutes, i.e. updates its own state and modification time every 3 minutes.
Application service inspection: the application is mainly divided into three steps:
the first step: inquiring the update time of each service in the database, comparing the update time with the current system time, and judging whether the update time exceeds 6 minutes; when the time does not exceed 6 minutes, the database state is considered to be updated normally; when the time exceeds 6 minutes, the database state is considered to be updated abnormally;
and a second step of: inquiring whether the service of each executor in the registry is normal;
and a third step of: judging whether to trigger the fail-over function abnormally according to the query condition. The processing rules for triggering the fail-over are divided into three types, wherein the first database is abnormal in state update, the service of the registry is normal, the current task is removed from the registry, and the fail-over function is triggered; and the second registry is abnormal in service, and the database state is updated normally, so that the current task is removed from the registry and is not in failure backup. And thirdly, the service in the registry is abnormal, and when the database state is updated and abnormal occurs, the current task is removed from the registry, and the fail-over is triggered.
More specifically, step S140 is next followed.
At step S140, when the database state update is abnormal, the current task is removed from the registry and a fail-over is triggered when the respective executor service of the registry is normal.
More specifically, step S150 is next passed.
At step S150, when the service of each executor of the registry is abnormal and the database state update is normal, the current task is removed from the registry without fail-over.
More specifically, step S160 is next passed.
At step S160, when the respective executors of the registry service abnormally, and the database state update is abnormal at the same time, the current task is removed from the registry, and a fail-over is triggered.
More specifically, step S170 is next followed.
At step S170, when a fail-over is made, the first scheduling server is put into an unavailable state, and an application service running on the first scheduling server is taken over by a second scheduling server
In an embodiment of the present invention, the taking over, by a second scheduling server, an application service running on the first scheduling server includes: setting a task to be executed in the application service of the first scheduling server to be in a state to be recovered; the second scheduling server obtains the task to be executed in the state to be recovered through engine polling calculation, the task to be executed is sent to the application service in the second scheduling server for operation, and the application service in the second scheduling server carries out recovery operation on the task to be executed in the state to be recovered through calling a recovery executor.
More specifically, the fail-over is that a task running on the downtime service is set to be in a state to be recovered, after the engine polls and calculates to obtain the task to be recovered, the task is sent to other scheduling servers to run, and the other scheduling servers recover the task by calling a recovery executor.
In an embodiment of the present invention, the method further includes: when the first scheduling servers are abnormal, tasks on the first scheduling servers are in failure backup in a preemption mode among the scheduling servers. The preemption is implemented by a distributed lock mechanism of redis, and when a lock is found to be already occupied, fail-over is aborted (as shown in fig. 3).
In an embodiment of the present invention, as shown in fig. 6-8, the method is applied to a single-center high-availability deployment architecture, a two-place double-center deployment architecture or a two-place three-center deployment architecture.
The fail-over function is used for ensuring high availability of the platform and continuity of service, triggering the fail-over through heartbeat monitoring, and completing recovery of the task by modifying the state of the task and matching with engine polling and engine driving during the fail-over.
FIG. 4 is a schematic diagram of a fail-over logic architecture according to an embodiment of the disclosure; the system consists of an application change server, a data storage, a client application, an Agent remote Agent, an integrated function and the like, and is a typical four-layer architecture.
The user layer provides 3 important tools for allowing developers and administrators to change flow design, change scheduling and change process monitoring based on scheduling functions such as flow design, version management, change management, authority control and the like provided by the system.
The system core layer is a scheduling engine, and the scheduling engine is provided with a scheduling cluster, heartbeat detection, load balancing and fail-over high availability mechanism for ensuring high availability.
FIG. 5 is a schematic diagram of a high performance cluster composition provided by an embodiment of the disclosure;
the data center and the development center are respectively provided with an application change automation system, and the two sets of platforms are mutually independent. And uniformly managing the application program change version through version management. The platform is composed of a dispatching server, a database, a client management program, a remote proxy end and a third party interface template.
FIG. 6 is a schematic diagram of a single-center high-availability deployment architecture provided by an embodiment of the present disclosure; the application service adopts a cluster mode, and the service safety is ensured through a series of strategies such as failure backup source, disaster recovery and the like.
FIG. 7 is a schematic diagram of a two-place dual-center deployment architecture provided by an embodiment of the present disclosure; the working mechanism of the two-place double-center remote disaster recovery mode is as follows:
(1) The Entegor server of the production center adopts a cluster mode, and the operation and maintenance safety of the production center is ensured through mechanisms such as fail-over, disaster recovery and the like.
(2) The production center and the disaster recovery center realize synchronous sharing of the execution information of the automatic operation and maintenance platform through a database backup technology. When the disaster recovery is switched, the disaster recovery center application server completely recovers all tasks according to the database information, and continuously executes the tasks of the application system in the disaster recovery center, so as to maintain the system integration function and realize seamless switching.
(3) And a task program of the business system strengthens the robustness of transaction management according to the specification, and allows the task to be submitted again and automatically judges the transaction processing when disaster recovery switching occurs.
(4) The service system server Ip list provides a management mode, and Entegor defaults to main IP drive support and provides a change-over switch.
FIG. 8 is a schematic diagram of a two-place three-center deployment architecture provided by an embodiment of the present disclosure;
the construction of the disaster recovery center in the same city can be carried out according to the modes mentioned in the two-place double-center scheme. The disaster recovery center with the conditions not only provides disaster recovery service when accidents occur, but also can be used as a supplement of a production center to provide service for the outside in normal times.
The automatic backup method based on cross heartbeat monitoring triggers the fail-over through heartbeat monitoring, and the recovery of the task is completed by modifying the state of the task and matching with engine polling and engine driving during the fail-over, and has the advantages that:
high availability: one node in the cluster fails, and the task of the node can be transmitted to other nodes, so that single-point failure is effectively prevented;
high performance: load balancing clusters allow the system to intervene in more users simultaneously;
high cost performance: high performance systems can be constructed using inexpensive industry standard compliant hardware.
Fig. 9 shows an automatic backup method system 900 based on cross-heartbeat monitoring, which includes a heartbeat monitoring module 910, a judging module 920, and a fail-over module 930.
The heartbeat monitoring module 910 is configured to monitor an application service state of each scheduling server through a heartbeat monitoring function;
the judging module 920 is configured to transmit the application service status information between the scheduling servers through a message at a preset frequency; when a message sent by a first scheduling server is not received within a preset time, judging whether the database state in the first scheduling server is updated and the registry service is abnormal; when the database state update is abnormal and each executor of the registry is normally served, the current task is removed from the registry, and a fail-over is triggered; when the service of each executor of the registry is abnormal and the database state is updated normally, removing the current task from the registry and not making a fail-over; when the service of each executor of the registry is abnormal and the database state is updated and abnormal, the current task is removed from the registry, and the fail-over is triggered;
the fail-over module 930 is configured to put the first scheduling server into an unavailable state when performing fail-over, and take over, by the second scheduling server, an application service running on the first scheduling server.
In the embodiment of the invention, the fail-over module sets a task to be executed in the application service of the first scheduling server to be in a state to be recovered; the second scheduling server obtains the task to be executed in the state to be recovered through engine polling calculation, the task to be executed is sent to the application service in the second scheduling server for operation, and the application service in the second scheduling server carries out recovery operation on the task to be executed in the state to be recovered through calling a recovery executor.
Referring to fig. 10, the embodiment of the present disclosure further provides an electronic device 100, including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the cross-heartbeat monitoring-based auto-redundancy method of the foregoing method embodiments.
The disclosed embodiments also provide a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the cross-heartbeat monitoring-based automatic backup method of the foregoing method embodiments.
The disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the cross-heartbeat monitoring based automatic backup method in the foregoing method embodiments.
Referring now to fig. 10, a schematic diagram of an electronic device 100 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 10 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 10, the electronic device 100 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage means 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 100 are also stored. The processing device 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
In general, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, and the like; an output device 1007 including, for example, a Liquid Crystal Display (LCD), speaker, vibrator, etc.; storage 1008 including, for example, magnetic tape, hard disk, etc.; and communication means 1009. The communication means 1009 may allow the electronic device 100 to communicate with other devices wirelessly or by wire to exchange data. While an electronic device 100 having various means is shown in the figures, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 1009, or installed from the storage device 1008, or installed from the ROM 1002. The above-described functions defined in the method of the embodiment of the present disclosure are performed when the computer program is executed by the processing device 1001.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects an internet protocol address from the at least two internet protocol addresses and returns the internet protocol address; receiving an Internet protocol address returned by the node evaluation equipment; wherein the acquired internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the disclosure are intended to be covered by the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. An automatic backup method based on cross-heartbeat monitoring, the method comprising the steps of:
monitoring the application service state of each scheduling server through a heartbeat monitoring function;
transmitting the application service state information between the scheduling servers through messages at preset frequency; wherein,,
when a message sent by a first scheduling server is not received within a preset time, judging whether the database state update and the registry service in the first scheduling server are abnormal; wherein,,
when the database state update is abnormal and the service of each executor of the registry is normal, removing the current task from the registry and triggering the fail-over;
when the service of each executor of the registry is abnormal and the database state is updated normally, removing the current task from the registry and not making a fail-over; the method comprises the steps of,
when the service of each executor of the registry is abnormal and the database state is updated and abnormal, removing the current task from the registry and triggering the fail-over;
and when the first scheduling server is in a failure standby state, taking over the application service running on the first scheduling server by a second scheduling server.
2. The cross-heartbeat monitoring based automatic backup method of claim 1 wherein determining if database state updates and registry services in the first dispatch server are abnormal comprises:
comparing the update time of the database in the first scheduling server with the current system time, judging that the database state is updated abnormally when the update time of the database exceeds the current system time by more than 6 minutes, and judging that the database state is updated normally when the update time of the database exceeds the current system time by less than or equal to 6 minutes; and judging whether the executor service in the application service registration center is abnormal or not.
3. The cross-heartbeat monitoring based automatic backup method of claim 1 wherein the taking over by a second scheduling server of an application service running on the first scheduling server includes:
setting a task to be executed in the application service of the first scheduling server to be in a state to be recovered;
the second scheduling server obtains the task to be executed in the state to be recovered through engine polling calculation, the task to be executed is sent to the application service in the second scheduling server for operation, and the application service in the second scheduling server carries out recovery operation on the task to be executed in the state to be recovered through calling a recovery executor.
4. The automatic backup method based on cross-heartbeat monitoring of claim 3 further comprising:
when the first scheduling servers are abnormal, tasks on the first scheduling servers are in failure backup in a preemption mode among the scheduling servers.
5. The method for automatic backup based on cross-heartbeat monitoring of claim 4 wherein the preemption is achieved by a distributed lock mechanism of redis and fail-over is abandoned when a lock is found to be already occupied.
6. The automatic backup method based on cross-heartbeat monitoring of any one of claims 1-5 wherein the method is applied in a single-site high availability deployment architecture, a two-site dual-site deployment architecture, or a two-site three-site deployment architecture.
7. An automatic backup method system based on cross-heartbeat monitoring, the system comprising:
the heartbeat monitoring module is configured to monitor the application service state of each scheduling server through a heartbeat monitoring function;
the judging module is configured to transmit the application service state information between the scheduling servers through messages at a preset frequency; when a message sent by a first scheduling server is not received within a preset time, judging whether the database state in the first scheduling server is updated and the registry service is abnormal; when the database state update is abnormal and each executor of the registry is normally served, the current task is removed from the registry, and a fail-over is triggered; when the service of each executor of the registry is abnormal and the database state is updated normally, removing the current task from the registry and not making a fail-over; when the service of each executor of the registry is abnormal and the database state is updated and abnormal, the current task is removed from the registry, and the fail-over is triggered;
and the fail-over module is configured to put the first scheduling server into an unavailable state when the fail-over is carried out, and take over the application service running on the first scheduling server by the second scheduling server.
8. The system of claim 7, wherein the fail-over module is configured to place a task to be executed in an application service of the first scheduling server in a state to be restored; the second scheduling server obtains the task to be executed in the state to be recovered through engine polling calculation, the task to be executed is sent to the application service in the second scheduling server for operation, and the application service in the second scheduling server carries out recovery operation on the task to be executed in the state to be recovered through calling a recovery executor.
9. An electronic device, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor, which when executed by the at least one processor, cause the at least one processor to perform the cross-heartbeat monitoring based automatic redundancy method of any one of claims 1 to 6.
10. A non-transitory computer-readable storage medium storing computer instructions that, when executed by at least one processor, cause the at least one processor to perform the cross-heartbeat monitoring-based auto-backup method of any of claims 1-6.
CN202310699417.0A 2023-06-14 2023-06-14 Automatic backup method, system, equipment and medium based on cross heartbeat monitoring Active CN116436768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310699417.0A CN116436768B (en) 2023-06-14 2023-06-14 Automatic backup method, system, equipment and medium based on cross heartbeat monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310699417.0A CN116436768B (en) 2023-06-14 2023-06-14 Automatic backup method, system, equipment and medium based on cross heartbeat monitoring

Publications (2)

Publication Number Publication Date
CN116436768A true CN116436768A (en) 2023-07-14
CN116436768B CN116436768B (en) 2023-08-15

Family

ID=87091152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310699417.0A Active CN116436768B (en) 2023-06-14 2023-06-14 Automatic backup method, system, equipment and medium based on cross heartbeat monitoring

Country Status (1)

Country Link
CN (1) CN116436768B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309167A (en) * 2008-06-27 2008-11-19 华中科技大学 Disaster allowable system and method based on cluster backup
US20200057686A1 (en) * 2018-08-14 2020-02-20 Industrial Technology Research Institute Compute node, failure detection method thereof and cloud data processing system
CN115963897A (en) * 2021-10-12 2023-04-14 昆达电脑科技(昆山)有限公司 Server data backup control method
CN116055499A (en) * 2023-04-03 2023-05-02 成都四方伟业软件股份有限公司 Method, equipment and medium for intelligently scheduling cluster tasks based on redis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309167A (en) * 2008-06-27 2008-11-19 华中科技大学 Disaster allowable system and method based on cluster backup
US20200057686A1 (en) * 2018-08-14 2020-02-20 Industrial Technology Research Institute Compute node, failure detection method thereof and cloud data processing system
CN110825544A (en) * 2018-08-14 2020-02-21 财团法人工业技术研究院 Computing node, failure detection method thereof and cloud data processing system
CN115963897A (en) * 2021-10-12 2023-04-14 昆达电脑科技(昆山)有限公司 Server data backup control method
CN116055499A (en) * 2023-04-03 2023-05-02 成都四方伟业软件股份有限公司 Method, equipment and medium for intelligently scheduling cluster tasks based on redis

Also Published As

Publication number Publication date
CN116436768B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
US9189316B2 (en) Managing failover in clustered systems, after determining that a node has authority to make a decision on behalf of a sub-cluster
KR100297906B1 (en) Dynamic changes in configuration
CN108923992B (en) High-availability method and system for NAS cluster, electronic equipment and storage medium
EP3210367B1 (en) System and method for disaster recovery of cloud applications
CN110750393B (en) Method, device, medium and equipment for avoiding network service double-machine hot standby brain cracking
US20080288812A1 (en) Cluster system and an error recovery method thereof
CN111327467A (en) Server system, disaster recovery backup method thereof and related equipment
US20160085646A1 (en) Automatic client side seamless failover
CN107508694B (en) Node management method and node equipment in cluster
CN112181660A (en) High-availability method based on server cluster
CN107480014A (en) A kind of High Availabitity equipment switching method and device
CN113360579A (en) Database high-availability processing method and device, electronic equipment and storage medium
US7366949B2 (en) Distributed software application software component recovery in an ordered sequence
CN114328033A (en) Method and device for keeping service configuration consistency of high-availability equipment group
US20050278688A1 (en) Software component initialization in an ordered sequence
CN116436768B (en) Automatic backup method, system, equipment and medium based on cross heartbeat monitoring
CN113760503A (en) Task migration method and device, electronic equipment and computer readable medium
CN112800028A (en) Fault self-recovery method and device for MySQL group replication
CN117112296A (en) Fault processing method and device for redundant system, electronic equipment and storage medium
CN114598594B (en) Method, system, medium and equipment for processing application faults under multiple clusters
JP3447347B2 (en) Failure detection method
CN109510867B (en) Data request processing method and device, storage medium and electronic equipment
WO2019216210A1 (en) Service continuation system and service continuation method
US8181162B2 (en) Manager component for checkpoint procedures
CN118101441B (en) Service scheduling method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant