CN116346582A - Method, device, equipment and storage medium for realizing redundancy of main network and standby network - Google Patents

Method, device, equipment and storage medium for realizing redundancy of main network and standby network Download PDF

Info

Publication number
CN116346582A
CN116346582A CN202211709217.0A CN202211709217A CN116346582A CN 116346582 A CN116346582 A CN 116346582A CN 202211709217 A CN202211709217 A CN 202211709217A CN 116346582 A CN116346582 A CN 116346582A
Authority
CN
China
Prior art keywords
node
sentinel
fault
master
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211709217.0A
Other languages
Chinese (zh)
Inventor
赵可
宋智翔
李晓刚
刘昊
吕文龙
路远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Infrastructure Investment Co ltd
Traffic Control Technology TCT Co Ltd
Beijing MTR Construction Administration Corp
Original Assignee
Beijing Infrastructure Investment Co ltd
Traffic Control Technology TCT Co Ltd
Beijing MTR Construction Administration Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Infrastructure Investment Co ltd, Traffic Control Technology TCT Co Ltd, Beijing MTR Construction Administration Corp filed Critical Beijing Infrastructure Investment Co ltd
Priority to CN202211709217.0A priority Critical patent/CN116346582A/en
Publication of CN116346582A publication Critical patent/CN116346582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The embodiment of the application provides a method, a device, equipment and a storage medium for realizing redundancy of a main network and a standby network, wherein the method comprises the following steps: when a first sentinel node in a sentinel cluster detects a suspected fault of a main node, a main node fault voting request is sent to other sentinel nodes in the sentinel cluster, and a main node fault voting response returned by the other sentinel nodes is received; the first sentinel node determines that the main node breaks down according to the main node fault voting response returned by the other sentinel nodes, and simultaneously starts a fault automatic migration function; and after the first sentinel node starts the fault automatic migration function, taking the master node as a failure node, selecting a slave node as a new master node, and simultaneously switching the client from the failure node to the new master node, thereby ensuring the normal operation of the client.

Description

Method, device, equipment and storage medium for realizing redundancy of main network and standby network
Technical Field
The application relates to the technical field of urban rail transit operation management, in particular to a method, a device, equipment and a storage medium for realizing redundancy of a main network and a standby network.
Background
In the signal system, in order to ensure the safety of the system, the main service and the standby service are usually main-standby double-system services, the main service network fails or is down, and the standby service automatically rises to replace the main service; similarly, after the standby service network fails or is down, the main service is not perceived, and the normal operation of the system is not affected under both conditions. Since dual-system services are involved, problems of data consistency, timeliness, integrity, etc. are inevitably encountered, and a middleware is required to solve the problems, and the middleware must adapt to the architecture of the signal system, and at the same time, ensure high availability of self-service.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for realizing redundancy of a main network and a standby network, so as to solve the problem of how to ensure high availability of own service by the main network and the standby network.
According to a first aspect of an embodiment of the present application, there is provided a method for implementing redundancy of a primary and a secondary dual network, including:
when a first sentinel node in a sentinel cluster detects a suspected fault of a main node, a main node fault voting request is sent to other sentinel nodes in the sentinel cluster, and a main node fault voting response returned by the other sentinel nodes is received;
the first sentinel node determines that the main node breaks down according to the main node fault voting response returned by the other sentinel nodes, and simultaneously starts a fault automatic migration function;
and after the first sentinel node starts the fault automatic migration function, taking the master node as a failure node, selecting a slave node as a new master node, and simultaneously switching the client from the failure node to the new master node, thereby ensuring the normal operation of the client.
Preferably, the detecting, by the first sentinel node in the sentinel cluster, a suspected fault of the master node includes:
each sentinel node in the sentinel cluster periodically sends a ping command to the main node, and whether a response ping command returned by the main node according to the ping command is received or not is judged in a preset time period;
and when the first sentinel node in the sentinel cluster does not receive the response ping command returned by the main node within a preset time period, judging that the main node is suspected to be faulty.
Preferably, the master node fault voting response includes a master node fault vote approval response and a master node fault vote objection response, wherein the determining, by the first sentinel node, that the master node is faulty according to the master node fault vote responses returned by the other sentinel nodes includes:
the first sentinel node counts the number of response votes of the main node fault votes and the number of response counterweights of the main node fault votes returned by the other sentinel nodes;
and when the number of response votes for the failure of the main node is larger than the number of response votes for the failure of the main node, the first sentinel node determines that the main node fails.
Preferably, after the fault automatic migration function is started, the first sentinel node further includes:
the first sentinel node judges whether a response ping command returned by the failure node according to the ping command is received within a preset time period by periodically sending the ping command to the failure node;
and when the first sentinel node receives a response ping command returned by the failure node within a preset time period, judging that the failure node is recovered to be normal, and taking the failure node as a slave node.
Preferably, before the first sentinel node in the sentinel cluster detects the suspected fault of the main node, the method further comprises:
the master node synchronizes interaction data with the client to each slave node.
According to a second aspect of the embodiments of the present application, there is provided a device for implementing a primary-backup dual-network redundancy, applied to a first sentinel node in a sentinel cluster, including:
the sending and receiving module is used for receiving a main node fault voting response returned by other sentry nodes in the sentry cluster by sending a main node fault voting request to the other sentry nodes when the main node suspected fault is detected;
the determining module is used for determining that the main node breaks down according to the main node fault voting response returned by the other sentinel nodes and starting a fault automatic migration function;
and the processing module is used for taking the master node as a failure node and selecting one slave node as a new master node after the automatic fault migration function is started, and simultaneously switching the client from the failure node to the new master node, so that the normal operation of the client is ensured.
Preferably, the device further comprises a detection module, a sending module and a sending module, wherein the detection module is used for judging whether a response ping command returned by the master node according to the ping command is received within a preset time period or not by periodically sending the ping command to the master node; and when the response ping command returned by the master node is not received within a preset time period, judging that the master node is suspected to be faulty.
Preferably, the master node fault voting response includes a master node fault voting approval response and a master node fault voting objection response, wherein the determining module is specifically configured to count the number of master node fault voting approval responses and the number of master node fault voting objection responses returned by the other sentinel nodes; and when the number of response votes of the master node faults is larger than the number of response counterweights of the master node faults, determining that the master node faults.
By adopting the technical scheme, the embodiment of the application has the following technical effects: when a client tries to connect the failed main node, the cluster also returns the address of the new main node to the client, so that the cluster can replace the failed main node by the current main node, thereby ensuring that services can be normally provided to the outside.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
fig. 1 is a flowchart of a method for implementing redundancy between primary and secondary networks according to an embodiment of the present application;
fig. 2 is a schematic diagram of connection establishment between a sentinel and a slave according to an embodiment of the present application;
FIG. 3 is a flow chart illustrating a process of a failure of a server of a master node (A) according to an embodiment of the present application;
FIG. 4 is a flow chart illustrating a process of a slave node (B) server failure provided by an embodiment of the present application;
fig. 5 is a flowchart of a process for a failure of an entire first network segment according to an embodiment of the present application;
fig. 6 is a flowchart of a process for a failure of an entire second network segment according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is given with reference to the accompanying drawings, and it is apparent that the described embodiments are only some of the embodiments of the present application and not exhaustive of all the embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.
Example 1
As shown in fig. 1, a method for implementing redundancy between primary and secondary networks according to an embodiment of the present application includes: step S101: when a first sentinel node in a sentinel cluster detects a suspected fault of a main node, a main node fault voting request is sent to other sentinel nodes in the sentinel cluster, and a main node fault voting response returned by the other sentinel nodes is received; step S102: the first sentinel node determines that the main node breaks down according to the main node fault voting response returned by the other sentinel nodes, and simultaneously starts a fault automatic migration function; step S103: and after the first sentinel node starts the fault automatic migration function, taking the master node as a failure node, selecting a slave node as a new master node, and simultaneously switching the client from the failure node to the new master node, thereby ensuring the normal operation of the client. In this application, when the main node that obtains after most sentry in "sentry crowd" exchange each other judges that the sentry process can carry out the trouble migration voluntarily, promotes one of them from the node to new main node to let other from the node of ineffectiveness main node change into and duplicate new main node, when the client tries to connect the main node of ineffectiveness, the cluster also can return the address of new main node to the client, make the cluster can use present main node replacement ineffectiveness main node, thereby ensure can normally provide the service outward.
Further, the first sentinel node in the sentinel cluster detecting a suspected fault of the master node includes: each sentinel node in the sentinel cluster periodically sends a ping command to the main node, and whether a response ping command returned by the main node according to the ping command is received or not is judged in a preset time period; and when the first sentinel node in the sentinel cluster does not receive the response ping command returned by the main node within a preset time period, judging that the main node is suspected to be faulty. According to the method and the device, the ping command is periodically sent to the master node to judge whether the master node in the dual system has suspected faults, and further whether the master node has faults is subjectively judged through other sentinel clusters in the clusters, so that the consistency of data and the high availability of application are ensured.
Further, the master node fault voting response includes a master node fault vote approval response and a master node fault vote objection response, wherein the determining, by the first sentinel node, that the master node is faulty according to the master node fault vote responses returned by the other sentinel nodes includes: the first sentinel node counts the number of response votes of the main node fault votes and the number of response counterweights of the main node fault votes returned by the other sentinel nodes; and when the number of response votes for the failure of the main node is larger than the number of response votes for the failure of the main node, the first sentinel node determines that the main node fails. In this application, when the main node that obtains after most sentry in "sentry crowd" exchange each other judges that the sentry process can carry out the trouble migration voluntarily, promotes one of them from the node to new main node to guarantee the uniformity of data and the high availability of application.
Specifically, after the first sentinel node starts the fault automatic migration function, the method further includes: the first sentinel node judges whether a response ping command returned by the failure node according to the ping command is received within a preset time period by periodically sending the ping command to the failure node; and when the first sentinel node receives a response ping command returned by the failure node within a preset time period, judging that the failure node is recovered to be normal, and taking the failure node as a slave node. In the method, after the failure node fails, whether the failure node recovers the failure is continuously monitored, and after the failure node is monitored to recover the failure, the failure node is used as a slave node to work normally, so that the data synchronization processing with the new master node is completed.
Before a first sentinel node in the sentinel cluster detects a suspected fault of a main node, the embodiment of the invention further comprises the following steps: the master node synchronizes interaction data with the client to each slave node. According to the method and the system, the master node is completely synchronous with the data on the slave node, so that the slave node can keep normal working with the client side when the master node breaks down.
Example two
The embodiment of the application provides a realize that main dual network redundancy device is equipped with, is applied to first sentry node in the sentry cluster, includes: the sending and receiving module is used for receiving a main node fault voting response returned by other sentry nodes in the sentry cluster by sending a main node fault voting request to the other sentry nodes when the main node suspected fault is detected; the determining module is used for determining that the main node breaks down according to the main node fault voting response returned by the other sentinel nodes and starting a fault automatic migration function; and the processing module is used for taking the master node as a failure node and selecting one slave node as a new master node after the automatic fault migration function is started, and simultaneously switching the client from the failure node to the new master node, so that the normal operation of the client is ensured.
The embodiment of the invention also comprises a detection module which is used for judging whether a response ping command returned by the master node according to the ping command is received in a preset time period by periodically sending the ping command to the master node; and when the response ping command returned by the master node is not received within a preset time period, judging that the master node is suspected to be faulty.
Further, the master node fault voting response comprises a master node fault voting approval response and a master node fault voting objection response, wherein the determining module is specifically used for counting the number of the master node fault voting approval responses and the number of the master node fault voting objection responses returned by the other sentry nodes; and when the number of response votes of the master node faults is larger than the number of response counterweights of the master node faults, determining that the master node faults.
An electronic device provided in an embodiment of the present application includes: a memory; a processor; a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement a method of implementing primary and backup dual network redundancy.
A computer-readable storage medium provided in an embodiment of the present application has a computer program stored thereon; the computer program is executed by a processor to implement a method of implementing primary and backup dual network redundancy.
Example III
Redis service contains a Master node (Master), at least one slave node (slave) and a plurality of sentry processes (Sentinel). If the opposite party is not responded within the appointed configuration time, the opposite party is temporarily considered to be disconnected, namely 'subjective disconnection', and when most of the sentry processes in the 'sentry group' make 'subjective disconnection' judgment on the main server and communicate with each other, the obtained main node disconnection judgment is called 'objective disconnection'. At this time, the sentinel process can automatically perform fault migration, firstly, a 'sentinel group' can select a leader through an election algorithm, then, one slave node of the failure master node can be lifted to be a new master node, other slave nodes of the failure master node are changed to copy the new master node, and when a client tries to connect the failure master node, the cluster also returns an address of the new master node to the client, so that the cluster can replace the failure master node by using the current master node, and normal external service can be ensured.
The sentinel is an operation mode of Redis, focuses on monitoring the operation state of Redis examples (a master node and a slave node), and can realize master selection and master-slave switching through a series of mechanisms when the master node fails, so that the fault transfer is realized, and the availability of the whole system is ensured.
The technical scheme of the invention is described in detail below with reference to the accompanying drawings
The sentinel is an operation mode, is focused on monitoring the operation state of an instance (a master node and a slave node), can realize master selection and master-slave switching through a series of mechanisms when the master node fails, realizes fault transfer, and ensures the usability of the whole system, and the principle is as follows:
and (one) monitoring: whether the master and slave are in the expected working state is continuously monitored.
sentinel sends ping commands to all nodes (including master, slave, other sentinel) at a frequency of once per second by default, and if slave does not respond to the ping command of the whistle within a specified time, the whistle records the whistle as an off-line state; if the master does not respond to the ping command of the 'sentinel' at a prescribed time, the sentinel decides that the master is down and starts to execute the flow of 'auto-switching master'.
One person is prone to misjudgment, and then a plurality of persons vote for judgment. The sentry mechanism is similar, and deployment is performed by adopting a cluster mode formed by multiple examples, namely the sentry cluster. Multiple sentry examples are introduced to judge together, so that the situation that a single sentry is misjudged that a main library is offline because of the problem of the single sentry can be avoided; judging whether the master is offline can not be said to be clear by only one sentinel, and only half sentinel judges that the master is offline subjectively, and the master can be marked as being offline objectively at this time. Only if the master is judged as 'objective off-line', the sentry is further triggered to start the master-slave switching flow.
Communication among the sentry clusters is realized based on a pub/sub mechanism: the sentry can communicate with each other mainly due to pub/sub publish/subscribe mechanism. The master has a dedicated channel of sendnel hello for publishing and subscribing messages between the sentry. This compares well with the sentinel hello micro-clusters, which the sentry uses to build up to release its own messages while paying attention to the messages released by other sentries.
When a plurality of sentry examples all perform the operations of publishing and subscribing on the main library, the IP addresses and ports of the sentry examples can be known, so that connection can be found and established; the connection between the sentry is not enough to form a cluster, the connection with the slave is needed, otherwise, the sentry cannot monitor the sentry, and the heartbeat judgment can not be carried out on the master-slave library.
Acquiring a slave list based on the info command to help the sentinel establish connection with the slave;
the key is realized by using a master, and the sentinel sends an info command to the master, because all slave registers own ip and port information to the master when initializing, and the master naturally knows all slave information. So after the master receives the command, the slave list is told to the sentinel. The sentinel establishes a connection with each slave according to master-responsive slave list information, and continuously monitors the slave according to this connection.
As shown in fig. 2, the sentinel 2 sends an info command to the Master, the Master returns a slave list to the sentinel 2, and the sentinel 2 establishes a connection with each slave according to the slave list connection information, and continuous monitoring is realized based on the connection.
The rest of the sentry is monitored based on the same.
And (II) automatically switching the main library: when the Master fails, the sentinel starts an automatic failure recovery flow: one is selected from slave as the new master.
After any one whistle judges that the master is "subjectively off line", an is-master-down-by-addr command is sent to other whistles, the other whistles respectively respond to Y or N according to the connection condition between the whistle and the master, and the Y indicates approval of tickets and the N indicates disapproval. For example, a total of 3 sentinels form a cluster, then quorum can be configured to be 2, and when one sentinel obtains 2 endorsed tickets, the master 'objective off line' can be marked, and the ticket contains the own ticket.
The sentinel who obtains most endorsed tickets can send a command to other sentinels stating that he wants to perform a master-slave switch. And the other sentinels vote, the voting process is called as a Leader election, and finally the Leader performs master-slave switching.
(III) notifying: letting slave execute replicaof, synchronizing with new master; and informs the client to establish a connection with the new master.
The master-slave data synchronization flow is as follows:
1) If a slave is set, whether first connected or reconnected to the master, it issues a sync command;
2) After the master receives the sync command, two things are done:
a) The master executes bgSave, i.e., saves data to disk in the background (rdb snapshot file);
b) The master stores the newly received command for writing and modifying the data set into the buffer area at the same time;
3) After the master stores the data in the background to the snapshot file, the master transmits the snapshot file to the slave, and the slave empties the memory and loads the file into the memory;
4) The master can also forward the command which is collected in the buffer area before and then forward the command to the slave, and the slave executes the command to realize the synchronization with the master;
5) The master/slave can continuously synchronize the commands in an asynchronous mode to achieve synchronous consistency of final data.
The present invention will be further described by taking a high availability deployment scheme under dual hot standby and a high availability deployment scheme under dual network as examples
In the dual-machine hot standby mode, the A server plans 1 master node, 1 sentinel node, the B server plans 1 slave node and 2 sentinel nodes, and the principle and the flow of the implementation mode are explained in detail by combining two different fault scenes.
S2.1, scene one: the primary node (a) server fails as shown in fig. 3.
When the S2.1.1 system is initialized to work normally, business software reads and writes data from a master node on an A server, and meanwhile, the data on the master node can be automatically synchronized to a slave node on a B server in real time;
s2.1.2 when the A server fails, the master node and the sentinel-1 sentinel state on the A server become unavailable, and when the response of the master node is not received by the sentinel-1 and the sentinel-2 on the B server, the master node is judged to be offline, and after the master is determined to be offline, the unique node slave on the B server is selected as a new master node;
s2.1.3 the service software reads data from the original slave (new master) on the B server, and the data on the original slave node and the original master data are completely synchronous, so that the normal operation of the service software is ensured;
s2.1.4 when the A server is restored, the original master and the sender-1 can be added into the cluster again, at the moment, the original master becomes a slave node, and the latest data can be fetched from a new master node in real time, so that the consistency of the data is ensured, and the business software of the step is unaware and still works normally;
s2.1.5 when the server fails, the cluster is reset to the initialized state in a period of time which does not affect the normal operation of the signaling system, and the process returns to step S2.1.1 to circulate.
S2.2, scene two: the slave node (B) server fails as shown in fig. 4.
When the S2.2.1 system is initialized to work normally, business software reads and writes data from a master node on an A server, and meanwhile, the data on the master node can be automatically synchronized to a slave node on a B server in real time;
s2.2.2 when the B service fails, the slave node, the sentinel-2 and the sentinel-3 on the B server are disconnected, and the master does not synchronize data to the slave node;
s2.2.3 sensor-1 monitors that a slave node is offline, but a master node works normally, for service software, the cache cluster is not changed at all, and the service software still acquires data from the master node through a sentinel;
s2.2.4 when the B service fault is recovered, the slave node, the sentinel-1 and the sentinel-2 are on line again and join the cache cluster, the states are consistent with the states before, and the master node continues to synchronize data to the slave node in real time;
s2.2.5 the cluster state is now consistent with the initialization, in this loop.
Under the dual-machine dual-network mode, 1 master node and 1 sentinel node are planned by the A server and are bound to the first network, and 1 slave node and 2 sentinel nodes are planned by the B server and are bound to the second network. The first network and the second network are two networks of different network segments, and the different network segments of the two machines are communicated through a router or a switch.
The double-network structure signal system ensures that the system is not affected under the fault of any network, and in the traditional cluster scheme, all nodes work under the same network segment, if the network has the fault, the cluster is directly unavailable, so that the application software is also unavailable; in order to better adapt to the dual-network architecture and realize truly high availability, another scheme is now proposed, and the principles and flows of this embodiment are explained in detail below in conjunction with several different fault scenarios.
S3.1, scene one: the entire first network segment fails as shown in fig. 5.
When the S3.1.1 system is initialized to work normally, service software reads and writes data from a master node on a first network segment, and meanwhile, the data on the master node can be automatically synchronized to a slave node of a second network segment on a B server in real time;
s3.1.2 when the first network segment of the A server fails, the master node and the sentinel-1 sentinel state become unavailable, and when the response of the master node cannot be received by the sentinel-1 and the sentinel-2 on the second network segment, the master node is judged to be off line, and after the master is determined to be off line, the slave which works normally of the second network segment is selected as a new master node;
s3.1.3 the service software reads data from the original slave (new master) of the second network segment, and the data on the original slave node and the original master data are completely synchronous, so that the normal operation of the service software is ensured;
s3.1.4 when the first network segment is recovered, the original master and the sender-1 can be added into the cluster again and bind the first network ip, at the moment, the original master becomes a slave node, the new master node can be taken to the latest data in real time, the consistency of the data is ensured, and the business software of the step is unaware and still works normally;
s3.1.5 when the server fails, the cluster is reset to the initialized state in a period of time which does not affect the normal operation of the signaling system, and the process returns to step S2.1.1 to circulate.
S3.2, scene two: the entire second network segment fails as shown in fig. 6.
When the S3.2.1 system is initialized to work normally, service software reads and writes data from a master node of a first network segment on the A server, and the data on the master node can be automatically synchronized to a slave node of a second network segment on the B server in real time;
s3.2.2 when the second network segment fails, the slave node, the sensor-2 and the sensor-3 on the second network segment are disconnected, and the master does not synchronize data to the slave node;
s3.2.3 sensor-1 monitors that a slave node of a second network segment is offline, but a master node of a first network segment works normally, for service software, a cache cluster does not change at all, and the service software still acquires data from the master node through a sentinel;
s3.2.4 when the second network segment is recovered, the slave node, the sentinel-1 and the sentinel-2 are on line again and join the cache cluster, and the states are consistent with the previous states, and the master node continues to synchronize data to the slave node in real time;
s3.2.5 the cluster state is now consistent with the initialization, in this loop.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. The method for realizing the redundancy of the main network and the standby network is characterized by comprising the following steps of:
when a first sentinel node in a sentinel cluster detects a suspected fault of a main node, a main node fault voting request is sent to other sentinel nodes in the sentinel cluster, and a main node fault voting response returned by the other sentinel nodes is received;
the first sentinel node determines that the main node breaks down according to the main node fault voting response returned by the other sentinel nodes, and simultaneously starts a fault automatic migration function;
and after the first sentinel node starts the fault automatic migration function, taking the master node as a failure node, selecting a slave node as a new master node, and simultaneously switching the client from the failure node to the new master node, thereby ensuring the normal operation of the client.
2. The method of claim 1, wherein the first sentinel node in the sentinel cluster detecting a primary node suspected fault comprises:
each sentinel node in the sentinel cluster periodically sends a ping command to the main node, and whether a response ping command returned by the main node according to the ping command is received or not is judged in a preset time period;
and when the first sentinel node in the sentinel cluster does not receive the response ping command returned by the main node within a preset time period, judging that the main node is suspected to be faulty.
3. The method of claim 2, wherein the master node failure vote response comprises a master node failure vote approval response and a master node failure vote counterresponse, wherein the first sentinel node determining that the master node failed based on the master node failure vote responses returned by the other sentinel nodes comprises:
the first sentinel node counts the number of response votes of the main node fault votes and the number of response counterweights of the main node fault votes returned by the other sentinel nodes;
and when the number of response votes for the failure of the main node is larger than the number of response votes for the failure of the main node, the first sentinel node determines that the main node fails.
4. A method according to claim 3, wherein the first sentinel node, after initiating a fault automatic migration function, further comprises:
the first sentinel node judges whether a response ping command returned by the failure node according to the ping command is received within a preset time period by periodically sending the ping command to the failure node;
and when the first sentinel node receives a response ping command returned by the failure node within a preset time period, judging that the failure node is recovered to be normal, and taking the failure node as a slave node.
5. The method of any of claims 1-4, further comprising, before a first sentinel node in the sentinel cluster detects a suspected failure of the master node:
the master node synchronizes interaction data with the client to each slave node.
6. A device for implementing redundancy of active and standby dual networks, which is applied to a first sentinel node in a sentinel cluster, comprising:
the sending and receiving module is used for receiving a main node fault voting response returned by other sentry nodes in the sentry cluster by sending a main node fault voting request to the other sentry nodes when the main node suspected fault is detected;
the determining module is used for determining that the main node breaks down according to the main node fault voting response returned by the other sentinel nodes and starting a fault automatic migration function;
and the processing module is used for taking the master node as a failure node and selecting one slave node as a new master node after the automatic fault migration function is started, and simultaneously switching the client from the failure node to the new master node, so that the normal operation of the client is ensured.
7. The apparatus of claim 6, further comprising a detection module configured to determine whether a response ping command returned by the master node according to the ping command is received within a preset time period by periodically sending the ping command to the master node; and when the response ping command returned by the master node is not received within a preset time period, judging that the master node is suspected to be faulty.
8. The apparatus of claim 7, wherein the master node failure vote response comprises a master node failure vote-in response and a master node failure vote-out response, wherein the determining module is specifically configured to count a number of master node failure vote-in responses and a number of master node failure vote-out responses returned by the other individual sentinel nodes; and when the number of response votes of the master node faults is larger than the number of response counterweights of the master node faults, determining that the master node faults.
9. An electronic device, comprising: a memory; a processor; a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-5.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon; the computer program being executed by a processor to implement the method of any of claims 1-5.
CN202211709217.0A 2022-12-29 2022-12-29 Method, device, equipment and storage medium for realizing redundancy of main network and standby network Pending CN116346582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211709217.0A CN116346582A (en) 2022-12-29 2022-12-29 Method, device, equipment and storage medium for realizing redundancy of main network and standby network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211709217.0A CN116346582A (en) 2022-12-29 2022-12-29 Method, device, equipment and storage medium for realizing redundancy of main network and standby network

Publications (1)

Publication Number Publication Date
CN116346582A true CN116346582A (en) 2023-06-27

Family

ID=86875394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211709217.0A Pending CN116346582A (en) 2022-12-29 2022-12-29 Method, device, equipment and storage medium for realizing redundancy of main network and standby network

Country Status (1)

Country Link
CN (1) CN116346582A (en)

Similar Documents

Publication Publication Date Title
CN109474465A (en) A kind of method and system of the high availability that can dynamically circulate based on server cluster
US5896503A (en) Managing membership of a domain of processors in a distributed computing environment
CN110581782B (en) Disaster tolerance data processing method, device and system
CN102394914A (en) Cluster brain-split processing method and device
US8880703B2 (en) Address distribution method, device and system thereof
CN110677282B (en) Hot backup method of distributed system and distributed system
CN107276839B (en) Self-monitoring method and system of cloud platform
CN112181660A (en) High-availability method based on server cluster
TWI677797B (en) Management method, system and equipment of master and backup database
CN106850255A (en) A kind of implementation method of multi-computer back-up
CN101903864A (en) Node system, server switching method, server device, and data transfer method
CN108173971A (en) A kind of MooseFS high availability methods and system based on active-standby switch
CN110704250A (en) Hot backup device of distributed system
CN110971662A (en) Two-node high-availability implementation method and device based on Ceph
CN109189854B (en) Method and node equipment for providing continuous service
CN105323271B (en) Cloud computing system and processing method and device thereof
CN107181608B (en) Method for recovering service and improving performance and operation and maintenance management system
CN111953808A (en) Data transmission switching method of dual-machine dual-active architecture and architecture construction system
CN116346582A (en) Method, device, equipment and storage medium for realizing redundancy of main network and standby network
CN114598593B (en) Message processing method, system, computing device and computer storage medium
CN114124803B (en) Device management method and device, electronic device and storage medium
CN111510336B (en) Network equipment state management method and device
CN114301763A (en) Distributed cluster fault processing method and system, electronic device and storage medium
CN114422335A (en) Communication method, communication device, server and storage medium
CN114020279A (en) Application software distributed deployment method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination