CN113625946A - Method, system and computer equipment for implementing arbitration of storage cluster - Google Patents

Method, system and computer equipment for implementing arbitration of storage cluster Download PDF

Info

Publication number
CN113625946A
CN113625946A CN202110720193.8A CN202110720193A CN113625946A CN 113625946 A CN113625946 A CN 113625946A CN 202110720193 A CN202110720193 A CN 202110720193A CN 113625946 A CN113625946 A CN 113625946A
Authority
CN
China
Prior art keywords
storage cluster
network communication
communication connection
storage
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110720193.8A
Other languages
Chinese (zh)
Inventor
张一罡
张璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110720193.8A priority Critical patent/CN113625946A/en
Publication of CN113625946A publication Critical patent/CN113625946A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The invention discloses a method, a system and computer equipment for realizing storage cluster arbitration, wherein the method comprises the following steps executed at an arbitration control end: establishing network communication connection with the storage cluster; monitoring the network communication connection state of the storage cluster according to the heartbeat message; if the network communication connection state with the storage cluster is in a disconnection state, sending a reconnection request to the storage cluster to reestablish network communication connection with the storage cluster; if the network communication connection with the storage cluster is not reestablished, recording the reconnection times, determining a time period based on the reconnection times, and sending a subsequent reconnection request to the storage cluster according to the time period so as to reestablish the network communication connection with the storage cluster; and receiving an Allegiance request sent by the storage node and arbitrating in response to the view of the storage cluster changing. By the scheme of the invention, the split phenomenon of the storage cluster caused by the unstable network connection state is solved, and the stability of the storage cluster is improved.

Description

Method, system and computer equipment for implementing arbitration of storage cluster
Technical Field
The invention relates to the technical field of cloud computing, in particular to a method, a system and computer equipment for realizing storage cluster arbitration.
Background
With the development of society, the performance requirements of enterprises on storage devices are higher and higher, and the performance requirements of the enterprises are difficult to meet by a single storage device. In this case, a plurality of storage devices are combined to form a storage cluster, and one storage cluster provides services to the outside uniformly. Storage clusters can greatly improve the processing power and availability of storage devices. However, when a storage cluster encounters a link failure, storage devices in the storage cluster are not visible with each other, so that one storage cluster is split into two or more storage sub-clusters, at this time, storage devices in the same storage sub-cluster can communicate with each other, but storage devices in two different storage sub-clusters cannot communicate with each other, so that two or more storage clusters provide services to the outside, and if storage devices in different storage sub-clusters access the same storage resource at the same time, an access error occurs, so that the phenomenon is called split brain. After a brain crack occurs, arbitration devices are generally required to provide arbitration services for the storage cluster, disconnected storage devices are removed, the storage device which occupies arbitration first serves as a new leader, a new storage cluster view is formed by the new leader, and the storage cluster is managed, so that the storage cluster generally has an arbitration function. The existing storage cluster generally adopts a half-passing mechanism to prevent split brain, namely, the split brain can only occur after more than half of the storage devices in the storage cluster are disconnected. Some brainstorming phenomena may be caused by unstable network connection state of the storage cluster, which causes temporary disconnection between some storage devices and the storage cluster, in order to enable the temporarily disconnected storage devices to be connected in the storage cluster as much as possible, in the prior art, detection is usually repeated for several times, and if heartbeat information from a certain node is not received for several consecutive times, the node is disconnected, so that a method can be provided, stability of network connection of the storage cluster is improved, brainstorming is effectively prevented, and arbitration service can be provided in time after the brainstorming.
Disclosure of Invention
The invention provides a method, a system and computer equipment for realizing storage cluster arbitration, which solve the problem of split brain of a storage cluster caused by unstable network connection state and improve the stability of the storage cluster.
Based on the above object, an aspect of the embodiments of the present invention provides a method for implementing storage cluster arbitration, which specifically includes the following steps:
establishing network communication connection with the storage cluster;
monitoring the network communication connection state with the storage cluster according to the heartbeat message;
if the network communication connection state with the storage cluster is monitored to be a disconnection state, sending a reconnection request to the storage cluster so as to reestablish network communication connection with the storage cluster;
if the network communication connection with the storage cluster is not reestablished, recording the reconnection times, determining a time period based on the reconnection times, and sending a subsequent reconnection request to the storage cluster according to the time period so as to reestablish the network communication connection with the storage cluster;
and responding to the view of the storage cluster to be changed, receiving an Allegrance request sent by the storage nodes in the storage cluster, and arbitrating.
In some embodiments, sending a reconnect request to the storage cluster to reestablish a network communication connection with the storage cluster comprises:
sending a reconnect request to the storage cluster;
and if a reconnection response of the storage cluster based on the reconnection request is received, determining that network communication connection is established with the storage cluster.
In some embodiments, determining the time period based on the number of reconnections comprises:
if the reconnection times are less than or equal to preset times, the time period is X;
and if the reconnection times are greater than the preset times, the time period is Y, wherein Y is greater than X.
In some embodiments, establishing a network communication connection with a storage cluster comprises:
establishing socket connection with storage nodes in the storage cluster;
sending a UID request to the storage cluster;
receiving a UID response sent by the storage cluster based on the UID request;
and after receiving the UID response, sending a connection request to the storage cluster so as to establish network communication connection with the storage cluster.
In some embodiments, monitoring a network communication connection status with the storage cluster according to heartbeat messages includes:
sending a heartbeat message request to the storage cluster at intervals of first preset time;
receiving a heartbeat message response sent by the storage cluster based on the heartbeat message request;
and monitoring the network communication connection state with the storage cluster according to the heartbeat message response.
In some embodiments, monitoring a network communication connection status with the storage cluster in response to the heartbeat message reply includes:
and if the time interval between the heartbeat message response and the heartbeat message request is within a second preset time range, determining that the network communication connection state of the storage cluster is a normal connection state.
In some embodiments, responding to the heartbeat message, monitoring a network communication connection status with the storage cluster, further comprising:
and if the time interval between the heartbeat message response and the heartbeat message request is not within a second preset time range, determining that the network communication connection state of the storage cluster is a disconnection state.
In some embodiments, receiving and arbitrating an Allegiance request sent by a storage node in the storage cluster includes:
receiving an Allegiance request sent by each storage node in the storage cluster in an arbitration period, and storing the identification of the storage node in the Allegiance request;
and after the arbitration period is finished, determining the storage node subjected to the first preemptive arbitration as a Boss node according to the identification, and forming a new cluster view by using the Boss node.
In another aspect of the embodiments of the present invention, a system for implementing storage cluster arbitration is further provided, where the system includes:
a connection establishing module configured to establish a network communication connection with the storage cluster;
a monitoring module configured to monitor a network communication connection state with the storage cluster according to a heartbeat message;
a first reconnection module configured to send a reconnection request to the storage cluster to reestablish a network communication connection with the storage cluster if it is monitored that the network communication connection state with the storage cluster is a disconnected state;
a second reconnection module configured to record reconnection times if network communication connection with the storage cluster is not reestablished, determine a time period based on the reconnection times, and send a subsequent reconnection request to the storage cluster according to the time period to reestablish network communication connection with the storage cluster;
the arbitration module is configured to receive and arbitrate an Allegiance request sent by a storage node in the storage cluster in response to a view of the storage cluster changing.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing a computer program executable on the processor, the program implementing the steps of the method as above when executed by the processor.
The invention has the following beneficial technical effects: the network connection state of the storage cluster is monitored through heartbeat messages, and the network connection of each storage node in the storage cluster is guaranteed to the maximum extent through a reconnection mechanism, so that split brains are effectively prevented, and the stability of the storage cluster is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a block diagram of an embodiment of a method for implementing storage cluster arbitration provided by the present invention;
FIG. 2 is a diagram illustrating an embodiment of a system for implementing storage cluster arbitration according to the present invention;
fig. 3 is a schematic structural diagram of an embodiment of a computer device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the foregoing, a first aspect of the embodiments of the present invention provides an embodiment of a method for implementing storage cluster arbitration. As shown in fig. 1, it includes the following steps executed on the arbitration control side:
s1, establishing network communication connection with the storage cluster;
s2, monitoring the network communication connection state of the storage cluster according to the heartbeat message;
s3, if the network communication connection state with the storage cluster is monitored to be a disconnection state, sending a reconnection request to the storage cluster so as to reestablish network communication connection with the storage cluster;
s4, if the network communication connection with the storage cluster is not reestablished, recording the reconnection times, determining a time period based on the reconnection times, and sending a subsequent reconnection request to the storage cluster according to the time period so as to reestablish the network communication connection with the storage cluster;
s5, responding to the view change of the storage cluster, receiving an Allegiance request sent by the storage nodes in the storage cluster and performing arbitration.
Specifically, a unique identifier is distributed to each storage node at an arbitration control end, an arbitration program is created, and network communication connection is established among the storage nodes in the storage cluster through the arbitration program; monitoring the network communication connection state between storage nodes in the storage cluster through the heartbeat message by an arbitration program; if the network communication connection state of a certain storage node in the storage cluster is in a disconnection state, sending a reconnection request to the storage cluster after an arbitration program is dormant for a preset time period, and reestablishing network communication connection with the storage cluster so as to enable the disconnected storage node to be connected into the storage cluster; if the reconnection of the arbitration program and the storage cluster fails, recording reconnection times, determining a time period based on the reconnection times, sending a reconnection request to the storage cluster according to the time period, and reestablishing network communication connection with the storage cluster; arbitration is performed by an arbiter in response to a change in the view formed by the storage nodes in the storage cluster, i.e., a split brain.
In this embodiment, through the continuous reconnection of the arbiter and the storage cluster, that is, the continuous reconnection of the disconnected storage node and the storage cluster, the network communication connection between each storage node in the storage cluster is maintained.
In some embodiments, sending a reconnect request to the storage cluster to reestablish a network communication connection with the storage cluster comprises:
sending a reconnect request to the storage cluster;
and if a reconnection response of the storage cluster based on the reconnection request is received, determining that network communication connection is established with the storage cluster.
Specifically, after the arbiter initiates a heartbeat message request, it may not receive a heartbeat message response returned by the storage node because the network is unstable, which may disconnect the network communication connection between the arbiter and each storage node. After the connection is disconnected, the arbitration program sleeps according to a preset time period, then sends a reconnection request to the storage cluster, and if a reconnection response sent by the storage cluster is received, the reconnection is determined to be successful.
In the embodiment, the reconnection of the arbitration program and the storage cluster is carried out, so that the connection of each storage node in the storage cluster is kept as much as possible, and the disconnection of the storage node from the storage cluster due to occasional network instability is solved.
In some embodiments, determining the time period based on the number of reconnections comprises:
if the reconnection times are less than or equal to preset times, the time period is X;
and if the reconnection times are greater than the preset times, the time period is Y, wherein Y is greater than X.
Specifically, after the first connection is disconnected, the arbitration program sleeps for a preset time period, for example, 10s, and then initiates a reconnection request, and if the reconnection is unsuccessful, the reconnection number M is recorded, the reconnection number M is compared with the preset number N, and if the reconnection is not successful for N consecutive times, the sleeping time of the arbitration program is increased. For example, a preset number N of times may be set to 60, and when the number M of reconnection times is equal to or less than 60, the time period of the arbitration procedure is X, and when the number M of reconnection times is greater than 60, the time period of the arbitration procedure is Y, where X may be set to 10s and Y may be set to 60 s.
In this embodiment, the reconnection time period is adjusted as the number of reconnection times increases, thereby reducing meaningless reconnection operations.
In some embodiments, establishing a network communication connection with a storage cluster comprises:
establishing socket connection with storage nodes in the storage cluster;
sending a UID request to the storage cluster;
receiving a UID response sent by the storage cluster based on the UID request;
and after receiving the UID response, sending a connection request to the storage cluster so as to establish network communication connection with the storage cluster.
Specifically, firstly, socket connection is established with a storage node, and then a UID request is sent to a storage cluster; after receiving the UID request of the arbiter, the storage cluster allocates UID for the arbiter and sends UID response to the arbiter; and after receiving the UID response, the arbitrator sends a connection request. After receiving the connection request, the storage cluster records the UID of the arbitration program into the storage cluster, and then sends a connection response as a reply; and after receiving the connection response, the arbitration program determines that the connection between the storage node corresponding to the arbitration program and the storage cluster is normally established. Wherein the UID request is to request the storage cluster to allocate UID to the arbitrator, and the UID is the unique identification of each IP arbitrator.
In some embodiments, monitoring a network communication connection status with the storage cluster from heartbeat messages includes:
sending a heartbeat message request to the storage cluster at intervals of first preset time;
receiving a heartbeat message response sent by the storage cluster based on the heartbeat message request;
and monitoring the network communication connection state of the storage cluster according to the heartbeat message response.
Specifically, the arbitration program sends heartbeat messages to the storage cluster at regular intervals, for example, 10s, for the network communication connection state of the storage cluster; after receiving the heartbeat message, the storage cluster sends a heartbeat message response to the arbitration program; the arbiter receives the heartbeat message indicating the network communication connection status of the storage cluster.
In some embodiments, monitoring a network communication connection status with the storage cluster according to the heartbeat message response includes:
and if the time interval between the heartbeat message response and the heartbeat message request is within a second preset time range, determining that the network communication connection state of the storage cluster is a normal connection state.
Specifically, after receiving the heartbeat message, the storage cluster detects whether the interval of the heartbeat message is within a normal range, if the interval of the heartbeat message is within a normal time range, the storage cluster sends a heartbeat message response to the arbitration program, and after receiving the heartbeat message, the arbitration program indicates the network communication connection state of the storage cluster.
In some embodiments, responding according to the heartbeat message, and monitoring a network communication connection status with the storage cluster, further includes:
and if the time interval between the heartbeat message response and the heartbeat message request is not within a second preset time range, determining that the network communication connection state of the storage cluster is a disconnection state.
Specifically, after receiving the heartbeat message, the storage cluster detects whether the interval of the heartbeat message is in a normal range, if the interval of the heartbeat message is not in the normal time range, the network communication connection state of the storage cluster is unstable, the storage cluster reports an alarm indicating that the network connection state is unstable to remind a user, and if the interval of the heartbeat message is in the normal time range, if the arbitration program does not receive the response of the heartbeat message for a long time, the connection with the storage cluster is actively disconnected.
In some embodiments, receiving and arbitrating an Allegiance request sent by a storage node in the storage cluster includes:
receiving an Allegiance request sent by the storage node in an arbitration period, and storing the identification of the storage node in the Allegiance request;
and after the arbitration period is finished, determining the storage node subjected to the first preemptive arbitration as a Boss node according to the identification, and forming a new cluster view by using the Boss node.
Specifically, after more than half of the nodes in the storage cluster are disconnected from the storage cluster, the storage nodes in the storage cluster send an alias request, the arbitration program receives the alias request in a preset arbitration period, stores an identifier of the storage node corresponding to the alias request, determines the storage node corresponding to the identifier of the first preemption arbitration as a Boss node after the arbitration period is ended, and forms a new cluster view with the Boss node.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 2, the present invention further provides an embodiment of a system for implementing storage cluster arbitration, where the system includes:
a connection establishing module 110, where the connection establishing module 110 is configured to establish a network communication connection with the storage cluster;
a monitoring module 120, the monitoring module 120 configured to monitor a network communication connection status with the storage cluster according to a heartbeat message;
a first reconnection module 130, where if it is monitored that the network communication connection state with the storage cluster is a disconnected state, the first reconnection module 130 is configured to send a reconnection request to the storage cluster to reestablish network communication connection with the storage cluster;
a second reconnection module 140, where the second reconnection module 140 is configured to record reconnection times if network communication connection with the storage cluster is not reestablished, determine a time period based on the reconnection times, and send a subsequent reconnection request to the storage cluster according to the time period to reestablish network communication connection with the storage cluster;
an arbitration module 150, where the arbitration module 150 is configured to receive and arbitrate an alias request sent by a storage node in the storage cluster in response to a view of the storage cluster changing.
In some embodiments, the first reconnect module 130 is further configured to:
sending a reconnect request to the storage cluster;
and if a reconnection response of the storage cluster based on the reconnection request is received, determining that network communication connection is established with the storage cluster.
In some embodiments, the second reconnecting module 140 is further configured to:
if the reconnection times are less than or equal to preset times, the time period is X;
and if the reconnection times are greater than the preset times, the time period is Y, wherein Y is greater than X.
In some embodiments, the establish connection module 110 is further configured to:
establishing socket connection with storage nodes in the storage cluster;
sending a UID request to the storage cluster;
receiving a UID response sent by the storage cluster based on the UID request;
and after receiving the UID response, sending a connection request to the storage cluster so as to establish network communication connection with the storage cluster.
In some embodiments, the monitoring module 120 is further configured to:
sending a heartbeat message request to the storage cluster at intervals of first preset time;
receiving a heartbeat message response sent by the storage cluster based on the heartbeat message request;
and monitoring the network communication connection state with the storage cluster according to the heartbeat message response.
In some embodiments, the monitoring module 120 is further configured to:
and if the time interval between the heartbeat message response and the heartbeat message request is within a second preset time range, determining that the network communication connection state of the storage cluster is a normal connection state.
In some embodiments, the monitoring module 120 is further configured to:
and if the time interval between the heartbeat message response and the heartbeat message request is not within a second preset time range, determining that the network communication connection state of the storage cluster is a disconnection state.
In some embodiments, the arbitration module 150 is further configured to:
receiving an Allegiance request sent by each storage node in the storage cluster in an arbitration period, and storing the identification of the storage node in the Allegiance request;
and after the arbitration period is finished, determining the storage node subjected to the first preemptive arbitration as a Boss node according to the identification, and forming a new cluster view by using the Boss node.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, the embodiment of the present invention further provides a computer device 20, in which the computer device 20 includes a processor 210 and a memory 220, the memory 220 stores a computer program 221 capable of running on the processor, and the processor 210 executes the program to perform the following steps:
s1, establishing network communication connection with the storage clusters in the storage clusters;
s2, monitoring the network communication connection state of the storage cluster according to the heartbeat message;
s3, if the network communication connection state with the storage cluster is monitored to be a disconnection state, sending a reconnection request to the storage cluster so as to reestablish network communication connection with the storage cluster;
s4, if the network communication connection with the storage cluster is not reestablished, recording the reconnection times, determining a time period based on the reconnection times, and sending a subsequent reconnection request to the storage cluster according to the time period so as to reestablish the network communication connection with the storage cluster;
s5, responding to the view change of the storage cluster, and performing arbitration.
In some embodiments, sending a reconnect request to the storage cluster to reestablish a network communication connection with the storage cluster comprises:
sending a reconnect request to the storage cluster;
and if a reconnection response of the storage cluster based on the reconnection request is received, determining that network communication connection is established with the storage cluster.
In some embodiments, determining the time period based on the number of reconnections comprises:
if the reconnection times are less than or equal to preset times, the time period is X;
and if the reconnection times are greater than the preset times, the time period is Y, wherein Y is greater than X.
In some embodiments, establishing a network communication connection with a storage cluster comprises:
establishing socket connection with storage nodes in the storage cluster;
sending a UID request to the storage cluster;
receiving a UID response sent by the storage cluster based on the UID request;
and after receiving the UID response, sending a connection request to the storage cluster so as to establish network communication connection with the storage cluster.
In some embodiments, monitoring a network communication connection status with the storage cluster according to heartbeat messages includes:
sending a heartbeat message request to the storage cluster at intervals of first preset time;
receiving a heartbeat message response sent by the storage cluster based on the heartbeat message request;
and monitoring the network communication connection state of the storage cluster according to the heartbeat message response.
In some embodiments, monitoring a network communication connection status with a storage cluster according to the heartbeat message response includes:
and if the time interval between the heartbeat message response and the heartbeat message request is within a second preset time range, determining that the network communication connection state of the storage cluster is a normal connection state.
In some embodiments, responding to the heartbeat message, and monitoring a network communication connection status with the storage cluster, further includes:
and if the time interval between the heartbeat message response and the heartbeat message request is not within a second preset time range, determining that the network communication connection state of the storage cluster is a disconnection state.
In some embodiments, receiving and arbitrating an Allegiance request sent by a storage node in the storage cluster includes:
receiving an Allegiance request sent by the storage node in an arbitration period, and storing the identification of the storage node in the Allegiance request;
and after the arbitration period is finished, determining the storage node subjected to the first preemptive arbitration as a Boss node according to the identification, and forming a new cluster view by using the Boss node.
In some embodiments, the invention also provides a computer readable storage medium, storing a computer program which, when executed by a processor, performs the steps of:
s1, establishing network communication connection with the storage clusters in the storage clusters;
s2, monitoring the network communication connection state of the storage cluster according to the heartbeat message;
s3, if the network communication connection state with the storage cluster is monitored to be a disconnection state, sending a reconnection request to the storage cluster so as to reestablish network communication connection with the storage cluster;
s4, if the network communication connection with the storage cluster is not reestablished, recording the reconnection times, determining a time period based on the reconnection times, and sending a subsequent reconnection request to the storage cluster according to the time period so as to reestablish the network communication connection with the storage cluster;
s5, responding to the view change of the storage cluster, and performing arbitration.
In some embodiments, sending a reconnect request to the storage cluster to reestablish a network communication connection with the storage cluster comprises:
sending a reconnect request to the storage cluster;
and if a reconnection response of the storage cluster based on the reconnection request is received, determining that network communication connection is established with the storage cluster.
In some embodiments, determining the time period based on the number of reconnections comprises:
if the reconnection times are less than or equal to preset times, the time period is X;
and if the reconnection times are greater than the preset times, the time period is Y, wherein Y is greater than X.
In some embodiments, establishing a network communication connection with a storage cluster comprises:
establishing socket connection with storage nodes in the storage cluster;
sending a UID request to the storage cluster;
receiving a UID response sent by the storage cluster based on the UID request;
and after receiving the UID response, sending a connection request to the storage cluster so as to establish network communication connection with the storage cluster.
In some embodiments, monitoring a network communication connection status with the storage cluster according to heartbeat messages includes:
sending a heartbeat message request to the storage cluster at intervals of first preset time;
receiving a heartbeat message response sent by the storage cluster based on the heartbeat message request;
and monitoring the network communication connection state of the storage cluster according to the heartbeat message response.
In some embodiments, monitoring a network communication connection status with a storage cluster according to the heartbeat message response includes:
and if the time interval between the heartbeat message response and the heartbeat message request is within a second preset time range, determining that the network communication connection state of the storage cluster is a normal connection state.
In some embodiments, responding to the heartbeat message, and monitoring a network communication connection status with the storage cluster, further includes:
and if the time interval between the heartbeat message response and the heartbeat message request is not within a second preset time range, determining that the network communication connection state of the storage cluster is a disconnection state.
In some embodiments, receiving and arbitrating an Allegiance request sent by a storage node in the storage cluster includes:
receiving an Allegiance request sent by the storage node in an arbitration period, and storing the identification of the storage node in the Allegiance request;
and after the arbitration period is finished, determining the storage node subjected to the first preemptive arbitration as a Boss node according to the identification, and forming a new cluster view by using the Boss node.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for implementing storage cluster arbitration is characterized by comprising the following steps executed at an arbitration control end:
establishing network communication connection with the storage cluster;
monitoring the network communication connection state with the storage cluster according to the heartbeat message;
if the network communication connection state with the storage cluster is monitored to be a disconnection state, sending a reconnection request to the storage cluster so as to reestablish network communication connection with the storage cluster;
if the network communication connection with the storage cluster is not reestablished, recording the reconnection times, determining a time period based on the reconnection times, and sending a subsequent reconnection request to the storage cluster according to the time period so as to reestablish the network communication connection with the storage cluster;
and responding to the view of the storage cluster to be changed, receiving an Allegrance request sent by the storage nodes in the storage cluster, and arbitrating.
2. The method of claim 1, wherein sending a reconnect request to the storage cluster to reestablish a network communication connection with the storage cluster comprises:
sending a reconnect request to the storage cluster;
and if a reconnection response of the storage cluster based on the reconnection request is received, determining that network communication connection is established with the storage cluster.
3. The method of claim 1, wherein determining the time period based on the number of reconnections comprises:
if the reconnection times are less than or equal to preset times, the time period is X;
and if the reconnection times are greater than the preset times, the time period is Y, wherein Y is greater than X.
4. The method of claim 1, wherein establishing a network communication connection with a storage cluster comprises:
establishing socket connection with storage nodes in the storage cluster;
sending a UID request to the storage cluster;
receiving a UID response sent by the storage cluster based on the UID request;
and after receiving the UID response, sending a connection request to the storage cluster so as to establish network communication connection with the storage cluster.
5. The method of claim 1, wherein monitoring a network communication connection status with the storage cluster based on heartbeat messages comprises:
sending a heartbeat message request to the storage cluster at intervals of first preset time;
receiving a heartbeat message response sent by the storage cluster based on the heartbeat message request;
and monitoring the network communication connection state with the storage cluster according to the heartbeat message response.
6. The method of claim 5, wherein monitoring a network communication connection status with the storage cluster in response to the heartbeat message reply comprises:
and if the time interval between the heartbeat message response and the heartbeat message request is within a second preset time range, determining that the network communication connection state of the storage cluster is a normal connection state.
7. The method of claim 5, wherein monitoring a network communication connection status with the storage cluster in response to the heartbeat message, further comprises:
and if the time interval between the heartbeat message response and the heartbeat message request is not within a second preset time range, determining that the network communication connection state of the storage cluster is a disconnection state.
8. The method of claim 1, wherein receiving and arbitrating an Allegiance request sent by a storage node in the storage cluster comprises:
receiving an Allegiance request sent by each storage node in the storage cluster in an arbitration period, and storing the identification of the storage node in the Allegiance request;
and after the arbitration period is finished, determining the storage node subjected to the first preemptive arbitration as a Boss node according to the identification, and forming a new cluster view by using the Boss node.
9. A system for implementing storage cluster arbitration, comprising:
a connection establishing module configured to establish a network communication connection with the storage cluster;
a monitoring module configured to monitor a network communication connection state with the storage cluster according to a heartbeat message;
a first reconnection module configured to send a reconnection request to the storage cluster to reestablish a network communication connection with the storage cluster if it is monitored that the network communication connection state with the storage cluster is a disconnected state;
a second reconnection module configured to record reconnection times if network communication connection with the storage cluster is not reestablished, determine a time period based on the reconnection times, and send a subsequent reconnection request to the storage cluster according to the time period to reestablish network communication connection with the storage cluster;
the arbitration module is configured to receive and arbitrate an Allegiance request sent by a storage node in the storage cluster in response to a view of the storage cluster changing.
10. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-8.
CN202110720193.8A 2021-06-28 2021-06-28 Method, system and computer equipment for implementing arbitration of storage cluster Pending CN113625946A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110720193.8A CN113625946A (en) 2021-06-28 2021-06-28 Method, system and computer equipment for implementing arbitration of storage cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110720193.8A CN113625946A (en) 2021-06-28 2021-06-28 Method, system and computer equipment for implementing arbitration of storage cluster

Publications (1)

Publication Number Publication Date
CN113625946A true CN113625946A (en) 2021-11-09

Family

ID=78378554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110720193.8A Pending CN113625946A (en) 2021-06-28 2021-06-28 Method, system and computer equipment for implementing arbitration of storage cluster

Country Status (1)

Country Link
CN (1) CN113625946A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743551A (en) * 2022-09-30 2023-09-12 腾讯云计算(北京)有限责任公司 Main-standby switching method and device of equipment gateway and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458436A (en) * 2012-05-31 2013-12-18 中兴通讯股份有限公司 Method and device for detecting alive-keeping of link between AC and AP
CN104991850A (en) * 2015-06-27 2015-10-21 广州华多网络科技有限公司 Heartbeat package control method and apparatus for application program
US9292371B1 (en) * 2013-12-11 2016-03-22 Symantec Corporation Systems and methods for preventing failures of nodes in clusters
WO2016106682A1 (en) * 2014-12-31 2016-07-07 华为技术有限公司 Post-cluster brain split quorum processing method and quorum storage device and system
US20170116084A1 (en) * 2015-10-26 2017-04-27 Beijing Baidu Netcom Science And Technology, Ltd. Method and System for Monitoring Virtual Machine Cluster
CN111510492A (en) * 2020-04-15 2020-08-07 矩阵元技术(深圳)有限公司 Data processing method, device, equipment and system for realizing disconnection reconnection
CN112468361A (en) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 Network connection state monitoring method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458436A (en) * 2012-05-31 2013-12-18 中兴通讯股份有限公司 Method and device for detecting alive-keeping of link between AC and AP
US9292371B1 (en) * 2013-12-11 2016-03-22 Symantec Corporation Systems and methods for preventing failures of nodes in clusters
WO2016106682A1 (en) * 2014-12-31 2016-07-07 华为技术有限公司 Post-cluster brain split quorum processing method and quorum storage device and system
CN104991850A (en) * 2015-06-27 2015-10-21 广州华多网络科技有限公司 Heartbeat package control method and apparatus for application program
US20170116084A1 (en) * 2015-10-26 2017-04-27 Beijing Baidu Netcom Science And Technology, Ltd. Method and System for Monitoring Virtual Machine Cluster
CN111510492A (en) * 2020-04-15 2020-08-07 矩阵元技术(深圳)有限公司 Data processing method, device, equipment and system for realizing disconnection reconnection
CN112468361A (en) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 Network connection state monitoring method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743551A (en) * 2022-09-30 2023-09-12 腾讯云计算(北京)有限责任公司 Main-standby switching method and device of equipment gateway and computer readable storage medium

Similar Documents

Publication Publication Date Title
EP2691859B1 (en) Fault detection and recovery as a service
CN106330475B (en) Method and device for managing main and standby nodes in communication system and high-availability cluster
CN110764963B (en) Service exception handling method, device and equipment
CN109040184B (en) Host node election method and server
WO2005077060A2 (en) System and method for requesting and granting access to a network channel
CN107508694B (en) Node management method and node equipment in cluster
CN111225401B (en) Method and related equipment for realizing disaster tolerance
US20130139178A1 (en) Cluster management system and method
US9146794B2 (en) Enhanced arbitration protocol for nodes in a cluster
CN106230622B (en) Cluster implementation method and device
CN113625946A (en) Method, system and computer equipment for implementing arbitration of storage cluster
CN114844809A (en) Multi-factor arbitration method and device based on network heartbeat and kernel disk heartbeat
CN112887367B (en) Method, system and computer readable medium for realizing high availability of distributed cluster
WO2014108862A2 (en) Method and system for the handling of redundant long poll
CN111490859A (en) Switching method and device of ARQ mode
CN112631756A (en) Distributed regulation and control method and device applied to space flight measurement and control software
CN111865659A (en) Method and device for switching master controller and slave controller, controller and network equipment
CN113765690A (en) Cluster switching method, system, device, terminal, server and storage medium
CN114039978B (en) Decentralized PoW computing power cluster deployment method
CN115514719A (en) Message sending method, device, switch and readable storage medium
CN111586110B (en) Optimization processing method for raft in point-to-point fault
CN113873008B (en) Connection reconfiguration method, device, system and medium for RDMA network node
CN114884805A (en) Data transmission method, device, terminal and storage medium
CN113162797B (en) Method, system and medium for switching master node fault of distributed cluster
CN110650135B (en) Node processing method, related equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination