CN113259188A - Method for constructing large-scale redis cluster - Google Patents

Method for constructing large-scale redis cluster Download PDF

Info

Publication number
CN113259188A
CN113259188A CN202110798396.9A CN202110798396A CN113259188A CN 113259188 A CN113259188 A CN 113259188A CN 202110798396 A CN202110798396 A CN 202110798396A CN 113259188 A CN113259188 A CN 113259188A
Authority
CN
China
Prior art keywords
node
nodes
management
slave
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110798396.9A
Other languages
Chinese (zh)
Inventor
于光杰
刘启铨
曾力耕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Whale Cloud Technology Co Ltd
Original Assignee
Whale Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Whale Cloud Technology Co Ltd filed Critical Whale Cloud Technology Co Ltd
Priority to CN202110798396.9A priority Critical patent/CN113259188A/en
Publication of CN113259188A publication Critical patent/CN113259188A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method for constructing a large-scale redis cluster, which comprises the following steps: establishing communication connection with other remote dictionary service nodes according to different attributes of the remote dictionary service nodes; carrying out information synchronization through the established communication connection and a communication connection module based on the cluster nodes; if the nodes in the cluster are hung, each management node carries out fault negotiation judgment on the node; and if the slave nodes initiate election voting, voting is carried out by each management node and a new main node is selected, and meanwhile the new main node replaces the failed main node. Has the advantages that: the invention can greatly reduce the number of network connections to be established for communication among cluster nodes; network bandwidth occupied by communication among cluster nodes is effectively reduced, and occupied host resources are further reduced, so that the cluster scale can reach thousands of nodes.

Description

Method for constructing large-scale redis cluster
Technical Field
The invention relates to the field of distributed cache, in particular to a method for constructing a large-scale redis cluster.
Background
An open source Redis (remote dictionary service) product provides a Redis Cluster mode, a Cluster without a central architecture is adopted, and in order to maintain the uniformity of Cluster state information, information exchange needs to be carried out among nodes in the Cluster: each node in the cluster is connected with all other nodes, the cluster is called Gossip (Gossip) by adopting a message exchange mode, each Redis node periodically exchanges information with other nodes, and the information exchange with all nodes needs to be completed within a certain time, so that all nodes finally achieve the consistency of the information.
As shown in fig. 5, in this cluster construction manner, because the cluster nodes are connected two by two, as the number of nodes in the cluster increases, the number of cluster network connections will increase exponentially; and with the increase of the number of cluster nodes, the network bandwidth occupation of the cluster communication is also greatly increased, thereby occupying more host resources. The cluster construction scale is limited by this, and more than 200 nodes are not recommended.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a method for constructing a large-scale redis cluster, so as to overcome the technical problems in the prior related art.
Therefore, the invention adopts the following specific technical scheme:
a method of constructing a large-scale redis cluster, the method comprising the steps of:
s1, establishing communication connection with other remote dictionary service nodes according to different attributes of the remote dictionary service nodes;
s2, carrying out information synchronization through the established communication connection and the communication connection module based on the cluster nodes;
s3, if the nodes in the cluster are hung, each management node carries out fault negotiation judgment on the node;
s4, if the slave node initiates election voting, each management node votes and selects a new master node, and the new master node replaces the failed master node;
the cluster nodes are divided into management nodes and common nodes, the management nodes are connected and communicated with all other nodes, fault judgment and master-slave fault election of the nodes are carried out, and the common nodes are connected and communicated with the management nodes and the nodes in master-slave relation.
Further, if the node in the cluster is hung up in S3, the negotiation determination of the fault by each management node for the node further includes the following steps:
s31, the management node starts and establishes long-term communication connection with all other nodes;
s32, if the connection between the management node and each node is failed to be established, the reconnection is carried out at regular time until the connection is established;
s33, starting the common node and establishing long-term communication connection with all management nodes and the owner/slave nodes of the common node;
s34, if the connection between the common node and other nodes fails to be established, the reconnection is carried out at regular time until the connection is established;
s35, each node performs one-time heartbeat communication with all connected nodes within a timeout period t1, and records the initiation time of a heartbeat request packet of the heartbeat communication;
s36, after each node receives the heartbeat request packet of other nodes in the step S35, the state of each node is checked, and a heartbeat response packet is replied;
and S37, after receiving the heartbeat response packet of the other node in the step S36, each node resets the initiation time of the heartbeat request packet to 0 and processes the state of each node in the heartbeat request packet.
Further, the total number expression of the connections finally established by the management node is as follows:
e (m, n) = (m-1 + n) × m; the number of management nodes is represented as m, and the number of common nodes is represented as n.
Further, the total number expression of the connections finally established by the common node is as follows:
f (m, n, r) is less than or equal to m x n and less than or equal to (m + r-1) x n; the number of management nodes is represented as m, the number of common nodes is represented as n, and the number of data copies is represented as r.
Further, the contents of the heartbeat request packet and the heartbeat response packet both include fragment information of the node and state information of 10% of nodes.
Further, in S35, each node performs one heartbeat communication with all connected nodes within a timeout period t1, and recording the initiation time of the heartbeat request packet of this heartbeat communication further includes the following steps:
s351, if the heartbeat request packet of each node in the step S35 does not receive the heartbeat response packet of the step S36 within the over-half timeout period t1/2, disconnecting the node, and reestablishing the connection in the steps S31-S34;
and S352, if the heartbeat request packet in the step S35 of each node does not receive the heartbeat response packet in the step S36 within the timeout period t1, marking the node as a suspected fault state.
Further, after receiving the heartbeat request packet of the other node in step S35, each node in S36 checks the status of each node included in the heartbeat request packet, and replies to the heartbeat response packet further includes the following steps:
s361, if the sending node is a management node and a node marked as a suspected fault state exists in the heartbeat request packet, adding the node in the suspected fault state into a suspected fault report list of the node recorded in the node;
s362, if the node is a management node and the report number of the node in the suspected fault state exceeds half of the management nodes, modifying the node state in the suspected fault state from the suspected fault state to a fault, simultaneously broadcasting and notifying all other surviving nodes, and marking the fault node state as a fault;
s363, if the sending node is a management node and there is a node marked as a failure state in the heartbeat request packet, marking the corresponding node state recorded in the node as a failure;
s364, if the sending node is a management node and the heartbeat request packet contains a node marked as a normal state, deleting the node in the normal state from a suspected fault report list of the node recorded in the node;
s365, if the node is a common node and the state of the node is a fault, the state of the node needs to be recovered to be normal;
after receiving the heartbeat response packet of the other node in step S36, each node in S37 resets the initiation time of the heartbeat request packet to 0, and performs processing on the state of each node in the heartbeat request packet, wherein the processing steps are the same as steps 361-365.
Further, if the number of reports of the suspected node that is recorded by the management node is x, the conditional expression for determining the node fault is as follows:
Figure 100002_DEST_PATH_IMAGE001
wherein, the number of the management nodes is represented as m.
Further, in S4, if the slave node initiates election voting, each management node votes and elects a new master node, and the replacement of the failed master node by the new master node further includes the following steps:
s41, when the master node state of the slave node is failure, the slave node sends a master-slave failure switching voting request to all management nodes;
s42, after receiving the failover voting request in step S41, if the master node has not voted for the failure in the voting period 2 × t1, the management node replies a response packet to the slave node initiating the voting;
s43, after receiving the voting response returned by the management node in the step S42 from the node, adding 1 to the voting count;
s44, if the voting count exceeds half of the management nodes, the slave node switches itself to the master node and informs all other connection nodes by broadcasting, the data fragment is switched from master to slave, and one management node is selected for secondary broadcasting notification;
s45, after each node receives the master-slave switching notice in the step S44, the node updates the slave node corresponding to the data fragment new master node, the original master node is updated to be the slave node, and if the node is a management node needing secondary broadcasting, the master-slave switching notice of the data fragment is sent to all other connection nodes again;
and S46, after the fault node is restarted, obtaining cluster node fragment information from other nodes, and switching to the slave node according to the information and accessing the cluster.
Further, if the vote number in S44 is represented as y, the conditional expression that the slave node can perform master-slave switching is as follows:
Figure 937352DEST_PATH_IMAGE002
wherein, the number of the management nodes is represented as m.
The invention has the beneficial effects that:
the method for constructing the large-scale redis cluster divides cluster nodes into management nodes and common nodes, wherein the management nodes are connected and communicated with all other nodes, and fault judgment and voting election rights of the nodes are carried out; the common nodes are only connected and communicated with the management nodes and the nodes in the master-slave relationship, and all cluster nodes are connected by adopting a small number of management nodes, so that the number of network connections to be established for communication among the cluster nodes can be greatly reduced; meanwhile, because a large number of common nodes only carry out Cluster communication with the management node, the network bandwidth occupied by communication among the Cluster nodes is effectively reduced, and the occupied host resources are reduced, so that the Redis Cluster Cluster of thousands of scale nodes can be supported.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a method of building a large-scale redis cluster according to an embodiment of the invention;
FIG. 2 is a network connection topology diagram of cluster nodes of a method for constructing a large-scale redis cluster according to an embodiment of the present invention;
FIG. 3 is a timing diagram of node failure determination of a method for constructing a large-scale redis cluster according to an embodiment of the present invention;
FIG. 4 is a timing diagram illustrating master-slave failure elections in a method for constructing a large-scale redis cluster according to an embodiment of the present invention;
fig. 5 is a diagram of an existing cluster construction method.
Detailed Description
For further explanation of the various embodiments, the drawings which form a part of the disclosure and which are incorporated in and constitute a part of this specification, illustrate embodiments and, together with the description, serve to explain the principles of operation of the embodiments, and to enable others of ordinary skill in the art to understand the various embodiments and advantages of the invention, and, by reference to these figures, reference is made to the accompanying drawings, which are not to scale and wherein like reference numerals generally refer to like elements.
According to the embodiment of the invention, the method for constructing the large-scale redis cluster is provided, and based on the gossip protocol, the cluster scale constructed by the method can reach thousands of nodes.
Referring now to the drawings and the detailed description, as shown in fig. 1-2, a method for constructing a large-scale redis (open-source key-value data caching system with a very wide application) cluster according to an embodiment of the present invention includes the following steps:
s1, selectively establishing communication connection with other remote dictionary service nodes according to different attributes of the remote dictionary service (redis) nodes; (Cluster communication connection)
S2, synchronizing information among clusters through the established communication connection and the communication connection module based on the cluster nodes, and finally enabling the fragment information in each node in the cluster to be consistent; (Cluster slice information synchronization)
S3, if the nodes in the cluster are hung, each management node carries out fault negotiation judgment on the node;
as shown in fig. 3, if the node in the cluster is hung up in S3, the negotiation determination of the failure by each management node for the node further includes the following steps:
s31, the management node starts and establishes long-term communication connection with all other nodes;
s32, if the connection between the management node and each node is failed to be established, the reconnection is carried out at regular time until the connection is established; (management node needs to be connected to all other nodes so that the surviving status of each node can be detected)
The expression of the total number of connections finally established by the management node can be expressed by the following logic:
e (m, n) = (m-1 + n) × m, where the number of management nodes is represented as m and the number of general nodes is represented as n.
S33, starting the common node and establishing long-term communication connection with all management nodes and the owner/slave nodes of the common node;
s34, if the connection between the common node and other nodes fails to be established, the reconnection is carried out at regular time until the connection is established; (the common node only needs to establish connection with the management node and the master-slave node, thus greatly reducing the connection quantity of the cluster and the occupation of network bandwidth)
The expression of the total number of connections finally established by the common node can be expressed by the following logic:
and m is less than or equal to F (m, n, r) is less than or equal to (m + r-1) x n, wherein the number of management nodes is represented as m, the number of common nodes is represented as n, and the number of data copies is represented as r (namely the number of master and slave nodes contained in each data fragment).
S35, each node performs one-time heartbeat communication with all connected nodes within a timeout period t1, the content of a heartbeat request packet comprises fragment information of the node and state information of 10% of the nodes, and the initiation time of the heartbeat request packet of the heartbeat communication is recorded; (i.e., using the concept of Gossip protocol: a node wants to share some information with other nodes in the cluster. nodes are selected randomly and passed on periodically
Wherein, in S35, each node performs one heartbeat communication with all connected nodes within a timeout period t1, and records the initiation time of the heartbeat request packet of this heartbeat communication, further including the following steps:
s351, if the heartbeat request packet of each node in the step S35 does not receive the heartbeat response packet of the step S36 within the over-half timeout period t1/2, disconnecting the node, and reestablishing the connection in the steps S31-S34; (the connection is reestablished without receiving the response packet in the period t1/2, the misjudgment of the node state abnormity caused by the network quality can be avoided)
And S352, if the heartbeat request packet in the step S35 of each node does not receive the heartbeat response packet in the step S36 within the timeout period t1, marking the node as a suspected fault (PFAIL) state.
S36, after receiving the heartbeat request packet of other nodes in the step S35, each node checks the state of each node and replies a heartbeat response packet, wherein the content of the heartbeat response packet comprises the fragment information of the node and the state information of 10% of the nodes; (step S35, the concept of Gossip protocol)
Wherein, after each node in S36 receives the heartbeat request packet of another node in step S35, the method checks the status of each node included in the heartbeat request packet, and returns a heartbeat response packet further includes the following steps:
s361, if the sending node is a management node and there is a node marked as a suspected fault (PFAIL) state in the heartbeat request packet, adding the node in the suspected fault state to the suspected fault report list of the node recorded in the node;
s362, if the node is a management node and the report number of the node in the suspected fault state exceeds half of the management nodes, modifying the node state in the suspected fault state from a suspected fault (PFAIL) to a Fault (FAIL), simultaneously broadcasting and notifying all other surviving nodes, and marking the fault node state as a Fault (FAIL);
and if the report number of the suspected fault node recorded by the management node is represented as x, the conditional expression for judging the node Fault (FAIL) can be represented by the following logic:
Figure 715252DEST_PATH_IMAGE001
wherein, the number of the management nodes is represented as m.
S363, if the sending node is a management node and there is a node marked as a Failure (FAIL) state in the heartbeat request packet, marking the corresponding node state recorded in the node as a Failure (FAIL);
s364, if the sending node is a management node and the heartbeat request packet contains a node marked as a normal state, deleting the node in the normal state from a suspected fault report list of the node recorded in the node;
s365, if the node is a common node and the state of the node is a Fault (FAIL), the state of the node needs to be recovered to be normal;
and S37, after receiving the heartbeat response packet of the other node in the step S36, each node resets the initiation time of the heartbeat request packet to 0 and processes the state of each node in the heartbeat request packet. The steps of the process are the same as steps 361-365. (reset request packet initiation time is 0, which indicates detection is complete, no timeout determination is made, and heartbeat detection can be performed again in the next cycle)
S4, if the slave node initiates election voting, each management node votes and selects a new master node, and the new master node replaces the failed master node; (S3 and S4 are selected for the Cluster node)
As shown in fig. 4, if the slave node initiates election voting in S4, each management node votes and selects a new master node, and the new master node replaces the failed master node, further includes the following steps:
s41, when the master node state of the slave node is Failure (FAIL), the slave node sends a master-slave failure switching voting request to all management nodes;
s42, after receiving the failover voting request in step S41, if the master node has not voted for the failure in the voting period 2 × t1, the management node replies a response packet to the slave node initiating the voting; (within each voting period, the management node can only respond once to the switching voting of a certain main node, thereby controlling at most one slave node to obtain enough votes)
S43, after receiving the voting response returned by the management node in the step S42 from the node, adding 1 to the voting count;
s44, if the voting count exceeds half of the management nodes, the slave node switches itself to the master node and informs all other connection nodes by broadcasting, the data fragment is switched from master to slave, and one management node is selected for secondary broadcasting notification; (Master-slave switching can only be done if more than half of the management nodes are voted, so only one slave node will become the new master node)
And the number of votes is represented as y, the conditional expression that the slave node can perform master-slave switching is as follows:
Figure DEST_PATH_IMAGE003
wherein, the number of the management nodes is represented as m.
S45, after each node receives the master-slave switching notice in the step S44, the node updates the slave node corresponding to the data fragment new master node, the original master node is updated to be the slave node, and if the node is a management node needing secondary broadcasting, the master-slave switching notice of the data fragment is sent to all other connection nodes again; (since the management node will connect all other nodes in the cluster, the secondary broadcast is performed through the management node, compared with Gossip propagation, the speed of synchronizing the master-slave switching information to all nodes in the cluster can be improved)
And S46, after the fault node is restarted, obtaining cluster node fragment information from other nodes, and switching to the slave node according to the information and accessing the cluster.
In summary, in the method for constructing a large-scale redis cluster of the present invention, the cluster nodes are divided into the management nodes and the common nodes, wherein the management nodes are connected to and communicate with all other nodes, and perform fault determination and voting right of the nodes; the common nodes are only connected and communicated with the management nodes and the nodes in the master-slave relationship, and all cluster nodes are connected by adopting a small number of management nodes, so that the number of network connections to be established for communication among the cluster nodes can be greatly reduced; meanwhile, because a large number of common nodes only carry out Cluster communication with the management node, the network bandwidth occupied by communication among the Cluster nodes is effectively reduced, and the occupied host resources are reduced, so that the Redis Cluster Cluster of thousands of scale nodes can be supported.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method of constructing a large-scale redis cluster, the method comprising the steps of:
s1, establishing communication connection with other remote dictionary service nodes according to different attributes of the remote dictionary service nodes;
s2, carrying out information synchronization through the established communication connection and the communication connection module based on the cluster nodes;
s3, if the nodes in the cluster are hung, each management node carries out fault negotiation judgment on the node;
s4, if the slave node initiates election voting, each management node votes and selects a new master node, and the new master node replaces the failed master node;
the cluster nodes are divided into management nodes and common nodes, the management nodes are connected and communicated with all other nodes, fault judgment and master-slave fault election of the nodes are carried out, and the common nodes are connected and communicated with the management nodes and the nodes in master-slave relation.
2. The method according to claim 1, wherein if a node in the cluster hangs up in S3, the negotiation judgment of the node by each management node for failure further comprises the following steps:
s31, the management node starts and establishes long-term communication connection with all other nodes;
s32, if the connection between the management node and each node is failed to be established, the reconnection is carried out at regular time until the connection is established;
s33, starting the common node and establishing long-term communication connection with all management nodes and the owner/slave nodes of the common node;
s34, if the connection between the common node and other nodes fails to be established, the reconnection is carried out at regular time until the connection is established;
s35, each node performs one-time heartbeat communication with all connected nodes within a timeout period t1, and records the initiation time of a heartbeat request packet of the heartbeat communication;
s36, after each node receives the heartbeat request packet of other nodes in the step S35, the state of each node is checked, and a heartbeat response packet is replied;
and S37, after receiving the heartbeat response packet of the other node in the step S36, each node resets the initiation time of the heartbeat request packet to 0 and processes the state of each node in the heartbeat request packet.
3. The method for building a large-scale redis cluster according to claim 2, wherein the total number of connections finally established by the management node is expressed as:
e (m, n) = (m-1 + n) × m; the number of management nodes is represented as m, and the number of common nodes is represented as n.
4. The method for constructing a large-scale redis cluster according to claim 3, wherein the total number of connections finally established by the regular nodes is expressed as:
m×n≤F(m,n,r)≤(m+r-1)×n;
the number of management nodes is represented as m, the number of common nodes is represented as n, and the number of data copies is represented as r.
5. The method of claim 4, wherein the contents of the heartbeat request packet and the heartbeat response packet each include fragmentation information of the node and status information of 10% of the nodes.
6. The method of claim 5, wherein each node in the S35 performs a heartbeat communication with all connected nodes within a timeout period t1, and records an initiation time of a heartbeat request packet of the heartbeat communication, further comprising the following steps:
s351, if the heartbeat request packet of each node in the step S35 does not receive the heartbeat response packet in the step S36 within the over-half timeout period t/2, disconnecting the node, and reestablishing the connection in the steps S31 to S34;
and S352, if the heartbeat request packet in the step S35 of each node does not receive the heartbeat response packet in the step S36 within the timeout period t1, marking the node as a suspected fault state.
7. The method of claim 6, wherein the nodes in S36 check the states of the nodes after receiving the heartbeat request packet from other nodes in S35, and reply to the heartbeat response packet further comprises the following steps:
s361, if the sending node is a management node and a node marked as a suspected fault state exists in the heartbeat request packet, adding the node in the suspected fault state into a suspected fault report list of the node recorded in the node;
s362, if the node is a management node and the report number of the node in the suspected fault state exceeds half of the management nodes, modifying the node state in the suspected fault state from the suspected fault state to a fault, simultaneously broadcasting and notifying all other surviving nodes, and marking the fault node state as a fault;
s363, if the sending node is a management node and there is a node marked as a failure state in the heartbeat request packet, marking the corresponding node state recorded in the node as a failure;
s364, if the sending node is a management node and the heartbeat request packet contains a node marked as a normal state, deleting the node in the normal state from a suspected fault report list of the node recorded in the node;
s365, if the node is a common node and the state of the node is a fault, the state of the node needs to be recovered to be normal;
after receiving the heartbeat response packet of the other node in step S36, each node in S37 resets the initiation time of the heartbeat request packet to 0, and performs processing on the state of each node in the heartbeat request packet, wherein the processing steps are the same as steps 361-365.
8. The method of claim 7, wherein the number of reports of suspected failed nodes recorded by the management node is represented by x, and the conditional expression for determining node failure is as follows:
Figure DEST_PATH_IMAGE001
wherein, the number of the management nodes is represented as m.
9. The method according to claim 1, wherein in S4, if the slave node initiates election voting, each management node votes and elects a new master node, and the new master node replaces the failed master node, further comprising:
s41, when the master node state of the slave node is failure, the slave node sends a master-slave failure switching voting request to all management nodes;
s42, after receiving the failover voting request in step S41, if the master node has not voted for the failure in the voting period 2 × t1, the management node replies a response packet to the slave node initiating the voting;
s43, after receiving the voting response returned by the management node in the step S42 from the node, adding 1 to the voting count;
s44, if the voting count exceeds half of the management nodes, the slave node switches itself to the master node and informs all other connection nodes by broadcasting, the data fragment is switched from master to slave, and one management node is selected for secondary broadcasting notification;
s45, after each node receives the master-slave switching notice in the step S44, the node updates the slave node corresponding to the data fragment new master node, the original master node is updated to be the slave node, and if the node is a management node needing secondary broadcasting, the master-slave switching notice of the data fragment is sent to all other connection nodes again;
and S46, after the fault node is restarted, obtaining cluster node fragment information from other nodes, and switching to the slave node according to the information and accessing the cluster.
10. The method for constructing a large-scale redis cluster according to claim 9, wherein if the vote number in S44 is represented as y, the conditional expression that the slave node can perform master-slave switching is as follows:
Figure 939781DEST_PATH_IMAGE002
wherein, the number of the management nodes is represented as m.
CN202110798396.9A 2021-07-15 2021-07-15 Method for constructing large-scale redis cluster Pending CN113259188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110798396.9A CN113259188A (en) 2021-07-15 2021-07-15 Method for constructing large-scale redis cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110798396.9A CN113259188A (en) 2021-07-15 2021-07-15 Method for constructing large-scale redis cluster

Publications (1)

Publication Number Publication Date
CN113259188A true CN113259188A (en) 2021-08-13

Family

ID=77180380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110798396.9A Pending CN113259188A (en) 2021-07-15 2021-07-15 Method for constructing large-scale redis cluster

Country Status (1)

Country Link
CN (1) CN113259188A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113783735A (en) * 2021-09-24 2021-12-10 小红书科技有限公司 Method, device, equipment and medium for identifying fault node in Redis cluster
CN114363357A (en) * 2021-12-28 2022-04-15 山东浪潮科学研究院有限公司 Distributed database network connection management method based on Gossip
CN114666202A (en) * 2022-03-18 2022-06-24 中国建设银行股份有限公司 Monitoring method and device for master-slave switching based on cloud database

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106210151A (en) * 2016-09-27 2016-12-07 深圳市彬讯科技有限公司 A kind of zedis distributed caching and server cluster monitoring method
CN106656624A (en) * 2017-01-04 2017-05-10 合肥康捷信息科技有限公司 Optimization method based on Gossip communication protocol and Raft election algorithm
EP3422668A4 (en) * 2016-05-16 2019-09-25 Bai, Yang Bai yang messaging port switch service
CN111506421A (en) * 2020-04-02 2020-08-07 浙江工业大学 Availability method for realizing Redis cluster

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3422668A4 (en) * 2016-05-16 2019-09-25 Bai, Yang Bai yang messaging port switch service
CN106210151A (en) * 2016-09-27 2016-12-07 深圳市彬讯科技有限公司 A kind of zedis distributed caching and server cluster monitoring method
CN106656624A (en) * 2017-01-04 2017-05-10 合肥康捷信息科技有限公司 Optimization method based on Gossip communication protocol and Raft election algorithm
CN111506421A (en) * 2020-04-02 2020-08-07 浙江工业大学 Availability method for realizing Redis cluster

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113783735A (en) * 2021-09-24 2021-12-10 小红书科技有限公司 Method, device, equipment and medium for identifying fault node in Redis cluster
CN114363357A (en) * 2021-12-28 2022-04-15 山东浪潮科学研究院有限公司 Distributed database network connection management method based on Gossip
CN114363357B (en) * 2021-12-28 2024-01-19 上海沄熹科技有限公司 Distributed database network connection management method based on Gossip
CN114666202A (en) * 2022-03-18 2022-06-24 中国建设银行股份有限公司 Monitoring method and device for master-slave switching based on cloud database
CN114666202B (en) * 2022-03-18 2024-04-26 中国建设银行股份有限公司 Monitoring method and device for master-slave switching based on cloud database

Similar Documents

Publication Publication Date Title
CN113259188A (en) Method for constructing large-scale redis cluster
CN113037552B (en) Network method, network device, and computer-readable storage medium
CN107295080B (en) Data storage method applied to distributed server cluster and server
US11924044B2 (en) Organizing execution of distributed operating systems for network devices
US8493978B2 (en) Distributed master election
EP3340054A1 (en) Maintaining coherency in distributed operating systems for network devices
US8180882B2 (en) Distributed messaging system and method for sharing network status data
US20030041138A1 (en) Cluster membership monitor
JP2001521222A (en) Method for determining cluster membership in a distributed computer system
CN109729111A (en) Method, equipment and computer program product for managing distributing type system
CN111581284A (en) High-availability method, device and system for database and storage medium
JP2002512484A (en) System and method for establishing a multicast message delivery error recovery tree in a digital network
US20110026513A1 (en) Method and system for synchronizing a data base at a plurality of nodes in an ad hoc network
CN112799867B (en) Election method based on distributed storage system and distributed storage system
CN102143194A (en) Data synchronization method and system, immediate data node and terminal data node
US11445013B2 (en) Method for changing member in distributed system and distributed system
US20110258396A1 (en) Data access and management system as well as a method for data access and data management for a computer system
CN112738240A (en) Large-scale distributed network data transmission and cooperation method
KR101075462B1 (en) Method to elect master nodes from nodes of a subnet
US6412002B1 (en) Method and apparatus for selecting nodes in configuring massively parallel systems
CN113204424A (en) Method and device for optimizing Raft cluster and storage medium
CN112328685A (en) Full-peer distributed database data synchronization method
US6519697B1 (en) Method and apparatus for coordinating the configuration of massively parallel systems
CN110650312B (en) Capacity expansion method and device based on image monitoring system
Sakic et al. Decoupling of distributed consensus, failure detection and agreement in sdn control plane

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813