CN110333986B - Method for guaranteeing availability of redis cluster - Google Patents
Method for guaranteeing availability of redis cluster Download PDFInfo
- Publication number
- CN110333986B CN110333986B CN201910530849.2A CN201910530849A CN110333986B CN 110333986 B CN110333986 B CN 110333986B CN 201910530849 A CN201910530849 A CN 201910530849A CN 110333986 B CN110333986 B CN 110333986B
- Authority
- CN
- China
- Prior art keywords
- cluster
- redis
- node
- master
- redis cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000013508 migration Methods 0.000 claims abstract description 25
- 230000005012 migration Effects 0.000 claims abstract description 25
- 238000012544 monitoring process Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims description 5
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 2
- 238000005192 partition Methods 0.000 description 2
- 241001362551 Samba Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/214—Database migration support
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/563—Data redirection of data network streams
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Hardware Redundancy (AREA)
Abstract
The invention relates to the technical field of redis clusters, in particular to a method for guaranteeing the availability of a redis cluster, which comprises the steps of storing the node information of the redis cluster and deleting a master-slave node of downtime from the redis cluster; re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, configuring the weight of the re-added redis examples to be 0, and configuring other nodes according to the set weight; delete the re-added redis instance and clean up the rdb file. Compared with the prior art, the invention has the advantages that: the method for cluster migration is provided, when a master node and a slave node in a redis cluster are down, the whole cluster can continue to provide storage service to the outside, the unavailable state of the redis cluster when faults occur is solved, and preferably, the method for cluster monitoring is provided, the availability of the redis node and the information of the redis node can be monitored, and the method for cluster migration is provided, when the master node and the slave node in the redis cluster are restored to a normal state from the down state, the redis cluster is restored again.
Description
Technical Field
The invention relates to the technical field of redis clusters, in particular to a method for guaranteeing the availability of a redis cluster.
Background
Redis introduced the cluster functionality starting from version 3.0, to which version 3.2 cluster functionality had stabilized. Redis clusters provide a set of programs that provide sharing of data among multiple nodes between Redis. The Redis cluster is mainly characterized in that a data partition mode and a master-slave mode are adopted, a hash groove is introduced through the data partition mode, the hash groove is divided into different nodes, and data are respectively divided into different grooves, so that the aim of dividing the data into different nodes is fulfilled, the data pressure of a single node is reduced, and the scalability of the cluster is better; through the master-slave mode, when the master node is down, the slave node of the master node is elected to serve as a new master node to continue providing service, so that the availability of the cluster is improved.
However, although there are many groups of master-slave nodes in the redis cluster, as long as one group of master-slave nodes is hung up, the whole cluster is down and cannot provide services to the outside, which is unacceptable for one cluster, and the reason for the problem is that data partitioning is that if a certain master-slave node is hung up, slots existing in the node are not existed, and data stored in the slots cannot be stored and acquired. Therefore, there is a need to devise a method of guaranteeing the availability of redis clusters.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a method for guaranteeing the availability of redis clusters, wherein the service provided by the clusters is not interrupted in the process of cluster migration.
In order to achieve the above object, a method for guaranteeing availability of redis clusters is designed, wherein the method comprises a cluster migration method, and the cluster migration method specifically comprises the following steps: step a, storing the node information of the redis cluster, and deleting the master-slave node of the downtime from the redis cluster; b, re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, and configuring the weight of the re-added redis examples to be 0, wherein other nodes are still configured according to the set weight; deleting the re-added redis instance, and cleaning the rdb file of the deleted redis instance.
The method adopts SNMP and CTDB to form a cluster monitoring system to realize the monitoring of clusters, and the specific method is as follows: the SNMP client is deployed on each node of the redis cluster to acquire each item of important parameter information and cluster node information (cluster nodes) of the redis cluster, each item of important parameter information and cluster node information are sent to the SNMP server, each item of important parameter information and cluster node information are synchronized into the redis cluster by the SNMP server, and the SNMP server is deployed in the redis cluster to be high-availability through a CTDB.
When the cluster state is unavailable, if more than half of the nodes of the redis cluster are in the downtime state or the rest of the main nodes of the redis cluster are less than 3 or only three main nodes of the redis cluster are provided, at the moment, the cluster state does not meet the starting condition of the redis cluster, the cluster fails and can not provide service, only alarm information is sent, and the steps a-c are not carried out. Otherwise, the processing of step a-step c is performed.
The method also comprises a cluster returning method, and the cluster returning method specifically comprises the following steps: step d, adding the master-slave node into the redis cluster, and configuring a master-slave mode; and e, calling a provided redis-trie-rb script of the redis cluster, executing a rebaance command, adding use-empty-masters parameters and configuring weights to balance slot allocation in the redis cluster.
The method continuously sends a test packet to the down master node, and when the down master node is restarted and at least one slave node of the master node is started, the cluster migration method is implemented.
Compared with the prior art, the invention has the advantages that: the method for cluster migration is provided, when a master node and a slave node in a redis cluster are down, the whole cluster can continue to provide storage service to the outside, the unavailable state of the redis cluster when faults occur is solved, and preferably, the method for cluster monitoring is provided, the availability of the redis node and the information of the redis node can be monitored, and the method for cluster migration is provided, when the master node and the slave node in the redis cluster are restored to a normal state from the down state, the redis cluster is restored again.
Drawings
FIG. 1 is a schematic overall plan view of the method of the present invention in one embodiment.
FIG. 2 is a flow chart of a cluster migration method according to an embodiment of the present invention.
FIG. 3 is a flow chart of a cluster migration method according to an embodiment of the invention.
Detailed Description
The principles of this method will be apparent to those skilled in the art from the following description of the invention. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, the present embodiment includes three major parts, namely, cluster monitoring, cluster migration and cluster back migration, and first, the method for cluster monitoring in the present embodiment is described as follows: we provide a distributed monitoring system by adopting SNMP (simple network management protocol) +ctdb (CTDB is a cluster database component in cluster Samba, which can provide high availability load sharing CIFS server cluster), and the specific steps of the monitoring method are as follows:
1. the SNMP client Agent is deployed on each node, collects information, and obtains the redis cluster information by using a command provided by the redis cluster itself, in the embodiment, the following 2 commands are adopted, and according to a set time interval, the following two types of redis cluster information are obtained, so that backup is reserved for the subsequent recovery of data to a state before downtime. In the two types of redis cluster information, the first type is important parameter information of the cluster, the second type is cluster node information, the two types of redis cluster information are commonly used for realizing monitoring, and the state of the whole cluster is known to be normal or not when each node information needs to be monitored. The cluster node information is also used for migration, and the information of each node is stored to prevent loss.
1> obtaining important parameter information of a cluster through cluster info, including: the state of the current cluster (cluster_state, ok is normal, fail is abnormal), the number of master nodes of the current cluster, and the size of the cluster (cluster_size and cluster_knownnodes).
2> obtaining cluster node information by cluster nodes, comprising: id, ip, port, master-slave information, connection state, slot position and other information of each node of the redis cluster.
The Server program of SNMP receives the information collected by each Agent, and synchronizes the information into the redis cluster after analyzing the real information of the cluster.
2. Meanwhile, the CTDB is utilized to deploy the SNMP servers as high availability in the distributed redis cluster. The CTDB is a TDB database which spans a plurality of nodes and has consistent data and consistent locks, and is used for automatically switching to other nodes managed by the CTDB when the service end of one node is down, so that the availability of the monitoring system is ensured. And, preferably, the data can also be presented using a computer graphical interface.
Next, referring to fig. 2, the cluster migration method in the present embodiment is described as follows: when the cluster monitoring system finds that the cluster state is fail (unavailable) and a certain group of master-slave nodes are in a downtime state, the cluster monitoring system enters a cluster migration stage of a fault state.
1> firstly, judging whether the main nodes of the cluster are down by more than half or only three main nodes or less than three residual main nodes in redis, and if so, directly returning, not processing, only sending an alarm mail and not carrying out subsequent processing.
2> saving the information of the redis cluster before migration, and calling the redis-trieb. Rb script to delete the master-slave node of downtime from the redis cluster.
3> the cluster starts the redis examples on each node, the cluster automatically restores the data to the pre-downtime state by using dump.rdb files (disk-falling redis database files of the redis cluster in a disk), and finally, the redis-trieb.rb script is called to add the examples to the redis cluster.
4> calling the provided redis-trieb.rb script of the redis cluster, executing a rebalance command, configuring the weight of the redis instance started in 3> to be 0, and balancing the slot allocation in the redis cluster by other nodes according to the weight in the configuration file.
5> calling the provided redis-trieb. Rb script of the redis cluster, deleting the redis instances started in 3>, closing the instances and cleaning the rdb file, and completing the cluster migration.
Finally, referring to fig. 3, the cluster migration method in the present embodiment is described as follows: the monitoring system continuously sends heartbeat (sends test packets) to the downed master node, and when the redis master node is found to be restarted normally and at least one slave node of the master node is started, the cluster returning stage is entered, and the specific steps are as follows.
1> first, add master-slave redis nodes into the cluster, and configure master-slave mode in the save configuration.
2> calling a redis-trie. Rb script provided by the redis cluster, executing a rebaance command, adding use-empty-masters parameters (script operation for configuring weights), configuring weights, and balancing slot allocation in the redis cluster, so that cluster migration is completed.
Example 1
The specific steps of the embodiment are as follows, and the redis cluster of the embodiment has 6 main nodes, so that the embodiment guarantees the availability of the redis cluster and reserves the data before downtime of the redis cluster.
1> judging whether the main nodes of the cluster are down by more than three or only three main nodes or the rest main nodes are not three enough, if so, the cluster is unavailable, directly returns, only sends an alarm mail, and does not carry out subsequent processing.
2> in this embodiment, two master nodes covering the redis cluster are in a down state, at this time, redis cluster information before migration is saved, and a redis-trie. Rb script is called to delete the down master-slave node from the redis cluster.
2.1> first step, an add-node instruction of the redis-trieb. Rb script is called, and dump. Rdb file start numbers and redis instances on the down 2 main nodes in the cluster are added to the redis cluster again.
2.2> in the second step, a rebalance instruction of the redis cluster redis-trie. Rb script is called, the weight of 2 redis instances started in 2.1> is configured to be 0, and other nodes balance the slot allocation in the redis cluster according to the weight in the configuration file.
2.3> third step, call del-node instruction of redis cluster redis-trie. Rb script, delete redis instance started in 2.1>, then close these instances and clean rdb file, so that cluster migration is completed.
Claims (4)
1. A method for guaranteeing availability of redis clusters is characterized by comprising a cluster migration method and a cluster returning method, wherein the cluster migration method comprises the following steps:
step a, storing the node information of the redis cluster, and deleting the master-slave node of the downtime from the redis cluster;
b, re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, and configuring the weight of the re-added redis examples to be 0, wherein other nodes are still configured according to the set weight;
deleting the re-added redis instance, and cleaning the rdb file of the deleted redis instance;
the cluster migration method specifically comprises the following steps:
step d, adding the master-slave node into the redis cluster, and configuring a master-slave mode;
and e, calling a provided redis-trie-rb script of the redis cluster, executing a rebaance command, adding use-empty-masters parameters and configuring weights to balance slot allocation in the redis cluster.
2. The method for guaranteeing availability of redis clusters according to claim 1, wherein the method adopts SNMP and CTDB to form a cluster monitoring system to monitor clusters, and the specific method is as follows:
the SNMP client is deployed on each node of the redis cluster to acquire each item of important parameter information and each cluster node information of the redis cluster, each item of important parameter information and each cluster node information are sent to the SNMP server, each item of important parameter information and each cluster node information are synchronized into the redis cluster by the SNMP server, and the SNMP server is deployed in the redis cluster to be high-availability through the CTDB.
3. The method for guaranteeing availability of redis cluster according to claim 1 or 2, wherein when the cluster status is unavailable, if more than half nodes of the redis cluster are in a down status or the remaining master nodes of the redis cluster are less than 3 or only three master nodes of the redis cluster, an alarm message is sent, and step a-step c is not performed, otherwise, the processing of step a-step c is performed.
4. The method for guaranteeing availability of redis cluster according to claim 1, wherein the test packet is continuously sent to the downed master node, and the cluster migration method is implemented when the downed master node is restarted and at least one slave node of the master node is started.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910530849.2A CN110333986B (en) | 2019-06-19 | 2019-06-19 | Method for guaranteeing availability of redis cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910530849.2A CN110333986B (en) | 2019-06-19 | 2019-06-19 | Method for guaranteeing availability of redis cluster |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110333986A CN110333986A (en) | 2019-10-15 |
CN110333986B true CN110333986B (en) | 2023-12-29 |
Family
ID=68142501
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910530849.2A Active CN110333986B (en) | 2019-06-19 | 2019-06-19 | Method for guaranteeing availability of redis cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110333986B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111212145A (en) * | 2020-01-09 | 2020-05-29 | 国网福建省电力有限公司 | Redis cluster for power supply service command system |
CN111565229B (en) * | 2020-04-29 | 2020-11-27 | 创盛视联数码科技(北京)有限公司 | Communication system distributed method based on Redis |
CN112000515A (en) * | 2020-08-07 | 2020-11-27 | 北京浪潮数据技术有限公司 | Method and assembly for recovering instance data in redis cluster |
CN112035064B (en) * | 2020-08-28 | 2024-09-20 | 浪潮云信息技术股份公司 | Distributed migration method for object storage |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6986076B1 (en) * | 2002-05-28 | 2006-01-10 | Unisys Corporation | Proactive method for ensuring availability in a clustered system |
KR20090061522A (en) * | 2007-12-11 | 2009-06-16 | 한국전자통신연구원 | Large scale cluster monitoring system, and automatic building and restoration method thereof |
CN102231681A (en) * | 2011-06-27 | 2011-11-02 | 中国建设银行股份有限公司 | High availability cluster computer system and fault treatment method thereof |
CN105141456A (en) * | 2015-08-25 | 2015-12-09 | 山东超越数控电子有限公司 | Method for monitoring high-availability cluster resource |
CN106301938A (en) * | 2016-08-25 | 2017-01-04 | 成都索贝数码科技股份有限公司 | A kind of high availability and the data base cluster system of strong consistency and node administration method thereof |
CN108833503A (en) * | 2018-05-29 | 2018-11-16 | 华南理工大学 | A kind of Redis cluster method based on ZooKeeper |
CN109783564A (en) * | 2019-01-28 | 2019-05-21 | 上海雷腾软件股份有限公司 | Support the distributed caching method and equipment of multinode |
CN109815049A (en) * | 2017-11-21 | 2019-05-28 | 北京金山云网络技术有限公司 | Node delay machine restoration methods, device, electronic equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9063939B2 (en) * | 2011-11-03 | 2015-06-23 | Zettaset, Inc. | Distributed storage medium management for heterogeneous storage media in high availability clusters |
US9338254B2 (en) * | 2013-01-09 | 2016-05-10 | Microsoft Corporation | Service migration across cluster boundaries |
-
2019
- 2019-06-19 CN CN201910530849.2A patent/CN110333986B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6986076B1 (en) * | 2002-05-28 | 2006-01-10 | Unisys Corporation | Proactive method for ensuring availability in a clustered system |
KR20090061522A (en) * | 2007-12-11 | 2009-06-16 | 한국전자통신연구원 | Large scale cluster monitoring system, and automatic building and restoration method thereof |
CN102231681A (en) * | 2011-06-27 | 2011-11-02 | 中国建设银行股份有限公司 | High availability cluster computer system and fault treatment method thereof |
CN105141456A (en) * | 2015-08-25 | 2015-12-09 | 山东超越数控电子有限公司 | Method for monitoring high-availability cluster resource |
CN106301938A (en) * | 2016-08-25 | 2017-01-04 | 成都索贝数码科技股份有限公司 | A kind of high availability and the data base cluster system of strong consistency and node administration method thereof |
CN109815049A (en) * | 2017-11-21 | 2019-05-28 | 北京金山云网络技术有限公司 | Node delay machine restoration methods, device, electronic equipment and storage medium |
CN108833503A (en) * | 2018-05-29 | 2018-11-16 | 华南理工大学 | A kind of Redis cluster method based on ZooKeeper |
CN109783564A (en) * | 2019-01-28 | 2019-05-21 | 上海雷腾软件股份有限公司 | Support the distributed caching method and equipment of multinode |
Non-Patent Citations (3)
Title |
---|
A new formalism for dynamic reconfiguration of data servers in a cluster;María S. Pérez, Alberto Sánchez, José M. Peña, Víctor Robles;Journal of Parallel and Distributed Computing;第65卷(第10期);全文 * |
Redis集群可靠性的研究与优化;李燚;顾乃杰;黄增士;任开新;计算机工程(第05期);全文 * |
一种改进的主从节点选举算法用于实现集群负载均衡;任乐乐;何灵敏;中国计量学院学报(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110333986A (en) | 2019-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110333986B (en) | Method for guaranteeing availability of redis cluster | |
CN111290834B (en) | Method, device and equipment for realizing high service availability based on cloud management platform | |
CN109286529A (en) | A kind of method and system for restoring RabbitMQ network partition | |
CN107404509B (en) | Distributed service configuration system and information management method | |
EP3526931B1 (en) | Computer system and method for dynamically adapting a software-defined network | |
CN102394914A (en) | Cluster brain-split processing method and device | |
CN108173971A (en) | A kind of MooseFS high availability methods and system based on active-standby switch | |
CN110971662A (en) | Two-node high-availability implementation method and device based on Ceph | |
CN111935244B (en) | Service request processing system and super-integration all-in-one machine | |
CN106021070A (en) | Method and device for server cluster monitoring | |
CN106464516B (en) | Event handling in a network management system | |
CN114116912A (en) | Method for realizing high availability of database based on Keepalived | |
CN113872997A (en) | Container group POD reconstruction method based on container cluster service and related equipment | |
CN114338670B (en) | Edge cloud platform and network-connected traffic three-level cloud control platform with same | |
CN115373799A (en) | Cluster management method and device and electronic equipment | |
CN113489149B (en) | Power grid monitoring system service master node selection method based on real-time state sensing | |
CN105490847A (en) | Real-time detecting and processing method of node failure in private cloud storage system | |
CN113835834A (en) | K8S container cluster-based computing node capacity expansion method and system | |
CN113765690A (en) | Cluster switching method, system, device, terminal, server and storage medium | |
CN110290163A (en) | A kind of data processing method and device | |
CN110569303B (en) | MySQL application layer high-availability system and method suitable for various cloud environments | |
CN116185697B (en) | Container cluster management method, device and system, electronic equipment and storage medium | |
CN114301763B (en) | Distributed cluster fault processing method and system, electronic equipment and storage medium | |
CN111092754A (en) | Real-time access service system and implementation method thereof | |
CN116723077A (en) | Distributed IT automatic operation and maintenance system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |