CN110740064A - Distributed cluster node fault processing method, device, equipment and storage medium - Google Patents
Distributed cluster node fault processing method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110740064A CN110740064A CN201911025111.7A CN201911025111A CN110740064A CN 110740064 A CN110740064 A CN 110740064A CN 201911025111 A CN201911025111 A CN 201911025111A CN 110740064 A CN110740064 A CN 110740064A
- Authority
- CN
- China
- Prior art keywords
- node
- cluster
- fault
- distributed
- respond
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a distributed cluster node fault processing method which comprises the following steps of respectively sending multicast requests to agent services pre-deployed by nodes in a distributed storage cluster, determining nodes which do not respond to the multicast requests as fault nodes when determining that the nodes which do not respond to the multicast requests exist, and clearing relevant authentication information of the fault nodes in the distributed storage cluster.
Description
Technical Field
The present invention relates to the field of distributed storage technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for processing a fault in distributed cluster nodes.
Background
The distributed storage cluster system generally includes a plurality of storage servers (servers), which form cluster systems for providing services to the outside, the servers are also denoted by "nodes", each distributed storage cluster generally has main monitoring nodes (called main nodes for short), which monitor the state of the storage cluster.
In summary, how to effectively solve the problem that the normal service operation of a client is affected by the overall performance of a cluster due to data reconstruction caused by the down of a node is an urgent need to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a distributed cluster node fault processing method which avoids data reconstruction caused by node downtime, greatly reduces the influence on the overall performance of a cluster and ensures the operation of normal services of a client, and the invention also aims to provide a distributed cluster node fault processing device, equipment and a computer readable storage medium.
In order to solve the technical problems, the invention provides the following technical scheme:
A distributed cluster node fault handling method, comprising:
respectively sending multicast requests to agent services pre-deployed by each node in the distributed storage cluster;
determining a node that does not respond to the multicast request as a failed node when it is determined that there is a node that does not respond to the multicast request;
and clearing the relevant authentication information of the fault node in the distributed storage cluster.
In embodiments of the present invention, after clearing the relevant authentication information of the failed node in the distributed storage cluster, the method further includes:
and adding fault identification information after the sn serial number corresponding to the fault node.
In embodiments of the present invention, after adding the fault identification information after the sn sequence number corresponding to the faulty node, the method further includes:
when a cluster joining request is received, detecting whether the fault identification information exists after the sn serial number of a node to be joined;
and if so, removing the original cluster service information in the node to be added, and adding the node to be added with the removed original cluster service information to the distributed storage cluster.
In embodiments of the present invention, when it is determined that there is a node that does not respond to the multicast request, determining the node that does not respond to the multicast request as a failed node includes:
and when determining that the nodes which do not respond to the multicast requests for the continuous preset times exist, determining the nodes which do not respond to the multicast requests for the continuous preset times as fault nodes.
distributed cluster node fault handling device, comprising:
the request sending module is used for respectively sending multicast requests to the agent services pre-deployed by each node in the distributed storage cluster;
a failed node determination module, configured to determine a node that does not respond to the multicast request as a failed node when it is determined that there is a node that does not respond to the multicast request;
and the authentication information clearing module is used for clearing the relevant authentication information of the fault node in the distributed storage cluster.
In embodiments of the present invention, the method further comprises:
and the identification information adding module is used for adding fault identification information after the sn serial number corresponding to the fault node is removed after the relevant authentication information of the fault node in the distributed storage cluster.
In embodiments of the present invention, the method further comprises:
the identification information detection module is used for detecting whether the fault identification information exists after the sn serial number of the node to be added when a cluster adding request is received after the fault identification information is added after the sn serial number corresponding to the fault node;
and the node adding module is used for clearing the original cluster service information in the node to be added when the fault identification information exists after the sn serial number of the node to be added is detected, and adding the node to be added with the cleared original cluster service information to the distributed storage cluster.
In specific embodiments of the present invention, the failed node determining module is specifically a module that, when it is determined that there is a node that does not respond to the multicast requests for a preset number of consecutive times, determines a node that does not respond to the multicast requests for a preset number of consecutive times as a failed node.
distributed cluster node failure handling device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the distributed cluster node fault handling method as described above when executing the computer program.
computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the distributed cluster node failure handling method as set out above.
The method provided by the embodiment of the invention is applied to respectively send multicast requests to the pre-deployed proxy services of each node in the distributed storage cluster; when determining that there is a node which does not respond to the multicast request, determining the node which does not respond to the multicast request as a failed node; and clearing the relevant authentication information of the fault node in the distributed storage cluster. By pre-deploying the proxy service for each node in the distributed storage cluster respectively, the fault node can be detected in time according to the response state of each node to the multicast request received by the corresponding proxy service, and the relevant authentication information of the fault node can be cleared in time, so that the fault node is removed from the distributed storage cluster in time, data reconstruction caused by node downtime is avoided, the influence on the overall performance of the cluster is greatly reduced, and the normal service operation of a client is ensured.
Correspondingly, embodiments of the present invention further provide a distributed cluster node fault processing apparatus, a device, and a computer-readable storage medium corresponding to the distributed cluster node fault processing method, which have the above technical effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flow chart of implementation methods of a distributed cluster node fault handling method in an embodiment of the present invention;
fig. 2 is another implementation flowcharts of the distributed cluster node fault handling method in the embodiment of the present invention;
fig. 3 is a block diagram of a distributed cluster node fault handling apparatus according to an embodiment of the present invention;
fig. 4 is a block diagram of distributed cluster node fault handling apparatus according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings and the accompanying detailed description, it is understood that the illustrated embodiments are only a partial embodiment , rather than a full embodiment.
Example :
referring to fig. 1, fig. 1 is a flow chart of implementation methods of a distributed cluster node fault handling method in an embodiment of the present invention, where the method may include the following steps:
s101: and respectively sending multicast requests to the agent services pre-deployed by each node in the distributed storage cluster.
A detection service (master) may be pre-deployed in a master node of the distributed storage cluster and an agent service (agent) may be pre-deployed in each node of the distributed storage cluster. The detection service may send multicast requests to the proxy services in real time or at preset time intervals.
The multicast request may be a handshake request.
It should be noted that, when the detection service sends the multicast request to each proxy service according to the preset time interval, the time interval for sending the multicast request may be set and adjusted according to the actual situation, which is not limited in this embodiment of the present invention, and may be set to 15s, for example.
S102: when it is determined that there is a node that does not respond to the multicast request, the node that does not respond to the multicast request is determined as a failed node.
After the multicast request is sent to the pre-deployed proxy service of each node in the distributed storage cluster, whether each node responds to the multicast request can be detected, for example, when a certain node returns an "OK" reply through the corresponding proxy service, it indicates that the node is normal, and when a certain node does not respond to the multicast request later, it indicates that the node has a problem. When it is determined that there is a node that does not respond to the multicast request, the node that does not respond to the multicast request may be determined as a failed node.
S103: and clearing the relevant authentication information of the fault node in the distributed storage cluster.
After the fault node is determined, because the fault node is down and cannot communicate with the fault node, the state (MON) of the related monitoring Storage cluster in the fault node and the Object Storage Device cannot be cleared, the cluster service such as the data Storage service (OSD) and the like cannot be provided, so that the related authentication information such as MON and OSD and the like of the fault node in the distributed Storage cluster is cleared first, and if the name of the fault node is noden, the related authentication information can be cleared through a command cluster auth del. And the operation and maintenance cost is reduced by automatically detecting the fault node.
The method provided by the embodiment of the invention is applied to respectively send multicast requests to the pre-deployed proxy services of each node in the distributed storage cluster; when determining that there is a node which does not respond to the multicast request, determining the node which does not respond to the multicast request as a failed node; and clearing the relevant authentication information of the fault node in the distributed storage cluster. By pre-deploying the proxy service for each node in the distributed storage cluster respectively, the fault node can be detected in time according to the response state of each node to the multicast request received by the corresponding proxy service, and the relevant authentication information of the fault node can be cleared in time, so that the fault node is removed from the distributed storage cluster in time, data reconstruction caused by node downtime is avoided, the influence on the overall performance of the cluster is greatly reduced, and the normal service operation of a client is ensured.
It should be noted that, based on the above embodiment , the embodiment of the present invention further provides a corresponding improved scheme, and the steps that are the same as or correspond to those in the above embodiment may be referred to each other in the subsequent embodiments, and corresponding beneficial effects may also be referred to each other, which is not described in detail in the following improved embodiment .
Referring to fig. 2, fig. 2 is another implementation flowcharts of the distributed cluster node fault handling method in the embodiment of the present invention, where the method may include the following steps:
s201: and respectively sending multicast requests to the agent services pre-deployed by each node in the distributed storage cluster.
S202: and when determining that the nodes which do not respond to the multicast requests for the continuous preset times exist, determining the nodes which do not respond to the multicast requests for the continuous preset times as the fault nodes.
The number of times that a certain node is determined as a failed node and that no multicast request needs to be satisfied continuously responds may be preset, and when it is determined that there are nodes that do not respond to multicast requests for the continuously preset number of times, the node that does not respond to multicast requests for the continuously preset number of times is determined as the failed node. Through multiple times of verification, misjudgment caused by network jitter and the like is avoided.
It should be noted that the preset number of times may be set and adjusted according to actual situations, which is not limited in the embodiment of the present invention, and may be set to 3 times, for example.
S203: and clearing the relevant authentication information of the fault node in the distributed storage cluster.
S204: and adding fault identification information after the sn serial number corresponding to the fault node.
After the related authentication information of the failed node in the distributed storage cluster is cleared, fault identification information, such as fault/clear identification, may be added after a sn serial number (i.e., a product serial number) corresponding to the failed node, indicating that the node is the removed distributed storage cluster in the event of a fault, and storage service and configuration information such as MON, OSD, etc. on the node are not yet completely cleared.
S205: when a cluster joining request is received, whether fault identification information exists after the sn serial number of the node to be joined is detected, if yes, step S206 is executed, and if not, the node to be joined is directly added to the distributed storage cluster.
When a new node or a node with a fault repair completed needs to join the distributed storage cluster, a cluster joining request can be sent to the detection service, and after the detection service receives the cluster joining request, whether fault identification information exists after the sn serial number of the node to be joined can be detected, so that whether the node is a node which reappears to join the distributed storage cluster after the fault repair is completed can be determined. When the fault identification information exists after the sn serial number of the node to be added is determined, it indicates that the node is a node which reappears to add to the distributed storage cluster after fault repair is completed, in this case, step S206 may be continuously executed, and when the sn serial number of the node to be added is determined that the fault identification information does not exist, it indicates that the node is a new node which applies to add to the distributed storage cluster, in this case, the node to be added may be directly added to the distributed storage cluster.
S206: and removing the original cluster service information in the node to be added, and adding the node to be added with the removed original cluster service information to the distributed storage cluster.
When the fault identification information exists after the sn serial number of the node to be added is determined, the original cluster service information in the node to be added can be removed, and the node to be added with the original cluster service information removed is added to the distributed storage cluster. By clearing the original cluster service information to be added into the node, the problem that the service access of a client is influenced after the node is added into the distributed storage cluster by the residual isolated service before the last fault can be avoided.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a distributed cluster node fault handling apparatus, and the distributed cluster node fault handling apparatus described below and the distributed cluster node fault handling method described above may be referred to correspondingly.
Referring to fig. 3, fig. 3 is a block diagram illustrating a structure of an distributed cluster node fault handling apparatus according to an embodiment of the present invention, where the apparatus may include:
a request sending module 31, configured to send a multicast request to a proxy service pre-deployed by each node in the distributed storage cluster;
a failed node determination module 32, configured to determine a node that does not respond to the multicast request as a failed node when it is determined that there is a node that does not respond to the multicast request;
and the authentication information clearing module 33 is configured to clear the relevant authentication information of the failed node in the distributed storage cluster.
The device provided by the embodiment of the invention is applied to respectively send multicast requests to the pre-deployed proxy services of each node in the distributed storage cluster; when determining that there is a node which does not respond to the multicast request, determining the node which does not respond to the multicast request as a failed node; and clearing the relevant authentication information of the fault node in the distributed storage cluster. By pre-deploying the proxy service for each node in the distributed storage cluster respectively, the fault node can be detected in time according to the response state of each node to the multicast request received by the corresponding proxy service, and the relevant authentication information of the fault node can be cleared in time, so that the fault node is removed from the distributed storage cluster in time, data reconstruction caused by node downtime is avoided, the influence on the overall performance of the cluster is greatly reduced, and the normal service operation of a client is ensured.
In embodiments of the present invention, the apparatus may further comprise:
and the identification information adding module is used for adding the fault identification information after the sn serial number corresponding to the fault node after the relevant authentication information of the fault node in the distributed storage cluster is eliminated.
In embodiments of the present invention, the apparatus may further comprise:
the identification information detection module is used for detecting whether the fault identification information exists after the sn serial number of the node to be added when a cluster adding request is received after the fault identification information is added after the sn serial number corresponding to the fault node;
and the node adding module is used for removing the original cluster service information in the node to be added when the fault identification information exists after the sn serial number of the node to be added is detected, and adding the node to be added with the removed original cluster service information to the distributed storage cluster.
In embodiments of the present invention, the failed node determining module 32 is specifically a module that, when it is determined that there is a node that does not respond to any of the consecutive preset number of multicast requests, determines a node that does not respond to any of the consecutive preset number of multicast requests as a failed node.
Corresponding to the above method embodiment, referring to fig. 4, fig. 4 is a schematic diagram of a distributed cluster node fault handling device provided in the present invention, where the device may include:
a memory 41 for storing a computer program;
the processor 42, when executing the computer program stored in the memory 41, may implement the following steps:
respectively sending multicast requests to agent services pre-deployed by each node in the distributed storage cluster; when determining that there is a node which does not respond to the multicast request, determining the node which does not respond to the multicast request as a failed node; and clearing the relevant authentication information of the fault node in the distributed storage cluster.
For the introduction of the device provided by the present invention, please refer to the above method embodiment, which is not described herein again.
In accordance with the above method embodiment, the present invention further provides computer-readable storage media, on which a computer program is stored, the computer program, when executed by a processor, being adapted to perform the steps of:
respectively sending multicast requests to agent services pre-deployed by each node in the distributed storage cluster; when determining that there is a node which does not respond to the multicast request, determining the node which does not respond to the multicast request as a failed node; and clearing the relevant authentication information of the fault node in the distributed storage cluster.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
For the introduction of the computer-readable storage medium provided by the present invention, please refer to the above method embodiments, which are not described herein again.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device, the apparatus and the computer-readable storage medium disclosed in the embodiments correspond to the method disclosed in the embodiments, so that the description is simple, and the relevant points can be referred to the description of the method.
The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
Claims (10)
1, distributed cluster node fault handling method, comprising:
respectively sending multicast requests to agent services pre-deployed by each node in the distributed storage cluster;
determining a node that does not respond to the multicast request as a failed node when it is determined that there is a node that does not respond to the multicast request;
and clearing the relevant authentication information of the fault node in the distributed storage cluster.
2. The method according to claim 1, wherein after clearing the relevant authentication information of the failed node in the distributed storage cluster, the method further comprises:
and adding fault identification information after the sn serial number corresponding to the fault node.
3. The method according to claim 2, wherein after adding the fault identification information after the sn sequence number corresponding to the faulty node, the method further comprises:
when a cluster joining request is received, detecting whether the fault identification information exists after the sn serial number of a node to be joined;
and if so, removing the original cluster service information in the node to be added, and adding the node to be added with the removed original cluster service information to the distributed storage cluster.
4. The distributed cluster node fault handling method of any of claims 1-3, wherein determining a node that is not responding to the multicast request as a faulty node when it is determined that there are nodes that are not responding to the multicast request includes:
and when determining that the nodes which do not respond to the multicast requests for the continuous preset times exist, determining the nodes which do not respond to the multicast requests for the continuous preset times as fault nodes.
5, distributed cluster node fault handling device, comprising:
the request sending module is used for respectively sending multicast requests to the agent services pre-deployed by each node in the distributed storage cluster;
a failed node determination module, configured to determine a node that does not respond to the multicast request as a failed node when it is determined that there is a node that does not respond to the multicast request;
and the authentication information clearing module is used for clearing the relevant authentication information of the fault node in the distributed storage cluster.
6. The distributed cluster node failure handling apparatus of claim 5, further comprising:
and the identification information adding module is used for adding fault identification information after the sn serial number corresponding to the fault node is removed after the relevant authentication information of the fault node in the distributed storage cluster.
7. The distributed cluster node failure handling apparatus of claim 6, further comprising:
the identification information detection module is used for detecting whether the fault identification information exists after the sn serial number of the node to be added when a cluster adding request is received after the fault identification information is added after the sn serial number corresponding to the fault node;
and the node adding module is used for clearing the original cluster service information in the node to be added when the fault identification information exists after the sn serial number of the node to be added is detected, and adding the node to be added with the cleared original cluster service information to the distributed storage cluster.
8. The distributed cluster node failure processing apparatus of as claimed in any of claims 5 to 7, wherein the failed node determining module is specifically a module that determines a node that does not respond to a preset number of consecutive multicast requests as a failed node when it is determined that there are nodes that do not respond to a preset number of consecutive multicast requests.
9, distributed cluster node fault handling device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the distributed cluster node failure handling method of any claims 1-4 when executing the computer program.
10, computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the distributed cluster node failure handling method according to any of claims 1 to 4 through .
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911025111.7A CN110740064A (en) | 2019-10-25 | 2019-10-25 | Distributed cluster node fault processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911025111.7A CN110740064A (en) | 2019-10-25 | 2019-10-25 | Distributed cluster node fault processing method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110740064A true CN110740064A (en) | 2020-01-31 |
Family
ID=69271485
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911025111.7A Pending CN110740064A (en) | 2019-10-25 | 2019-10-25 | Distributed cluster node fault processing method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110740064A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111756571A (en) * | 2020-05-28 | 2020-10-09 | 苏州浪潮智能科技有限公司 | Cluster node fault processing method, device, equipment and readable medium |
CN113783735A (en) * | 2021-09-24 | 2021-12-10 | 小红书科技有限公司 | Method, device, equipment and medium for identifying fault node in Redis cluster |
CN115426247A (en) * | 2022-08-22 | 2022-12-02 | 中国工商银行股份有限公司 | Processing method and device of fault node, storage medium and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040059805A1 (en) * | 2002-09-23 | 2004-03-25 | Darpan Dinker | System and method for reforming a distributed data system cluster after temporary node failures or restarts |
US20120042030A1 (en) * | 2010-08-12 | 2012-02-16 | International Business Machines Corporation | High availability management system for stateless components in a distributed master-slave component topology |
US20170373926A1 (en) * | 2016-06-22 | 2017-12-28 | Vmware, Inc. | Dynamic heartbeating mechanism |
CN108847982A (en) * | 2018-06-26 | 2018-11-20 | 郑州云海信息技术有限公司 | A kind of distributed storage cluster and its node failure switching method and apparatus |
CN109218100A (en) * | 2018-09-21 | 2019-01-15 | 郑州云海信息技术有限公司 | Distributed objects storage cluster and its request responding method, system and storage medium |
US10275326B1 (en) * | 2014-10-31 | 2019-04-30 | Amazon Technologies, Inc. | Distributed computing system failure detection |
-
2019
- 2019-10-25 CN CN201911025111.7A patent/CN110740064A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040059805A1 (en) * | 2002-09-23 | 2004-03-25 | Darpan Dinker | System and method for reforming a distributed data system cluster after temporary node failures or restarts |
US20120042030A1 (en) * | 2010-08-12 | 2012-02-16 | International Business Machines Corporation | High availability management system for stateless components in a distributed master-slave component topology |
US10275326B1 (en) * | 2014-10-31 | 2019-04-30 | Amazon Technologies, Inc. | Distributed computing system failure detection |
US20170373926A1 (en) * | 2016-06-22 | 2017-12-28 | Vmware, Inc. | Dynamic heartbeating mechanism |
CN108847982A (en) * | 2018-06-26 | 2018-11-20 | 郑州云海信息技术有限公司 | A kind of distributed storage cluster and its node failure switching method and apparatus |
CN109218100A (en) * | 2018-09-21 | 2019-01-15 | 郑州云海信息技术有限公司 | Distributed objects storage cluster and its request responding method, system and storage medium |
Non-Patent Citations (1)
Title |
---|
李学勇等: "基于广播的分布式系统级故障诊断算法", 《计算机工程》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111756571A (en) * | 2020-05-28 | 2020-10-09 | 苏州浪潮智能科技有限公司 | Cluster node fault processing method, device, equipment and readable medium |
CN111756571B (en) * | 2020-05-28 | 2022-02-18 | 苏州浪潮智能科技有限公司 | Cluster node fault processing method, device, equipment and readable medium |
US11750437B2 (en) | 2020-05-28 | 2023-09-05 | Inspur Suzhou Intelligent Technology Co., Ltd. | Cluster node fault processing method and apparatus, and device and readable medium |
CN113783735A (en) * | 2021-09-24 | 2021-12-10 | 小红书科技有限公司 | Method, device, equipment and medium for identifying fault node in Redis cluster |
CN115426247A (en) * | 2022-08-22 | 2022-12-02 | 中国工商银行股份有限公司 | Processing method and device of fault node, storage medium and electronic equipment |
CN115426247B (en) * | 2022-08-22 | 2024-04-26 | 中国工商银行股份有限公司 | Fault node processing method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109274544B (en) | Fault detection method and device for distributed storage system | |
CN106933843B (en) | Database heartbeat detection method and device | |
CN110740064A (en) | Distributed cluster node fault processing method, device, equipment and storage medium | |
CN110830283B (en) | Fault detection method, device, equipment and system | |
CN108737132B (en) | Alarm information processing method and device | |
CN108924202B (en) | Distributed cluster data disaster tolerance method and related device | |
EP3258653A1 (en) | Message pushing method and device | |
CN109921942B (en) | Cloud platform switching control method, device and system and electronic equipment | |
CN105959078B (en) | A kind of cluster method for synchronizing time, cluster and clock synchronization system | |
CN111355600B (en) | Main node determining method and device | |
CN111142801B (en) | Distributed storage system network sub-health detection method and device | |
CN109391691A (en) | The restoration methods and relevant apparatus that NAS is serviced under a kind of single node failure | |
CN109302435B (en) | Message publishing method, device, system, server and computer readable storage medium | |
CN110113187B (en) | Configuration updating method and device, configuration server and configuration system | |
CN111338858A (en) | Disaster recovery method and device for double machine rooms | |
CN108509296B (en) | Method and system for processing equipment fault | |
CN112436962B (en) | Block chain consensus network dynamic expansion method, electronic device, system and medium | |
CN105490837A (en) | Network monitoring processing method and device | |
CN110224872B (en) | Communication method, device and storage medium | |
CN104243473A (en) | Data transmission method and device | |
CN113254245A (en) | Fault detection method and system for storage cluster | |
CN114301763B (en) | Distributed cluster fault processing method and system, electronic equipment and storage medium | |
CN107493308B (en) | Method and device for sending message and distributed equipment cluster system | |
CN113190347A (en) | Edge cloud system and task management method | |
JP7143609B2 (en) | COMMUNICATION DEVICE, COMMUNICATION METHOD, AND PROGRAM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200131 |
|
RJ01 | Rejection of invention patent application after publication |