CN109032830A - A kind of fault recovery method of distributed memory system, system and associated component - Google Patents

A kind of fault recovery method of distributed memory system, system and associated component Download PDF

Info

Publication number
CN109032830A
CN109032830A CN201810826771.4A CN201810826771A CN109032830A CN 109032830 A CN109032830 A CN 109032830A CN 201810826771 A CN201810826771 A CN 201810826771A CN 109032830 A CN109032830 A CN 109032830A
Authority
CN
China
Prior art keywords
node
address
client
virtual
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810826771.4A
Other languages
Chinese (zh)
Inventor
丁瑞锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Inspur Smart Computing Technology Co Ltd
Original Assignee
Guangdong Inspur Big Data Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Inspur Big Data Research Co Ltd filed Critical Guangdong Inspur Big Data Research Co Ltd
Priority to CN201810826771.4A priority Critical patent/CN109032830A/en
Publication of CN109032830A publication Critical patent/CN109032830A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

This application discloses a kind of fault recovery methods of distributed memory system, the fault recovery method includes carrying out cluster IP address when detecting node failure information normal nodes all in cluster using the host node and reassigning operation as each one-to-one virtual ip address of normal node distribution;The client-side information table in the cluster is inquired, and destination node is set for the corresponding normal node of the virtual ip address for being connected with client according to query result;It controls each destination node and sends TCP reconnection signal to corresponding client, to restore service connection.This method can fast implement fault recovery after service node failure, improve the stability of distributed memory system.Disclosed herein as well is a kind of fault recovery system of distributed memory system, a kind of computer readable storage medium and a kind of electronic equipment, have the above beneficial effect.

Description

A kind of fault recovery method of distributed memory system, system and associated component
Technical field
The present invention relates to technical field of data storage, in particular to a kind of fault recovery method of distributed memory system, System, a kind of computer readable storage medium and a kind of electronic equipment.
Background technique
Distributed memory system is that data dispersion is stored in more independent equipment.Traditional network store system All data, bottleneck and reliability and peace of the storage server as system performance are stored using the storage server of concentration The focus of full property, is not able to satisfy the needs of Mass storage application.Distributed network storage system uses expansible system knot Structure shares storage load using more storage servers, positions storage information using location server, it not only increases system Reliability, availability and access efficiency, be also easy to extend.
CTDB is a cluster TDB database, can be by Samba or other using carrying out storing data.CTDB There is a set of virtual IP address mechanism, can allow cluster after some node failure, business IP floats from a node to another node, Business can be restored automatically.
In the prior art, after the disconnecting of client and cluster, the time for being again coupled to consuming is longer, the reason is that The time-out time algorithm of the reconnection of TCP connection is exponential backoff algorithm, i.e., if opposite end IP is not connected, can attempt reconnection, but It is that the interval of reconnection can increasingly be grown, the time is successively 1s, 3s, 6s, 12s, 24s, 48s, 64s, 64s.In this way, if in client A sends a reconnection signal at a certain moment at end, and the completion but cluster virtual IP address does not drift about also, then client is in A+24s moment Reconnection signal can be sent again, and if cluster has drifted about the IP that is at the A+5s moment, however, there remains wait 24s-5s The time of=19s causes entire business interruption time longer.
Therefore, how service node failure after fast implement fault recovery, improve the steady of distributed memory system It is qualitative to be a technical problem that technical personnel in the field need to solve at present.
Summary of the invention
Fault recovery method, system, a kind of computer that the purpose of the application is to provide a kind of distributed memory system can Storage medium and a kind of electronic equipment are read, fault recovery can be fast implemented after service node failure, improved distributed The stability of storage system.
In order to solve the above technical problems, the application provides a kind of fault recovery method of distributed memory system, the failure Restoration methods include:
When detecting node failure information, to normal nodes all in cluster progress cluster IP using the host node Location reassigns operation and distributes one-to-one virtual ip address for each described normal node;
The client-side information table in the cluster is inquired, and the virtual IP address of client will be connected with according to query result The corresponding normal node in address is set as destination node;
It controls each destination node and sends TCP reconnection signal to corresponding client, to restore service connection.
Optionally, the reassignment operation of cluster IP address is being carried out to normal nodes all in cluster using the host node Before, further includes:
Malfunctioning node is determined according to the node failure information, and judges whether the malfunctioning node is host node;
If so, re-electing the host node from all normal nodes.
Optionally, by the corresponding normal node of the virtual ip address for being connected with client be set as destination node it Afterwards, further includes:
It controls all normal nodes of all destination nodes into cluster and sends ARP broadcast, so that all institutes It states normal node and updates ARP table;Wherein, the ARP table is stored with the corresponding relationship of virtual ip address and MAC Address.
Optionally, further includes:
When receiving information transmission instruction, is sent according to the information and instruct determining destination virtual IP address;
The corresponding MAC Address of the destination virtual IP address is inquired according to the ARP table, and the information is sent and is instructed Corresponding information is sent to the MAC Address.
Optionally, destination node packet is set by the corresponding normal node of the virtual ip address for being connected with client It includes:
Inquire whether each virtual ip address connects client according to the client-side information table;
If so, setting destination node for the corresponding normal node of the virtual ip address.
Optionally, further includes:
The client-side information that all nodes are sent in the cluster is received according to predetermined period, according to the client-side information Update the client-side information table.
Optionally, the service node is the node that operation has CTDB to service.
Present invention also provides a kind of fault recovery system of distributed memory system, which includes:
IP reallocation module, for when detecting node failure information, using the host node in cluster it is all just Chang Jiedian carries out cluster IP address and reassigns operation as each one-to-one virtual ip address of normal node distribution;
Destination node determining module, the client-side information table for inquiring in the cluster, and will be even according to query result The corresponding normal node of the virtual ip address for being connected to client is set as destination node;
Reconnection module sends TCP reconnection signal to corresponding client for controlling each destination node, so as to extensive Multiple service connection.
Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer Program realizes the step of fault recovery method of above-mentioned distributed memory system executes when executing.
Present invention also provides a kind of electronic equipment, including memory and processor, calculating is stored in the memory Machine program, the processor realize that the failure of above-mentioned distributed memory system is extensive when calling the computer program in the memory The step of compound method executes.
The present invention provides a kind of fault recovery methods of distributed memory system, including work as and detect node failure information When, normal nodes all in cluster are carried out cluster IP address to reassign operation being that each is described normal using the host node Node distributes one-to-one virtual ip address;The client-side information table in the cluster is inquired, and will be even according to query result The corresponding normal node of the virtual ip address for being connected to client is set as destination node;Control each destination node to Corresponding client sends TCP reconnection signal, to restore service connection.
The application distributes virtual ip address after there is node failure, for all normal nodes, since client is believed Breath table is stored in cluster, therefore the application can inquire virtual ip address and whether be connected with client, and will actively to The client for being connected with virtual ip address sends TCP reconnection signal.Since the operation of the recovery service connection of the application is in void Destination node actively executes after quasi- IP address distribution, and there is no need to passively wait the reconnection signal of client.Therefore the application can To fast implement fault recovery after service node breaks down, the stability of distributed memory system is improved.The application is simultaneously The fault recovery system, a kind of computer readable storage medium and a kind of electronics for additionally providing a kind of distributed memory system are set It is standby, there is above-mentioned beneficial effect, details are not described herein.
Detailed description of the invention
In ord to more clearly illustrate embodiments of the present application, attached drawing needed in the embodiment will be done simply below It introduces, it should be apparent that, the drawings in the following description are only some examples of the present application, for ordinary skill people For member, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of the fault recovery method of distributed memory system provided by the embodiment of the present application;
Fig. 2 is the flow chart of the fault recovery method of another kind distributed memory system provided by the embodiment of the present application;
Fig. 3 is a kind of structural representation of the fault recovery system of distributed memory system provided by the embodiment of the present application Figure.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
Below referring to Figure 1, Fig. 1 is a kind of fault recovery side of distributed memory system provided by the embodiment of the present application The flow chart of method.
Specific steps may include:
S101: when detecting node failure information, normal nodes all in cluster are collected using the host node Group's IP address reassigns operation and distributes one-to-one virtual ip address for each described normal node;
Wherein, the present embodiment default is implemented in distributed memory system, and there are more in distributed memory system A node illustrates that a certain node in distributed memory system breaks down when detecting node failure information, malfunctioning node Normal service connection can not be established with client, in order to guarantee the normal operation of business, need to restore client and distribution The service connection of storage system.
It should be noted that the virtual ip address distribution of each node is required through main section in distributed memory system Point is to execute.It needs to determine new host node in the cluster when malfunctioning node is host node, new host node is recycled to divide With virtual ip address.There are many method reason for carrying out cluster IP reassignment, as a preferred embodiment, can use The included cluster IP of CTDB service reassigns function to realize redistributing for virtual IP address.Illustrate the cluster IP of CTDB service Function is reassigned, such as in distributed memory system, each node operation has CTDB service, external for providing cluster Virtual IP address, client can be connect with a virtual IP address, if current cluster has an A, tri- nodes of B, C, A, B, on tri- nodes of C point There is not virtual IP address: A (192.168.0.11), B (192.168.0.12), C (192.168.0.13).Client D is connected to 192.168.0.11, then 192.168.0.11 is exactly business IP, if A node failure, is drifted about by the complete IP of CTDB service execution Afterwards, the distribution of current virtual IP is as follows: B (192.168.0.11,192.168.0.12), C (192.168.0.13).At this moment Connection can be established by node B and cluster by waiting client.It should be noted that by CTDB service with redistributing virtual IP address When location, if the IP has client in succession, this IP will not be re-assigned to other nodes, it is only that original malfunctioning node is corresponding Virtual ip address be assigned to normal node.
It is understood that the operation redistributed in this step about virtual IP address is real both for normal node in cluster It applies, there are the nodes of failure will not be assigned virtual ip address.The present embodiment is defaulted all node divisions in cluster Two classes: malfunctioning node and normal node.
S102: the client-side information table in the cluster is inquired, and will be connected with described in client according to query result The corresponding normal node of virtual ip address is set as destination node;
Wherein, client-side information table is stored in cluster by the present embodiment, after redistributing so as to virtual ip address Actively carry out the situation of inquiry virtual IP address connection client.Pair of virtual ip address and client is stored in client-side information table It should be related to, can judge whether some virtual ip address has established business company with some client according to client-side information table It connects.
About the client-side information table of comparisons in cluster, building process is as follows: carrying out to local client connection real-time Detection, other nodes of broadcaster client client information to cluster receive the client-side information of other nodes transmission;Linux can be used The included ss order of system, retrieves the TCP (Transmission Control Protocol transmission control protocol) of designated port Connection, the connection for meeting the characteristic is a client.A table being stored in memory due to client-side information table: it is After system is restarted, which is cleared, and writes data to the table again;There are two types of the newly-increased and deletion of client-side information table only has Mode: local real-time detection, and receive the information table of other nodes transmission.
When a certain normal node distribution virtual ip address be connected with client, then can establish the normal node with The connection relationship of client restores the service connection of client.It should be noted that the application is by the void of all normal nodes Quasi- IP address is all distributed, and can not determine which node should connect with which client after redistributing virtual ip address It connects, therefore sets destination node for the corresponding normal node of the virtual ip address for being connected with client in the present embodiment, Destination node is exactly the node for connecting and executing related service with client.
S103: each destination node of control sends TCP reconnection signal to corresponding client, connects to restore business It connects.
Wherein, this step is established on the basis of S102 has determined destination node, controls destination node to its virtual IP address The corresponding client in address sends TCP (Transmission Control Protocol, transmission control protocol) reconnection signal, To establish the regular traffic connection of destination node and client, restores business operation, make originally as caused by node failure Service disconnection restores normal.
The present embodiment distributes virtual ip address after there is node failure, for all normal nodes, due to client Information table is stored in cluster, therefore the application can inquire whether virtual ip address has been connected with client, and will be actively TCP reconnection signal is sent to the client for being connected with virtual ip address.Due to the operation of the recovery service connection of the application be Destination node actively executes after virtual ip address distribution, and there is no need to passively wait the reconnection signal of client.Therefore this implementation Example can fast implement fault recovery after service node failure, improve the stability of distributed memory system.
Fig. 2 is referred to below, and Fig. 2 is the fault recovery of another kind distributed memory system provided by the embodiment of the present application The flow chart of method;
Specific steps may include:
S201: when detecting node failure information, malfunctioning node is determined according to the node failure information, and judge institute State whether malfunctioning node is host node;If so, into S202;If it is not, then entering S203;
Wherein, since virtual ip address reassignment needs to rely on host node, if malfunctioning node is the main section in cluster When point, need to re-elect from all normal nodes to obtain new host node.
S202: the host node is re-elected from all normal nodes.
Wherein, the service node is the node that operation has CTDB to service, and main section can be re-elected by CTDB service Point.
S203: normal nodes all in cluster are carried out cluster IP address to reassign operation being each using the host node A normal node distributes one-to-one virtual ip address;
S204: inquire whether each virtual ip address connects client according to the client-side information table;If so, Into S205;If it is not, then terminating process.
Wherein it is possible to receive the client that all nodes are sent in the cluster according to predetermined period before the present embodiment Information updates the client-side information table according to the client-side information.
S205: destination node is set by the corresponding normal node of the virtual ip address.
S206: all normal nodes of all destination nodes of control into cluster send ARP broadcast, so that institute There is the normal node to update ARP table;Wherein, the ARP table is stored with the corresponding relationship of virtual ip address and MAC Address.
APR (Address Resolution Protocol, address resolution protocol) broadcast packet can notify all nodes ARP table is updated, ARP table storage is virtual ip address and MAC (Medium Access Control, medium access), that is, is controlled Physical address can tell all nodes, and the corresponding MAC Address of business IP changes, if necessary to send information, then past to be somebody's turn to do MAC Address is sent.
S207: each destination node of control sends TCP reconnection signal to corresponding client, connects to restore business It connects.
Fig. 3 is referred to, Fig. 3 is a kind of fault recovery system of distributed memory system provided by the embodiment of the present application Structural schematic diagram;
The system may include:
IP reallocation module 100, for owning in cluster using the host node when detecting node failure information Normal node carries out cluster IP address and reassigns operation as each one-to-one virtual ip address of normal node distribution;
Destination node determining module 200, the client-side information table for inquiring in the cluster, and will according to query result The corresponding normal node of the virtual ip address for being connected with client is set as destination node;
Reconnection module 300 sends TCP reconnection signal to corresponding client for controlling each destination node, with Just restore service connection.
Further, the fault recovery system further include:
Node judgment module for determining malfunctioning node according to the node failure information, and judges the malfunctioning node It whether is host node;
Whether host node elects module, for being host node when malfunctioning node, then from all normal nodes again Elect the host node.
Further, the fault recovery system further include:
APR broadcast module sends ARP for controlling all normal nodes of all destination nodes into cluster Broadcast, so that all normal nodes update ARP table;Wherein, the ARP table is stored with virtual ip address and MAC Address Corresponding relationship.
Further, the fault recovery system further include:
Address determination module determines target for sending to instruct according to the information when receiving information transmission instruction Virtual ip address;
Information sending module, for inquiring the corresponding MAC Address of the destination virtual IP address according to the ARP table, and The information, which is sent, instructs corresponding information to be sent to the MAC Address.
Further, the destination node determining module 200 is specially to inquire each institute according to the client-side information table State whether virtual ip address connects client;If so, setting target section for the corresponding normal node of the virtual ip address The module of point.
Further, the fault recovery system further include:
The client-side information that all nodes are sent in the cluster is received according to predetermined period, according to the client-side information Update the client-side information table.
Further, the service node is the node that operation has CTDB to service.
Since the embodiment of components of system as directed is corresponded to each other with the embodiment of method part, the embodiment of components of system as directed is asked Referring to the description of the embodiment of method part, wouldn't repeat here.
Present invention also provides a kind of computer readable storage mediums, have computer program thereon, the computer program It is performed and step provided by above-described embodiment may be implemented.The storage medium may include: USB flash disk, mobile hard disk, read-only deposit Reservoir (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or The various media that can store program code such as CD.
Present invention also provides a kind of electronic equipment, may include memory and processor, have meter in the memory Calculation machine program may be implemented provided by above-described embodiment when the processor calls the computer program in the memory Step.Certain electronic equipment can also include various network interfaces, the components such as power supply.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.It should be pointed out that for those skilled in the art, under the premise of not departing from the application principle, also Can to the application, some improvement and modification can also be carried out, these improvement and modification also fall into the protection scope of the claim of this application It is interior.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.Under the situation not limited more, the element limited by sentence "including a ..." is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims (10)

1. a kind of fault recovery method of distributed memory system characterized by comprising
When detecting node failure information, cluster IP address weight is carried out to normal nodes all in cluster using the host node Batch operation is that each described normal node distributes one-to-one virtual ip address;
The client-side information table in the cluster is inquired, and the virtual ip address of client will be connected with according to query result Corresponding normal node is set as destination node;
It controls each destination node and sends TCP reconnection signal to corresponding client, to restore service connection.
2. fault recovery method according to claim 1, which is characterized in that using the host node in cluster it is all just Chang Jiedian is carried out before the reassignment operation of cluster IP address, further includes:
Malfunctioning node is determined according to the node failure information, and judges whether the malfunctioning node is host node;
If so, re-electing the host node from all normal nodes.
3. fault recovery method according to claim 1, which is characterized in that in the virtual IP address that will be connected with client The corresponding normal node in location is set as after destination node, further includes:
Control all normal nodes of all destination nodes into cluster and send ARP broadcast so that it is all it is described just Normal node updates ARP table;Wherein, the ARP table is stored with the corresponding relationship of virtual ip address and MAC Address.
4. fault recovery method according to claim 3, which is characterized in that further include:
When receiving information transmission instruction, is sent according to the information and instruct determining destination virtual IP address;
The corresponding MAC Address of the destination virtual IP address is inquired according to the ARP table, and the information is sent into instruction and is corresponded to Information be sent to the MAC Address.
5. fault recovery method according to claim 1, which is characterized in that the virtual ip address of client will be connected with Corresponding normal node is set as destination node
Inquire whether each virtual ip address connects client according to the client-side information table;
If so, setting destination node for the corresponding normal node of the virtual ip address.
6. fault recovery method according to claim 1, which is characterized in that further include:
The client-side information that all nodes are sent in the cluster is received according to predetermined period, is updated according to the client-side information The client-side information table.
7. fault recovery method according to claim 1, which is characterized in that the service node is that operation has CTDB to service Node.
8. a kind of fault recovery system of distributed memory system characterized by comprising
IP reallocation module, for when detecting node failure information, using the host node to normal sections all in cluster Point carries out cluster IP address and reassigns operation to be that each described normal node distributes one-to-one virtual ip address;
Destination node determining module, the client-side information table for inquiring in the cluster, and will be connected with according to query result The corresponding normal node of the virtual ip address of client is set as destination node;
Reconnection module sends TCP reconnection signal to corresponding client for controlling each destination node, to restore industry Business connection.
9. a kind of electronic equipment characterized by comprising
Memory, for storing computer program;
Processor realizes distributed storage system as described in any one of claim 1 to 7 when for executing the computer program The step of fault recovery method of system.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize distributed memory system as described in any one of claim 1 to 7 when being executed by processor Fault recovery method the step of.
CN201810826771.4A 2018-07-25 2018-07-25 A kind of fault recovery method of distributed memory system, system and associated component Pending CN109032830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810826771.4A CN109032830A (en) 2018-07-25 2018-07-25 A kind of fault recovery method of distributed memory system, system and associated component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810826771.4A CN109032830A (en) 2018-07-25 2018-07-25 A kind of fault recovery method of distributed memory system, system and associated component

Publications (1)

Publication Number Publication Date
CN109032830A true CN109032830A (en) 2018-12-18

Family

ID=64645229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810826771.4A Pending CN109032830A (en) 2018-07-25 2018-07-25 A kind of fault recovery method of distributed memory system, system and associated component

Country Status (1)

Country Link
CN (1) CN109032830A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750379A (en) * 2019-10-28 2020-02-04 无锡华云数据技术服务有限公司 ETCD cluster recovery method, system, equipment and computer medium
CN111258795A (en) * 2019-11-29 2020-06-09 浪潮电子信息产业股份有限公司 Samba cluster fault reconnection method, device, equipment and medium
CN111314117A (en) * 2020-01-20 2020-06-19 苏州浪潮智能科技有限公司 Fault transfer method, device, equipment and readable storage medium
CN111949452A (en) * 2020-09-18 2020-11-17 苏州浪潮智能科技有限公司 Method and device for rapidly recovering IO (input/output) in single-node fault of storage system
CN112511317A (en) * 2020-12-31 2021-03-16 河南信大网御科技有限公司 Input distribution method, input agent and mimicry distributed storage system
CN113596068A (en) * 2020-04-30 2021-11-02 北京金山云网络技术有限公司 Method, device and server for establishing TCP connection
CN114116216A (en) * 2021-11-24 2022-03-01 北京大道云行科技有限公司 Method and device for realizing high availability of distributed block storage based on vip
CN114285729A (en) * 2021-11-29 2022-04-05 苏州浪潮智能科技有限公司 Distributed cluster management node deployment method, device, equipment and storage medium
CN114553900A (en) * 2022-02-18 2022-05-27 苏州浪潮智能科技有限公司 Distributed block storage management system and method and electronic equipment
CN115437843A (en) * 2022-08-25 2022-12-06 北京万里开源软件有限公司 Database storage partition recovery method based on multi-level distributed consensus
CN115866018A (en) * 2023-02-28 2023-03-28 浪潮电子信息产业股份有限公司 Service processing method and device, electronic equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932500A (en) * 2012-11-07 2013-02-13 曙光信息产业股份有限公司 Method and system for taking over fault interface node
CN103475732A (en) * 2013-09-25 2013-12-25 浪潮电子信息产业股份有限公司 Distributed file system data volume deployment method based on virtual address pool
CN104090992A (en) * 2014-08-06 2014-10-08 浪潮电子信息产业股份有限公司 Method for high-availability configuration between conversion nodes in cluster storage system
US9342390B2 (en) * 2013-01-31 2016-05-17 International Business Machines Corporation Cluster management in a shared nothing cluster
US20170220418A1 (en) * 2009-12-29 2017-08-03 International Business Machines Corporation Determining completion of migration in a dispersed storage network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170220418A1 (en) * 2009-12-29 2017-08-03 International Business Machines Corporation Determining completion of migration in a dispersed storage network
CN102932500A (en) * 2012-11-07 2013-02-13 曙光信息产业股份有限公司 Method and system for taking over fault interface node
US9342390B2 (en) * 2013-01-31 2016-05-17 International Business Machines Corporation Cluster management in a shared nothing cluster
CN103475732A (en) * 2013-09-25 2013-12-25 浪潮电子信息产业股份有限公司 Distributed file system data volume deployment method based on virtual address pool
CN104090992A (en) * 2014-08-06 2014-10-08 浪潮电子信息产业股份有限公司 Method for high-availability configuration between conversion nodes in cluster storage system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
THANDA SHWE ET AL.: "A fault tolerant approach in cluster computing system", 《2008 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY》 *
李昌隆: "云存储系统中数据访问和存储接口的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750379A (en) * 2019-10-28 2020-02-04 无锡华云数据技术服务有限公司 ETCD cluster recovery method, system, equipment and computer medium
CN110750379B (en) * 2019-10-28 2023-10-31 无锡华云数据技术服务有限公司 ETCD cluster recovery method, system, equipment and computer medium
CN111258795A (en) * 2019-11-29 2020-06-09 浪潮电子信息产业股份有限公司 Samba cluster fault reconnection method, device, equipment and medium
CN111258795B (en) * 2019-11-29 2022-06-17 浪潮电子信息产业股份有限公司 Samba cluster fault reconnection method, device, equipment and medium
CN111314117A (en) * 2020-01-20 2020-06-19 苏州浪潮智能科技有限公司 Fault transfer method, device, equipment and readable storage medium
CN113596068A (en) * 2020-04-30 2021-11-02 北京金山云网络技术有限公司 Method, device and server for establishing TCP connection
CN111949452A (en) * 2020-09-18 2020-11-17 苏州浪潮智能科技有限公司 Method and device for rapidly recovering IO (input/output) in single-node fault of storage system
CN111949452B (en) * 2020-09-18 2022-09-20 苏州浪潮智能科技有限公司 Method and device for rapidly recovering IO (input/output) in single-node fault of storage system
CN112511317A (en) * 2020-12-31 2021-03-16 河南信大网御科技有限公司 Input distribution method, input agent and mimicry distributed storage system
CN114116216A (en) * 2021-11-24 2022-03-01 北京大道云行科技有限公司 Method and device for realizing high availability of distributed block storage based on vip
CN114285729B (en) * 2021-11-29 2023-08-25 苏州浪潮智能科技有限公司 Distributed cluster management node deployment method, device, equipment and storage medium
CN114285729A (en) * 2021-11-29 2022-04-05 苏州浪潮智能科技有限公司 Distributed cluster management node deployment method, device, equipment and storage medium
CN114553900A (en) * 2022-02-18 2022-05-27 苏州浪潮智能科技有限公司 Distributed block storage management system and method and electronic equipment
CN114553900B (en) * 2022-02-18 2023-08-04 苏州浪潮智能科技有限公司 Distributed block storage management system, method and electronic equipment
CN115437843A (en) * 2022-08-25 2022-12-06 北京万里开源软件有限公司 Database storage partition recovery method based on multi-level distributed consensus
CN115866018B (en) * 2023-02-28 2023-05-16 浪潮电子信息产业股份有限公司 Service processing method, device, electronic equipment and computer readable storage medium
CN115866018A (en) * 2023-02-28 2023-03-28 浪潮电子信息产业股份有限公司 Service processing method and device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN109032830A (en) A kind of fault recovery method of distributed memory system, system and associated component
CN113037560B (en) Service flow switching method and device, storage medium and electronic equipment
US7856488B2 (en) Electronic device profile migration
CN1554055B (en) High-availability cluster virtual server system
CN107465721B (en) Global load balancing method and system based on double-active architecture and scheduling server
CN108780386A (en) A kind of methods, devices and systems of data storage
US9419890B2 (en) Streaming service load sharing method, streaming service processing method, and corresponding device and system
US9118595B2 (en) Graceful failover of a principal link in a fiber-channel fabric
CN110069210B (en) Storage system, and method and device for allocating storage resources
US10001945B2 (en) Method of storing data and data storage managing server
EP2418824A1 (en) Method for resource information backup operation based on peer to peer network and peer to peer network thereof
CN106059791A (en) Business link switching method and storage device in storage system
CN103546315A (en) System, method and equipment for backing up DHCP (dynamic host configuration protocol) server
KR101586354B1 (en) Communication failure recover method of parallel-connecte server system
CN104967691A (en) Distributed storage control method and system
CN114500523A (en) Fixed IP application release method based on container cloud platform
US11153173B1 (en) Dynamically updating compute node location information in a distributed computing environment
CN108089934A (en) Cluster management method and cluster server
US20210326224A1 (en) Method and system for processing device failure
JP2005011331A (en) Load distribution system and computer management program
US11977450B2 (en) Backup system, method therefor, and program
CN114138475A (en) Data transmission load balancing method, device, equipment and storage medium
CN109788007B (en) Cloud platform based on two places and three centers and communication method thereof
US20060168108A1 (en) Methods and systems for defragmenting subnet space within an adaptive infrastructure
CN110855495B (en) Task dynamic balancing method, device, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181218