CN108847982A - A kind of distributed storage cluster and its node failure switching method and apparatus - Google Patents

A kind of distributed storage cluster and its node failure switching method and apparatus Download PDF

Info

Publication number
CN108847982A
CN108847982A CN201810668234.1A CN201810668234A CN108847982A CN 108847982 A CN108847982 A CN 108847982A CN 201810668234 A CN201810668234 A CN 201810668234A CN 108847982 A CN108847982 A CN 108847982A
Authority
CN
China
Prior art keywords
node
business
distributed storage
business information
power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810668234.1A
Other languages
Chinese (zh)
Other versions
CN108847982B (en
Inventor
孙业宽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201810668234.1A priority Critical patent/CN108847982B/en
Publication of CN108847982A publication Critical patent/CN108847982A/en
Application granted granted Critical
Publication of CN108847982B publication Critical patent/CN108847982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a kind of distributed storage clustered node power-off switching method and its devices, and applied to the host node of distributed storage cluster, this method includes:Heartbeat detection mode according to CTDB detects the state of each node in cluster;After detecting power-off node, the business information of power-off node is obtained;Business information is sent in the normal node in distributed storage cluster with respective service function, each normal node for receiving business information carries out business drift and business recovery according to business information.The detection recovery process time for powering off node is shorten to second grade by the minute grade of script by the present invention, and quickening cluster recovery is normal and powers off the speed of the business recovery of node access, improves the reliability of cluster;The invention also discloses a kind of distributed storage cluster based on the above method.

Description

A kind of distributed storage cluster and its node failure switching method and apparatus
Technical field
The present invention relates to distributed type assemblies High Availabitity technical fields, disconnected more particularly to a kind of distributed storage clustered node Cutting changes method and device thereof.The invention further relates to a kind of distributed storage clusters.
Background technique
Distributed storage cluster is the cluster being made of multiple memory node servers, and it is multiple to support that a data are stored in On node, each node can get complete data by inter-node communication, when delay machine occurs in node according to configuration Strategy can carry out the recovery of partial data, include monitoring module, storage pool module, metadata pipe in distributed storage cluster Manage the service modules such as module.
Part of nodes is likely to occur power supply line loosening to distributed storage cluster in the process of running or power supply line is unplugged Etc. failures envoy's point power-off, at this time if power-off node number cluster allow power off number of nodes within the scope of (i.e. clustered node is superfluous Remainder), distributed storage cluster recovery is normal and continues the time that the normal access of offer business needs minute grade, the reason is that by It is to confirm whether each node powers off by heartbeat detection by each service module, and service in current distributed storage cluster The heartbeat detection precision of module is minute grade, i.e. 60s or more (shakes) because will lead to cluster lower than 60s, mesh It is preceding to need to be confirmed whether by the time of 60s or more node power-off occur, and then carry out cluster recovery and power off node Business recovery etc..
It can be seen that cluster can not be quickly detected disconnection fault in current node outage detection recovery process, into And can not can not rapidly carry out cluster recovery and restore the business access on power-off node, cause business interruption time long, Cluster poor reliability.
Therefore, how to provide a kind of high reliablity distributed storage clustered node power-off switching method and its device and A kind of distributed storage cluster is the current problem to be solved of those skilled in the art.
Summary of the invention
The object of the present invention is to provide a kind of distributed storage clustered node power-off switching method and its devices, and power-off is saved The detection recovery process time of point shorten to second grade by the minute grade of script, and it is normal and power off the industry of node to accelerate cluster recovery Business restores the speed of access, improves the reliability of cluster;It is a further object of the present invention to provide a kind of dividing based on the above method Cloth storage cluster.
In order to solve the above technical problems, the present invention provides a kind of distributed storage clustered nodes to power off switching method, answer For the host node of the distributed storage cluster, the method includes:
Heartbeat detection mode according to CTDB lightweight Cluster Database detects the state of each node in cluster;
After having detected node power-off, the business information of power-off node is obtained;
The business information is sent in the normal node in the distributed storage cluster with respective service function, Each normal node for receiving the business information carries out business drift and business recovery according to the business information.
Preferably, after having detected node power-off, before the business information for obtaining power-off node, further include:
Judge whether the power-off node is to obtain by heartbeat detection, if so, obtaining the industry of the power-off node Business information.
Preferably, the business information includes virtual IP address.
Preferably, the business information further includes service buffer data.
Preferably, described that the business information is sent in the distributed storage cluster with respective service function Process in normal node is specially:
Call the failover program in the distributed storage cluster;
Selection includes the normal node of each service function;
The business information is sent in selected node.
Preferably, the service function includes monitoring function, storage pool function and metadata management function.
Preferably, the process according to CTDB heartbeat detection mode detection node state is specially:
Each node in several heartbeat packets to the distributed storage cluster is issued in each heartbeat detection period;
Judge the response that whole nodes return whether is received in preset time, if it exists the node of non-returning response, then The node of the non-returning response is power-off node.
In order to solve the above technical problems, the present invention also provides a kind of distributed storage clustered nodes to power off switching device, Applied to the host node of the distributed storage cluster, described device includes:
State monitoring module detects the state of each node in cluster for the heartbeat detection mode according to CTDB;
Data obtaining module obtains the business information of power-off node after having detected node power-off;
Sending module has the function of respective service for the business information to be sent in the distributed storage cluster Normal node in, for receive each normal node of the business information according to the business information carry out business drift and Business recovery.
In order to solve the above technical problems, the present invention also provides a kind of distributed storage cluster, including multiple it is provided with The node of CTDB function elects a node as host node in multiple nodes;The host node includes:
Memory, for storing computer program;
Processor realizes the distributed storage cluster section as described in any of the above item when for executing the computer program The step of point off electricity switching method.
Preferably, the node in addition to the host node, is specifically used for:
It is parallel to carry out own service recovery operation and carry out business drift operation according to the business information.
The present invention provides a kind of distributed storage clustered node power-off switching method and its devices, utilize the heartbeat of CTDB Detection whether there is power-off node to detect in cluster, if detecting power-off node later, obtain the business letter of power-off node Breath, and be sent in the normal node with respective service (or can be regarded as that there is respective service module), for receiving The normal node of these business information carries out business drift, restores the business access on power-off node.It is understood that due to The heartbeat detection time precision of CTDB is second grade, i.e., usually several seconds, therefore, the heartbeat detection of CTDB can be quickly detected It is powered off with the presence or absence of node, and the business information for powering off node is sent in normal node, so that normal node can Data are carried out in time to restore that the detection recovery process time of node will be powered off by the minute grade of script with business drift, the i.e. present invention It is shorten to second grade, accelerate cluster recovery normally and powers off the speed that the business recovery of node accesses, to shorten as far as possible The terminal time of business, improve the reliability of cluster.The present invention also provides a kind of distributions based on the above method to deposit Accumulation, it may have above-mentioned advantage.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is the flow chart for the process that a kind of distributed storage clustered node provided by the invention powers off switching method;
Fig. 2 is the structural schematic diagram that a kind of distributed storage clustered node provided by the invention powers off switching device.
Specific embodiment
Core of the invention is to provide a kind of distributed storage clustered node power-off switching method and its device, will power-off section The detection recovery process time of point shorten to second grade by the minute grade of script, and it is normal and power off the industry of node to accelerate cluster recovery Business restores the speed of access, improves the reliability of cluster;Another core of the invention is to provide a kind of point based on the above method Cloth storage cluster.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
The present invention provides a kind of distributed storage clustered nodes to power off switching method, applied to distributed storage cluster Host node, shown in Figure 1, Fig. 1 is the process that a kind of distributed storage clustered node provided by the invention powers off switching method Flow chart;This method includes:
Step s1:Heartbeat detection mode according to CTDB detects the state of each node in cluster;
It is understood that CTDB (clustered trivial database, lightweight Cluster Database) is a kind of Cluster High Availabitity management software, for monitoring clustered node state and traffic assignments.Distributed storage collection with CTDB function Group, it will usually CTDB software is respectively mounted in each clustered node, so that each node can carry out heartbeat inspection according to CTDB It surveys, the testing result between different nodes will do it interaction.All node meetings in distributed storage cluster with CTDB function A host node is elected, at this point, failure recovery operation (distribution of such as virtual IP address) is only carried out by host node.
In current distributed storage cluster, CTDB has the function of heartbeat detection, and the time precision of detection is also second grade, But the testing result of CTDB is not applied on the cluster and business recovery after node power-off, cluster and power-off in the prior art Whether the recovery operation of node traffic carries out, and is that the testing result of the service module in foundation node carries out, time precision grade Not Wei minute grade, the time is long, low efficiency, i.e., the heartbeat detection of CTDB and power-off node business recovery the prior art be two sets Irrelevant process.And in the present invention, the heartbeat detection of CTDB and the business recovery of power-off node are linked together, foundation The heartbeat detection result of CTDB carries out the recovery operation of subsequent cluster and power-off node traffic to control, and cluster and power-off are saved The time of the recovery operation of point business shortens for second grade, improves the efficiency and reliability of cluster recovery.Step s2:It detects After powering off node, the business information of power-off node is obtained;
After having node power-off, the business on the power-off node is certain to be interrupted, in order to restore the business as early as possible just Frequentation is asked, is run on (switching in other words) to other the normal nodes that need to drift about the business, it is thus necessary to determine that power-off section The information of the business run on point facilitates the drift for selecting suitable node and carrying out business.Due to having powered off node at this time Through powering off, therefore, the business information for usually obtaining power-off node is from the acquisition on host node, this is because host node is responsible for Therefore, on host node the relevant information of the business run in each node is preserved in the distribution of business.
Step s3:Business information is sent in the normal node in distributed storage cluster with respective service function, Each normal node for receiving business information carries out business drift and business recovery according to business information.
It is understood that include multiple nodes in distributed storage cluster, it is interrelated between these nodes, jointly The processing of finishing service, therefore, different nodes may have the function of different, i.e., may include identical clothes on different nodes Module of being engaged in may also include different service module.When carrying out business drift, can be operated normally to guarantee that business is subsequent, It needs to determine required respective services function when the service operation first, and then selects for business information to be distributed to these The normal node of service function drifts to the business for powering off node in these normal nodes, these subsequent normal nodes are restored After normal, it can be executed according to the access of the common finishing service of business information.
In addition, business recovery is it can be appreciated that be that node restores, due in distributed storage cluster between each node Interrelated, therefore, after a node breaks down power-off, other nodes also will receive influence, and can not work normally, At this time, it is desirable to enable the business of power-off node run in other nodes, not only need to drift to business on other nodes, it is also necessary to The configuration for adjusting these nodes, makes it restore to normal operating conditions;If also, there are specific demands for business to be drifted about Words, when carrying out recovery operation to these nodes, it is also necessary to adjust its configuration data, the business that drift can be supported to come Operation.
Wherein, due to service operation need first finishing service drift and business recovery, business successful switch to its The time of his node is subject to business drift and business recovery at the time of complete.Wherein, if business drift and business recovery are If serial progress, then the service switching time is equal to the summation of the two time, if if the two is parallel progress, service switching It is longer that time is subject to the time in the two, compares serial operation, and the parallel work-flow time is shorter, certainly, serial operation and parallel It operates within the scope of the present invention.
By experiment it is found that by aforesaid operations, disconnection fault detection is usually controllably within 10 seconds, business drift and section Point business recovery time-consuming is substantially respectively at 10 seconds or so, so that the time that service switching is performed integrally controls the phase within 30 seconds Than the service switching of current minute grade, service recovery time is shorter, improves the reliability and stability of cluster.
Wherein, service function includes monitoring function, storage pool function and metadata management function.
It is understood that cluster is wanted to operate normally, storage and metadata management function are essential, and in order to In time the problem of discovery service operation, it is also desirable to which the operating condition that monitoring function carrys out monitoring business is set.Certainly, it is distributed Can also be comprising other service functions in formula storage cluster, this is not limited by the present invention.
In a preferred embodiment, after detecting power-off node, before the business information for obtaining power-off node, further include:
Judge to power off whether node is to obtain by heartbeat detection, if so, obtaining the business information of power-off node.
It is understood that although the present invention has detected whether node power-off using the heartbeat detection of CTDB, most The power-off node obtained eventually may not be to be obtained by heartbeat detection, because CTDB is executing stop or restart order When, it can also determine the node locating for itself to power off node, at this point, be clearly mistake, therefore in order to distinguish such case, Before the business information for obtaining power-off node, need to differentiate whether the power-off node is to obtain by heartbeat detection first, because To be only just powered off node by the malfunctioning node that heartbeat detection function detects, otherwise do not handle.Concrete implementation mode, It is to add flag bit in the power-off node identification that heartbeat detection goes out, whether the node detected later by judgement includes mark Position can be differentiated.Certainly, the above is only one of implementations, can also judge whether it is power-off section by other means Point, this is not limited by the present invention.
In a specific embodiment, business information includes virtual IP address.
It is understood that virtual IP address is, CTDB main section one-to-one with business for distributed storage cluster The distribution of the responsible virtual IP address of point.When some clustered node breaks down, in order to guarantee the normal access of business on the node, then Distribution can be floated to other nodes in the virtual IP address of the node, the business of the node can also float accordingly with virtual IP address later Other nodes are moved on to, to guarantee the high availability of cluster.
Certainly, for most of distributed storage clusters, the drift of business only needs virtual IP address, but partial picture Under, business drift may also realize that this is not limited by the present invention according to other parameters.
In addition, business information further includes service buffer data.For under partial picture, business continues to access possible needs Data before may be unable to complete business only in accordance with virtual IP address at this time and continue to access, therefore, needed in business information include Service buffer data.Certainly, business information may also contain the host number etc. of power-off node, this hair of the particular content of business information It is bright to be not construed as limiting.
In a specific embodiment, business information is sent in distributed storage cluster with respective service function Process in normal node is specially:
Call the failover program in distributed storage cluster;
Selection includes the normal node of each service function;
Business information is sent in selected node.
It is understood that being usually provided with a failure in each node in current distributed storage cluster Changeover program, if after thering is a node to call itself the program, due between node each in cluster there are data interaction, After the program in other nodes can also be run, and then carry out service switching operation, therefore host node obtains business information, directly The failover program of itself is called, the program in other nodes can be also activated, and then be ready for service switching, the program Has the function of node selection, after host node calls the program, program itself can select suitable node, and then by host node Business information is sent in the node of selection, since the failover program in the node of selection has been actuated while, as long as therefore Node can start to carry out service switching (business drift and business recovery etc.) after receiving business information.
It certainly, can be by host node certainly if in cluster and if being not provided with failover program with the above function The facilities of each node in row analysis cluster, and then select corresponding node.Which kind of specifically select to receive industry using mode The node for information of being engaged in, the present invention are not construed as limiting.
Wherein, it is specially according to the process of CTDB heartbeat detection mode detection node state:
Each node in several heartbeat packets to distributed storage cluster is issued in each heartbeat detection period;
Judge the response that whole nodes return whether is received in preset time, if it exists the node of non-returning response, then The node of non-returning response is power-off node.
Here preset time is generally corresponding with the heartbeat detection period, but due to the transmitting-receiving of signal needs the time, therefore the heart The detection cycle preferred slightly longer and heartbeat detection period is jumped, certainly, the present invention does not limit being specifically defined for preset time.
For example, it is assumed that each heartbeat detection period is 4 seconds (the time interval present invention of two heart beat cycles is not construed as limiting), Heartbeat packet of every 2 seconds hairs is sent out 2 times altogether, if Correspondent Node does not receive the heartbeat packet in 4 seconds and is considered as node event Barrier;Or the heartbeat detection period may be 8 seconds, i.e., heartbeat packet of 2 seconds hairs, sends out 4 times, is in order to avoid heartbeat is too small in this way Failure is caused to judge by accident.Certainly, the length of detection cycle and the transmission frequency of heartbeat packet are jumped in the unlimited centering of the present invention.
The present invention provides a kind of distributed storage clustered nodes to power off switching method, is examined using the heartbeat detection of CTDB It surveys with the presence or absence of power-off node in cluster, if detecting power-off node later, obtains the business information of power-off node, and send Extremely have the function of in the normal node of respective service (or can be regarded as that there is respective service module), for receiving these business letter The normal node of breath carries out business drift, restores the business access on power-off node.It is understood that due to the heartbeat of CTDB Detection time precision is second grade, i.e., usually several seconds, therefore, the heartbeat detection of CTDB can be quickly detected with the presence or absence of section Point power-off, and the business information for powering off node is sent in normal node, so that normal node can be counted in time According to restoring to drift about with business, i.e., the detection recovery process time for powering off node is shorten to the second by the minute grade of script by the present invention Grade accelerates cluster recovery normally and powers off the speed that the business recovery of node accesses, to shorten business as far as possible Terminal time improves the reliability of cluster.
The present invention also provides a kind of distributed storage clustered nodes to power off switching device, is applied to distributed storage cluster Host node, shown in Figure 2, Fig. 2 is the knot that a kind of distributed storage clustered node provided by the invention powers off switching device Structure schematic diagram.
The device includes:
State monitoring module 1 detects the state of each node in cluster for the heartbeat detection mode according to CTDB;
Data obtaining module 2 obtains the business information of power-off node after detecting power-off node;
Sending module 3 is normal with respective service function in distributed storage cluster for business information to be sent to In node, each normal node for receiving business information carries out business drift and business recovery according to business information.
The present invention also provides a kind of distributed storage clusters, including multiple nodes for being provided with CTDB function, Duo Gejie A node is elected in point as host node;Host node includes:
Memory, for storing computer program;
Processor realizes that the distributed storage clustered node power-off such as any of the above item is cut when for executing computer program The step of changing method.
In a preferred embodiment, the node in addition to host node, is specifically used for:
It is parallel to carry out own service recovery operation and carry out business drift operation according to business information.
It is understood that since both business recovery operation and business drift operation are not interfered, compared to serial Operation, parallel work-flow can shorten the time of service switching as far as possible.
Above several specific embodiments are only the preferred embodiment of the present invention, and above several specific embodiments can be with Any combination, the embodiment obtained after combination is also within protection scope of the present invention.It should be pointed out that for the art For those of ordinary skill, relevant speciality technical staff deduced out in the case where not departing from spirit of that invention and concept thereof other change Into and variation, should all be included in the protection scope of the present invention.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of distributed storage clustered node powers off switching method, which is characterized in that be applied to the distributed storage cluster Host node, the method includes:
Heartbeat detection mode according to CTDB lightweight Cluster Database detects the state of each node in cluster;
After having detected node power-off, the business information of power-off node is obtained;
The business information is sent in the normal node in the distributed storage cluster with respective service function, for connecing The each normal node for receiving the business information carries out business drift and business recovery according to the business information.
2. node failure switching method according to claim 1, which is characterized in that after having detected node power-off, obtain Before the business information for powering off node, further include:
Judge whether the power-off node is to obtain by heartbeat detection, if so, obtaining the business letter of the power-off node Breath.
3. node failure switching method according to claim 2, which is characterized in that the business information includes virtual IP address.
4. node failure switching method according to claim 3, which is characterized in that the business information further includes that business is slow Deposit data.
5. node failure switching method according to claim 1, which is characterized in that described to be sent to the business information In the distributed storage cluster with respective service function normal node in process be specially:
Call the failover program in the distributed storage cluster;
Selection includes the normal node of each service function;
The business information is sent in selected node.
6. node failure switching method according to claim 1, which is characterized in that the service function includes monitoring function Energy, storage pool function and metadata management function.
7. node failure switching method according to claim 2, which is characterized in that described according to CTDB heartbeat detection mode The process of detection node state is specially:
Each node in several heartbeat packets to the distributed storage cluster is issued in each heartbeat detection period;
Judge the response that whole nodes return whether is received in preset time, if it exists the node of non-returning response, then it is described The node of non-returning response is power-off node.
8. a kind of distributed storage clustered node powers off switching device, which is characterized in that be applied to the distributed storage cluster Host node, described device includes:
State monitoring module detects the state of each node in cluster for the heartbeat detection mode according to CTDB;
Data obtaining module obtains the business information of power-off node after having detected node power-off;
Sending module has respective service function just for the business information to be sent in the distributed storage cluster In Chang Jiedian, each normal node for receiving the business information carries out business drift and business according to the business information Restore.
9. a kind of distributed storage cluster, which is characterized in that including multiple nodes for being provided with CTDB function, multiple nodes In elect a node as host node;The host node includes:
Memory, for storing computer program;
Processor realizes distributed storage collection as described in any one of claim 1 to 7 when for executing the computer program Group node powers off the step of switching method.
10. distributed storage cluster according to claim 9, which is characterized in that the section in addition to the host node Point, is specifically used for:
It is parallel to carry out own service recovery operation and carry out business drift operation according to the business information.
CN201810668234.1A 2018-06-26 2018-06-26 Distributed storage cluster and node fault switching method and device thereof Active CN108847982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810668234.1A CN108847982B (en) 2018-06-26 2018-06-26 Distributed storage cluster and node fault switching method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810668234.1A CN108847982B (en) 2018-06-26 2018-06-26 Distributed storage cluster and node fault switching method and device thereof

Publications (2)

Publication Number Publication Date
CN108847982A true CN108847982A (en) 2018-11-20
CN108847982B CN108847982B (en) 2021-11-19

Family

ID=64203566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810668234.1A Active CN108847982B (en) 2018-06-26 2018-06-26 Distributed storage cluster and node fault switching method and device thereof

Country Status (1)

Country Link
CN (1) CN108847982B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783264A (en) * 2018-12-29 2019-05-21 南京富士通南大软件技术有限公司 A kind of High Availabitity solution of database
CN110286732A (en) * 2019-06-27 2019-09-27 无锡华云数据技术服务有限公司 High-availability cluster power down automatic recovery method, device, equipment and storage medium
CN110611603A (en) * 2019-09-09 2019-12-24 苏州浪潮智能科技有限公司 Cluster network card monitoring method and device
CN110740064A (en) * 2019-10-25 2020-01-31 北京浪潮数据技术有限公司 Distributed cluster node fault processing method, device, equipment and storage medium
CN110855504A (en) * 2019-11-22 2020-02-28 苏州浪潮智能科技有限公司 Method, system and related device for recovering faults of cloud platform management nodes
CN111212127A (en) * 2019-12-29 2020-05-29 浪潮电子信息产业股份有限公司 Storage cluster, service data maintenance method, device and storage medium
CN111756573A (en) * 2020-05-28 2020-10-09 浪潮电子信息产业股份有限公司 CTDB double-network-card fault monitoring method in distributed cluster and related equipment
CN111865632A (en) * 2019-04-28 2020-10-30 阿里巴巴集团控股有限公司 Switching method of distributed data storage cluster and switching instruction sending method and device
CN112035326A (en) * 2020-09-03 2020-12-04 中国银行股份有限公司 Abnormal node task processing method and device based on cluster node mutual detection
CN112711632A (en) * 2019-12-27 2021-04-27 山东鲁能软件技术有限公司 Asynchronous data stream replication method and system for high-availability cluster
CN112866408A (en) * 2021-02-09 2021-05-28 山东英信计算机技术有限公司 Service switching method, device, equipment and storage medium in cluster
CN113162797A (en) * 2021-03-03 2021-07-23 山东英信计算机技术有限公司 Method, system and medium for switching master node fault of distributed cluster
CN113794595A (en) * 2021-09-15 2021-12-14 领云悠逸(北京)科技有限公司 IoT (Internet of things) equipment high-availability method based on industrial Internet
CN114584489A (en) * 2022-03-08 2022-06-03 浪潮云信息技术股份公司 Ssh channel-based remote environment information and configuration detection method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110314339A1 (en) * 2010-06-22 2011-12-22 International Business Machines Corporation Systems for agile error determination and reporting and methods thereof
CN102394791A (en) * 2011-10-26 2012-03-28 浪潮(北京)电子信息产业有限公司 Downtime recovery method and system
CN103607297A (en) * 2013-11-07 2014-02-26 上海爱数软件有限公司 Fault processing method of computer cluster system
CN106933693A (en) * 2017-03-15 2017-07-07 郑州云海信息技术有限公司 A kind of data-base cluster node failure self-repairing method and system
CN108009045A (en) * 2016-10-31 2018-05-08 杭州海康威视数字技术股份有限公司 A kind of master/slave data storehouse fault handling method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110314339A1 (en) * 2010-06-22 2011-12-22 International Business Machines Corporation Systems for agile error determination and reporting and methods thereof
CN102394791A (en) * 2011-10-26 2012-03-28 浪潮(北京)电子信息产业有限公司 Downtime recovery method and system
CN103607297A (en) * 2013-11-07 2014-02-26 上海爱数软件有限公司 Fault processing method of computer cluster system
CN108009045A (en) * 2016-10-31 2018-05-08 杭州海康威视数字技术股份有限公司 A kind of master/slave data storehouse fault handling method and device
CN106933693A (en) * 2017-03-15 2017-07-07 郑州云海信息技术有限公司 A kind of data-base cluster node failure self-repairing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘爱贵: "基于开源软件构建高性能集群NAS系统", 《CSDN:HTTPS://BLOG.CSDN.NET/LIUAIGUI/ARTICLE/DETAILS/7163482》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783264A (en) * 2018-12-29 2019-05-21 南京富士通南大软件技术有限公司 A kind of High Availabitity solution of database
CN111865632A (en) * 2019-04-28 2020-10-30 阿里巴巴集团控股有限公司 Switching method of distributed data storage cluster and switching instruction sending method and device
CN110286732A (en) * 2019-06-27 2019-09-27 无锡华云数据技术服务有限公司 High-availability cluster power down automatic recovery method, device, equipment and storage medium
CN110286732B (en) * 2019-06-27 2021-01-12 华云数据控股集团有限公司 Method, device and equipment for automatically recovering power failure of high-availability cluster and storage medium
CN110611603A (en) * 2019-09-09 2019-12-24 苏州浪潮智能科技有限公司 Cluster network card monitoring method and device
CN110740064A (en) * 2019-10-25 2020-01-31 北京浪潮数据技术有限公司 Distributed cluster node fault processing method, device, equipment and storage medium
CN110855504A (en) * 2019-11-22 2020-02-28 苏州浪潮智能科技有限公司 Method, system and related device for recovering faults of cloud platform management nodes
CN112711632A (en) * 2019-12-27 2021-04-27 山东鲁能软件技术有限公司 Asynchronous data stream replication method and system for high-availability cluster
CN111212127A (en) * 2019-12-29 2020-05-29 浪潮电子信息产业股份有限公司 Storage cluster, service data maintenance method, device and storage medium
CN111756573A (en) * 2020-05-28 2020-10-09 浪潮电子信息产业股份有限公司 CTDB double-network-card fault monitoring method in distributed cluster and related equipment
CN112035326A (en) * 2020-09-03 2020-12-04 中国银行股份有限公司 Abnormal node task processing method and device based on cluster node mutual detection
CN112866408A (en) * 2021-02-09 2021-05-28 山东英信计算机技术有限公司 Service switching method, device, equipment and storage medium in cluster
CN113162797A (en) * 2021-03-03 2021-07-23 山东英信计算机技术有限公司 Method, system and medium for switching master node fault of distributed cluster
CN113794595A (en) * 2021-09-15 2021-12-14 领云悠逸(北京)科技有限公司 IoT (Internet of things) equipment high-availability method based on industrial Internet
CN114584489A (en) * 2022-03-08 2022-06-03 浪潮云信息技术股份公司 Ssh channel-based remote environment information and configuration detection method and system

Also Published As

Publication number Publication date
CN108847982B (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN108847982A (en) A kind of distributed storage cluster and its node failure switching method and apparatus
US9703608B2 (en) Variable configurations for workload distribution across multiple sites
EP1451687B1 (en) Real composite objects for providing high availability of resources on networked systems
CN105406980B (en) A kind of multinode backup method and device
CN109474465A (en) A kind of method and system of the high availability that can dynamically circulate based on server cluster
JPH10214199A (en) Process restarting method, and system for realizing process restart
CN110417600B (en) Node switching method and device of distributed system and computer storage medium
CN105335256B (en) Switch the methods, devices and systems of backup disk in whole machine cabinet server
CN102308559A (en) Voting arbitration method and apparatus for cluster computer system
CN106789306A (en) Restoration methods and system are collected in communication equipment software fault detect
CN111988347B (en) Data processing method of board hopping machine system and board hopping machine system
CN107453888B (en) High-availability virtual machine cluster management method and device
CN109088830B (en) Port state synchronization method and device
CN105025179A (en) Method and system for monitoring service agents of call center
CN114301763B (en) Distributed cluster fault processing method and system, electronic equipment and storage medium
US8438261B2 (en) Failover scheme with service-based segregation
CN115549751A (en) Remote sensing satellite ground station monitoring system and method
CN111324513B (en) Monitoring management method and system for artificial intelligence development platform
CN110830281B (en) Hot standby method and system based on mesh network structure
CN107547257B (en) Server cluster implementation method and device
CN111817892A (en) Network management method, system, electronic equipment and storage medium
CN103684829B (en) Network service system and management method thereof
CN108616397B (en) It disposes and determines method and device
JPH11232194A (en) Supervisory system in static connection type network
CN115952237B (en) Multi-terminal data fusion system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant