CN108847982A - A kind of distributed storage cluster and its node failure switching method and apparatus - Google Patents
A kind of distributed storage cluster and its node failure switching method and apparatus Download PDFInfo
- Publication number
- CN108847982A CN108847982A CN201810668234.1A CN201810668234A CN108847982A CN 108847982 A CN108847982 A CN 108847982A CN 201810668234 A CN201810668234 A CN 201810668234A CN 108847982 A CN108847982 A CN 108847982A
- Authority
- CN
- China
- Prior art keywords
- node
- business
- distributed storage
- business information
- power
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1034—Reaction to server failures by a load balancer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Cardiology (AREA)
- General Health & Medical Sciences (AREA)
- Hardware Redundancy (AREA)
Abstract
The invention discloses a kind of distributed storage clustered node power-off switching method and its devices, and applied to the host node of distributed storage cluster, this method includes:Heartbeat detection mode according to CTDB detects the state of each node in cluster;After detecting power-off node, the business information of power-off node is obtained;Business information is sent in the normal node in distributed storage cluster with respective service function, each normal node for receiving business information carries out business drift and business recovery according to business information.The detection recovery process time for powering off node is shorten to second grade by the minute grade of script by the present invention, and quickening cluster recovery is normal and powers off the speed of the business recovery of node access, improves the reliability of cluster;The invention also discloses a kind of distributed storage cluster based on the above method.
Description
Technical field
The present invention relates to distributed type assemblies High Availabitity technical fields, disconnected more particularly to a kind of distributed storage clustered node
Cutting changes method and device thereof.The invention further relates to a kind of distributed storage clusters.
Background technique
Distributed storage cluster is the cluster being made of multiple memory node servers, and it is multiple to support that a data are stored in
On node, each node can get complete data by inter-node communication, when delay machine occurs in node according to configuration
Strategy can carry out the recovery of partial data, include monitoring module, storage pool module, metadata pipe in distributed storage cluster
Manage the service modules such as module.
Part of nodes is likely to occur power supply line loosening to distributed storage cluster in the process of running or power supply line is unplugged
Etc. failures envoy's point power-off, at this time if power-off node number cluster allow power off number of nodes within the scope of (i.e. clustered node is superfluous
Remainder), distributed storage cluster recovery is normal and continues the time that the normal access of offer business needs minute grade, the reason is that by
It is to confirm whether each node powers off by heartbeat detection by each service module, and service in current distributed storage cluster
The heartbeat detection precision of module is minute grade, i.e. 60s or more (shakes) because will lead to cluster lower than 60s, mesh
It is preceding to need to be confirmed whether by the time of 60s or more node power-off occur, and then carry out cluster recovery and power off node
Business recovery etc..
It can be seen that cluster can not be quickly detected disconnection fault in current node outage detection recovery process, into
And can not can not rapidly carry out cluster recovery and restore the business access on power-off node, cause business interruption time long,
Cluster poor reliability.
Therefore, how to provide a kind of high reliablity distributed storage clustered node power-off switching method and its device and
A kind of distributed storage cluster is the current problem to be solved of those skilled in the art.
Summary of the invention
The object of the present invention is to provide a kind of distributed storage clustered node power-off switching method and its devices, and power-off is saved
The detection recovery process time of point shorten to second grade by the minute grade of script, and it is normal and power off the industry of node to accelerate cluster recovery
Business restores the speed of access, improves the reliability of cluster;It is a further object of the present invention to provide a kind of dividing based on the above method
Cloth storage cluster.
In order to solve the above technical problems, the present invention provides a kind of distributed storage clustered nodes to power off switching method, answer
For the host node of the distributed storage cluster, the method includes:
Heartbeat detection mode according to CTDB lightweight Cluster Database detects the state of each node in cluster;
After having detected node power-off, the business information of power-off node is obtained;
The business information is sent in the normal node in the distributed storage cluster with respective service function,
Each normal node for receiving the business information carries out business drift and business recovery according to the business information.
Preferably, after having detected node power-off, before the business information for obtaining power-off node, further include:
Judge whether the power-off node is to obtain by heartbeat detection, if so, obtaining the industry of the power-off node
Business information.
Preferably, the business information includes virtual IP address.
Preferably, the business information further includes service buffer data.
Preferably, described that the business information is sent in the distributed storage cluster with respective service function
Process in normal node is specially:
Call the failover program in the distributed storage cluster;
Selection includes the normal node of each service function;
The business information is sent in selected node.
Preferably, the service function includes monitoring function, storage pool function and metadata management function.
Preferably, the process according to CTDB heartbeat detection mode detection node state is specially:
Each node in several heartbeat packets to the distributed storage cluster is issued in each heartbeat detection period;
Judge the response that whole nodes return whether is received in preset time, if it exists the node of non-returning response, then
The node of the non-returning response is power-off node.
In order to solve the above technical problems, the present invention also provides a kind of distributed storage clustered nodes to power off switching device,
Applied to the host node of the distributed storage cluster, described device includes:
State monitoring module detects the state of each node in cluster for the heartbeat detection mode according to CTDB;
Data obtaining module obtains the business information of power-off node after having detected node power-off;
Sending module has the function of respective service for the business information to be sent in the distributed storage cluster
Normal node in, for receive each normal node of the business information according to the business information carry out business drift and
Business recovery.
In order to solve the above technical problems, the present invention also provides a kind of distributed storage cluster, including multiple it is provided with
The node of CTDB function elects a node as host node in multiple nodes;The host node includes:
Memory, for storing computer program;
Processor realizes the distributed storage cluster section as described in any of the above item when for executing the computer program
The step of point off electricity switching method.
Preferably, the node in addition to the host node, is specifically used for:
It is parallel to carry out own service recovery operation and carry out business drift operation according to the business information.
The present invention provides a kind of distributed storage clustered node power-off switching method and its devices, utilize the heartbeat of CTDB
Detection whether there is power-off node to detect in cluster, if detecting power-off node later, obtain the business letter of power-off node
Breath, and be sent in the normal node with respective service (or can be regarded as that there is respective service module), for receiving
The normal node of these business information carries out business drift, restores the business access on power-off node.It is understood that due to
The heartbeat detection time precision of CTDB is second grade, i.e., usually several seconds, therefore, the heartbeat detection of CTDB can be quickly detected
It is powered off with the presence or absence of node, and the business information for powering off node is sent in normal node, so that normal node can
Data are carried out in time to restore that the detection recovery process time of node will be powered off by the minute grade of script with business drift, the i.e. present invention
It is shorten to second grade, accelerate cluster recovery normally and powers off the speed that the business recovery of node accesses, to shorten as far as possible
The terminal time of business, improve the reliability of cluster.The present invention also provides a kind of distributions based on the above method to deposit
Accumulation, it may have above-mentioned advantage.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the flow chart for the process that a kind of distributed storage clustered node provided by the invention powers off switching method;
Fig. 2 is the structural schematic diagram that a kind of distributed storage clustered node provided by the invention powers off switching device.
Specific embodiment
Core of the invention is to provide a kind of distributed storage clustered node power-off switching method and its device, will power-off section
The detection recovery process time of point shorten to second grade by the minute grade of script, and it is normal and power off the industry of node to accelerate cluster recovery
Business restores the speed of access, improves the reliability of cluster;Another core of the invention is to provide a kind of point based on the above method
Cloth storage cluster.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
The present invention provides a kind of distributed storage clustered nodes to power off switching method, applied to distributed storage cluster
Host node, shown in Figure 1, Fig. 1 is the process that a kind of distributed storage clustered node provided by the invention powers off switching method
Flow chart;This method includes:
Step s1:Heartbeat detection mode according to CTDB detects the state of each node in cluster;
It is understood that CTDB (clustered trivial database, lightweight Cluster Database) is a kind of
Cluster High Availabitity management software, for monitoring clustered node state and traffic assignments.Distributed storage collection with CTDB function
Group, it will usually CTDB software is respectively mounted in each clustered node, so that each node can carry out heartbeat inspection according to CTDB
It surveys, the testing result between different nodes will do it interaction.All node meetings in distributed storage cluster with CTDB function
A host node is elected, at this point, failure recovery operation (distribution of such as virtual IP address) is only carried out by host node.
In current distributed storage cluster, CTDB has the function of heartbeat detection, and the time precision of detection is also second grade,
But the testing result of CTDB is not applied on the cluster and business recovery after node power-off, cluster and power-off in the prior art
Whether the recovery operation of node traffic carries out, and is that the testing result of the service module in foundation node carries out, time precision grade
Not Wei minute grade, the time is long, low efficiency, i.e., the heartbeat detection of CTDB and power-off node business recovery the prior art be two sets
Irrelevant process.And in the present invention, the heartbeat detection of CTDB and the business recovery of power-off node are linked together, foundation
The heartbeat detection result of CTDB carries out the recovery operation of subsequent cluster and power-off node traffic to control, and cluster and power-off are saved
The time of the recovery operation of point business shortens for second grade, improves the efficiency and reliability of cluster recovery.Step s2:It detects
After powering off node, the business information of power-off node is obtained;
After having node power-off, the business on the power-off node is certain to be interrupted, in order to restore the business as early as possible just
Frequentation is asked, is run on (switching in other words) to other the normal nodes that need to drift about the business, it is thus necessary to determine that power-off section
The information of the business run on point facilitates the drift for selecting suitable node and carrying out business.Due to having powered off node at this time
Through powering off, therefore, the business information for usually obtaining power-off node is from the acquisition on host node, this is because host node is responsible for
Therefore, on host node the relevant information of the business run in each node is preserved in the distribution of business.
Step s3:Business information is sent in the normal node in distributed storage cluster with respective service function,
Each normal node for receiving business information carries out business drift and business recovery according to business information.
It is understood that include multiple nodes in distributed storage cluster, it is interrelated between these nodes, jointly
The processing of finishing service, therefore, different nodes may have the function of different, i.e., may include identical clothes on different nodes
Module of being engaged in may also include different service module.When carrying out business drift, can be operated normally to guarantee that business is subsequent,
It needs to determine required respective services function when the service operation first, and then selects for business information to be distributed to these
The normal node of service function drifts to the business for powering off node in these normal nodes, these subsequent normal nodes are restored
After normal, it can be executed according to the access of the common finishing service of business information.
In addition, business recovery is it can be appreciated that be that node restores, due in distributed storage cluster between each node
Interrelated, therefore, after a node breaks down power-off, other nodes also will receive influence, and can not work normally,
At this time, it is desirable to enable the business of power-off node run in other nodes, not only need to drift to business on other nodes, it is also necessary to
The configuration for adjusting these nodes, makes it restore to normal operating conditions;If also, there are specific demands for business to be drifted about
Words, when carrying out recovery operation to these nodes, it is also necessary to adjust its configuration data, the business that drift can be supported to come
Operation.
Wherein, due to service operation need first finishing service drift and business recovery, business successful switch to its
The time of his node is subject to business drift and business recovery at the time of complete.Wherein, if business drift and business recovery are
If serial progress, then the service switching time is equal to the summation of the two time, if if the two is parallel progress, service switching
It is longer that time is subject to the time in the two, compares serial operation, and the parallel work-flow time is shorter, certainly, serial operation and parallel
It operates within the scope of the present invention.
By experiment it is found that by aforesaid operations, disconnection fault detection is usually controllably within 10 seconds, business drift and section
Point business recovery time-consuming is substantially respectively at 10 seconds or so, so that the time that service switching is performed integrally controls the phase within 30 seconds
Than the service switching of current minute grade, service recovery time is shorter, improves the reliability and stability of cluster.
Wherein, service function includes monitoring function, storage pool function and metadata management function.
It is understood that cluster is wanted to operate normally, storage and metadata management function are essential, and in order to
In time the problem of discovery service operation, it is also desirable to which the operating condition that monitoring function carrys out monitoring business is set.Certainly, it is distributed
Can also be comprising other service functions in formula storage cluster, this is not limited by the present invention.
In a preferred embodiment, after detecting power-off node, before the business information for obtaining power-off node, further include:
Judge to power off whether node is to obtain by heartbeat detection, if so, obtaining the business information of power-off node.
It is understood that although the present invention has detected whether node power-off using the heartbeat detection of CTDB, most
The power-off node obtained eventually may not be to be obtained by heartbeat detection, because CTDB is executing stop or restart order
When, it can also determine the node locating for itself to power off node, at this point, be clearly mistake, therefore in order to distinguish such case,
Before the business information for obtaining power-off node, need to differentiate whether the power-off node is to obtain by heartbeat detection first, because
To be only just powered off node by the malfunctioning node that heartbeat detection function detects, otherwise do not handle.Concrete implementation mode,
It is to add flag bit in the power-off node identification that heartbeat detection goes out, whether the node detected later by judgement includes mark
Position can be differentiated.Certainly, the above is only one of implementations, can also judge whether it is power-off section by other means
Point, this is not limited by the present invention.
In a specific embodiment, business information includes virtual IP address.
It is understood that virtual IP address is, CTDB main section one-to-one with business for distributed storage cluster
The distribution of the responsible virtual IP address of point.When some clustered node breaks down, in order to guarantee the normal access of business on the node, then
Distribution can be floated to other nodes in the virtual IP address of the node, the business of the node can also float accordingly with virtual IP address later
Other nodes are moved on to, to guarantee the high availability of cluster.
Certainly, for most of distributed storage clusters, the drift of business only needs virtual IP address, but partial picture
Under, business drift may also realize that this is not limited by the present invention according to other parameters.
In addition, business information further includes service buffer data.For under partial picture, business continues to access possible needs
Data before may be unable to complete business only in accordance with virtual IP address at this time and continue to access, therefore, needed in business information include
Service buffer data.Certainly, business information may also contain the host number etc. of power-off node, this hair of the particular content of business information
It is bright to be not construed as limiting.
In a specific embodiment, business information is sent in distributed storage cluster with respective service function
Process in normal node is specially:
Call the failover program in distributed storage cluster;
Selection includes the normal node of each service function;
Business information is sent in selected node.
It is understood that being usually provided with a failure in each node in current distributed storage cluster
Changeover program, if after thering is a node to call itself the program, due between node each in cluster there are data interaction,
After the program in other nodes can also be run, and then carry out service switching operation, therefore host node obtains business information, directly
The failover program of itself is called, the program in other nodes can be also activated, and then be ready for service switching, the program
Has the function of node selection, after host node calls the program, program itself can select suitable node, and then by host node
Business information is sent in the node of selection, since the failover program in the node of selection has been actuated while, as long as therefore
Node can start to carry out service switching (business drift and business recovery etc.) after receiving business information.
It certainly, can be by host node certainly if in cluster and if being not provided with failover program with the above function
The facilities of each node in row analysis cluster, and then select corresponding node.Which kind of specifically select to receive industry using mode
The node for information of being engaged in, the present invention are not construed as limiting.
Wherein, it is specially according to the process of CTDB heartbeat detection mode detection node state:
Each node in several heartbeat packets to distributed storage cluster is issued in each heartbeat detection period;
Judge the response that whole nodes return whether is received in preset time, if it exists the node of non-returning response, then
The node of non-returning response is power-off node.
Here preset time is generally corresponding with the heartbeat detection period, but due to the transmitting-receiving of signal needs the time, therefore the heart
The detection cycle preferred slightly longer and heartbeat detection period is jumped, certainly, the present invention does not limit being specifically defined for preset time.
For example, it is assumed that each heartbeat detection period is 4 seconds (the time interval present invention of two heart beat cycles is not construed as limiting),
Heartbeat packet of every 2 seconds hairs is sent out 2 times altogether, if Correspondent Node does not receive the heartbeat packet in 4 seconds and is considered as node event
Barrier;Or the heartbeat detection period may be 8 seconds, i.e., heartbeat packet of 2 seconds hairs, sends out 4 times, is in order to avoid heartbeat is too small in this way
Failure is caused to judge by accident.Certainly, the length of detection cycle and the transmission frequency of heartbeat packet are jumped in the unlimited centering of the present invention.
The present invention provides a kind of distributed storage clustered nodes to power off switching method, is examined using the heartbeat detection of CTDB
It surveys with the presence or absence of power-off node in cluster, if detecting power-off node later, obtains the business information of power-off node, and send
Extremely have the function of in the normal node of respective service (or can be regarded as that there is respective service module), for receiving these business letter
The normal node of breath carries out business drift, restores the business access on power-off node.It is understood that due to the heartbeat of CTDB
Detection time precision is second grade, i.e., usually several seconds, therefore, the heartbeat detection of CTDB can be quickly detected with the presence or absence of section
Point power-off, and the business information for powering off node is sent in normal node, so that normal node can be counted in time
According to restoring to drift about with business, i.e., the detection recovery process time for powering off node is shorten to the second by the minute grade of script by the present invention
Grade accelerates cluster recovery normally and powers off the speed that the business recovery of node accesses, to shorten business as far as possible
Terminal time improves the reliability of cluster.
The present invention also provides a kind of distributed storage clustered nodes to power off switching device, is applied to distributed storage cluster
Host node, shown in Figure 2, Fig. 2 is the knot that a kind of distributed storage clustered node provided by the invention powers off switching device
Structure schematic diagram.
The device includes:
State monitoring module 1 detects the state of each node in cluster for the heartbeat detection mode according to CTDB;
Data obtaining module 2 obtains the business information of power-off node after detecting power-off node;
Sending module 3 is normal with respective service function in distributed storage cluster for business information to be sent to
In node, each normal node for receiving business information carries out business drift and business recovery according to business information.
The present invention also provides a kind of distributed storage clusters, including multiple nodes for being provided with CTDB function, Duo Gejie
A node is elected in point as host node;Host node includes:
Memory, for storing computer program;
Processor realizes that the distributed storage clustered node power-off such as any of the above item is cut when for executing computer program
The step of changing method.
In a preferred embodiment, the node in addition to host node, is specifically used for:
It is parallel to carry out own service recovery operation and carry out business drift operation according to business information.
It is understood that since both business recovery operation and business drift operation are not interfered, compared to serial
Operation, parallel work-flow can shorten the time of service switching as far as possible.
Above several specific embodiments are only the preferred embodiment of the present invention, and above several specific embodiments can be with
Any combination, the embodiment obtained after combination is also within protection scope of the present invention.It should be pointed out that for the art
For those of ordinary skill, relevant speciality technical staff deduced out in the case where not departing from spirit of that invention and concept thereof other change
Into and variation, should all be included in the protection scope of the present invention.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of distributed storage clustered node powers off switching method, which is characterized in that be applied to the distributed storage cluster
Host node, the method includes:
Heartbeat detection mode according to CTDB lightweight Cluster Database detects the state of each node in cluster;
After having detected node power-off, the business information of power-off node is obtained;
The business information is sent in the normal node in the distributed storage cluster with respective service function, for connecing
The each normal node for receiving the business information carries out business drift and business recovery according to the business information.
2. node failure switching method according to claim 1, which is characterized in that after having detected node power-off, obtain
Before the business information for powering off node, further include:
Judge whether the power-off node is to obtain by heartbeat detection, if so, obtaining the business letter of the power-off node
Breath.
3. node failure switching method according to claim 2, which is characterized in that the business information includes virtual IP address.
4. node failure switching method according to claim 3, which is characterized in that the business information further includes that business is slow
Deposit data.
5. node failure switching method according to claim 1, which is characterized in that described to be sent to the business information
In the distributed storage cluster with respective service function normal node in process be specially:
Call the failover program in the distributed storage cluster;
Selection includes the normal node of each service function;
The business information is sent in selected node.
6. node failure switching method according to claim 1, which is characterized in that the service function includes monitoring function
Energy, storage pool function and metadata management function.
7. node failure switching method according to claim 2, which is characterized in that described according to CTDB heartbeat detection mode
The process of detection node state is specially:
Each node in several heartbeat packets to the distributed storage cluster is issued in each heartbeat detection period;
Judge the response that whole nodes return whether is received in preset time, if it exists the node of non-returning response, then it is described
The node of non-returning response is power-off node.
8. a kind of distributed storage clustered node powers off switching device, which is characterized in that be applied to the distributed storage cluster
Host node, described device includes:
State monitoring module detects the state of each node in cluster for the heartbeat detection mode according to CTDB;
Data obtaining module obtains the business information of power-off node after having detected node power-off;
Sending module has respective service function just for the business information to be sent in the distributed storage cluster
In Chang Jiedian, each normal node for receiving the business information carries out business drift and business according to the business information
Restore.
9. a kind of distributed storage cluster, which is characterized in that including multiple nodes for being provided with CTDB function, multiple nodes
In elect a node as host node;The host node includes:
Memory, for storing computer program;
Processor realizes distributed storage collection as described in any one of claim 1 to 7 when for executing the computer program
Group node powers off the step of switching method.
10. distributed storage cluster according to claim 9, which is characterized in that the section in addition to the host node
Point, is specifically used for:
It is parallel to carry out own service recovery operation and carry out business drift operation according to the business information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810668234.1A CN108847982B (en) | 2018-06-26 | 2018-06-26 | Distributed storage cluster and node fault switching method and device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810668234.1A CN108847982B (en) | 2018-06-26 | 2018-06-26 | Distributed storage cluster and node fault switching method and device thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108847982A true CN108847982A (en) | 2018-11-20 |
CN108847982B CN108847982B (en) | 2021-11-19 |
Family
ID=64203566
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810668234.1A Active CN108847982B (en) | 2018-06-26 | 2018-06-26 | Distributed storage cluster and node fault switching method and device thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108847982B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783264A (en) * | 2018-12-29 | 2019-05-21 | 南京富士通南大软件技术有限公司 | A kind of High Availabitity solution of database |
CN110286732A (en) * | 2019-06-27 | 2019-09-27 | 无锡华云数据技术服务有限公司 | High-availability cluster power down automatic recovery method, device, equipment and storage medium |
CN110611603A (en) * | 2019-09-09 | 2019-12-24 | 苏州浪潮智能科技有限公司 | Cluster network card monitoring method and device |
CN110740064A (en) * | 2019-10-25 | 2020-01-31 | 北京浪潮数据技术有限公司 | Distributed cluster node fault processing method, device, equipment and storage medium |
CN110855504A (en) * | 2019-11-22 | 2020-02-28 | 苏州浪潮智能科技有限公司 | Method, system and related device for recovering faults of cloud platform management nodes |
CN111212127A (en) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | Storage cluster, service data maintenance method, device and storage medium |
CN111756573A (en) * | 2020-05-28 | 2020-10-09 | 浪潮电子信息产业股份有限公司 | CTDB double-network-card fault monitoring method in distributed cluster and related equipment |
CN111865632A (en) * | 2019-04-28 | 2020-10-30 | 阿里巴巴集团控股有限公司 | Switching method of distributed data storage cluster and switching instruction sending method and device |
CN112035326A (en) * | 2020-09-03 | 2020-12-04 | 中国银行股份有限公司 | Abnormal node task processing method and device based on cluster node mutual detection |
CN112711632A (en) * | 2019-12-27 | 2021-04-27 | 山东鲁能软件技术有限公司 | Asynchronous data stream replication method and system for high-availability cluster |
CN112866408A (en) * | 2021-02-09 | 2021-05-28 | 山东英信计算机技术有限公司 | Service switching method, device, equipment and storage medium in cluster |
CN113162797A (en) * | 2021-03-03 | 2021-07-23 | 山东英信计算机技术有限公司 | Method, system and medium for switching master node fault of distributed cluster |
CN113794595A (en) * | 2021-09-15 | 2021-12-14 | 领云悠逸(北京)科技有限公司 | IoT (Internet of things) equipment high-availability method based on industrial Internet |
CN114584489A (en) * | 2022-03-08 | 2022-06-03 | 浪潮云信息技术股份公司 | Ssh channel-based remote environment information and configuration detection method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110314339A1 (en) * | 2010-06-22 | 2011-12-22 | International Business Machines Corporation | Systems for agile error determination and reporting and methods thereof |
CN102394791A (en) * | 2011-10-26 | 2012-03-28 | 浪潮(北京)电子信息产业有限公司 | Downtime recovery method and system |
CN103607297A (en) * | 2013-11-07 | 2014-02-26 | 上海爱数软件有限公司 | Fault processing method of computer cluster system |
CN106933693A (en) * | 2017-03-15 | 2017-07-07 | 郑州云海信息技术有限公司 | A kind of data-base cluster node failure self-repairing method and system |
CN108009045A (en) * | 2016-10-31 | 2018-05-08 | 杭州海康威视数字技术股份有限公司 | A kind of master/slave data storehouse fault handling method and device |
-
2018
- 2018-06-26 CN CN201810668234.1A patent/CN108847982B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110314339A1 (en) * | 2010-06-22 | 2011-12-22 | International Business Machines Corporation | Systems for agile error determination and reporting and methods thereof |
CN102394791A (en) * | 2011-10-26 | 2012-03-28 | 浪潮(北京)电子信息产业有限公司 | Downtime recovery method and system |
CN103607297A (en) * | 2013-11-07 | 2014-02-26 | 上海爱数软件有限公司 | Fault processing method of computer cluster system |
CN108009045A (en) * | 2016-10-31 | 2018-05-08 | 杭州海康威视数字技术股份有限公司 | A kind of master/slave data storehouse fault handling method and device |
CN106933693A (en) * | 2017-03-15 | 2017-07-07 | 郑州云海信息技术有限公司 | A kind of data-base cluster node failure self-repairing method and system |
Non-Patent Citations (1)
Title |
---|
刘爱贵: "基于开源软件构建高性能集群NAS系统", 《CSDN:HTTPS://BLOG.CSDN.NET/LIUAIGUI/ARTICLE/DETAILS/7163482》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783264A (en) * | 2018-12-29 | 2019-05-21 | 南京富士通南大软件技术有限公司 | A kind of High Availabitity solution of database |
CN111865632A (en) * | 2019-04-28 | 2020-10-30 | 阿里巴巴集团控股有限公司 | Switching method of distributed data storage cluster and switching instruction sending method and device |
CN110286732A (en) * | 2019-06-27 | 2019-09-27 | 无锡华云数据技术服务有限公司 | High-availability cluster power down automatic recovery method, device, equipment and storage medium |
CN110286732B (en) * | 2019-06-27 | 2021-01-12 | 华云数据控股集团有限公司 | Method, device and equipment for automatically recovering power failure of high-availability cluster and storage medium |
CN110611603A (en) * | 2019-09-09 | 2019-12-24 | 苏州浪潮智能科技有限公司 | Cluster network card monitoring method and device |
CN110740064A (en) * | 2019-10-25 | 2020-01-31 | 北京浪潮数据技术有限公司 | Distributed cluster node fault processing method, device, equipment and storage medium |
CN110855504A (en) * | 2019-11-22 | 2020-02-28 | 苏州浪潮智能科技有限公司 | Method, system and related device for recovering faults of cloud platform management nodes |
CN112711632A (en) * | 2019-12-27 | 2021-04-27 | 山东鲁能软件技术有限公司 | Asynchronous data stream replication method and system for high-availability cluster |
CN111212127A (en) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | Storage cluster, service data maintenance method, device and storage medium |
CN111756573A (en) * | 2020-05-28 | 2020-10-09 | 浪潮电子信息产业股份有限公司 | CTDB double-network-card fault monitoring method in distributed cluster and related equipment |
CN112035326A (en) * | 2020-09-03 | 2020-12-04 | 中国银行股份有限公司 | Abnormal node task processing method and device based on cluster node mutual detection |
CN112866408A (en) * | 2021-02-09 | 2021-05-28 | 山东英信计算机技术有限公司 | Service switching method, device, equipment and storage medium in cluster |
CN113162797A (en) * | 2021-03-03 | 2021-07-23 | 山东英信计算机技术有限公司 | Method, system and medium for switching master node fault of distributed cluster |
CN113794595A (en) * | 2021-09-15 | 2021-12-14 | 领云悠逸(北京)科技有限公司 | IoT (Internet of things) equipment high-availability method based on industrial Internet |
CN114584489A (en) * | 2022-03-08 | 2022-06-03 | 浪潮云信息技术股份公司 | Ssh channel-based remote environment information and configuration detection method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108847982B (en) | 2021-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108847982A (en) | A kind of distributed storage cluster and its node failure switching method and apparatus | |
US9703608B2 (en) | Variable configurations for workload distribution across multiple sites | |
EP1451687B1 (en) | Real composite objects for providing high availability of resources on networked systems | |
CN105406980B (en) | A kind of multinode backup method and device | |
CN109474465A (en) | A kind of method and system of the high availability that can dynamically circulate based on server cluster | |
JPH10214199A (en) | Process restarting method, and system for realizing process restart | |
CN110417600B (en) | Node switching method and device of distributed system and computer storage medium | |
CN105335256B (en) | Switch the methods, devices and systems of backup disk in whole machine cabinet server | |
CN102308559A (en) | Voting arbitration method and apparatus for cluster computer system | |
CN106789306A (en) | Restoration methods and system are collected in communication equipment software fault detect | |
CN111988347B (en) | Data processing method of board hopping machine system and board hopping machine system | |
CN107453888B (en) | High-availability virtual machine cluster management method and device | |
CN109088830B (en) | Port state synchronization method and device | |
CN105025179A (en) | Method and system for monitoring service agents of call center | |
CN114301763B (en) | Distributed cluster fault processing method and system, electronic equipment and storage medium | |
US8438261B2 (en) | Failover scheme with service-based segregation | |
CN115549751A (en) | Remote sensing satellite ground station monitoring system and method | |
CN111324513B (en) | Monitoring management method and system for artificial intelligence development platform | |
CN110830281B (en) | Hot standby method and system based on mesh network structure | |
CN107547257B (en) | Server cluster implementation method and device | |
CN111817892A (en) | Network management method, system, electronic equipment and storage medium | |
CN103684829B (en) | Network service system and management method thereof | |
CN108616397B (en) | It disposes and determines method and device | |
JPH11232194A (en) | Supervisory system in static connection type network | |
CN115952237B (en) | Multi-terminal data fusion system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |