CN1512375A - Fault-tolerance approach using machine group node interacting buckup - Google Patents

Fault-tolerance approach using machine group node interacting buckup Download PDF

Info

Publication number
CN1512375A
CN1512375A CNA021594791A CN02159479A CN1512375A CN 1512375 A CN1512375 A CN 1512375A CN A021594791 A CNA021594791 A CN A021594791A CN 02159479 A CN02159479 A CN 02159479A CN 1512375 A CN1512375 A CN 1512375A
Authority
CN
China
Prior art keywords
node
service
fault
backup
planes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA021594791A
Other languages
Chinese (zh)
Other versions
CN1317658C (en
Inventor
李电森
冯锐
姜晓东
肖利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CNB021594791A priority Critical patent/CN1317658C/en
Publication of CN1512375A publication Critical patent/CN1512375A/en
Application granted granted Critical
Publication of CN1317658C publication Critical patent/CN1317658C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The fault-tolerance approach with interacting computer group node backup includes the interconnection via heartbeat ring, mutual communication and interacting backup among nodes; for the main node to designate newly joined node with position in the computer group and return the service information; for the newly joined node to start service depending process and set corresponding service IP; for the main node to select other node for starting service in case of failed starting; for the nodes to confirm adjacent abnormal node; and for the main node to take over the service of the faulty node. By means of the interacting backup, the present invention has raised availability of the computer group system, makes best use of the node resource in the computer group and lowers the cost of redundant equipment.

Description

The fault-tolerance approach of utilizing cluster nodes to back up mutually
Technical field:
The present invention relates to a kind of redundancy backup, in order to the method for the high availability that improves Network of Workstation, especially a kind of fault-tolerance approach of utilizing cluster nodes to back up mutually belongs to computer networking technology.
Background technology:
Along with development of internet technology, the demand of a commerce services group of planes constantly enlarges.The characteristics of a commerce services group of planes are: the individual task amount is less, and concurrent task total amount is very big; Load has undulatory property, sudden etc.; Between different tasks, has certain independence.The main target of a commerce services group of planes is to improve system-through-up capability, promptly finishes task as much as possible in the regular hour.In addition, the commercial application has very high requirement to availability, and crucial commerce application generally all requires 7 * 24 hours running without interruption continuously; Of short duration shutdown may cause an immeasurable loss.Therefore, according to the requirement of using, a commerce services group of planes adopts corresponding high available backup mode.
Server logic in the high available group of planes mostly is operation machine and backup machine; Operation machine and backup machine can be parallel runnings, also can be side operations, side backup.Adopt many heartbeat lines to connect between the node; When finding that node is out of order, through after confirming that node confirms to be out of order, take to move the utilization on this node again, and do not influence user's use earlier.
In the diagnosis that guarantees abnormality between node between operation node and the backup node by redundant heartbeat line.The all deposit data of application program is in the magnetic disk media of sharing (being independent of node server).In order to keep the consistance of data, control sharing the visit of storage by locking mechanisms.
At present, in a large-scale commercial group of planes, all services can be divided into following several:
1, the service that is only provided by a node, this node is the single failure point.This service can be taken over by other all nodes, i.e. this service does not have special requirement to the operation node, and we claim such service to be single service.
2, the service that is provided by a plurality of nodes provides the node of service to move identical program, shares identical data.Can appoint for each node be identical copy.Usually adopt some load balancing strategies by a task distribution node, give the services request of these node dispatch user, such service is multiple service.
3,, need directly to connect other equipment for some services; For example memory node needs the terminal pad battle array as network file system(NFS) service (NFS Server).When the service of this class is broken down, can only take over service by the node that links to each other with the dish battle array.Therefore, this class service is service-specific.
When various services are broken down, corresponding adapter measure should be arranged all.At present, for guarantee various services with use high availability, general Network of Workstation provides Hot Spare mechanism; For example: multidirectional backup, N+1 backup, N+M backup and stacked backup, dual systems etc.
Referring to Fig. 1, it is for supporting the multidirectional recovery backup mode of four nodes; Be node N1 ,-set up ring-type between the N4 to connect backup mutually.Referring to Fig. 2, it does fault recovery for supporting the multi-channel running node by a backup node S1, other be ordinary node, i.e. N+1 backup mode; Backup machine also can be more than one, i.e. N+M backup; Referring to Fig. 3, it is stacked backup mode, and a plurality of node N1-N5 are the backups of an application, and when moving node (for example N1-N3) when breaking down, (for example N4) takes over this node by next node, all breaks down until Servers-all; Referring to Fig. 4, it is a dual systems, and promptly node N1 and node N2 back up mutually.
The disadvantage of above-mentioned backup mode is to need a large amount of redundance units; The cost height can not make full use of the resource of Network of Workstation.
Summary of the invention:
Fundamental purpose of the present invention is to provide a kind of fault-tolerance approach of utilizing cluster nodes to back up mutually, backs up the availability that improves Network of Workstation mutually by the node in the group of planes, makes full use of node resource, reduces the cost of redundancy backup.
The object of the present invention is achieved like this:
A kind of fault-tolerance approach of utilizing cluster nodes to back up mutually is characterized in that: connect, intercom mutually and back up by the heartbeat ring between cluster nodes, comprise at least: node adds the processing of a group of planes and node failure.
Described node comprises that at least (Services Applications) process is used in service, node is served monitor (lifeguard) process, node actuator (Executor) process, failover module (Standby) process and kernel control module (Commander) process.
The processing that node adds a group of planes specifically comprises following step:
Step 10: newly added node sends the request that adds a group of planes to host node;
Step 11: after host node is received request, distribute the position of newly added node in a group of planes, and return the information on services that newly added node is born;
Step 12: newly added node starts service depending process one by one, and corresponding service IP is set;
Step 13: the situation that the service of newly added node notice host node starts: if start failure, host node then selects other nodes to start this service;
Step 14: if service starts successfully, host node identifies this service and is starting state, and writes down the nodal information that service is provided, be convenient to when this node breaks down, in time with all services migratings of moving on this node to the backup node of this service.
The processing of node failure specifically comprises:
Step 20: find that when the survival information of node in the group of planes detects (Heartbeat) process adjacent node occurs when unusual, sends to the Executor of this adjacent node process and confirms bag;
Step 21: if this adjacent node receives the fault recognition bag, then the Lifeguard to self sends message, checks the ruuning situation and the resource operating position of all dependent process on this node; The affirmation bag that node is replied Heartbeat after recovering normally;
Step 22: if send confirm packet node Heartbeat can't with the adjacent node communication, perhaps do not receive replying of this adjacent node at the appointed time, then the Commander process to the backup host node sends message, the report fault;
Step 23: host node receives this message, and analyzes the type of this service, if this service is multiple service, then notifies preposition task distribution node to stop to send task to this malfunctioning node;
Step 24: if this fault is single service or service-specific, host node can supported to change in the node of service for this failed services according to the load information of each node, select the lightest node of load, and send service take-over order, taking over fault service to the Executor that gives node;
Step 25: selectedly take over node as failed services and receive adapter information, calling the Standby module recovers failed services on this node, and notify node service monitor to obtain the configuration of monitoring again, the service processes that increases newly on this node is monitored.
The processing of node failure also further comprises:
Whether step 26: host node is given an order, require inner all nodes of all heartbeat rings normal, is convenient to fast processing large tracts of land node failure.
The concrete operations that host node is taken over failed services are:
Step 241: host node is selected the service of corresponding backup node taking over fault according to the priority and the current real work load of this node of backup node;
Step 242: host node will be taken over order and send to selected Executor process of taking over node;
Step 243: the Executor that takes over node calls the Lifeguard module, starts all dependent process of this service, and corresponding service IP is set, and notifies the Lifeguard monitoring that the process that increases service on this node newly is monitored then.
Described service has at least: a group of planes provide the IP address of this service, service adopt protocol type, be used for checking this service whether normally the listen for user request port numbers of serving, service type, serve node, the alternative node of service and the position of service depending process and start and stop script thereof of default operation.
Service adopts protocol type to comprise TCP and UDP two classes at least.Described COS comprises at least: single service, multiple service and service-specific.Described default operation node for this service when all nodes of a group of planes normally move, the node of this service should be provided; When this node was in down state, this service had backup node; After default operation node was fixed a breakdown and added a group of planes, this service is promptly moved on the default operation node to be moved.Described alternative node is at least one.
In sum, the present invention backs up mutually by the node in the group of planes, has improved the availability of Network of Workstation, has made full use of the resource of all nodes in the group of planes, and has reduced the cost of redundance unit.
Description of drawings:
Fig. 1 is the system architecture synoptic diagram of multidirectional backup in the prior art;
Fig. 2 is the system architecture synoptic diagram of N+1 backup in the prior art;
Fig. 3 is the system architecture synoptic diagram of stacked backup in the prior art;
Fig. 4 is the structural representation of dual systems in the prior art;
Fig. 5 is system architecture signal of the present invention and troubleshooting process figure;
Fig. 6 is the structural representation of one embodiment of the invention.
Embodiment:
Below by specific embodiment and accompanying drawing technical scheme of the present invention is elaborated:
In the present invention, service of definition needs following attribute at least:
1. the IP address of service access, a group of planes provides the IP address of this service;
2. protocol type is adopted in service, mainly is divided into TCP and UDP two classes;
3. serve the port numbers of being intercepted, can be used to check the whether normally request of listen for user of this service;
4. Fu Wu type: i.e. this service belongs to single service, multiple service or service-specific;
5. serve the node of default operation: when this service normally moves at all nodes of a group of planes, the node of this service should be provided, this node is in down state, this service has other backup nodes to provide, but after default operation node was fixed a breakdown and added a group of planes, service will be moved on the default operation node to be moved.
6. Fu Wu alternative node: the alternative node of service can have a plurality of, some service even can be by the every other node in the group of planes as alternative node, serve as the service of work of a management coordination group of planes and group of planes failover as host node, all installations the node of this program this service can be provided and become host node.Therefore the alternative node of service can be considered to have installed and serves the node that required program possesses provides this service ability.When the node that service is provided breaks down, host node will select a lighter node of load to take over the work of malfunctioning node according to the loading condition of the backup node of all these services.
7. service depending process: node provides service depending process, the position of the start and stop script of these processes, when node is selected when service is provided, all processes that this this service of querying node is relied on, and move this process by the startup script and the program of this process.
In the present invention, each node comprises service application (Services Applications) process, lifeguard (node service monitor) process, Executor (node actuator) process, Standby (failover module) process and Commander (kernel control module) process.Be connected by Ethernet between node and the node, (commander) forms the heartbeat ring in logic with node by kernel control module, and each node is by intercoming mutually by heartbeat detection process (heartbeat).When starting a node, the concrete step of carrying out is:
Step 10: this node sends the request that adds Network of Workstation to the kernel control module (commander) of host node;
Step 11: after host node is received request, distribute the position of this node in group of planes heartbeat ring, and return the information on services that this node be should bear; Tell which service this node should bear;
Step 12: this node starts the service depending process that host node distributes one by one, and corresponding service IP is set;
Step 13: the notice host node should be served situation about starting: success or failure, if failure, host node can select other node to start this service.
Step 14: if service starts successfully, host node will identify this service and be starting state, and write down the nodal information that service is provided, and be convenient to when this node breaks down, in time with all services migratings of moving on this node to other available nodes, promptly should backup node of service on.
Referring to Fig. 5, when finding adjacent node, the Heartbeat of node occurs when unusual, and its processing procedure is:
Step 20:Heartbeat sends to the Executor of adjacent node process and confirms bag, confirms whether adjacent node breaks down;
Step 21: if adjacent node receives the fault recognition bag, send message, check the ruuning situation and the resource operating position of all dependent process on the node, the affirmation bag that node is replied Heartbeat after recovering normally to Lifeguard;
Step 22: if Heartbeat can't with the adjacent node communication, perhaps do not receive replying of adjacent node at the appointed time, just think that this adjacent node breaks down, then this node sends message to the Commander process of backup host node, to host node report fault;
Step 23: host node is analyzed the type of this service, if this service is multiple service, does not need to take over this service, only need notify preposition task distribution node, stops to send task to this node;
Step 24: if fault is single service or service-specific, then need failed services is taken over, select the service of backup node taking over fault by host node earlier, this backup node can operation service program and replace malfunctioning node that service is provided, because backup node can have a plurality of, then to all node definition priority, at first select the highest backup node of priority as taking over node, if the priority of optional node is identical, then select the lightest node of operating load as taking over node according to the current real work load information of each node.Behind the selected adapter of the host node node, taking over the Executor process that sends on the selected adapter node of ordering.Selected take over node and receive adapter information as failed services after, call the Standby module and start this service, for each service, be set by the user the process that this service will normally move required startup, we become the dependent process of service, simultaneously, the startup of certain process may need to start earlier other processes, these processes can be considered this process dependent process again, when starting service, the Standby module adopts the method for recurrence, all process initiations of this service are got up, corresponding service IP is set, failed services is recovered on this node, and notified node service monitor (Lifeguard) to obtain the configuration of monitoring again, the service processes that increases newly on this node is monitored.
In a group of planes, by host node the node of all operations is constituted a plurality of heartbeat rings, this structure is convenient to expansion, and first node in each heartbeat ring promptly becomes group leader's node of this heartbeat ring.After failed services was recovered, host node required to check whether the node of heartbeat ring inside is normal separately, and this helps the situation of fast processing large tracts of land node failure to all heartbeat ring group leader nodes.
Referring to Fig. 6, in a group of planes, I/O (Input/Output, abbreviation I/O) node directly links to each other with the dish battle array, and therefore, I/O node IO1, IO2, IO3 back up mutually, as network file system(NFS) (Network File System is called for short NFS) server (Server); Other all service nodes are by I/O node IO1, IO2, IO3 accessing storage device-dish battle array.Present embodiment has comprised file transmission/transmission protocol (File Transfer Protocol is called for short ftp) service, and it is provided by 3 nodes; Web service provides name server (Domain Name Server is called for short DNS) service by node WebServer1, WebServer2, WebServer3, WebServer4, WebServer5; Mail service, it is provided by 2 node M ail Server1, Mail Server2; Database service is provided by 2 database server nodes.Request dispatcher 1 (Dispatcher1) and Request dispatcher 2 (Dispatcher2) are the previous task distribution nodes of load balancing.Previous task distribution node and file transmission/transmission protocol service operation are on same node, if the node at file transmission/transmission protocol service place breaks down, to take over the work of node Dispatcher1 by other nodes, the request of dispatch user notifies the task distribution node to stop to file transmission/transmission protocol delivery of services task simultaneously.
Dns server DNS Server among Fig. 6 when this node breaks down, takes over DNS service by other nodes that DNS program has been installed only by a node operation.
I/O node IO1 is the default operation node of network file system server NFS Server, have only I/O node IO2, IO3 and database server to can be used as the backup node of network file system server NFSServer service, because these nodes are connected with the dish battle array, if I/O node IO1 moves as network file system server NFS Server, after I/O node IO1 broke down, host node can select node to take over the work of I/O node IO1 in these backup nodes.In the present embodiment, all nodes all can be installed the required program of host node, and therefore, the arbitrary node in the group of planes can become host node.
In this system, Web service, ftp service and mail service are all provided simultaneously by a plurality of nodes, therefore has favorable expansibility, the backup node of I/O node also can be used as the backup node of other services except taking over the service of network file system server NFS Server, can make full use of the resource of a group of planes like this, special-purpose redundancy backup equipment need be set just can realize the multidirectional backup of system, improve the availability of a group of planes.
It should be noted that at last: above embodiment is only unrestricted in order to explanation the present invention, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can make amendment or be equal to replacement the present invention, and not breaking away from the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (11)

1, a kind of fault-tolerance approach of utilizing cluster nodes to back up mutually is characterized in that: connect, intercom mutually and back up by the heartbeat ring between the cluster nodes, comprise at least: node adds the processing of a group of planes and node failure.
2, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 1 is characterized in that: described node comprises that at least (Services Applications) process is used in service, node is served monitor (lifeguard) process, node actuator (Executor) process, failover module (Standby) process and kernel control module (Commander) process.
3, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 1 is characterized in that: the processing that node adds a group of planes specifically comprises following step:
Step 10: newly added node sends the request that adds a group of planes to host node;
Step 11: after host node is received request, distribute the position of newly added node in a group of planes, and return the information on services that newly added node is born;
Step 12: newly added node starts service depending process one by one, and corresponding service IP is set;
Step 13: the situation that the service of newly added node notice host node starts: if start failure, host node then selects other nodes to start this service;
Step 14: if service starts successfully, host node identifies this service and is starting state, and writes down the nodal information that service is provided, be convenient to when this node breaks down, in time with all services migratings of moving on this node to the backup node of this service.
4, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 1, it is characterized in that: the processing of node failure specifically comprises:
Step 20: find that when the survival information of node in the group of planes detects (Heartbeat) process adjacent node occurs when unusual, sends to the Executor of this adjacent node process and confirms bag;
Step 21: if this adjacent node receives the fault recognition bag, then the Lifeguard to self sends message, checks the ruuning situation and the resource operating position of all dependent process on this node; The affirmation bag that node is replied Heartbeat after recovering normally;
Step 22: if send confirm packet node Heartbeat can't with the adjacent node communication, perhaps do not receive replying of this adjacent node at the appointed time, then the Commander process to the backup host node sends message, the report fault;
Step 23: host node receives this message, and analyzes the type of this service, if this service is multiple service, then notifies preposition task distribution node to stop to send task to this malfunctioning node;
Step 24: if this fault is single service or service-specific, host node can supported to change in the node of service for this failed services according to the load information of each node, select the lightest node of load, and send service take-over order, taking over fault service to the Executor that gives node;
Step 25: selectedly take over node as failed services and receive adapter information, calling the Standby module recovers failed services on this node, and notify node service monitor to obtain the configuration of monitoring again, the service processes that increases newly on this node is monitored.
5, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 4, it is characterized in that: the processing of node failure also further comprises:
Whether step 26: host node is given an order, require inner all nodes of all heartbeat rings normal, is convenient to fast processing large tracts of land node failure.
6, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 4 is characterized in that: the concrete operations that host node is taken over failed services are:
Step 241: host node is selected the service of corresponding backup node taking over fault according to the priority and the current real work load of this node of backup node;
Step 242: host node will be taken over order and send to selected Executor process of taking over node;
Step 243: the Executor that takes over node calls the Lifeguard module, starts all dependent process of this service, and corresponding service IP is set, and notifies the Lifeguard monitoring that the process that increases service on this node newly is monitored then.
7, according to claim 1 or 3 or 4 or the 5 or 6 described fault-tolerance approaches of utilizing cluster nodes to back up mutually, it is characterized in that: described service has at least: a group of planes provide the IP address of this service, service adopt protocol type, be used for checking this service whether normally the listen for user request port numbers of serving, service type, serve node, the alternative node of service and the position of service depending process and start and stop script thereof of default operation.
8, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 7 is characterized in that: service adopts protocol type to comprise TCP and UDP two classes at least.
9, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 7, it is characterized in that: described COS comprises at least: single service, multiple service and service-specific.
10, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 7 is characterized in that: described default operation node for this service when all nodes of a group of planes normally move, the node of this service should be provided; When this node was in down state, this service had backup node; After default operation node was fixed a breakdown and added a group of planes, this service is promptly moved on the default operation node to be moved.
11, the fault-tolerance approach of utilizing cluster nodes to back up mutually according to claim 7 is characterized in that: described alternative node is at least one.
CNB021594791A 2002-12-31 2002-12-31 Fault-tolerance approach using machine group node interacting buckup Expired - Fee Related CN1317658C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB021594791A CN1317658C (en) 2002-12-31 2002-12-31 Fault-tolerance approach using machine group node interacting buckup

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB021594791A CN1317658C (en) 2002-12-31 2002-12-31 Fault-tolerance approach using machine group node interacting buckup

Publications (2)

Publication Number Publication Date
CN1512375A true CN1512375A (en) 2004-07-14
CN1317658C CN1317658C (en) 2007-05-23

Family

ID=34237491

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021594791A Expired - Fee Related CN1317658C (en) 2002-12-31 2002-12-31 Fault-tolerance approach using machine group node interacting buckup

Country Status (1)

Country Link
CN (1) CN1317658C (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101212519B (en) * 2006-12-29 2010-09-15 耐通信息科技(上海)有限公司 Method for judging whether a server node is an isolated node in a redundant IP-PBX system
CN102346698A (en) * 2010-07-30 2012-02-08 阿里巴巴集团控股有限公司 Time program management method, server and system
CN103259687A (en) * 2013-06-04 2013-08-21 沈阳空管技术开发有限公司 Air traffic control data access platform in civil aviation
CN103312541A (en) * 2013-05-28 2013-09-18 浪潮电子信息产业股份有限公司 Management method of high-availability mutual backup cluster
CN103793296A (en) * 2014-01-07 2014-05-14 浪潮电子信息产业股份有限公司 Method for assisting in backing-up and copying computer system in cluster
CN103838635A (en) * 2012-11-23 2014-06-04 中国银联股份有限公司 Host computer health degree detecting method
CN104038366A (en) * 2014-05-05 2014-09-10 深圳市中博科创信息技术有限公司 Cluster node failure detection method and system
CN104185841A (en) * 2013-03-12 2014-12-03 株式会社东芝 Database system, program, and data processing method
CN104199747A (en) * 2014-08-15 2014-12-10 中国船舶重工集团公司第七0九研究所 High-availability system obtaining method and system based on health management
CN104503325A (en) * 2014-12-19 2015-04-08 北京国电软通江苏科技有限公司 Distributed intelligent main control unit
CN105141456A (en) * 2015-08-25 2015-12-09 山东超越数控电子有限公司 Method for monitoring high-availability cluster resource
CN105141445A (en) * 2015-07-24 2015-12-09 广州尚融网络科技有限公司 Method and device for realizing multiple backups of multiple flow groups in high-availability cluster system
CN105978734A (en) * 2016-06-30 2016-09-28 北京海鑫智圣技术有限公司 Identity verification system, and hot replacement method and system of the identity verification system
CN106330523A (en) * 2015-07-03 2017-01-11 中国移动通信集团广西有限公司 Cluster server disaster recovery system and method, and server node
CN107003920A (en) * 2014-09-22 2017-08-01 Netapp股份有限公司 System and method for disposing the more piece point failure in disaster recovery cluster
CN110012054A (en) * 2018-12-28 2019-07-12 阿里巴巴集团控股有限公司 A kind of method for processing business and system based on alliance's chain network
CN111031126A (en) * 2019-12-10 2020-04-17 江苏满运软件科技有限公司 Cluster cache sharing method, system, equipment and storage medium
CN112235136A (en) * 2020-10-10 2021-01-15 苏州浪潮智能科技有限公司 Network file system backup method, system, terminal and storage medium
CN113472566A (en) * 2021-06-11 2021-10-01 北京市大数据中心 Status monitoring method of union block chain and master node status monitoring system
CN114125585A (en) * 2021-11-17 2022-03-01 山东山大电力技术股份有限公司 Monitoring and analyzing system and method for charging process of electric automobile charging pile

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7010617B2 (en) * 2000-05-02 2006-03-07 Sun Microsystems, Inc. Cluster configuration repository
US6990606B2 (en) * 2000-07-28 2006-01-24 International Business Machines Corporation Cascading failover of a data management application for shared disk file systems in loosely coupled node clusters
EP1323040A4 (en) * 2000-09-08 2005-08-03 Goahead Software Inc A system and method for managing clusters containing multiple nodes
CN1308278A (en) * 2001-02-15 2001-08-15 华中科技大学 IP fault-tolerant method for colony server

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101212519B (en) * 2006-12-29 2010-09-15 耐通信息科技(上海)有限公司 Method for judging whether a server node is an isolated node in a redundant IP-PBX system
CN102346698A (en) * 2010-07-30 2012-02-08 阿里巴巴集团控股有限公司 Time program management method, server and system
CN102346698B (en) * 2010-07-30 2014-12-24 阿里巴巴集团控股有限公司 Time program management method, server and system
CN103838635B (en) * 2012-11-23 2016-09-28 中国银联股份有限公司 Main frame health degree detection method
CN103838635A (en) * 2012-11-23 2014-06-04 中国银联股份有限公司 Host computer health degree detecting method
CN104185841A (en) * 2013-03-12 2014-12-03 株式会社东芝 Database system, program, and data processing method
CN103312541A (en) * 2013-05-28 2013-09-18 浪潮电子信息产业股份有限公司 Management method of high-availability mutual backup cluster
CN103259687A (en) * 2013-06-04 2013-08-21 沈阳空管技术开发有限公司 Air traffic control data access platform in civil aviation
CN103793296A (en) * 2014-01-07 2014-05-14 浪潮电子信息产业股份有限公司 Method for assisting in backing-up and copying computer system in cluster
CN104038366A (en) * 2014-05-05 2014-09-10 深圳市中博科创信息技术有限公司 Cluster node failure detection method and system
CN104199747A (en) * 2014-08-15 2014-12-10 中国船舶重工集团公司第七0九研究所 High-availability system obtaining method and system based on health management
CN104199747B (en) * 2014-08-15 2017-05-03 中国船舶重工集团公司第七0九研究所 High-availability system obtaining method and system based on health management
CN107003920A (en) * 2014-09-22 2017-08-01 Netapp股份有限公司 System and method for disposing the more piece point failure in disaster recovery cluster
CN104503325A (en) * 2014-12-19 2015-04-08 北京国电软通江苏科技有限公司 Distributed intelligent main control unit
CN106330523A (en) * 2015-07-03 2017-01-11 中国移动通信集团广西有限公司 Cluster server disaster recovery system and method, and server node
CN105141445A (en) * 2015-07-24 2015-12-09 广州尚融网络科技有限公司 Method and device for realizing multiple backups of multiple flow groups in high-availability cluster system
CN105141456A (en) * 2015-08-25 2015-12-09 山东超越数控电子有限公司 Method for monitoring high-availability cluster resource
CN105978734A (en) * 2016-06-30 2016-09-28 北京海鑫智圣技术有限公司 Identity verification system, and hot replacement method and system of the identity verification system
CN110012054A (en) * 2018-12-28 2019-07-12 阿里巴巴集团控股有限公司 A kind of method for processing business and system based on alliance's chain network
US11102284B2 (en) 2018-12-28 2021-08-24 Advanced New Technologies Co., Ltd. Service processing methods and systems based on a consortium blockchain network
CN111031126A (en) * 2019-12-10 2020-04-17 江苏满运软件科技有限公司 Cluster cache sharing method, system, equipment and storage medium
CN111031126B (en) * 2019-12-10 2022-08-12 江苏满运软件科技有限公司 Cluster cache sharing method, system, equipment and storage medium
CN112235136A (en) * 2020-10-10 2021-01-15 苏州浪潮智能科技有限公司 Network file system backup method, system, terminal and storage medium
CN112235136B (en) * 2020-10-10 2022-12-23 苏州浪潮智能科技有限公司 Network file system backup method, system, terminal and storage medium
CN113472566A (en) * 2021-06-11 2021-10-01 北京市大数据中心 Status monitoring method of union block chain and master node status monitoring system
CN114125585A (en) * 2021-11-17 2022-03-01 山东山大电力技术股份有限公司 Monitoring and analyzing system and method for charging process of electric automobile charging pile

Also Published As

Publication number Publication date
CN1317658C (en) 2007-05-23

Similar Documents

Publication Publication Date Title
CN1317658C (en) Fault-tolerance approach using machine group node interacting buckup
US6760859B1 (en) Fault tolerant local area network connectivity
US6839752B1 (en) Group data sharing during membership change in clustered computer system
EP2643771B1 (en) Real time database system
EP2224341B1 (en) Node system, server switching method, server device, and data transfer method
US20050108593A1 (en) Cluster failover from physical node to virtual node
CN1892612A (en) Cluster availability management method and system
CN102404390A (en) Intelligent dynamic load balancing method for high-speed real-time database
KR20050009665A (en) Automatic configuration of network for monitoring
JP2004062535A (en) Method of dealing with failure for multiprocessor system, multiprocessor system and node
KR20110044858A (en) Maintain data indetermination in data servers across data centers
US7246261B2 (en) Join protocol for a primary-backup group with backup resources in clustered computer system
CN112003716A (en) Data center dual-activity implementation method
CN102325196A (en) Distributed cluster storage system
CN101079896A (en) A multi-availability mechanism coexistence framework of concurrent storage system
US20050044226A1 (en) Method and apparatus for validating and ranking resources for geographic mirroring
JP5078347B2 (en) Method for failing over (repairing) a failed node of a computer system having a plurality of nodes
CN1259786C (en) Method for detecting problem in route protocol module of distribution router
CN110971662A (en) Two-node high-availability implementation method and device based on Ceph
CN1908911A (en) Automatic startup of a cluster system after occurrence of a recoverable error
CN106294031A (en) A kind of business management method and storage control
US20220100619A1 (en) Computer cluster using expiring recovery rules
JP3621634B2 (en) Redundant configuration switching system
CN115694748A (en) Redundancy framework design method based on real-time data synchronization of hierarchical system
CN114785849A (en) Application high-availability method realized based on multi-level node network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070523

Termination date: 20201231