CN103647668A - Host group decision system in high availability cluster and switching method for host group decision system - Google Patents

Host group decision system in high availability cluster and switching method for host group decision system Download PDF

Info

Publication number
CN103647668A
CN103647668A CN201310689137.8A CN201310689137A CN103647668A CN 103647668 A CN103647668 A CN 103647668A CN 201310689137 A CN201310689137 A CN 201310689137A CN 103647668 A CN103647668 A CN 103647668A
Authority
CN
China
Prior art keywords
main frame
host
cluster
decision
making
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310689137.8A
Other languages
Chinese (zh)
Inventor
郭鹏光
武剑锋
王泊
张佳岭
朱星垠
黄寅飞
白硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Stock Exchange
Original Assignee
Shanghai Stock Exchange
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Stock Exchange filed Critical Shanghai Stock Exchange
Priority to CN201310689137.8A priority Critical patent/CN103647668A/en
Publication of CN103647668A publication Critical patent/CN103647668A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to the data processing field and particularly relates to a host group decision system in a high availability cluster and a switching method for the host group decision system. The host group decision system is characterized in that the host group decision system is at a system background and comprises a plurality of transaction hosts, an integral structure comprises three levels of a sequencing-level server host, a communication-level server host and a processing-level server host, communication among the various hosts formed the cluster can be carried out through a network, reading-writing access to a file system is carried out through sharing storing equipment, real-time synchronization of data among the hosts is carried out to realize consistency, a high availability module used for converting host states is cooperatively accomplished through three modules which are a probe module, a decision module and a route module. Compared with the prior art, the host group decision system improves integral performance of a server and is advantaged in that physical isolation switching of a fault point is carried out through a two dimensional table cross collective vote decision mode, so host switching and fault isolation are realized, and a problem of cluster brain cracking is effectively solved.

Description

Main frame group decision-making system and changing method in a kind of high availability cluster
[technical field]
The present invention relates to data processing field, specifically main frame group decision-making system and changing method in a kind of high availability cluster.
[background technology]
Stock exchange provides public transaction platform for securities market, the plateform system that wherein security core transaction system is bidded in real time and brought together as security product, being also referred to as bids brings platform together, belong to key service system, therefore its performance safety is reliably directly connected to the prosperity and stability of domestic financial market, bids to bring that platform need to guarantee to stablize, high availability together.
The statistics of doing according to Gartner; causing the outer reason of shutting down of system planning is mainly application problem (40%), operational issue (40%), operating system failure (10%) and hardware fault (10%); and these problems can solve substantially in an operational system; only have problem seldom to need business to be switched to cold standby system; for the failure condition of main frame, system need to have hot standby, just can possess the ability of quick switching; the continuity of assurance business, continues externally to provide service.
Security, from the popular standby hot standby pattern of a master one of industry, upgrade to one main two standby even how standby hot standby pattern, thereby can tackle the Single Point of Faliure of host hardware at present.Along with the expansion of main frame scale in cluster, urgent problem is Host Status monitoring and switching, and in Host Status monitoring cluster decision-making and handoff procedure, fissure is the problem often running into.In the method for existing solution fissure, the most famous with paxos and Fast Paxos, corresponding project has ZooKeeper etc., but the more complicated of paxos algorithm own, it is larger that program realizes difficulty, and cannot effectively solve the problem of livelock, and when running into fissure scene, in cluster, need vote by ballot to produce a LEADER main frame, by LEADER main frame, carry out decision-making, but the process of election LEADER is complicated equally, and it is seemingly-dead when abnormal to be faced with LEADER main frame itself, will in decision process, form longer time delay.
[summary of the invention]
The object of the invention is to solve the problem of collective's fissure in the monitoring cluster decision-making of Host Status in prior art and handoff procedure, and paxos algorithm more complicated, it is larger that program realizes difficulty, cannot effectively solve the problem of livelock, designed and a kind ofly can improve server overall performance, be applicable to the detecting fault of multi-host hot swap requirement in high availability cluster computer system, diagnosis decision-making, Fault Isolation and switching, recover and expansion, by the mode of collective vote decision-making, malfunctioning node is carried out to physical isolation switching, capture access and the disposal right of shared resource, thereby realizing main frame switches and Fault Isolation, support the interior main frame group decision-making system of high availability cluster and the changing method of the parallel running of many transaction main frames.
To achieve these goals, invent main frame group decision-making system in a kind of high availability cluster, described group decision-making system is positioned at system backstage, by some transaction main frames, formed, overall architecture is divided into three layers: sequencing stratum server main frame, communication layers server host and processing layer server host, between each main frame of formation cluster, can communicate by network, by shared storage device, file system is carried out to read and write access, data between main frame are carried out real-time synchronization, keep consistency, in cluster, each node maintenance is about all member node information of cluster, newly add the node of cluster can inform All hosts self information and dynamically update routing table, be responsible for the high available modules of the conversion of Host Status, by three module cooperative, completed, comprise: probe module, decision-making module and routing module, probe module is periodically accessed application program and the resource idle condition of this main frame, the health status of judgement the machine, probe module is detecting to such an extent that the machine health status is passed through TCP/IP network, in cluster, broadcast, decision-making module is received the healthy heartbeat message that probe module sends, by active and passive mode, whether judgement sends the main frame of healthy heartbeat message in normal condition, if decision-making module is in the judgement of epicycle, not finding has abnormal host in system, continue next round judgement, if find that there is abnormal host, notify routing module, revise the state of abnormal host, and revise the routing rule of transaction data, accomplish isolation and the switching of fault.
Described sequencing stratum server main frame is responsible for load balancing and declaration form sequencing, communication layers server host receives the order from sequencing stratum server main frame, according to static state and dynamic routing table, selected backstage trading processing main frame, and order is forwarded to backstage trading processing stratum server main frame, backstage trading processing stratum server main frame receives the order from communication host, changes and bring together processing.
Described transaction main frame carries out subregion according to product set, the same time, the product that belongs to identical product set only can be processed on a certain applied host machine in cluster, to a certain product set, to having primary transaction main frame and standby transaction main frame, main computer is responsible for the securities trading of this product set and is processed, guest machine does not carry out the securities trading of this product set to be processed, guest machine keeps the data consistency with main computer by reproduction technology, when main computer lost efficacy, guest machine will dynamically be adjusted into the main computer of this product set.
Described routing table is in charge of main computer, the guest machine information of each product set, these information are called as order routing iinformation, main computer can only be determined one, standby function is fixed a plurality of, between guest machine, order is taken in definition, be called successively the first standby host, the second standby host, by that analogy, static routing table refers to main computer, the guest machine information of pre-defined each product set hereof before transaction system starts, dynamic routing table refers in real time according to Host Status, the main computer of each product set of acquisition, guest machine information.
A changing method for main frame group decision-making system in high availability cluster, described changing method is as follows:
A. probe module is initiatively detected the machine health status, detecting process, shared drive, message queue break down, at certain main frame, there is the abnormal of resource exhaustion or process hang-up class, can trigger probe module initiatively detects and function of reporting, probe module initiatively, to All hosts broadcast health and fitness information in system, initiatively requires this main frame to isolate; In system, every other main frame is received the healthy heartbeat report that probe module sends, in real time abnormal host is implemented to isolated operation, isolated operation is divided into two steps, the first step is the routing table that in system, every other main frame is updated to abnormal host, and new transaction data is routed to the first standby host of abnormal host; Second step is to abnormal host implementation physical isolation, isolates abnormal host on network, and starts to carry out to the malfunction elimination of abnormal host and recovery;
If b. there is the main frame machine of delaying, network abnormal interruption, the healthy heartbeat report of the machine that probe module generates cannot be sent to other main frames in cluster, now trigger cluster in-group decision-making mechanism, carry out the decision-making without major state, each healthy main frame is independently initiated ballot, final decision-making combines the report of all healthy main frames, the decision error of having avoided Single Point of Faliure to cause, also guaranteed the foundation of decision-making in system simultaneously, the unification of result, even group decision-making mistake, also can guarantee the health operation of whole system and the consistency of data, the machine if generation main frame is delayed, every other main frame detects the continuous several times of fault main frame and does not send to healthy heartbeat message on time and register in system, All hosts sends to mutually other main frame the diagnostic result of abnormal host, also be that the interior All hosts of system is all at bivariate table of local real-time servicing, store the judgement to malfunction on each main frame, the diagnostic message that in system, All hosts gathers according to this locality is carried out decision-making, revises the routing table of local storage, and abnormal host is isolated, after isolation, abnormal host is carried out to malfunction elimination and recovery,
If c. main frame recovers from fault, add cluster implementation data to load, under the prerequisite of processing at the regular traffic that does not interrupt current main computer, fault main frame completes data with normal heat standby host and synchronizes, can in ten seconds, complete automatic recovery and switching, first probe module detects after the machine trouble shooting, probe module can send broadcast in cluster, application rejoins cluster, every other main frame in system, at continuous several times, receive after the recovery normal request of fault, it is normal that failure judgement is recovered really; In system, every other main frame, to fault host implementation recovery operation, is recalculated to the dynamic routing table of fault main frame.
Above-mentioned changing method also comprises: when the state of cluster member main frame changes, dynamic routing table will recalculate, when host fails, in system, each main frame all can carry out informix and processing, all recalculate dynamic routing table, this fault main frame is originally as the product set of main computer, its first standby host becomes main computer, the second standby host goes forward one by one and becomes the first standby host, by that analogy, if standby host lost efficacy, only can adjust this standby host standby host order afterwards, standby host thereafter pushes away before going forward one by one, can not affect the active and standby machine order that priority is higher, when this has guaranteed main computer inefficacy, each product set is taken at first by preparing the most sufficient guest machine.
The present invention compared with the existing technology, has improved server overall performance, and its advantage is specifically:
1. by bivariate table, intersect the mode of collective vote decision-making, malfunctioning node is carried out to physical isolation switching, capture access and the disposal right of shared resource, thereby realizing main frame switches and Fault Isolation, support the parallel running of many transaction main frames, be more suitable for the cluster treatment system in layering, than Paxos scheduling algorithm, more succinctly, effectively solved the problem of cluster fissure;
2. be applicable to financial industry, meet the requirement of key business high availability, stability and extensibility, can be used for detecting fault, diagnosis decision-making, Fault Isolation and switching, recovery and the expansion of multi-host hot swap requirement in high availability cluster computer system;
3. in group system, dispose hot standby main frame, reduce redundancy, can make full use of Framework computing ability, cost is lower;
4. adopt detecting fault and the changing method without Master pattern, solved the risk that Mater fault and switching bring, make whole group system more simply clear, fault-tolerance is stronger.
[accompanying drawing explanation]
Fig. 1 is that in the present invention, high availability cluster is disposed schematic diagram;
Fig. 2 is the Host Status conversion schematic diagram of concluding the business in the present invention;
Fig. 3 is the high available modules deployment of the main frame schematic diagram of concluding the business in the present invention;
Fig. 4 is that in the present invention, fault is initiatively detected flow chart;
In Fig. 5 the present invention, Host Status recovers flow chart;
Specify Fig. 2 as Figure of abstract of the present invention.
[embodiment]
Below in conjunction with accompanying drawing, the invention will be further described, and the structure of this system and principle are very clearly concerning this professional people.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
In the high availability cluster of the present invention, main frame group decision-making system is divided into three layers, wherein group decision-making and changing method are only applicable to communication server layer and application-server layer, in this is two-layer, the status of all communication servers and application server is reciprocity, is also that the strategy of decision-making is identical.
The present invention is positioned at operation system backstage, and the transaction main frame of being processed by some responsible transaction business forms, and without host node main frame, overall architecture is divided into three layers: sequencing layer, communication layers and processing layer.Sequencing layer is responsible for load balancing and declaration form sequencing, communication host receives the order from sequencing layer, according to static state and dynamic routing table, selected backstage trading processing main frame, and order is forwarded to backstage trading processing main frame, backstage transaction main frame receives the order from communication host, change and bring together processing, in cluster, each node maintenance is about all member node information of cluster, newly add the node of cluster can inform All hosts self information and dynamically update routing table, between each main frame of formation cluster, can communicate by network, by shared storage device, file system is carried out to read and write access, data between main frame are carried out real-time synchronization, keep consistency.
Embodiment 1
Fig. 1 has shown a multimachine based on the available multi-computer back-up technology of height hot standby typical deployment way each other, in figure, there are 9 main frames to form a cluster, by sequencing server host, communication server main frame, application server host, between main frame, pass through TCP/IP network interconnection with swap data and control message, between main frame, pass through storage area network share and access disk.
Embodiment 2
As shown in Figure 2, be Host Status schematic diagram of the present invention, task can have various states, is responsible for the high available modules of the conversion of Host Status, by three module cooperative, is completed, and comprising: probe module, decision-making module and routing module.Probe module is periodically accessed application program and the resource idle condition of this main frame, and in high availability cluster, broadcasts this main frame health status; Decision-making module is received the healthy heartbeat message that probe module sends, by active and passive mode, and judgement main frame health status; Routing module, loads static routing table, and maintains the renewal of dynamic routing table.Probe module is detecting to such an extent that the machine health status is passed through TCP/IP network, in high availability cluster, broadcast, decision-making module is received the healthy heartbeat message that probe module sends, and by active and passive mode, whether judgement sends the main frame of healthy heartbeat message in normal condition.If decision-making module in the judgement of epicycle, does not find there is abnormal host in system, continue next round judgement, if find that there is abnormal host, notify routing module, revise the state of abnormal host, and revise the routing rule of transaction data, accomplish isolation and the switching of fault.
When the high available modules of main frame is started working, its Host Status of initialization is " state 1-starts initialization ", initialization completes, probe module starts timer, if the time of setting at timer triggers, start to check the machine health status, the state of main frame is changed into " state 2-state checking ", this Host Status inspection completes smoothly, changes task status for " N state-checked state normal ".If at probe module, find for a certain reason and abnormal conditions appear in the machine, revise local state for " abnormal state ", while decision-making module, regularly healthy heartbeat message is checked, in several times inspection, do not receive after the healthy heartbeat report of probe module transmission, can revise equally fault Host Status for " abnormal state ".When high available modules is found " Host Status is abnormal ", main frame is no longer processed arm's length dealing, starts to enter " state 0-fault recovery " state, manually or automatically recovers main frame; If main frame or application program are restarted, " state 1-starts initialization " gets the hang of; Otherwise " state 2 ... N-1 state checking-state is normal " gets the hang of.
" state 1-starts initialization " is initial condition; " state 0-fault recovery " is state of termination; Other is all intermediateness.
Embodiment 3
Fig. 3 has shown the main functional modules of transaction main frame.Decision-making module: receive the healthy heartbeat message of probe module, revise state and the routing rule of main frame according to main frame health condition; Probe module: check that the application program of the machine and resource are idle, the health status of judgement the machine, initiatively the mode with broadcast sends healthy heartbeat message; Routing module: load static routing information during startup, when cluster moves, the instruction according to decision-making module, dynamically updates routing iinformation, completes switching and the recovery of main frame.
Between the modules of same main frame, adopt the mode of Inter-Process Communication, pass-along message, between different main frames, adopts TCP/IP mode, carries out main inter-machine communication.
According to the particularity of security itself, the product that enters transaction host process is divided into different product set, for a product set, can there is a main frame as master processor, can have multiple host as standby processor, be called the first standby host, the second standby host, the 3rd standby host etc., for a plurality of product set of a main frame main processing simultaneously, also a plurality of product set of standby processing simultaneously.
In this method, comprise an entity content, static configuration routing table, wherein writes and understands the corresponding main frame of each product set, the first standby host, and the second standby host etc., the static configuration routing table in routing module is as shown in table 1:
Static configuration routing table in table 1 routing module
Product set Master processor The first standby host The second standby host The 3rd standby host
Product set 1 Main frame 1 Main frame 2 Main frame 3 Main frame 5
Product set 2 Main frame 2 Main frame 1 Main frame 4 Main frame 6
Product set 3 Main frame 3 Main frame 4 Main frame 1 Main frame 5
Product set 4 Main frame 1 Main frame 3 Main frame 2 Main frame 6
As above-mentioned static routing table configuration, any one product set has a master processor, with a plurality of backup machines, and main frame can be made the master processor of a plurality of product set, can make the standby processor of a plurality of product set, can only make master processor, also can only make standby processor, therefore relatively flexible for the location of host role.Such as product set 1, master processor is that main frame 1, the first standby host is that main frame 2, the second standby hosts are main frame 3, the three standby host main frames 5.
Embodiment 4
Fig. 4 has shown that fault initiatively detects flow process, and idiographic flow is:
1. on main frame 1, probe module is initiatively detected the machine health status, and whether the resources such as detecting process, shared drive, message queue occur normally, if find, the machine has in message queue blocks message count over threshold value;
2. on main frame 1, probe module initiatively sends the machine health and fitness information to All hosts in system, and report the machine has blocks message count in message queue and surpassed threshold value;
3. in system, every other main frame is implemented isolated operation to abnormal host 1;
4. in system, every other host modifications is to the routing table of main frame 1, and new transaction data is routed to the first standby host of main frame 1;
5. pair abnormal main frame 1 is implemented physical isolation;
6. main frame 1 starts to carry out malfunction elimination and recovery;
In cluster, on every main frame, probe module can safeguard that the application program of a this main frame and resource check list, and checks at clocked flip, and check result is recorded in healthy heartbeat message, in group system, broadcasts, as shown in table 2 below:
Table 2 main frame 1 health examination result
Main frame Process check Message queue checks Shared drive checks Storage and database auditing
Main frame 1 Normally Block and surpass threshold values Normally Normally
As seen from the above, owing to blocking message count in message queue, surpassed threshold value, this main frame 1 can not normal process transaction data, and in probe module real time notification system, the decision-making module of other main frames, informs that this main frame 1 is abnormal, active request isolation.
In cluster, on every main frame, decision-making module is received this initiatively exception reporting, and synchronization notice routing module main frame 1 is abnormal, finishes this system mode inspection.
Embodiment 5
Fig. 5 has shown the flow process that Host Status recovers, and idiographic flow is:
1. abnormal host 1 detects the machine trouble shooting, and health status is normal, synchronously upgrades transaction data, completes data synchronous;
2. every other main frame 1 in abnormal host 1 reporting system ... N need to readmit oneself, can normal process transaction data;
3. every other main frame 1 in system ... N, several times are received the recovery normal request of main frame 1 continuously, decision-making judgement main frame 1 recovers normal really;
4. in system, every other main frame is implemented recovery operation to abnormal host 1, is modified to the routing table of main frame 1, and new transaction data can be routed to main frame 1;
5. new transaction data can send to main frame 1.

Claims (6)

1. main frame group decision-making system in a high availability cluster, it is characterized in that described group decision-making system is positioned at system backstage, by some transaction main frames, formed, overall architecture is divided into three layers: sequencing stratum server main frame, communication layers server host and processing layer server host, between each main frame of formation cluster, can communicate by network, by shared storage device, file system is carried out to read and write access, data between main frame are carried out real-time synchronization, keep consistency, in cluster, each node maintenance is about all member node information of cluster, newly add the node of cluster can inform All hosts self information and dynamically update routing table, be responsible for the high available modules of the conversion of Host Status, by three module cooperative, completed, comprise: probe module, decision-making module and routing module, probe module is periodically accessed application program and the resource idle condition of this main frame, the health status of judgement the machine, probe module is detecting to such an extent that the machine health status is passed through TCP/IP network, in cluster, broadcast, decision-making module is received the healthy heartbeat message that probe module sends, by active and passive mode, whether judgement sends the main frame of healthy heartbeat message in normal condition, if decision-making module is in the judgement of epicycle, not finding has abnormal host in system, continue next round judgement, if find that there is abnormal host, notify routing module, revise the state of abnormal host, and revise the routing rule of transaction data, accomplish isolation and the switching of fault.
2. main frame group decision-making system in a kind of high availability cluster as claimed in claim 1, sequencing stratum server main frame described in it is characterized in that is responsible for load balancing and declaration form sequencing, communication layers server host receives the order from sequencing stratum server main frame, according to static state and dynamic routing table, selected backstage trading processing main frame, and order is forwarded to backstage trading processing stratum server main frame, backstage trading processing stratum server main frame receives the order from communication host, changes and bring together processing.
3. main frame group decision-making system in a kind of high availability cluster as claimed in claim 1, it is characterized in that described transaction main frame carries out subregion according to product set, the same time, the product that belongs to identical product set only can be processed on a certain applied host machine in cluster, to a certain product set, to having primary transaction main frame and standby transaction main frame, main computer is responsible for the securities trading of this product set and is processed, guest machine does not carry out the securities trading of this product set to be processed, guest machine keeps the data consistency with main computer by reproduction technology, when main computer lost efficacy, guest machine will dynamically be adjusted into the main computer of this product set.
4. main frame group decision-making system in a kind of high availability cluster as claimed in claim 1, routing table described in it is characterized in that is in charge of the main computer of each product set, guest machine information, these information are called as order routing iinformation, main computer can only be determined one, standby function is fixed a plurality of, between guest machine, order is taken in definition, be called successively the first standby host, the second standby host, by that analogy, static routing table refers to the main computer of pre-defined each product set hereof before transaction system starts, guest machine information, dynamic routing table refers in real time according to Host Status, the main computer of each product set obtaining, guest machine information.
5. a changing method for main frame group decision-making system in high availability cluster as claimed in claim 1, is characterized in that described changing method is as follows:
A. probe module is initiatively detected the machine health status, detecting process, shared drive, message queue break down, at certain main frame, there is the abnormal of resource exhaustion or process hang-up class, can trigger probe module initiatively detects and function of reporting, probe module initiatively, to All hosts broadcast health and fitness information in system, initiatively requires this main frame to isolate; In system, every other main frame is received the healthy heartbeat report that probe module sends, in real time abnormal host is implemented to isolated operation, isolated operation is divided into two steps, the first step is the routing table that in system, every other main frame is updated to abnormal host, and new transaction data is routed to the first standby host of abnormal host; Second step is to abnormal host implementation physical isolation, isolates abnormal host on network, and starts to carry out to the malfunction elimination of abnormal host and recovery;
If b. there is the main frame machine of delaying, network abnormal interruption, the healthy heartbeat report of the machine that probe module generates cannot be sent to other main frames in cluster, now trigger cluster in-group decision-making mechanism, carry out the decision-making without major state, each healthy main frame is independently initiated ballot, final decision-making combines the report of all healthy main frames, the decision error of having avoided Single Point of Faliure to cause, also guaranteed the foundation of decision-making in system simultaneously, the unification of result, even group decision-making mistake, also can guarantee the health operation of whole system and the consistency of data, the machine if generation main frame is delayed, every other main frame detects the continuous several times of fault main frame and does not send to healthy heartbeat message on time and register in system, All hosts sends to mutually other main frame the diagnostic result of abnormal host, also be that the interior All hosts of system is all at bivariate table of local real-time servicing, store the judgement to malfunction on each main frame, the diagnostic message that in system, All hosts gathers according to this locality is carried out decision-making, revises the routing table of local storage, and abnormal host is isolated, after isolation, abnormal host is carried out to malfunction elimination and recovery,
If c. main frame recovers from fault, add cluster implementation data to load, under the prerequisite of processing at the regular traffic that does not interrupt current main computer, fault main frame completes data with normal heat standby host and synchronizes, can in ten seconds, complete automatic recovery and switching, first probe module detects after the machine trouble shooting, probe module can send broadcast in cluster, application rejoins cluster, every other main frame in system, at continuous several times, receive after the recovery normal request of fault, it is normal that failure judgement is recovered really; In system, every other main frame, to fault host implementation recovery operation, is recalculated to the dynamic routing table of fault main frame.
6. the changing method of main frame group decision-making system in a kind of high availability cluster as claimed in claim 5, it is characterized in that described changing method also comprises: when the state of cluster member main frame changes, dynamic routing table will recalculate, when host fails, in system, each main frame all can carry out informix and processing, all recalculate dynamic routing table, this fault main frame is originally as the product set of main computer, its first standby host becomes main computer, the second standby host goes forward one by one and becomes the first standby host, by that analogy, if standby host lost efficacy, only can adjust this standby host standby host order afterwards, standby host thereafter pushes away before going forward one by one, can not affect the active and standby machine order that priority is higher, when this has guaranteed main computer inefficacy, each product set is taken at first by preparing the most sufficient guest machine.
CN201310689137.8A 2013-12-16 2013-12-16 Host group decision system in high availability cluster and switching method for host group decision system Pending CN103647668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310689137.8A CN103647668A (en) 2013-12-16 2013-12-16 Host group decision system in high availability cluster and switching method for host group decision system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310689137.8A CN103647668A (en) 2013-12-16 2013-12-16 Host group decision system in high availability cluster and switching method for host group decision system

Publications (1)

Publication Number Publication Date
CN103647668A true CN103647668A (en) 2014-03-19

Family

ID=50252829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310689137.8A Pending CN103647668A (en) 2013-12-16 2013-12-16 Host group decision system in high availability cluster and switching method for host group decision system

Country Status (1)

Country Link
CN (1) CN103647668A (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573117A (en) * 2015-02-05 2015-04-29 赛特斯信息科技股份有限公司 Method and system for realizing high availability of database server based on shared storage
CN104780613A (en) * 2015-04-23 2015-07-15 河北远东通信系统工程有限公司 Method for sharing and synchronizing resources between digital cluster base station and switching center
CN105391737A (en) * 2015-12-14 2016-03-09 福建六壬网安股份有限公司 Load balancing host group file synchronous processing system and processing method thereof
CN106453656A (en) * 2016-12-06 2017-02-22 东软集团股份有限公司 Cluster host selection method and device
CN106789193A (en) * 2016-12-06 2017-05-31 郑州云海信息技术有限公司 A kind of cluster ballot referee method and system
CN107533486A (en) * 2015-10-13 2018-01-02 甲骨文国际公司 For the high-efficiency network isolation in multi-tenant cluster environment and the system and method for load balance
CN107733684A (en) * 2017-08-31 2018-02-23 北京宇航系统工程研究所 A kind of multi-controller computing redundancy cluster based on Loongson processor
CN107807608A (en) * 2017-11-02 2018-03-16 腾讯科技(深圳)有限公司 Data processing method, data handling system and storage medium
CN107819605A (en) * 2016-09-14 2018-03-20 北京百度网讯科技有限公司 Method and apparatus for the switching server in server cluster
CN108712475A (en) * 2018-04-27 2018-10-26 深圳市元征科技股份有限公司 Message method, device, electronic equipment and computer readable storage medium
CN109274711A (en) * 2018-08-13 2019-01-25 中兴飞流信息科技有限公司 PC cluster method, apparatus and computer readable storage medium
CN109286529A (en) * 2018-10-31 2019-01-29 武汉烽火信息集成技术有限公司 A kind of method and system for restoring RabbitMQ network partition
CN110427429A (en) * 2019-08-06 2019-11-08 上海浦东发展银行股份有限公司信用卡中心 A kind of Session loads balance realizing method based on fabric-sdk-java
CN110519112A (en) * 2018-05-22 2019-11-29 山东数盾信息科技有限公司 A kind of method for realizing the continuous High Availabitity of dynamic in cluster storage system
CN110855737A (en) * 2019-09-24 2020-02-28 中国科学院软件研究所 Consistency level controllable self-adaptive data synchronization method and system
CN111008026A (en) * 2018-10-08 2020-04-14 阿里巴巴集团控股有限公司 Cluster management method, device and system
CN111026586A (en) * 2019-11-21 2020-04-17 通号城市轨道交通技术有限公司 Main/standby state switching method and device for cluster equipment
CN111277632A (en) * 2020-01-13 2020-06-12 中国建设银行股份有限公司 Method and device for managing applications in system cluster
TWI697224B (en) * 2018-10-29 2020-06-21 日商三菱電機股份有限公司 Communication system, communication device and computer program pruduct
CN111431805A (en) * 2020-03-27 2020-07-17 上海天好信息技术股份有限公司 Internet of things multi-channel signal multiplexing synchronization strategy method
CN111696245A (en) * 2020-06-30 2020-09-22 郭平波 Voting method based on P2P network
WO2020211362A1 (en) * 2019-04-16 2020-10-22 平安科技(深圳)有限公司 Method and apparatus for improving availability of trunking system, and computer device
CN111901256A (en) * 2020-08-07 2020-11-06 杭州熙菱信息技术有限公司 Cluster type switching system and method
CN112035262A (en) * 2020-09-22 2020-12-04 中国建设银行股份有限公司 Method and device for multi-host dynamic management adjustment
CN112181660A (en) * 2020-10-12 2021-01-05 北京计算机技术及应用研究所 High-availability method based on server cluster
CN112596991A (en) * 2020-12-27 2021-04-02 卡斯柯信号有限公司 Hot standby reverse cutting method based on machine health state
CN113055203A (en) * 2019-12-26 2021-06-29 中国移动通信集团重庆有限公司 Method and device for recovering abnormity of SDN control plane
CN113220509A (en) * 2021-05-19 2021-08-06 扬州万方电子技术有限责任公司 Double-combination alternating shift system and method
CN113489792A (en) * 2021-07-07 2021-10-08 上交所技术有限责任公司 Method for reducing network transmission times among data centers in cross-data center cluster consensus algorithm
CN114124668A (en) * 2021-11-03 2022-03-01 上证所信息网络有限公司 System and method for ensuring consistency of market quotation slices of multiple hosts
CN114189547A (en) * 2022-02-14 2022-03-15 北京安盟信息技术股份有限公司 SSL tunnel fast switching method under cluster
CN114844909A (en) * 2022-03-31 2022-08-02 顾松林 Consensus mechanism query system based on Internet

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269396B1 (en) * 1997-12-12 2001-07-31 Alcatel Usa Sourcing, L.P. Method and platform for interfacing between application programs performing telecommunications functions and an operating system
CN1741489A (en) * 2005-09-01 2006-03-01 西安交通大学 High usable self-healing Logic box fault detecting and tolerating method for constituting multi-machine system
CN102938705A (en) * 2012-09-25 2013-02-20 上海证券交易所 Method for managing and switching high availability multi-machine backup routing table
CN103384212A (en) * 2013-07-24 2013-11-06 佳都新太科技股份有限公司 Double-machine high availability scheme for communication application system and implementation thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269396B1 (en) * 1997-12-12 2001-07-31 Alcatel Usa Sourcing, L.P. Method and platform for interfacing between application programs performing telecommunications functions and an operating system
CN1741489A (en) * 2005-09-01 2006-03-01 西安交通大学 High usable self-healing Logic box fault detecting and tolerating method for constituting multi-machine system
CN102938705A (en) * 2012-09-25 2013-02-20 上海证券交易所 Method for managing and switching high availability multi-machine backup routing table
CN103384212A (en) * 2013-07-24 2013-11-06 佳都新太科技股份有限公司 Double-machine high availability scheme for communication application system and implementation thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵殿奎: "基于LVS负载调度器的双机热备份研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573117A (en) * 2015-02-05 2015-04-29 赛特斯信息科技股份有限公司 Method and system for realizing high availability of database server based on shared storage
CN104780613A (en) * 2015-04-23 2015-07-15 河北远东通信系统工程有限公司 Method for sharing and synchronizing resources between digital cluster base station and switching center
US11356370B2 (en) 2015-10-13 2022-06-07 Oracle International Corporation System and method for efficient network isolation and load balancing in a multi-tenant cluster environment
CN107533486B (en) * 2015-10-13 2020-11-10 甲骨文国际公司 System and method for efficient network isolation and load balancing in a multi-tenant cluster environment
US11677667B2 (en) 2015-10-13 2023-06-13 Oracle International Corporation System and method for efficient network isolation and load balancing in a multi-tenant cluster environment
CN107533486A (en) * 2015-10-13 2018-01-02 甲骨文国际公司 For the high-efficiency network isolation in multi-tenant cluster environment and the system and method for load balance
CN105391737A (en) * 2015-12-14 2016-03-09 福建六壬网安股份有限公司 Load balancing host group file synchronous processing system and processing method thereof
CN107819605A (en) * 2016-09-14 2018-03-20 北京百度网讯科技有限公司 Method and apparatus for the switching server in server cluster
CN106453656A (en) * 2016-12-06 2017-02-22 东软集团股份有限公司 Cluster host selection method and device
CN106789193A (en) * 2016-12-06 2017-05-31 郑州云海信息技术有限公司 A kind of cluster ballot referee method and system
CN106453656B (en) * 2016-12-06 2019-12-06 东软集团股份有限公司 Cluster host selection method and device
CN107733684B (en) * 2017-08-31 2021-02-09 北京宇航系统工程研究所 Multi-controller computing redundancy cluster based on Loongson processor
CN107733684A (en) * 2017-08-31 2018-02-23 北京宇航系统工程研究所 A kind of multi-controller computing redundancy cluster based on Loongson processor
CN107807608A (en) * 2017-11-02 2018-03-16 腾讯科技(深圳)有限公司 Data processing method, data handling system and storage medium
CN108712475A (en) * 2018-04-27 2018-10-26 深圳市元征科技股份有限公司 Message method, device, electronic equipment and computer readable storage medium
CN110519112A (en) * 2018-05-22 2019-11-29 山东数盾信息科技有限公司 A kind of method for realizing the continuous High Availabitity of dynamic in cluster storage system
CN109274711A (en) * 2018-08-13 2019-01-25 中兴飞流信息科技有限公司 PC cluster method, apparatus and computer readable storage medium
CN109274711B (en) * 2018-08-13 2021-05-25 中兴飞流信息科技有限公司 Cluster computing method and device and computer readable storage medium
CN111008026A (en) * 2018-10-08 2020-04-14 阿里巴巴集团控股有限公司 Cluster management method, device and system
CN111008026B (en) * 2018-10-08 2024-03-26 阿里巴巴集团控股有限公司 Cluster management method, device and system
TWI697224B (en) * 2018-10-29 2020-06-21 日商三菱電機股份有限公司 Communication system, communication device and computer program pruduct
CN109286529A (en) * 2018-10-31 2019-01-29 武汉烽火信息集成技术有限公司 A kind of method and system for restoring RabbitMQ network partition
CN109286529B (en) * 2018-10-31 2021-08-10 武汉烽火信息集成技术有限公司 Method and system for recovering RabbitMQ network partition
WO2020211362A1 (en) * 2019-04-16 2020-10-22 平安科技(深圳)有限公司 Method and apparatus for improving availability of trunking system, and computer device
CN110427429A (en) * 2019-08-06 2019-11-08 上海浦东发展银行股份有限公司信用卡中心 A kind of Session loads balance realizing method based on fabric-sdk-java
CN110427429B (en) * 2019-08-06 2023-03-14 上海浦东发展银行股份有限公司信用卡中心 Transaction load balancing implementation method based on fabric-sdk-java
CN110855737B (en) * 2019-09-24 2020-11-06 中国科学院软件研究所 Consistency level controllable self-adaptive data synchronization method and system
CN110855737A (en) * 2019-09-24 2020-02-28 中国科学院软件研究所 Consistency level controllable self-adaptive data synchronization method and system
CN111026586A (en) * 2019-11-21 2020-04-17 通号城市轨道交通技术有限公司 Main/standby state switching method and device for cluster equipment
CN111026586B (en) * 2019-11-21 2024-01-02 通号城市轨道交通技术有限公司 Main and standby state switching method and device of cluster equipment
CN113055203B (en) * 2019-12-26 2023-04-18 中国移动通信集团重庆有限公司 Method and device for recovering exception of SDN control plane
CN113055203A (en) * 2019-12-26 2021-06-29 中国移动通信集团重庆有限公司 Method and device for recovering abnormity of SDN control plane
CN111277632A (en) * 2020-01-13 2020-06-12 中国建设银行股份有限公司 Method and device for managing applications in system cluster
CN111431805A (en) * 2020-03-27 2020-07-17 上海天好信息技术股份有限公司 Internet of things multi-channel signal multiplexing synchronization strategy method
CN111431805B (en) * 2020-03-27 2021-01-12 上海天好信息技术股份有限公司 Internet of things multi-channel signal multiplexing synchronization strategy method
CN111696245A (en) * 2020-06-30 2020-09-22 郭平波 Voting method based on P2P network
CN111901256A (en) * 2020-08-07 2020-11-06 杭州熙菱信息技术有限公司 Cluster type switching system and method
CN111901256B (en) * 2020-08-07 2022-10-04 杭州熙菱信息技术有限公司 Cluster type switching system and method
CN112035262A (en) * 2020-09-22 2020-12-04 中国建设银行股份有限公司 Method and device for multi-host dynamic management adjustment
CN112181660A (en) * 2020-10-12 2021-01-05 北京计算机技术及应用研究所 High-availability method based on server cluster
CN112596991B (en) * 2020-12-27 2023-09-08 卡斯柯信号有限公司 Hot standby reverse cutting method based on machine health state
CN112596991A (en) * 2020-12-27 2021-04-02 卡斯柯信号有限公司 Hot standby reverse cutting method based on machine health state
CN113220509A (en) * 2021-05-19 2021-08-06 扬州万方电子技术有限责任公司 Double-combination alternating shift system and method
CN113220509B (en) * 2021-05-19 2024-03-05 扬州万方科技股份有限公司 Double-combination alternating shift system and method
CN113489792B (en) * 2021-07-07 2023-02-03 上交所技术有限责任公司 Method for reducing network transmission times among data centers in cross-data center cluster consensus algorithm
CN113489792A (en) * 2021-07-07 2021-10-08 上交所技术有限责任公司 Method for reducing network transmission times among data centers in cross-data center cluster consensus algorithm
CN114124668A (en) * 2021-11-03 2022-03-01 上证所信息网络有限公司 System and method for ensuring consistency of market quotation slices of multiple hosts
CN114189547B (en) * 2022-02-14 2022-05-03 北京安盟信息技术股份有限公司 SSL tunnel fast switching method under cluster
CN114189547A (en) * 2022-02-14 2022-03-15 北京安盟信息技术股份有限公司 SSL tunnel fast switching method under cluster
CN114844909A (en) * 2022-03-31 2022-08-02 顾松林 Consensus mechanism query system based on Internet

Similar Documents

Publication Publication Date Title
CN103647668A (en) Host group decision system in high availability cluster and switching method for host group decision system
US11360854B2 (en) Storage cluster configuration change method, storage cluster, and computer system
US10713135B2 (en) Data disaster recovery method, device and system
US9063787B2 (en) System and method for using cluster level quorum to prevent split brain scenario in a data grid cluster
CN102402395B (en) Quorum disk-based non-interrupted operation method for high availability system
JP5102901B2 (en) Method and system for maintaining data integrity between multiple data servers across a data center
CN110224871A (en) A kind of high availability method and device of Redis cluster
CN106341454A (en) Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method
WO2016058307A1 (en) Fault handling method and apparatus for resource
CN110807064B (en) Data recovery device in RAC distributed database cluster system
CN107918570B (en) Method for sharing arbitration logic disk by double-active system
JP2002517819A (en) Method and apparatus for managing redundant computer-based systems for fault-tolerant computing
JP2005209201A (en) Node management in high-availability cluster
CN110912991A (en) Super-fusion-based high-availability implementation method for double nodes
CN110990200B (en) Flow switching method and device based on multiple active data centers
US10331472B2 (en) Virtual machine service availability
CN108173959A (en) A kind of cluster storage system
CN103457775A (en) High-availability virtual machine pooling management system based on roles
CN102938705A (en) Method for managing and switching high availability multi-machine backup routing table
CN113127270B (en) Cloud computing-based 3-acquisition-2 secure computer platform
CN104573428B (en) A kind of method and system for improving server cluster resource availability
CN111800484B (en) Service anti-destruction replacing method for mobile edge information service system
CN114003350B (en) Data distribution method and system of super-fusion system
CN110348826A (en) Strange land disaster recovery method, system, equipment and readable storage medium storing program for executing mostly living
CN110377487A (en) A kind of method and device handling high-availability cluster fissure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140319