CN103647668A

CN103647668A - Host group decision system in high availability cluster and switching method for host group decision system

Info

Publication number: CN103647668A
Application number: CN201310689137.8A
Authority: CN
Inventors: 郭鹏光; 武剑锋; 王泊; 张佳岭; 朱星垠; 黄寅飞; 白硕
Original assignee: Shanghai Stock Exchange
Current assignee: Shanghai Stock Exchange
Priority date: 2013-12-16
Filing date: 2013-12-16
Publication date: 2014-03-19

Abstract

The invention relates to the data processing field and particularly relates to a host group decision system in a high availability cluster and a switching method for the host group decision system. The host group decision system is characterized in that the host group decision system is at a system background and comprises a plurality of transaction hosts, an integral structure comprises three levels of a sequencing-level server host, a communication-level server host and a processing-level server host, communication among the various hosts formed the cluster can be carried out through a network, reading-writing access to a file system is carried out through sharing storing equipment, real-time synchronization of data among the hosts is carried out to realize consistency, a high availability module used for converting host states is cooperatively accomplished through three modules which are a probe module, a decision module and a route module. Compared with the prior art, the host group decision system improves integral performance of a server and is advantaged in that physical isolation switching of a fault point is carried out through a two dimensional table cross collective vote decision mode, so host switching and fault isolation are realized, and a problem of cluster brain cracking is effectively solved.

Description

Main frame group decision-making system and changing method in a kind of high availability cluster

[technical field]

The present invention relates to data processing field, specifically main frame group decision-making system and changing method in a kind of high availability cluster.

[background technology]

Stock exchange provides public transaction platform for securities market, the plateform system that wherein security core transaction system is bidded in real time and brought together as security product, being also referred to as bids brings platform together, belong to key service system, therefore its performance safety is reliably directly connected to the prosperity and stability of domestic financial market, bids to bring that platform need to guarantee to stablize, high availability together.

The statistics of doing according to Gartner; causing the outer reason of shutting down of system planning is mainly application problem (40%), operational issue (40%), operating system failure (10%) and hardware fault (10%); and these problems can solve substantially in an operational system; only have problem seldom to need business to be switched to cold standby system; for the failure condition of main frame, system need to have hot standby, just can possess the ability of quick switching; the continuity of assurance business, continues externally to provide service.

Security, from the popular standby hot standby pattern of a master one of industry, upgrade to one main two standby even how standby hot standby pattern, thereby can tackle the Single Point of Faliure of host hardware at present.Along with the expansion of main frame scale in cluster, urgent problem is Host Status monitoring and switching, and in Host Status monitoring cluster decision-making and handoff procedure, fissure is the problem often running into.In the method for existing solution fissure, the most famous with paxos and Fast Paxos, corresponding project has ZooKeeper etc., but the more complicated of paxos algorithm own, it is larger that program realizes difficulty, and cannot effectively solve the problem of livelock, and when running into fissure scene, in cluster, need vote by ballot to produce a LEADER main frame, by LEADER main frame, carry out decision-making, but the process of election LEADER is complicated equally, and it is seemingly-dead when abnormal to be faced with LEADER main frame itself, will in decision process, form longer time delay.

[summary of the invention]

The object of the invention is to solve the problem of collective's fissure in the monitoring cluster decision-making of Host Status in prior art and handoff procedure, and paxos algorithm more complicated, it is larger that program realizes difficulty, cannot effectively solve the problem of livelock, designed and a kind ofly can improve server overall performance, be applicable to the detecting fault of multi-host hot swap requirement in high availability cluster computer system, diagnosis decision-making, Fault Isolation and switching, recover and expansion, by the mode of collective vote decision-making, malfunctioning node is carried out to physical isolation switching, capture access and the disposal right of shared resource, thereby realizing main frame switches and Fault Isolation, support the interior main frame group decision-making system of high availability cluster and the changing method of the parallel running of many transaction main frames.

To achieve these goals, invent main frame group decision-making system in a kind of high availability cluster, described group decision-making system is positioned at system backstage, by some transaction main frames, formed, overall architecture is divided into three layers: sequencing stratum server main frame, communication layers server host and processing layer server host, between each main frame of formation cluster, can communicate by network, by shared storage device, file system is carried out to read and write access, data between main frame are carried out real-time synchronization, keep consistency, in cluster, each node maintenance is about all member node information of cluster, newly add the node of cluster can inform All hosts self information and dynamically update routing table, be responsible for the high available modules of the conversion of Host Status, by three module cooperative, completed, comprise: probe module, decision-making module and routing module, probe module is periodically accessed application program and the resource idle condition of this main frame, the health status of judgement the machine, probe module is detecting to such an extent that the machine health status is passed through TCP/IP network, in cluster, broadcast, decision-making module is received the healthy heartbeat message that probe module sends, by active and passive mode, whether judgement sends the main frame of healthy heartbeat message in normal condition, if decision-making module is in the judgement of epicycle, not finding has abnormal host in system, continue next round judgement, if find that there is abnormal host, notify routing module, revise the state of abnormal host, and revise the routing rule of transaction data, accomplish isolation and the switching of fault.

Described sequencing stratum server main frame is responsible for load balancing and declaration form sequencing, communication layers server host receives the order from sequencing stratum server main frame, according to static state and dynamic routing table, selected backstage trading processing main frame, and order is forwarded to backstage trading processing stratum server main frame, backstage trading processing stratum server main frame receives the order from communication host, changes and bring together processing.

Described transaction main frame carries out subregion according to product set, the same time, the product that belongs to identical product set only can be processed on a certain applied host machine in cluster, to a certain product set, to having primary transaction main frame and standby transaction main frame, main computer is responsible for the securities trading of this product set and is processed, guest machine does not carry out the securities trading of this product set to be processed, guest machine keeps the data consistency with main computer by reproduction technology, when main computer lost efficacy, guest machine will dynamically be adjusted into the main computer of this product set.

Described routing table is in charge of main computer, the guest machine information of each product set, these information are called as order routing iinformation, main computer can only be determined one, standby function is fixed a plurality of, between guest machine, order is taken in definition, be called successively the first standby host, the second standby host, by that analogy, static routing table refers to main computer, the guest machine information of pre-defined each product set hereof before transaction system starts, dynamic routing table refers in real time according to Host Status, the main computer of each product set of acquisition, guest machine information.

A changing method for main frame group decision-making system in high availability cluster, described changing method is as follows:

A. probe module is initiatively detected the machine health status, detecting process, shared drive, message queue break down, at certain main frame, there is the abnormal of resource exhaustion or process hang-up class, can trigger probe module initiatively detects and function of reporting, probe module initiatively, to All hosts broadcast health and fitness information in system, initiatively requires this main frame to isolate; In system, every other main frame is received the healthy heartbeat report that probe module sends, in real time abnormal host is implemented to isolated operation, isolated operation is divided into two steps, the first step is the routing table that in system, every other main frame is updated to abnormal host, and new transaction data is routed to the first standby host of abnormal host; Second step is to abnormal host implementation physical isolation, isolates abnormal host on network, and starts to carry out to the malfunction elimination of abnormal host and recovery;

If b. there is the main frame machine of delaying, network abnormal interruption, the healthy heartbeat report of the machine that probe module generates cannot be sent to other main frames in cluster, now trigger cluster in-group decision-making mechanism, carry out the decision-making without major state, each healthy main frame is independently initiated ballot, final decision-making combines the report of all healthy main frames, the decision error of having avoided Single Point of Faliure to cause, also guaranteed the foundation of decision-making in system simultaneously, the unification of result, even group decision-making mistake, also can guarantee the health operation of whole system and the consistency of data, the machine if generation main frame is delayed, every other main frame detects the continuous several times of fault main frame and does not send to healthy heartbeat message on time and register in system, All hosts sends to mutually other main frame the diagnostic result of abnormal host, also be that the interior All hosts of system is all at bivariate table of local real-time servicing, store the judgement to malfunction on each main frame, the diagnostic message that in system, All hosts gathers according to this locality is carried out decision-making, revises the routing table of local storage, and abnormal host is isolated, after isolation, abnormal host is carried out to malfunction elimination and recovery,

If c. main frame recovers from fault, add cluster implementation data to load, under the prerequisite of processing at the regular traffic that does not interrupt current main computer, fault main frame completes data with normal heat standby host and synchronizes, can in ten seconds, complete automatic recovery and switching, first probe module detects after the machine trouble shooting, probe module can send broadcast in cluster, application rejoins cluster, every other main frame in system, at continuous several times, receive after the recovery normal request of fault, it is normal that failure judgement is recovered really; In system, every other main frame, to fault host implementation recovery operation, is recalculated to the dynamic routing table of fault main frame.

Above-mentioned changing method also comprises: when the state of cluster member main frame changes, dynamic routing table will recalculate, when host fails, in system, each main frame all can carry out informix and processing, all recalculate dynamic routing table, this fault main frame is originally as the product set of main computer, its first standby host becomes main computer, the second standby host goes forward one by one and becomes the first standby host, by that analogy, if standby host lost efficacy, only can adjust this standby host standby host order afterwards, standby host thereafter pushes away before going forward one by one, can not affect the active and standby machine order that priority is higher, when this has guaranteed main computer inefficacy, each product set is taken at first by preparing the most sufficient guest machine.

The present invention compared with the existing technology, has improved server overall performance, and its advantage is specifically:

1. by bivariate table, intersect the mode of collective vote decision-making, malfunctioning node is carried out to physical isolation switching, capture access and the disposal right of shared resource, thereby realizing main frame switches and Fault Isolation, support the parallel running of many transaction main frames, be more suitable for the cluster treatment system in layering, than Paxos scheduling algorithm, more succinctly, effectively solved the problem of cluster fissure;

2. be applicable to financial industry, meet the requirement of key business high availability, stability and extensibility, can be used for detecting fault, diagnosis decision-making, Fault Isolation and switching, recovery and the expansion of multi-host hot swap requirement in high availability cluster computer system;

3. in group system, dispose hot standby main frame, reduce redundancy, can make full use of Framework computing ability, cost is lower;

4. adopt detecting fault and the changing method without Master pattern, solved the risk that Mater fault and switching bring, make whole group system more simply clear, fault-tolerance is stronger.

[accompanying drawing explanation]

Fig. 1 is that in the present invention, high availability cluster is disposed schematic diagram;

Fig. 2 is the Host Status conversion schematic diagram of concluding the business in the present invention;

Fig. 3 is the high available modules deployment of the main frame schematic diagram of concluding the business in the present invention;

Fig. 4 is that in the present invention, fault is initiatively detected flow chart;

In Fig. 5 the present invention, Host Status recovers flow chart;

Specify Fig. 2 as Figure of abstract of the present invention.

[embodiment]

Below in conjunction with accompanying drawing, the invention will be further described, and the structure of this system and principle are very clearly concerning this professional people.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

In the high availability cluster of the present invention, main frame group decision-making system is divided into three layers, wherein group decision-making and changing method are only applicable to communication server layer and application-server layer, in this is two-layer, the status of all communication servers and application server is reciprocity, is also that the strategy of decision-making is identical.

The present invention is positioned at operation system backstage, and the transaction main frame of being processed by some responsible transaction business forms, and without host node main frame, overall architecture is divided into three layers: sequencing layer, communication layers and processing layer.Sequencing layer is responsible for load balancing and declaration form sequencing, communication host receives the order from sequencing layer, according to static state and dynamic routing table, selected backstage trading processing main frame, and order is forwarded to backstage trading processing main frame, backstage transaction main frame receives the order from communication host, change and bring together processing, in cluster, each node maintenance is about all member node information of cluster, newly add the node of cluster can inform All hosts self information and dynamically update routing table, between each main frame of formation cluster, can communicate by network, by shared storage device, file system is carried out to read and write access, data between main frame are carried out real-time synchronization, keep consistency.

Embodiment 1

Fig. 1 has shown a multimachine based on the available multi-computer back-up technology of height hot standby typical deployment way each other, in figure, there are 9 main frames to form a cluster, by sequencing server host, communication server main frame, application server host, between main frame, pass through TCP/IP network interconnection with swap data and control message, between main frame, pass through storage area network share and access disk.

Embodiment 2

As shown in Figure 2, be Host Status schematic diagram of the present invention, task can have various states, is responsible for the high available modules of the conversion of Host Status, by three module cooperative, is completed, and comprising: probe module, decision-making module and routing module.Probe module is periodically accessed application program and the resource idle condition of this main frame, and in high availability cluster, broadcasts this main frame health status; Decision-making module is received the healthy heartbeat message that probe module sends, by active and passive mode, and judgement main frame health status; Routing module, loads static routing table, and maintains the renewal of dynamic routing table.Probe module is detecting to such an extent that the machine health status is passed through TCP/IP network, in high availability cluster, broadcast, decision-making module is received the healthy heartbeat message that probe module sends, and by active and passive mode, whether judgement sends the main frame of healthy heartbeat message in normal condition.If decision-making module in the judgement of epicycle, does not find there is abnormal host in system, continue next round judgement, if find that there is abnormal host, notify routing module, revise the state of abnormal host, and revise the routing rule of transaction data, accomplish isolation and the switching of fault.

When the high available modules of main frame is started working, its Host Status of initialization is " state 1-starts initialization ", initialization completes, probe module starts timer, if the time of setting at timer triggers, start to check the machine health status, the state of main frame is changed into " state 2-state checking ", this Host Status inspection completes smoothly, changes task status for " N state-checked state normal ".If at probe module, find for a certain reason and abnormal conditions appear in the machine, revise local state for " abnormal state ", while decision-making module, regularly healthy heartbeat message is checked, in several times inspection, do not receive after the healthy heartbeat report of probe module transmission, can revise equally fault Host Status for " abnormal state ".When high available modules is found " Host Status is abnormal ", main frame is no longer processed arm's length dealing, starts to enter " state 0-fault recovery " state, manually or automatically recovers main frame; If main frame or application program are restarted, " state 1-starts initialization " gets the hang of; Otherwise " state 2 ... N-1 state checking-state is normal " gets the hang of.

" state 1-starts initialization " is initial condition; " state 0-fault recovery " is state of termination; Other is all intermediateness.

Embodiment 3

Fig. 3 has shown the main functional modules of transaction main frame.Decision-making module: receive the healthy heartbeat message of probe module, revise state and the routing rule of main frame according to main frame health condition; Probe module: check that the application program of the machine and resource are idle, the health status of judgement the machine, initiatively the mode with broadcast sends healthy heartbeat message; Routing module: load static routing information during startup, when cluster moves, the instruction according to decision-making module, dynamically updates routing iinformation, completes switching and the recovery of main frame.

Between the modules of same main frame, adopt the mode of Inter-Process Communication, pass-along message, between different main frames, adopts TCP/IP mode, carries out main inter-machine communication.

According to the particularity of security itself, the product that enters transaction host process is divided into different product set, for a product set, can there is a main frame as master processor, can have multiple host as standby processor, be called the first standby host, the second standby host, the 3rd standby host etc., for a plurality of product set of a main frame main processing simultaneously, also a plurality of product set of standby processing simultaneously.

In this method, comprise an entity content, static configuration routing table, wherein writes and understands the corresponding main frame of each product set, the first standby host, and the second standby host etc., the static configuration routing table in routing module is as shown in table 1:

Static configuration routing table in table 1 routing module

Product set	Master processor	The first standby host	The second standby host	The 3rd standby host
					Product set 1	Main frame 1	Main frame 2	Main frame 3	Main frame 5
Product set 2	Main frame 2	Main frame 1	Main frame 4	Main frame 6
					Product set 3	Main frame 3	Main frame 4	Main frame 1	Main frame 5
Product set 4	Main frame 1	Main frame 3	Main frame 2	Main frame 6

As above-mentioned static routing table configuration, any one product set has a master processor, with a plurality of backup machines, and main frame can be made the master processor of a plurality of product set, can make the standby processor of a plurality of product set, can only make master processor, also can only make standby processor, therefore relatively flexible for the location of host role.Such as product set 1, master processor is that main frame 1, the first standby host is that main frame 2, the second standby hosts are main frame 3, the three standby host main frames 5.

Embodiment 4

Fig. 4 has shown that fault initiatively detects flow process, and idiographic flow is:

1. on main frame 1, probe module is initiatively detected the machine health status, and whether the resources such as detecting process, shared drive, message queue occur normally, if find, the machine has in message queue blocks message count over threshold value;

2. on main frame 1, probe module initiatively sends the machine health and fitness information to All hosts in system, and report the machine has blocks message count in message queue and surpassed threshold value;

3. in system, every other main frame is implemented isolated operation to abnormal host 1;

4. in system, every other host modifications is to the routing table of main frame 1, and new transaction data is routed to the first standby host of main frame 1;

5. pair abnormal main frame 1 is implemented physical isolation;

6. main frame 1 starts to carry out malfunction elimination and recovery;

In cluster, on every main frame, probe module can safeguard that the application program of a this main frame and resource check list, and checks at clocked flip, and check result is recorded in healthy heartbeat message, in group system, broadcasts, as shown in table 2 below:

Table 2 main frame 1 health examination result

Main frame	Process check	Message queue checks	Shared drive checks	…	Storage and database auditing
						Main frame 1	Normally	Block and surpass threshold values	Normally	…	Normally

As seen from the above, owing to blocking message count in message queue, surpassed threshold value, this main frame 1 can not normal process transaction data, and in probe module real time notification system, the decision-making module of other main frames, informs that this main frame 1 is abnormal, active request isolation.

In cluster, on every main frame, decision-making module is received this initiatively exception reporting, and synchronization notice routing module main frame 1 is abnormal, finishes this system mode inspection.

Embodiment 5

Fig. 5 has shown the flow process that Host Status recovers, and idiographic flow is:

1. abnormal host 1 detects the machine trouble shooting, and health status is normal, synchronously upgrades transaction data, completes data synchronous;

2. every other main frame 1 in abnormal host 1 reporting system ... N need to readmit oneself, can normal process transaction data;

3. every other main frame 1 in system ... N, several times are received the recovery normal request of main frame 1 continuously, decision-making judgement main frame 1 recovers normal really;

4. in system, every other main frame is implemented recovery operation to abnormal host 1, is modified to the routing table of main frame 1, and new transaction data can be routed to main frame 1;

5. new transaction data can send to main frame 1.

Claims

1. main frame group decision-making system in a high availability cluster, it is characterized in that described group decision-making system is positioned at system backstage, by some transaction main frames, formed, overall architecture is divided into three layers: sequencing stratum server main frame, communication layers server host and processing layer server host, between each main frame of formation cluster, can communicate by network, by shared storage device, file system is carried out to read and write access, data between main frame are carried out real-time synchronization, keep consistency, in cluster, each node maintenance is about all member node information of cluster, newly add the node of cluster can inform All hosts self information and dynamically update routing table, be responsible for the high available modules of the conversion of Host Status, by three module cooperative, completed, comprise: probe module, decision-making module and routing module, probe module is periodically accessed application program and the resource idle condition of this main frame, the health status of judgement the machine, probe module is detecting to such an extent that the machine health status is passed through TCP/IP network, in cluster, broadcast, decision-making module is received the healthy heartbeat message that probe module sends, by active and passive mode, whether judgement sends the main frame of healthy heartbeat message in normal condition, if decision-making module is in the judgement of epicycle, not finding has abnormal host in system, continue next round judgement, if find that there is abnormal host, notify routing module, revise the state of abnormal host, and revise the routing rule of transaction data, accomplish isolation and the switching of fault.

2. main frame group decision-making system in a kind of high availability cluster as claimed in claim 1, sequencing stratum server main frame described in it is characterized in that is responsible for load balancing and declaration form sequencing, communication layers server host receives the order from sequencing stratum server main frame, according to static state and dynamic routing table, selected backstage trading processing main frame, and order is forwarded to backstage trading processing stratum server main frame, backstage trading processing stratum server main frame receives the order from communication host, changes and bring together processing.

3. main frame group decision-making system in a kind of high availability cluster as claimed in claim 1, it is characterized in that described transaction main frame carries out subregion according to product set, the same time, the product that belongs to identical product set only can be processed on a certain applied host machine in cluster, to a certain product set, to having primary transaction main frame and standby transaction main frame, main computer is responsible for the securities trading of this product set and is processed, guest machine does not carry out the securities trading of this product set to be processed, guest machine keeps the data consistency with main computer by reproduction technology, when main computer lost efficacy, guest machine will dynamically be adjusted into the main computer of this product set.

4. main frame group decision-making system in a kind of high availability cluster as claimed in claim 1, routing table described in it is characterized in that is in charge of the main computer of each product set, guest machine information, these information are called as order routing iinformation, main computer can only be determined one, standby function is fixed a plurality of, between guest machine, order is taken in definition, be called successively the first standby host, the second standby host, by that analogy, static routing table refers to the main computer of pre-defined each product set hereof before transaction system starts, guest machine information, dynamic routing table refers in real time according to Host Status, the main computer of each product set obtaining, guest machine information.

5. a changing method for main frame group decision-making system in high availability cluster as claimed in claim 1, is characterized in that described changing method is as follows:

6. the changing method of main frame group decision-making system in a kind of high availability cluster as claimed in claim 5, it is characterized in that described changing method also comprises: when the state of cluster member main frame changes, dynamic routing table will recalculate, when host fails, in system, each main frame all can carry out informix and processing, all recalculate dynamic routing table, this fault main frame is originally as the product set of main computer, its first standby host becomes main computer, the second standby host goes forward one by one and becomes the first standby host, by that analogy, if standby host lost efficacy, only can adjust this standby host standby host order afterwards, standby host thereafter pushes away before going forward one by one, can not affect the active and standby machine order that priority is higher, when this has guaranteed main computer inefficacy, each product set is taken at first by preparing the most sufficient guest machine.