The self-organizing cluster server of a kind of holding load equilibrium realizes method and apparatus
Technical field
The present invention relates to a kind of server unit running Large-Scale Interconnected net application program, run multiple internet, applications especially simultaneously, be automatically obtained the cluster server of load balancing and node administration.
Background technology
Cluster server is mainly used in running the application of Large-Scale Interconnected net, such as large-scale website, mailing system, Video service etc. These computation amounts are very big, and separate unit physical server is difficult to undertake, it is necessary to adopt cluster server. At present, cluster server is made up of two mutually redundant load-balancing devices and multiple stage physical server. Load-balancing device is all issued in customer service request, then is dispatched to physical server process by load-balancing device. This framework is not suitable for superhuge cluster server, because load-balancing device will become the bottleneck of cluster server, constrain the extension of cluster server scale, it is also not suitable for small-sized cluster server simultaneously, because load-balancing device is not involved in Business Processing, standby load-balancing device is more in idle condition, and for a cluster server only having 2-3 platform physical server, load-balancing device investment is bigger than normal.
Needing on two mutually redundant load-balancing devices to run heart beating software, Alternative load balancing equipment detects the running status of main equipment in real time, once find master-failure, starts local process at once, adapter traffic scheduling function. Further, load-balancing device is also required to run physical server detection program, once find fault, stops assigning business to failed server at once. Visible, general cluster server needs many set tool software to coordinate, and implements to dispose difficulty bigger.
Load-balancing device can not find physical server automatically, it is necessary to all physics server parameters are configured on load-balancing device in advance. If service operation needing increase physical server, it is necessary to the corresponding configuration of amendment and test load balancing equipment, it is difficult to realizing online fastext dilatation, operation maintenance complexity is higher.
Summary of the invention
In order to overcome existing cluster server to need integrated load-balancing device, heart beating software and nodal test program and the investment that causes is big, autgmentability is poor, implement to wait deficiency with difficult in maintenance, the present invention provides a kind of self-organizing cluster server, and self-organizing cluster server is not only only capable of automatically finding joining and departing from of node, it is not necessary to the cooperation of other heart beating software and nodal test program, and load balancing within realizing and redundancy backup, it is not necessary to special load-balancing device coordinates.
The technical solution adopted for the present invention to solve the technical problems is: self-organizing cluster server finds the addition of neighbor node automatically, leave and change with state, dynamically election management access point and Service Access point, unified management interface is provided to network management system by management access point, unified Operational Visit interface is provided a user with by Service Access point, and business is evenly distributed on each node, thus realizing the load balancing within cluster server and redundancy backup, solve existing cluster server and need deployment-specific load-balancing device and the problem of heart beating software, the installation of simplified system and maintenance.
As it is shown in figure 1, self-organizing cluster server manages physical server with joint form, and it is divided into the different role such as management node, service node, driven node. Management node is responsible for the node of self-organizing aggregated server system management. Management node on for receive and dispatch network management information the network port cry management access point, for receive and dispatch network management information IP address cry cluster IP, while also correspondence self-organizing cluster server domain name/IP address. Self-organizing cluster server only one of which management node and a management access point. Service node is responsible for receiving the node of service request. On service node for receive service request the network port cry Service Access point, for receive service request IP address cry business IP, while also correspondence business domain name/IP address. Self-organizing cluster server has one or more service node, and service node has one or more Service Access point. Driven node is the node assisting service node to process service request. Driven node is automatically become without the node that election is management node or service node.
As in figure 2 it is shown, self-organizing cluster server finds the addition of neighbor node by actively sending and monitoring node state message, leaves and the change of node state. Node regularly sends the status information of local node by multicast mode, including namespace node, business load situation and active block link parameter etc., declares the existence of local node to other node with this. Simultaneously node monitors the state message that other node sends, thus finding the existence of neighbor node, and grasp neighbor node addition, leave, business load change, link-state change multidate information. For reducing the performance cost of neighbor node discovery procedure, self-organizing cluster server does not adopt handshake method to set up neighborhood, and is guaranteed the synchronization of all node status information by continuous listening mode.
As shown in Figure 3, according to certain management access point election algorithm, self-organizing cluster server determines whether local node manages node, if it is actively declare that local node is management node to other node by multicast mode, and management access point which bar network link corresponding, if management node is not local node, then do not do any process. When having new node to add or malfunctioning node leaves, the neighbor state that each node is grasped not exclusively synchronizes, and the management node calculated is likely to different, causes that multiple node falls over each other to become management node. For avoiding conflict, selecting Optimal Management node, each node adopts yielding mode simultaneously, even if node thinks that local node is management node, but listens to other node when actively applying to become management node, equal active release pipe abandon reason node role.
As shown in Figure 4, the node that election bandwidth is minimum between local node and all neighbor nodes is as management node, and the link that election bandwidth is minimum in management node is as management access point. This election algorithm mainly avoids system administration expense to take the resource of high-performance node. The concrete election algorithm of management access point is as described below:
The first step, selects the node that effective total bandwidth is minimum in all nodes, if the qualified node of only one of which, leaps to the 3rd step;
Second step, selects node minimum for network link IP in selecting node further, if node exists multiple active link IP, then compares with smallest link IP;
3rd step, to choose node as managing node;
4th step, selecting the active link that bandwidth is minimum in management node, if only having one article of qualified link, leaping to the 6th step;
5th step, selects link minimum for IP in selecting link further, if link exists multiple active link IP, then compares with minimum IP;
6th step, to choose link as managing access point.
As it is shown in figure 5, self-organizing cluster server can run multiple business, one Service Access point of each business-binding simultaneously. Node is according to the Service Access point that certain Service Access point election algorithm is each business election correspondence, and determine local node whether service node, if it is actively declare that local node has the Service Access point of which business to other node by multicast mode, and Service Access point which bar link corresponding, if local node does not have Service Access point, then do not do any process. When having new node to add or malfunctioning node leaves, the neighbor state that each node is grasped not exclusively synchronizes, and the Service Access point calculated is likely to different, causes that multiple node falls over each other to become the service node of same business. For avoiding conflict, selecting optimum service node, each node adopts yielding mode, even if node thinks that local node is the service node of certain business simultaneously, but when listening to the service node that other node actively applies to become this business, all actively abandon the service node role of this business.
As shown in Figure 6, for the maximum node of each business election most lightly loaded, bandwidth as service node between local node and all neighbor nodes (including the node having become as management node), and the link that election most lightly loaded, bandwidth are maximum in service node is as Service Access point. This election algorithm mainly allows business be evenly distributed in all nodes, and gives full play to the effect of high-performance node. The concrete election algorithm of Service Access point is as described below:
The first step, selects the node that existing Service Access point is minimum in all nodes, if the qualified node of only one of which, leaps to the 4th step;
Second step, selects the node that effective total bandwidth is maximum in choosing node further, if the qualified node of only one of which, leaps to the 4th step;
3rd step, selects node maximum for network link IP in selecting node further, if node exists multiple active link IP, then compares with maximum link IP;
4th step, to choose node as service node;
5th step, selecting the available link that binding Service Access point is minimum in service node, if only having one article of qualified link, leaping to the 8th step;
6th step, selecting the link that bandwidth is maximum in selecting link further, if only having one article of qualified link, leaping to the 8th step;
7th step, selects link maximum for IP in selecting link further, if link exists multiple active link IP, then compares with maximum IP;
8th step, to choose link as Service Access point.
As it is shown in fig. 7, self-organizing cluster server can realize load balancing between each node. Load Sharing Algorithm adopts stateless hashing algorithm, has both guaranteed that the service request of same user was assigned to same node processing, reduces again the expense searching state table, improves the performance of whole self-organizing cluster server. All service request of certain business receive by the Service Access point with this business-binding. When service node receives the message of service request by Service Access point, first carry out hash calculating according to the purpose IP address of message and source IP address, and be mapped in live-vertex list. If mapping result is local node, then service request is directly handed to local service layer and processed, and result is directly returned to user. If mapping result is other node, then first resolve the MAC Address choosing node, then with the destination address that this MAC Address is link layer message, service request message is transmitted to, by two layers of link, the node chosen. Non-traffic node receives service request message, directly hands to local service layer and processes, and result is directly returned to user. The service response message that service node and driven node return is all with the business IP source IP address being message.
As shown in Figure 8, management node and service node possess redundancy backup ability, and any one management/service node breaks down, and other node re-elects new management/service node at once, replace malfunctioning node, thus improving the reliability of whole self-organizing cluster server. Each node constantly monitors the state message of neighbor node, if confiscating the state message of management/service node within a certain period of time, then it is assumed that management/service node breaks down or leaves, then re-elects management/service node in live-vertex. Node is chosen to elect management/Service Access point in locally significant link, new management/service node is become to the declaration of other node by state message, and the arp response message of broadcast control/Service Access point, force the service request mailing to malfunctioning node to be quickly switched into new management/service node. If management/service node does not have complete off-grid, the simply link down of management/Service Access point binding, so do not re-elect management/service node, only the active link on management/service node re-elects management/Service Access point, and broadcast the arp response message of new management/Service Access point, force the service request mailing to former management/Service Access point to be quickly switched into new management/Service Access point.
As shown in Figure 9, driven node is likewise supplied with redundancy backup ability, each node constantly monitors the state message of neighbor node, if confiscating the state message of driven node within a certain period of time, then think driven one malfunctions or leave, service node will stop asking to malfunctioning node forwarding service, and corresponding service request is shared by other live-vertex.
Accompanying drawing explanation
Below in conjunction with drawings and Examples, the present invention is further described.
Fig. 1 self-organizing cluster server node role.
Fig. 2 neighbor node finds and state synchronization method.
Fig. 3 node actively declares that according to election algorithm local node is for management node.
Fig. 4 manages access point election algorithm.
Fig. 5 node actively declares that according to election algorithm local node is service node.
Fig. 6 Service Access point election algorithm.
Fig. 7 load balancing operation principle.
Fig. 8 management/service node redundancy backup operation principle.
The driven node redundancy back-up job principle of Fig. 9.
Figure 10 node software system architecture.
Figure 11 state circular workflow.
Figure 12 monitors workflow.
Figure 13 node overtime work flow process.
Figure 14 manages access point election flow process.
Figure 15 Service Access point election flow process.
Figure 16 traffic scheduling workflow.
Figure 17 link down workflow.
Figure 18 ARP Message processing workflow.
Figure 19 initial work flow process.
Detailed description of the invention
With reference to the accompanying drawings present disclosure is described more fully. Note that being described below is only explanatory and exemplary in itself, not as any restriction to the present invention and application or use. Unless stated otherwise, otherwise, the parts set forth in an embodiment and the positioned opposite and numerical expression of step and numerical value do not limit the scope of the invention. It addition, technology well known by persons skilled in the art, method and apparatus are likely to not be discussed in detail, but also become a part for description in appropriate circumstances.
As shown in Figure 10, the software configuration of each node of self-organizing cluster server is the same, and its main purpose is to realize complete ad-hoc mode, any one malfunctions, as long as also having live-vertex to exist, all functions of system are still effective, and namely any role of system has the redundancy backup of 1:N. The major software modules of node includes the neighbor uni-cast of Internet, management access point election, Service Access point election, traffic scheduling module, the link monitoring of link layer, ARP processing module, the initialization module of system administration.
Neighbor discovery module is subdivided into 3 submodules such as state circular, monitoring, node time-out. As shown in figure 11, node arranges tick interrupt, makes regular check on Link State, circulates a notice of locally significant link, management access point, service access dot information. Concrete state circular workflow is as described below:
The first step, reads local link condition, without newly-increased active link, leaps to the 4th step;
Second step, if local node is management node, then re-elect management access point in locally significant link range, it is ensured that management access point is optimal choice;
3rd step, if local node is service node, then re-elect for Service Access point originally in locally significant link range, it is ensured that Service Access point is optimal choice;
4th step, structure node state message, encapsulate locally significant link, the management information such as access point, Service Access point;
5th step, by multicast mode sending node state message, circulates a notice of local node active link information, and whether local node is management node and service node.
As shown in figure 12, node constantly monitors the state message of neighbor node, it has been found that the addition of new node, old node link change, and the situation such as distribution of management node and service node. It is as described below that concrete node state monitors workflow:
The first step, monitors neighbor node state message, reads the information such as the namespace node in message, active link, management/Service Access point, if neighbor node is existing node, then leap to the 3rd step;
Second step, increase neighbor node record for new node, preserve new node active link state, if new node and local node competition management node, leap to the 4th step, if new node and local node competition Service Access point, leap to the 5th step, otherwise terminate to monitor;
3rd step, for original node updates active link information, delete the old link record not appearing in state message, for the new link record of newly-increased link establishment, and reset the enumerator of node time-out, if neighbor node and local node competition management node, enter the 4th step, if new node and local node competition Service Access point, leap to the 5th step, otherwise terminate to monitor;
4th step, local node is abandoned becoming management node, releases the binding relationship of former management access point, and notifies that ARP module stops the ARP request of response management access point, if neighbor node does not compete Service Access point with local node, terminates to monitor;
5th step, local node abandons Service Access point race condition occur, releases the binding relationship of former Service Access point, and notifies that ARP module stops responding the ARP request of former Service Access point.
As shown in figure 13, the state message of neighbor node is not received through certain time, then it is assumed that this neighbor node time-out. When neighbor node time-out, local node will delete the record of time-out node, and re-elects management/Service Access point. Concrete node overtime work flow process is as described below:
The first step, deletes the record of time-out node, including its link information and the management access point bound with it and Service Access point record;
Second step, re-elects management access point in live-vertex, it is ensured that management access point effectively and is optimum selection;
3rd step, re-elects all of Service Access point in live-vertex, it is ensured that all Service Access points effectively and are optimum selections, if local node is not management or service node, then end node overtime work;
4th step, circulates a notice of local management/service access dot information by multicast mode to all nodes, and notifies the ARP request of ARP module response corresponding management/Service Access point.
As shown in figure 14, each node elects bandwidth and the minimum node of IP as management node voluntarily, selects bandwidth and the minimum link of IP as management access point in management node. The concrete election flow process of management access point is as described below:
The first step, sorts all live-vertexs (including local node and neighbor node) from small to large by total bandwidth and IP;
Second step, using the node that makes number one as management node;
3rd step, by bandwidth and the IP all active links of sequencing management intra-node from small to large;
4th step, using the link that makes number one as management access point.
As shown in figure 15, each node elects bandwidth and the maximum node of IP as service node voluntarily, selects bandwidth and the maximum link of IP as Service Access point, and guarantee that Service Access point is uniformly distributed in all nodes and link in service node. The concrete election flow process of Service Access point is as described below:
The first step, sorts all live-vertexs from big to small by total bandwidth and IP, and node ID is 0��(n-1);
Second step, by sorting all business IP, business IP sequence number i from big to small from 0;
3rd step, using (imodn) individual node as service node corresponding for i-th business IP;
4th step, all active links of each service node that sorts respectively from big to small by bandwidth and IP, the link sequence number on each service node is 0��(m-1) respectively;
5th step, by the business IP sorted from big to small corresponding to each service node, the business IP sequence number j on each service node is from 0;
4th step, using (jmodm) article link as this service node jth business IP Service Access point bound.
As shown in figure 16, by the role according to oneself determines how to dispatch business when node receives service request message, it is ensured that service request is evenly distributed inside self-organizing cluster server. Concrete traffic scheduling workflow is as described below:
The first step, first looks for the Service Access point of correspondence, if the Service Access point of correspondence is not at local node, leaps to the 3rd step when receiving service request message;
Second step, the active link of all live-vertexs is constituted a continuous print one-dimensional space, each of the links length in the one-dimensional space is directly proportional to link bandwidth, again through certain hash algorithm, the source+purpose IP of service request message is mapped on link space, if the link being mapped to is not at local node, leaps to the 4th step;
3rd step, delivers service request message after locally applied layer processes and return user and terminates;
4th step, the MAC Address of inquiry mapping link, and by double layer network, service request message is transmitted to mapping link place node, then terminate.
The change of node link state includes link startup and link down two kinds. For suppressing Link State frequently to overturn, node processes link down in real time, ignores link startup, node state circulate a notice of submodule and complete the inspection of new link. As shown in figure 17, when link generation interruption situation, node needs the management/Service Access point on quick switch failure link, and notices other node. Concrete link down workflow is as described below:
The first step, deletes the record of faulty link, if the complete off-grid of node, then leaps to the 4th step, if faulty link is not management/Service Access point, then leap to the 3rd step;
Second step, re-elects management/Service Access point in locally significant link, and notifies the arp response message of ARP module broadcast new management/Service Access point and the ARP request of response new management/Service Access point;
3rd step, circulates a notice of the information of locally significant link, management/Service Access point by node state message, allows other node stop continuing to send service request to faulty link at once, then terminates;
4th step, deletes all neighbor node records and management/Service Access point record, forces node to enter init state, waits that node is networked again.
As shown in figure 18, node needs response about the ARP request of local management/Service Access point. ARP Message processing specific works flow process is as described below:
The first step, reads the request content of ARP message, if non-local management/Service Access point, then turns operating system and process and terminate;
Second step, reads the MAC Address of the link of corresponding management/Service Access point binding, and by arp response message broadcasting.
Node powers on or again networks from off-grid state and all can start initialization process. As shown in figure 19, node needs to initialize each software module, and finds neighbours and election management/Service Access point. Concrete initial work flow process is as described below:
The first step, starts link monitoring and ARP processing module, it is ensured that locally significant link information is accurate;
Second step, starts traffic scheduling and neighbor discovery module, it is ensured that do not abandon the service request that neighbor node forwards;
3rd step, continues to wait for a period of time after local node is networked, allows local node and neighbor node fully synchronize neighbor state information by regular sending node state message;
4th step, starts management/Service Access point election module, elects management/Service Access point;
5th step, circulates a notice of locally significant link, management/service access dot information by multicast.
Description of the invention provides for example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form. Many modifications and variations are obvious for the ordinary skill in the art. Selecting and describing embodiment is in order to principles of the invention and practical application are better described, and makes those of ordinary skill in the art it will be appreciated that the present invention is thus design is suitable to the various embodiments with various amendments of special-purpose.