Background technology
Jump in (Zero Hop) distributed system based on 0 of peer system in a class, each node need be known the state of all nodes in the whole cluster, generally adopts following steps:
1. when a node adds cluster, send a multicast (multicast) or broadcasting (broadcast) message;
2. oneself member's view is accepted and upgraded to all nodes in the cluster;
3. select a kind child node, and ask this kind child node that its member's view is passed to initiate node;
4. each node is kept heartbeat (heartbeat) to every other node, exchanges member's view mutually, if judge certain node death, then it is deleted from member's view.
This simple way can realize the state synchronized between small number of nodes, but interstitial content reaches hundreds of even more in cluster, and each node needs all to safeguard that a large amount of heartbeat (heartbeat) connects the heartbeat message that timed sending is a large amount of.Whole cluster centers is jumped message number in O (n^2) rank, and network overhead is huge, does not almost have feasibility.If adopt the Gossip agreement to carry out between node, regularly select some nodes to communicate at random, send heartbeat message, the state information of the node that exchange and renewal are safeguarded separately; Through after the regular hour, all nodes all have the node status information of up-to-date unanimity in the final cluster.This method has guaranteed the final consistency of data between each node, and takies the network bandwidth seldom.
Gossip is a kind of distributed protocol, carries out the state synchronized between the node in the distributed system by Gossip agreement distribution node status message, the method that often adopts when being the distributed system of topological forms such as structure peer system or overlapping network.The Gossip agreement has been proved to be in practice and has been particularly suitable for using in the larger distributed system of node, carry out message distribution and status exchange, with the final consistency (Eventual Consistency) that reaches node state and data, thereby reach the purpose of state synchronized between system node.
General Gossip agreement realizes that a node is being chosen some nodes and when it sends synchronization message, normally chosen some nodes randomly from the node listing that it is safeguarded, the transmit status synchronization message is carried out state synchronized.
But this Gossip agreement realizes selecting at random the method for node communication, does not consider the topological structure of cluster, as same frame and different frames, and same data center and different pieces of information center etc.Fig. 2 is a kind of topological structure schematic diagram of data center, and as shown in Figure 2, each data center comprises several clusters 203, and each cluster 203 comprises several server racks 202, and each server rack 202 also comprises several server nodes 201.In this cluster topological structure, generally adopt the interconnected building network structure of switch.Two internodal bandwidth of different server frame are general than the bandwidth anxiety with two nodes in the frame, two internodal bandwidth of striding data center are just more nervous, and along with institute's switch-spanning and logical reach between node are more and more far away, the time that the message transmission needs also may be long more, make that the speed of system message convergence synchronously is very slow, efficient is lower.
Summary of the invention
In order to solve the deficiency that exists in existing the realization, the object of the present invention is to provide a kind of state synchronization message distribution method based on distance, between each node of distributed system, carry out message, improve the convergence rate of system node state consistency.
For achieving the above object, the invention provides a kind of state synchronization method based on distance, this method may further comprise the steps:
1) the kind child node of a certain node in building network or the cluster, and safeguard the tabulation of movable joint point and a tabulation of dying for the sake of honour;
2) in movable joint point tabulation,, give each node different distance parameters according to the topology location of each node in network, and with square probability that is inversely proportional to of distance, select the node of different distance, send synchronization message;
3) send synchronization message to unreachable node at random;
4) selecteed node does not comprise kind of a child node, then sends synchronization message to a kind child node at random;
5) be less than kind of a child node number as the node number in the tabulation of movable joint point, send synchronization message to a kind child node at random.
Wherein, the kind child node of a certain node is the form that the kind child node of all nodes in described network or the cluster is built into binary tree in described step 1) building network or the cluster.
Wherein, described step 2) middle according to the topology location in network, the step of giving the different distance parameter of node further comprises: the position of residing shelf position of decision node and data center; Give the nodal distance parameter according to the position.
Wherein, the position of residing shelf position of described decision node and data center comprise according to system configuration judge, IP address set rule judgment or judge during according to the network planning according to dynamic statistics.
Wherein, described to give the nodal distance parameter according to the position be concrete form assignment according to network topology, is 1 with the nodal distance parameter of frame, different frames but the nodal distance parameter at identical data center is 2, and the nodal distance parameter at different pieces of information center is 3.
The present invention is by considering topology location (as frame, the data center etc.) information of node in distributed system network, give the euclidean distance between node pair parameter, and with square probability that is inversely proportional to of distance, choose the node of different distance, send synchronization message, carry out state synchronized.The mechanism of this Gossip message distribution synchronous regime data based on distance can be accelerated the conforming convergence rate of system mode, improves communication efficiency, reduces the expense of the network bandwidth and system loading.
Other features and advantages of the present invention will be set forth in the following description, and, partly from specification, become apparent, perhaps understand by implementing the present invention.
Embodiment
Below in conjunction with accompanying drawing the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein only is used for description and interpretation the present invention, and be not used in qualification the present invention.
Core content of the present invention mainly is to use the Gossip agreement to carry out need selecting synchronization node according to distance when synchronous.
Select the rule of synchronization node to be based on distance: certain node in system or the cluster, to being in other nodes of heterogeneous networks position, give different distance parameters, square being inversely proportional to of selecteed probability of node and nodal distance parameter.
Fig. 1 is according to the state synchronization method flow chart based on distance of the present invention, below with reference to Fig. 1, the state synchronization method based on distance of the present invention is described in detail:
At first, in step 101, when using Gossip protocol construction peer system or overlapping network, when a node newly joins in the cluster, send message synchronization for the first time to one or more nodes, we claim these nodes to be kind of a child node (Seed Node), simultaneously, two node listings of each node maintenance: one is that current movable joint point tabulation can reach node listing, and another one is that a current tabulation of dying for the sake of honour is unreachable node listing.
Fig. 3 is the kind child node relationships schematic diagram that makes up according to the present invention, as shown in Figure 3, we can be built into the relation of kind of child node the form of a binary tree, each father node is the kind child node of its child node, node A is the kind child node of Node B and C, and Node B is the kind child node of node D and E.When each node is planted child node in configuration,, can also specify other kind child node except specifying kind of the child node according to the form of binary tree.
In step 102, certain node in system or the cluster, to the node in reached at the tabulation that is in the heterogeneous networks position, give different nodal distance parameters, the nodal distance parameter that for example can establish with frame is 1, different frames but the nodal distance parameter at identical data center is 2, the nodal distance parameter at different pieces of information center are 3 etc.Whether decision node is in same frame, or same data center, or the method at different pieces of information center has multiple, for example, can judge that IP address set rule judgment in the time of also can be according to the network planning can also be judged or the like according to dynamic statistics according to system configuration.Internodal distance parameter also can be according to the concrete situation to network topology, assignment neatly, and its principle is that distance is far away more, distance parameter is big more.
In step 103, in current movable joint point tabulation (can reach node listing) according to square rule that is inversely proportional to of selecteed probability of node and nodal distance parameter, select to send the destination node of synchronization message, and carry out state synchronized, the highest with the selecteed probability of the node of frame, the selecteed probability of the node at different pieces of information center is minimum, preferentially allows with carrying out between the node of frame synchronously.During Practical Calculation, the node number of different distance can be changed whole the processing accordingly, and for example less than 1 node number, being considered as is 1 etc.
In step 104, send synchronization message to some inaccessible nodes at random, attempt carrying out message synchronization.Under distributed environment, because that network becomes is unreliable, of short duration unreachable of node may appear, the purpose of this step is in order to find that as early as possible inaccessible node can reach again.
In step 105, whether determining step 103 selected nodes are kind of a child node, if plant child node, forward step 107 to and handle; If not kind of a child node, then forward next step to.
In step 106, send synchronization message to a kind child node at random, carry out synchronously.Selected node may not comprise kind of a child node in step 102, in this case, then sends synchronization message to a kind child node at random with certain probability, and carries out synchronously.Because plant child node many node status information are always arranged, this step can be accelerated synchronous speed.
In step 107, judge that node number in the tabulation of current movable joint point whether less than kind of a child node number, if less than kind of a child node, then carries out next step; If be not less than kind of a child node number, forward step 109 to and finish this state synchronized.
In step 108, send synchronization message to a kind child node at random, and carry out state synchronized.If the node number of current work is less than kind of a child node number, the purpose that sends synchronization request to kind child node at random with certain probability is the appearance for fear of " isolated island ".
If do not carry out above-mentioned steps 107 and step 108, be easy to occur " isolated island ", for example: suppose to have 4 machines, be respectively A, B, C, D, and to have disposed them all be kind of a child node, i.e. A, B, C, the kind child node of D all comprises this 4 machines.If they start simultaneously, such situation may appear:
A) the A node gets up, and the node that discovery does not live is gone to step 106 and any one kind child node is synchronous, supposes to have selected B;
B) B node and A finish synchronously, think that then A lives, and it will be synchronous with A, because A is the kind child node of B, B will be no longer synchronous with other kinds child node;
C) the C node gets up, and the node that discovery does not live is gone to step 106 equally and any one kind child node is synchronous, supposes to have selected specifically D;
D) D node and C finish synchronously, think that C lives, and then it will be synchronous with C, because C is the kind child node of D, so D is also no longer synchronous with other kinds child node.
At this moment just formed two isolated islands, A and B are synchronous mutually, and mutually synchronously, still { A, B} is with { C will be no longer mutually synchronously between the D}, and they have not just known the other side's existence yet between C and the D.After adding step 107 and step 108, A and B finish synchronously, find to have only a node to live, but planting child node have 4, at this moment can be again and other any one seed node communications, thus break this isolated island.
The present invention can allow with carrying out fast between the node of frame synchronously, reduce and stride frame, stride the communication number of times of data center, take the littler network bandwidth, make and stride frame, stride between the node of data center carrying out having the up-to-date more complete data in this frame or notebook data center when synchronous, accelerate the speed of whole cluster network state consistency convergence.
In addition, reasonable construction kind child node also can add the speed of rapid convergence on this basis more.Node carries out when synchronous at every turn, all (carries out message synchronization with kind of child node, has the state information of maximum nodes in theory because plant child node with certain probability.And all nodes have formed a kind of incidence relation by kind of a child node, and select to have formed certain complementation at random.
In every distributed system, utilize Gossip or other agreements to carry out state synchronized, information exchange, or neighbor node can be used the present invention when selecting based on P2P.The present invention is including but not limited to two kinds of following operating positions.
1.P2P video is shared
In the live streaming media based on P2P, shared medium data between each node and near the node need regular renewal state separately between the node.Utilize the present invention, by dynamic statistics, some nodes that will be nearer apart from this node belong in " same frame ".When selecting nodes for state synchronous, can upgrade by these nearer nodes of chosen distance oneself, then according to the data dispatch algorithm, select suitable " neighbours ", request obtains corresponding data.Can not occur two node schedulings that distance is far, link quality is very poor being become content sharing, influence the service quality of system greatly owing to select at random.
2.VoIP the interacting message in the communication
The voice communication of IP based network (VoIP) is a kind of brand-new voip communications business, and it has been compared favorable expandability, has disposed tangible advantages such as convenient, cheap with traditional pstn telephone business.During VoIP in the world uses, because communication each side may be under the different network conditions, adopt the present invention, the state information of the server of voice packets transfer will be carried out, in whole network, propagate fast, dynamic self-adapting ground carries out link notice and forwards according to the communication two party network then, and higher-quality service is provided.
One of ordinary skill in the art will appreciate that: the above only is the preferred embodiments of the present invention, be not limited to the present invention, although the present invention is had been described in detail with reference to previous embodiment, for a person skilled in the art, it still can be made amendment to the technical scheme of aforementioned each embodiment record, perhaps part technical characterictic wherein is equal to replacement.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.