CN100334557C - Method for selecting intermediate proxy node of cluster network - Google Patents

Method for selecting intermediate proxy node of cluster network Download PDF

Info

Publication number
CN100334557C
CN100334557C CNB021421641A CN02142164A CN100334557C CN 100334557 C CN100334557 C CN 100334557C CN B021421641 A CNB021421641 A CN B021421641A CN 02142164 A CN02142164 A CN 02142164A CN 100334557 C CN100334557 C CN 100334557C
Authority
CN
China
Prior art keywords
node
agent
module
group
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB021421641A
Other languages
Chinese (zh)
Other versions
CN1466055A (en
Inventor
程菊生
吴雪丽
胡毅
金正操
顾光导
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CNB021421641A priority Critical patent/CN100334557C/en
Publication of CN1466055A publication Critical patent/CN1466055A/en
Application granted granted Critical
Publication of CN100334557C publication Critical patent/CN100334557C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention relates to a method for cluster network communications, in which intermediate agent nodes are introduced. When a system is started, the system can automatically select the intermediate agent nodes; in the process of system running, when the intermediate agent nodes generate failures and can not complete specified functions, the system can select new intermediate agent nodes.

Description

The system of selection of cluster network middle-agent node
Technical field
The present invention relates to the communication means of computer group network, relate in particular to the system of selection of cluster network communication middle-agent node.
Background technology
Computing machine various abnormal conditions may occur in operational process, the computer administrator need understand its running status at any time, in time learns the abnormal conditions of appearance, and handles accordingly, guarantees the safe and stable operation of computer system.
Computer cluster system is that multiple servers (node computer) is formed, and they condense together by private high network, constitute a superserver.In actual applications, it is particularly important that the safe and stable operation of Network of Workstation seems, therefore, be necessary the ruuning situation of all the node computer software and hardwares in the Network of Workstation is monitored, at any time pinpoint the problems, and the eliminating fault, and people more wish whole Network of Workstation is monitored as a single reflection.This just needs the supervisory system that can monitor whole Network of Workstation.
The communication plan of a kind of supervisory system employing of often using at present, is: monitoring host computer is directly communicated by letter with each node computer, obtains monitor message (as shown in Figure 1).Earlier, directly send monitoring host computer 1 then to, realize the monitoring of monitoring host computer each server by running on the running state information that Agent on each node computer server obtains place node computer 2.
There are many obvious defects in the existing communication method:
At first, existing scheme is only applicable to the less situation of nodal point number, when the node number increases to some, adopts this direct communication mode just can not meet the demands.For example, a group of planes of being monitored has 256 nodes, if adopt TCP (transmission control protocol) as basic communication protocol, then monitoring host computer need be kept 256 TCP connections, and this can take a large amount of system resource, even can not realize at all.If adopt UDP (User Datagram Protoco (UDP)) as basic communication protocol, then monitoring host computer might be received a large amount of UDP bags at one time, in case monitoring host computer fails in time to handle these bags, probably packet drop appears just, promptly go out the situation of active monitor message.For this situation, also there is not solution preferably at present.
Secondly, supervisory system is as the important component part of Network of Workstation, and as the backstage service operation, it can not take too much system resource and influence the operation of other application of Network of Workstation.And according to existing supervisory system communication plan, the operation of supervisory system can take a large amount of system resource, thereby disturbs the normal operation of Network of Workstation.For this reason, need a kind of new implementation, lack occupying system resources as much as possible, make the operation expense of supervisory system in whole Network of Workstation reduce to minimum.
Once more, the communication plan of existing supervisory system can not guarantee the synchronism of each node data acquisition be that is to say well, can not synchronously collect in the ruuning situation of synchronization each node computer.Like this, just the overall operation situation of Network of Workstation can not be understood objective, exactly, nor the single imaged features of whole Network of Workstation can be embodied.
Above-mentioned defective based on existing supervisory system communication plan, we press for a kind of new technical solution, can be applicable to the large-scale Network of Workstation of many nodes, under the prerequisite that does not take too much system resource, synchronously the ruuning situation of each node computer be monitored.And new solution should guarantee that supervisory system moves safely and steadly.
Summary of the invention
The object of the present invention is to provide a kind of communication plan of new cluster network system.
Another object of the present invention is to provide a kind of system of selection of computers group monitoring middle-agent node.
A further object of the present invention is to provide a kind of method of replacing automatically when middle-agent's node fails.
Further purpose of the present invention is to provide a kind of monitor network that can safe and stable operation.
The present invention is a kind of method that group monitoring network middle-agent node is selected that solves, this method comprises: all node computers of a monitored group of planes are divided into several groups, node acquisition module of operation on each node computer, be responsible for collection to the node data, a proxy module all in service on each node machine, make middle-agent's module can run on two states, when system start-up, node is acted on behalf of in the centre carry out initial setting up, in system's operational process, if middle-agent's module lost efficacy, replace dynamically.
Description of drawings
Fig. 1 represents the structure of existing monitor network.
Fig. 2 represents the communication structure according to hierarchical monitoring network of the present invention.
Fig. 3 represents according to the communication structure after the hierarchical monitoring network introducing middle-agent node of the present invention system of selection.
NP layoutprocedure when Fig. 4 represents according to system start-up of the present invention.
Fig. 5 represents according to the NP replacement process in the system of the present invention operational process.
Embodiment
In order to realize purpose of the present invention, can adopt following method.As shown in Figure 2, on monitoring host computer, moving basic service module (BSP) 11, all node computers are divided into several groups 12, in each group, all moving node acquisition module (NA) 13 on each node machine, also moving node proxy module (NP) 14 on the node computer and have in every group.NA module and np module all are software or the programs that runs on the node computer operating system.
Wherein, BSP is responsible for sending data acquisition command when needs are understood the Network of Workstation running status, waits for and receive the data of being returned by node computer then, and it is gathered and analyzing and processing; NP is responsible for after the acquisition of receiving from BSP, sends acquisition the NA module of all node computers in this group to, waits for and receive the data that the NA module is returned then, and it is gathered the unified BSP that sends in back; The then responsible running state data of periodically gathering the place node computer of NA, and after receiving acquisition, return up-to-date image data once immediately.
In the primary information gatherer process, BSP sends to all NP to acquisition by the udp broadcast mode, after NP receives acquisition, by the udp broadcast mode order is sent to all NA in the group of place again.Run on the running state data that NA periodically gathers the place node computer on each node computer, during acquisition that the NP in receiving the place group on certain node computer sends, just give this NP, by NP the uniform data of collecting is passed to the BSP that moves on the monitoring host computer again data transfer.The BSP that moves on the monitoring host computer receives the running state data of all node computers that each NP transmits, and gathers and analyzes, and realizes the monitoring to a whole group of planes.
Adopt this hierarchical policy, the node proxy module plays key effect in whole monitor network, if certain node proxy module because the accidental cause cisco unity malfunction, monitoring host computer just can not in time obtain the running state data of all nodes of respective sets.
As seen, need further solve two problems: the first, to acting on behalf of the selection of the node node of proxy module (operation node), the second, break down and can not continue to exercise agent functionality if act on behalf of node itself, select the new node of acting on behalf of.
The invention reside in and allow middle-agent's module NP can run on two states: enabled state (NP Enable) and illegal state (NP Disable).
As shown in Figure 3, on monitoring host computer, moving basic service module (BSP) 11, all node computers are divided into m group 12, in each group n node computer arranged, in each group, all moving node acquisition module (NA) 13 simultaneously on each node computer and node proxy module NP (comprises the NP that runs on enabled state Enable21 and the NP of illegal state Disable22), still, in each group, have only the NP that moves on the node computer to be in enabled state, i.e. NP Enable
In the primary information gatherer process, BSP sends to all NP to acquisition by the udp broadcast mode Enable, NP EnabieAfter receiving acquisition, by the udp broadcast mode order is sent to all NA in the group of place again.Run on the running state data that NA periodically gathers the place node computer on each node computer, as the NP that receives the place group EnableDuring the acquisition sent, just give this NP with data transfer Enable, again by NP EnableThe uniform data of collecting is passed to the BSP that moves on the monitoring host computer.The BSP that moves on the monitoring host computer receives each NP EnableThe running state data of all node computers that transmit gathers and analyzes, and realizes the monitoring to a whole group of planes.
According to above explanation, we as can be seen, the NP that only is in enabled state (is NP Enable) just really exercise the function of middle-agent's node, be responsible for transferring command and data between BSP and NP.If NP EnableFortuitous event appears in the place node computer, causes this NP EnableCan't operate as normal (we are referred to as NP and lost efficacy), supervisory system just can't be to this NP EnableThe node computer of place group is monitored.
The present invention is conceived under the different situations conversion of two kinds of running statuses of NP be realized the automatic selection and the replacement of middle-agent's node.
The automatic selection of middle-agent's node needs comprehensive two kinds of situations, and a kind of is the NP of supervisory system when starting EnableSelect, another kind is NP in the supervisory control system running process EnableReplacement.Describe in detail NP below in conjunction with accompanying drawing EnableThe method of selecting and replacing:
One, the NP during system start-up EnableSelect.
As shown in Figure 4, when supervisory system starts, all move two modules of NA, NP on each node computer.All np modules are in init state, send heartbeat message to BSP.BSP writes down the NP of first heartbeat in every group of node, and its NP as this group node Enable, then, send NP configuration order 31 with broadcast mode, notify NP all in this group.It is enabled state NP that selecteed NP changes its state Enable, and sending NP configuration response 32 to BSP, it is illegal state that other NP changes its state.NP EnableFurther send NP configuration announcement, inform NP with broadcast mode all NA in this group EnableThe position at place.
Two, the NP in the supervisory control system running process EnableReplace.
In the supervisory control system running process, if NP EnabieThe place node computer breaks down, and may cause NP EnableCan not finish set function.Therefore, require system can in time detect the situation that NP lost efficacy, and select new NP Enabte
The NP that is in enabled state (is NP Enable) can be ceaselessly send heartbeat message to BSP, and the NP that is in illegal state (is NP Disable) do not send heartbeat message to BSP.Like this, BSP can be at any time with each group node in NP EnableKeep in touch.In case the NP in certain group node EnableLost efficacy, BSP will learn this situation rapidly, and carries out NP by following process and select.
As shown in Figure 5, BSP will organize at certain and select new NP in the node Enable, at first all np modules send NP select command 35 in this group.Each np module (no matter whether it is in enabled state) all sends NP to BSP and selects response 36, and BSP writes down first NP that sends response, its NP as this group node Enable, send NP configuration order 31 with broadcast mode then, notify all NP in this group.Selected as NP EnableNp module to change its state be enabled state, and send NP configuration response 32 to BSP, other np module changes its state to be illegal state (if being in enabled state originally) or to keep its illegal state.Next, new NP EnableFurther send NP configuration announcement 33, inform NP with broadcast mode all NA modules in this group EnableThe position at place.
We are not difficult to find out, the method according to this invention, and supervisory system can realize the monitoring to large-scale Network of Workstation.And, when supervisory system starts, can carry out the selection of middle-agent's node automatically, and in the supervisory control system running process, when the centre acts on behalf of that node breaks down and can not finish set function the time, can select middle-agent's node of making new advances too, thereby guarantee the stable operation of supervisory system.
Obviously, about the inner structure of various programs, those skilled in the art are easily according to the present invention to its programming, just repeat no more here.
Those skilled in the art can carry out various changes and modification to computer group method for communicating of the present invention and system and not break away from the spirit and scope of the present invention.If of the present invention like this these are revised and modification belongs within claim of the present invention and the equivalent technologies scope thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (2)

1, the method for a kind of cluster network communication middle-agent node selection, this method comprises the steps:
Node acquisition module of operation is responsible for the collection to the node data on each node computer;
A proxy module all in service on each node computer;
Make middle-agent's module be in enabled state or illegal state, the middle-agent's module place node that is in enabled state is current middle-agent's node, is responsible for transferring command and data between monitoring host computer and each node computer;
When system start-up, node is acted on behalf of in the centre carry out initial setting up, when initial setting up, monitoring host computer sends to each middle-agent's module order is set, each middle-agent's module is returned response is set, and middle-agent's module each the node acquisition module in this group that is set to enabled state sends announcement is set;
In system's operational process, if middle-agent's module lost efficacy, carry out dynamic replacement, when dynamic replacement, monitoring host computer sends select command to each middle-agent's module, and each middle-agent's module is returned and selected to respond, and monitoring host computer sends to each middle-agent's module order is set, each middle-agent's module is returned response is set, and middle-agent's module each the node acquisition module in this group that is set to enabled state sends announcement is set.
2, the method for middle-agent's node selection as claimed in claim 1, further comprise step: the node machine that a described group of planes is divided is some groups, sets up described middle-agent's module in each group.
CNB021421641A 2002-06-10 2002-08-27 Method for selecting intermediate proxy node of cluster network Expired - Fee Related CN100334557C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB021421641A CN100334557C (en) 2002-06-10 2002-08-27 Method for selecting intermediate proxy node of cluster network

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
CN02237849 2002-06-10
CN20022378499 2002-06-10
CN02237849.9 2002-06-10
CN20021256268 2002-07-25
CN02125626 2002-07-25
CN02125626.8 2002-07-25
CNB021421641A CN100334557C (en) 2002-06-10 2002-08-27 Method for selecting intermediate proxy node of cluster network

Publications (2)

Publication Number Publication Date
CN1466055A CN1466055A (en) 2004-01-07
CN100334557C true CN100334557C (en) 2007-08-29

Family

ID=34198442

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021421641A Expired - Fee Related CN100334557C (en) 2002-06-10 2002-08-27 Method for selecting intermediate proxy node of cluster network

Country Status (1)

Country Link
CN (1) CN100334557C (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102291467B (en) * 2011-09-15 2014-04-09 电子科技大学 Communication platform and method suitable for private cloud environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1304247A (en) * 2000-01-11 2001-07-18 国际商业机器公司 Frame system and method for testing server performance
WO2001090913A1 (en) * 2000-05-22 2001-11-29 New.Net, Inc. Systems and methods of accessing network resources
WO2002033900A1 (en) * 2000-10-19 2002-04-25 Nokia Corporation Method of managing network element settings

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1304247A (en) * 2000-01-11 2001-07-18 国际商业机器公司 Frame system and method for testing server performance
WO2001090913A1 (en) * 2000-05-22 2001-11-29 New.Net, Inc. Systems and methods of accessing network resources
WO2002033900A1 (en) * 2000-10-19 2002-04-25 Nokia Corporation Method of managing network element settings

Also Published As

Publication number Publication date
CN1466055A (en) 2004-01-07

Similar Documents

Publication Publication Date Title
CN109597723B (en) Dual-machine hot standby redundancy implementation system and method for subway integrated monitoring system
CN1111994C (en) Method for fault-tolerant communication under strictly real-time conditions
CN101907879B (en) Industrial control network redundancy fault-tolerant system
CN102457390B (en) A kind of Fault Locating Method based on QOE and system
CN104320311A (en) Heartbeat detection method of SCADA distribution type platform
CN110677282B (en) Hot backup method of distributed system and distributed system
CN114371912A (en) Virtual network management method of data center and data center system
CN102118274A (en) State monitoring method, device and system
CN1988477A (en) Network managing system with high usability property
CN101296232A (en) Adapting device and method with multi-network management and multi-north interface
CN100334557C (en) Method for selecting intermediate proxy node of cluster network
CN106656584B (en) Distributed system invalid node judgment method
CN109302319B (en) Message pool distributed cluster and management method thereof
CN115729164B (en) Industrial communication system management method and device and industrial communication system
CN100547560C (en) A kind of computers group monitoring and method
CN1440606A (en) Communications system
CN108234154B (en) Airborne switching network equipment fault monitoring method
CN115499300A (en) Embedded equipment clustering operation architecture, method and device
KR20000012756A (en) Method for controlling distributed processing in cluster severs
CN113364659B (en) Data acquisition system based on Modbus protocol
CN110838994A (en) Redundant Ethernet link state monitoring system based on broadcast protocol
KR102517831B1 (en) Method and system for managing software in mission critical system environment
CN115801789B (en) Internet of things data aggregation system and method
CN103746787B (en) Multi-channel real-time full duplex carrier communication equipment
CN113708967B (en) System monitoring disaster recovery early warning device and early warning method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070829

Termination date: 20200827