CN106021070A

CN106021070A - Method and device for server cluster monitoring

Info

Publication number: CN106021070A
Application number: CN201610285256.0A
Authority: CN
Inventors: 赵富欣
Original assignee: LeTV Holding Beijing Co Ltd; LeTV Information Technology Beijing Co Ltd
Current assignee: LeTV Holding Beijing Co Ltd; LeTV Information Technology Beijing Co Ltd
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2016-10-12

Abstract

The embodiment of the invention provides a method for server cluster monitoring. The method comprises the steps that a temporary catalog node of a server cluster is established on a public service platform; the temporary catalog node is monitored, wherein an abnormal sub-node is positioned when abnormity of the temporary catalog node is monitored; and a preset abnormity processing plan is inquired according to the abnormal sub-node, and the abnormal sub-node is processed according to the abnormity processing plan. In this way, persistent monitoring and instant alarming for the server cluster can be implemented.

Description

Server cluster monitoring method and device

Technical field

The present embodiments relate to big technical field of data processing, particularly relate to a kind of server cluster monitoring Method and device.

Background technology

Server cluster refers to get up a lot of server centered carry out same service together, client Holding apparently server cluster similarly is only one of which server.Cluster can utilize multiple computer to carry out also Row calculates thus obtains the highest calculating speed, it is also possible to backup with multiple computers, so that appoint What machine is broken whole system still can be properly functioning.Install the most on the server and run group Collection service, this server can add cluster.Clustered operation can reduce Single Point of Faliure quantity, and Achieve the high availability of clustered resource.

Generally in distributed server cluster, a big operation is split as multiple task, and by this Multiple server parallel processings that multiple tasks are distributed in cluster such that it is able to realize high efficiency number According to process.

In server cluster, need to start Realtime Alerts after any one node server breaks down, So that cluster receive the report for police service start afterwards standby server taking over this out of order server thus Ensure the normally completing of operation, but how to realize to each node server long lasting for real-time It is a difficult point that monitoring and second level are reported to the police.Meanwhile, when there is multiple server cluster, need Each server cluster is set up monitoring system, elapsed time and resource respectively more.Therefore, a kind of The server cluster monitoring method improved urgently proposes.

Summary of the invention

The embodiment of the present invention provides a kind of server cluster monitoring method and device, in order to solve prior art In can not be long lasting for can not Rapid Alarm when monitoring and clustering fault time server cluster is monitored Defect, it is achieved server cluster is continued to monitor and server cluster fault second level report to the police.

The embodiment of the present invention provides a kind of server cluster monitoring method, including:

The temp directory node of server cluster is created at public service platform；

Monitoring described temp directory node, when monitoring described temp directory node and having abnormal, location is abnormal Child node；

The abnormality processing prediction scheme preset according to the inquiry of described abnormal child node, and according to described abnormality processing prediction scheme Described abnormal child node is processed.

The embodiment of the present invention provides a kind of server cluster monitoring device, including:

Pretreatment module, for creating the temp directory node of server cluster at public service platform；

Monitoring modular, is used for monitoring described temp directory node, has different when monitoring described temp directory node Chang Shi, the abnormal child node in location；

Abnormality processing module, for the abnormality processing prediction scheme preset according to the inquiry of described abnormal child node, and root According to described abnormality processing prediction scheme, described abnormal child node is processed.

The server cluster monitoring method of embodiment of the present invention offer and device, utilization can be different clusters The platform of public service is provided, public service platform creates temp directory node according to cluster topology, And by described temp directory node being monitored the monitoring realized server cluster, change existing skill When art carries out server cluster monitoring, it is impossible to can not quickly report during long lasting for detection and clustering fault Alert defect, it is achieved that server cluster is continued to monitor and the second level of server cluster fault is reported to the police.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that under, Accompanying drawing during face describes is some embodiments of the present invention, for those of ordinary skill in the art, On the premise of not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the techniqueflow chart of the embodiment of the present application one；

Fig. 2 is the techniqueflow chart of the embodiment of the present application two；

Fig. 3 is the techniqueflow chart of the embodiment of the present application three；

Fig. 4 is the device example structure schematic diagram of the embodiment of the present application one；

Fig. 5 is the device example structure schematic diagram of the embodiment of the present application two.

Detailed description of the invention

For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of protection of the invention.

Assume such a application scenarios, when there is multiple server cluster, to each server cluster It is monitored needing the operation through troublesome respectively, and takies more monitoring resource, and multiple service Device cluster may rely on a public service and carries out the maintenance of data and the management of cluster configuration, therefore, The embodiment of the present application use public service platform server cluster is monitored, such as zookeeper This public service platform.

ZooKeeper is the distributed application program coordination service of an open source code, is Google Mono-realization increased income of Chubby, is the significant components of Hadoop and Hbase.It can be distributed Application provide Consistency service software, it is provided that function include: configuring maintenance, domain name service, distribution Formula synchronization, group service etc..With lower part will in conjunction with accompanying drawing elaborate the embodiment of the present application based on Zookeeper carries out implementing of server cluster monitoring.

Fig. 1 is the techniqueflow chart of the embodiment of the present application one, and in conjunction with Fig. 1, the embodiment of the present application one takes Business device cluster monitoring method, can have a following implementation process:

Step S110: create the temp directory node of server cluster at public service platform；

Step S120: monitor described temp directory node, has exception when monitoring described temp directory node Time, the abnormal child node in location；

Step S130: the abnormality processing prediction scheme preset according to the inquiry of described abnormal child node, and according to described Described abnormal child node is processed by abnormality processing prediction scheme.

Concrete, in step s 110, when described public service platform is zookeeper, described establishment The temp directory node of server cluster can be in GroupMembers, in this catalogue, first First creating temp directory node, the quantity of described temp directory node can be with the quantity phase of server cluster Correspondence, such as, in a kind of application scenarios, need the server cluster of monitoring to have three, then can be GroupMembers creates three temp directory nodes.In this step, for realizing server cluster In the monitoring of each node server, also need the structure according to server cluster, create described temp directory All child nodes of node, wherein, each service of each described child node representative server cluster Device, the quantity of described child node can be corresponding with the number of servers in each server cluster, and with The IP address of each described child node server is described child node assignment.

Such as, in GroupMembers catalogue, create the temp directory node Node1 of cluster 1, and It is that described temp directory node sets up child node, such as according to the connection of server in cluster 1 Server1、Server1……Server N。

In this step, after described temp directory node creation node, further each child node is taken The IP address assignment of business device gives each child node, so that when server cluster breaks down, it is possible to root Fault reason scheme at failed server startup is quickly positioned according to the IP address of malfunctioning node.

Therefore, for temp directory node Node1, can there is a following assigned result:

Node1\Server1\192.x.y.1

Node1\Server2\192.x.y.2

Node1\Server3\192.x.y.3

Concrete, in the step s 120, monitor described temp directory node, monitoring can be shown as described Whether the child node quantity of temp directory node changes.

Zookeeper maintains the data structure of a similar file system, every height in Zookeeper Directory entry, such as NameService, GroupMembers, be all called znode, and file system Unified sample, it is possible to increase freely, delete znode and increase under a znode, delete sub-znode. In the embodiment of the present application, select this subdirectory item of GroupMembers, and in this subdirectory item Quantity according to monitored server cluster creates the temp directory node of respective numbers.

Zookeeper have four types znod:PERSISTENT-persistence directory node, PERSISTENT_SEQUENTIAL-persistence serial number directory node, EPHEMERAL-are interim Directory node and EPHEMERAL_SEQUENTIAL-temporal order's numbering directory node.

For PERSISTENT-persistence directory node, the equipment being attached thereto breaks with zookeeper After opening connection, the node of client registers still exists.

For PERSISTENT_SEQUENTIAL-persistence serial number directory node, equipment with After zookeeper disconnects, this node still exists, and simply Zookeeper carries out to this nodename Serial number.

For EPHEMERAL_SEQUENTIAL-temporal order's numbering directory node equipment with After zookeeper disconnects, this node is deleted, and simply Zookeeper carries out suitable to this nodename Sequence is numbered.

For EPHEMERAL-temp directory node, after client and zookeeper disconnect, if The node of remarks volume is deleted therewith.

In the embodiment of the present application, utilize the characteristic of temp directory node EPHEMERAL, arrange all clothes Business device is creating temp directory node under parent directory GroupMembers, then listens for parent directory node Child node change message.Once occur that server is delayed the phenomenon of machine, then this server and zookeeper Connection disconnect, the temp directory node that it is created be deleted.

Therefore, the characteristic of temp directory node in zookeeper is utilized can to realize in the embodiment of the present application right The monitoring of server cluster.When, in temp directory node, the quantity of child node once changes, the most very Easily determine and learn whether server cluster has server delay machine.

Concrete, in this step, when monitoring described temp directory node and having abnormal, location is abnormal sub Node, specifically may also include the following step that realizes:

Step S121: obtain the current structure of temp directory node described in current time in real time, and with described Current structure and 'historical structure carry out real-time comparison；

Step S122: when occurring abnormal child node in described comparison result, reads from described preset structure Take the IP address of described abnormal child node thus position abnormal child node.

In step S121, described 'historical structure be the previous moment of described current time corresponding described in face Time directory node structure；It should be noted that in the embodiment of the present application, interim mesh described in each moment The structure of record node is all to need to preserve, and the structure of current time becomes 'historical structure at subsequent time, Realize the real-time monitoring to cluster for comparing with the structure of subsequent time, find cluster in time Fault.Wherein, the time interval in described each moment is configured according to user's request, can be 1 Second preserve a current time structure, can with 1 millisecond preserve a current time structure, it is also possible to be Within 30 seconds or one minute, preserving once, certainly, the granularity of time interval is the least, and the real-time of monitoring is the highest, The embodiment of the present application is not limiting as the time interval in described each moment, and above-mentioned data are intended for illustration to be made With, implement to be not intended that restriction to the application.Assume current time, server cluster has a service Device is delayed machine, then the child node that this station server is corresponding can disappear from zookeeper.At described current time Previous moment, the child node that this station server is corresponding still exists, therefore, server failure in a flash, The child node quantity of described temp directory node can change, but specifically which child node occurs Change, it is necessary to contrast 'historical structure and the difference of current structure.

Such as, the 'historical structure of cluster 1 is as follows:

Node1\Server1\192.x.y.1

Node1\Server2\192.x.y.2

Node1\Server3\192.x.y.3

Node1\Server4\192.x.y.4

The current structure of cluster 1 is as follows:

Node1\Server1\192.x.y.1

Node1\Server2\192.x.y.2

Node1\Server4\192.x.y.4

Then by the comparison analysis of described 'historical structure and described current structure, it can be determined that learn, IP ground Location is that the connection of Server3 with zookeeper of 192.x.y.3 is disconnected, and Server3 very likely occurs Delay machine fault.

Concrete, in step s 130, after finding abnormal child node, zookeeper can send early warning and lead to Know, it should be noted that the pre-alert notification of zookeeper can reach the speed of Millisecond, therefore, it is possible to Realize the warning rapidly of server cluster fault.

Wherein, described abnormality processing prediction scheme includes: the IP address start according to described abnormal child node is corresponding Spare node；Or, the abnormal information of described abnormal child node is notified corresponding server cluster pipe Reason person.

It is default that above two processes prediction scheme, for the first processes prediction scheme, and the IP ground of child node The different then corresponding spare node in location is different, because of the possible corresponding different server cluster of different IP addresses, The performance capacity of different server cluster is different, therefore the standby server enabled is the most different.

Processing prediction scheme for the second, each abnormal information can the correspondent party of corresponding corresponding management person Formula, by sending abnormal information to manager, by manager's processing server fault.Feasible in one Implementation in, server administrators can be selected according to father node corresponding to described abnormal child node, Concrete, inquire about the emergency processing table preset, in described emergency processing table, enumerate different server collection Group and the corresponding relation of manager's contact method, such as, server cluster 1 correspondence 1 group of manager, clothes The corresponding 2 groups of managers of business device cluster 2；The most such as, the Core server correspondence 1 in server cluster 1 No. 1 manager of team, other managers of corresponding 1 team of non-core server.Concrete notice form is permissible It is to edit declaration of exception information in advance, when fault occurs, directly by described declaration of exception information pushing To the mobile device of manager, thus remind manager in time.

In the present embodiment, utilize the platform that public service can be provided for different clusters, put down in public service Temp directory node is created according to cluster topology on platform, and by the monitoring reality to described temp directory node The now monitoring to server cluster, changes when carrying out server cluster monitoring in prior art, it is impossible to long Time continues to monitor and can not the defect of Rapid Alarm during clustering fault, it is achieved that hold server cluster The second level of continuous monitoring and server cluster fault is reported to the police.

Fig. 2 is the techniqueflow chart of the embodiment of the present application two, and in conjunction with Fig. 2, the embodiment of the present application one services Device cluster monitoring method, following enforcement step:

Step S210: create the monitoring line of respective numbers according to the quantity of the described server cluster that need to monitor Journey；

Step S220: use audiomonitor each described monitoring thread is mourned in silence monitoring thus obtain in real time The unusual condition of each described server cluster.

Concrete, in step S210, described monitoring thread is for the institute to each described server cluster The son joint stating temp directory node and described temp directory node is monitored.Each described monitoring line Journey is carried out the enforcement step in embodiment one.

Concrete, in step S220, an audiomonitor is started for each monitoring thread it is entered Capable monitoring of mourning in silence, it is possible to realize monitoring while multiple server clusters, and each described detection thread All mourn in silence on backstage operation, consume resource few.Described audiomonitor can realize, because of institute by monitoring pattern Stating monitoring pattern is ripe prior art, does not repeats.

In the present embodiment, when there being multiple server cluster to need monitoring, start the monitoring thread of respective numbers, And use the mode monitored of mourning in silence each thread to be mourned in silence monitorings, thus the few network of consumption with And in the case of system resource, also can realize continuously multiple servers being monitored.Same with this Time, in the embodiment of the present application, carry out data when multiple server clusters all rely on a public service Safeguard and during the management of cluster configuration, utilize described public service that the plurality of server cluster is carried out Malfunction monitoring, it is possible to more save the deployment resource of monitoring device and dispose flow process.

Fig. 3 is the techniqueflow chart of the embodiment of the present application three, and in conjunction with Fig. 3, the embodiment of the present application one services Device cluster monitoring method, the concrete monitoring process of each monitoring thread can include following enforcement step:

Step S310: create the temp directory node of server cluster at public service platform；

Step S320: according to the structure of server cluster, create all sub-joint of described temp directory node Point, and be described child node assignment with the IP address of each described child node；

Step S330: obtain the current structure of temp directory node described in current time in real time, and with described Current structure and 'historical structure carry out real-time comparison；

Step S340: when occurring abnormal child node in described comparison result, reads from described preset structure Take the IP address of described abnormal child node thus position abnormal child node；

Step S350: the abnormality processing prediction scheme preset according to the inquiry of described abnormal child node, and according to described Described abnormal child node is processed by abnormality processing prediction scheme.

In the present embodiment, by creating temp directory node according to cluster topology on public service platform, and By to the current structure of described temp directory node and the comparison of 'historical structure, it is achieved that server set In Qun, the second level of the quick location-server clustering fault of abnormal server is reported to the police.

Fig. 4 is the apparatus structure schematic diagram of the embodiment of the present application one, and in conjunction with Fig. 4, the embodiment of the present application is a kind of Server cluster monitoring device, including such as lower module:

Pretreatment module 410, for creating the temp directory node of server cluster at public service platform；

Monitoring modular 420, is used for monitoring described temp directory node, when monitoring described temp directory node When having abnormal, the abnormal child node in location；

Abnormality processing module 430, for the abnormality processing prediction scheme preset according to the inquiry of described abnormal child node, And according to described abnormality processing prediction scheme, described abnormal child node is processed.

Wherein, described pretreatment module 410, specifically for: according to the structure of server cluster, create institute State all child nodes of temp directory node, and be described child node with the IP address of each described child node Assignment.

Wherein, described monitoring modular 420, specifically for: monitor the son node number of described temp directory node Whether amount changes.

Wherein, described monitoring modular 420, specifically for: obtain temp directory joint described in current time in real time The current structure of point, and carry out real-time comparison with described current structure and 'historical structure；Wherein, go through described in History structure is the structure of described temp directory node corresponding to the previous moment of described current time；When described When abnormal child node occurs in comparison result, from described preset structure, read the IP of described abnormal child node Address thus position abnormal child node.

Wherein, described monitoring modular 420 is additionally operable to: create according to the quantity of the described server cluster that need to monitor Building the monitoring thread of respective numbers, wherein said detection thread is for the institute to each described server cluster The son joint stating temp directory node and described temp directory node is monitored；Use audiomonitor to each Described monitoring thread carries out mourning in silence monitoring thus obtains the unusual condition of each described server cluster in real time.

Fig. 4 shown device can perform the method for Fig. 1 and embodiment illustrated in fig. 3, it is achieved principle and technology Effect, with reference to Fig. 1 and embodiment illustrated in fig. 3, repeats no more.

Fig. 5 is the apparatus structure schematic diagram of the embodiment of the present application two, and in conjunction with Fig. 5, the embodiment of the present application is a kind of Server cluster monitoring device, including monitoring modular, audiomonitor:

Described monitoring modular 510 is used for: create respective counts according to the quantity of the described server cluster that need to monitor The monitoring thread of amount, wherein said detection thread is for the described interim mesh to each described server cluster The son joint of record node and described temp directory node is monitored；

Described device also includes audiomonitor 520, and belonging audiomonitor is for carrying out each described monitoring thread Mourn in silence monitoring thus obtain the unusual condition of each described server cluster in real time.

Fig. 5 shown device can perform the method for Fig. 2 and embodiment illustrated in fig. 3, it is achieved principle and technology Effect, with reference to Fig. 2 and embodiment illustrated in fig. 3, repeats no more.

Device embodiment described above is only schematically, wherein said illustrates as separating component Unit can be or may not be physically separate, the parts shown as unit can be or Person may not be physical location, i.e. may be located at a place, or can also be distributed to multiple network On unit.Some or all of module therein can be selected according to the actual needs to realize the present embodiment The purpose of scheme.Those of ordinary skill in the art are not in the case of paying performing creative labour, the most permissible Understand and implement.

Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each reality The mode of executing can add the mode of required general hardware platform by software and realize, naturally it is also possible to by firmly Part.Based on such understanding, the portion that prior art is contributed by technique scheme the most in other words Dividing and can embody with the form of software product, this computer software product can be stored in computer can Read in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that one Computer installation (can be personal computer, server, or network equipment etc.) performs each to be implemented The method described in some part of example or embodiment.

Last it is noted that above example is only in order to illustrate technical scheme, rather than to it Limit；Although the present invention being described in detail with reference to previous embodiment, the ordinary skill of this area Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or Person carries out equivalent to wherein portion of techniques feature；And these amendments or replacement, do not make corresponding skill The essence of art scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. a server cluster monitoring method, it is characterised in that including:

Method the most according to claim 1, it is characterised in that create interim at public service platform Directory node, specifically includes:

According to the structure of server cluster, create all child nodes of described temp directory node, and with each The IP address of described child node is described child node assignment.

Method the most according to claim 2, it is characterised in that monitor described temp directory node, Specifically include:

Whether the child node quantity monitoring described temp directory node changes.

Method the most according to claim 2, it is characterised in that the abnormal child node in location, specifically wraps Include:

Obtain in real time the current structure of temp directory node described in current time, and with described current structure with go through History structure carries out real-time comparison；Wherein, described 'historical structure is the previous moment correspondence of described current time The structure of described temp directory node；

When described comparison result occurs abnormal child node, from described 'historical structure, read described exception sub The IP address of node thus position abnormal child node.

Method the most according to claim 1, it is characterised in that described method also includes:

Quantity according to the described server cluster that need to monitor creates the monitoring thread of respective numbers, wherein said Detection thread is used for the described temp directory node to each described server cluster and described temp directory The son joint of node is monitored；

Use audiomonitor each described monitoring thread is mourned in silence monitoring thus obtain each described clothes in real time The unusual condition of business device cluster.

Method the most according to claim 1, it is characterised in that described abnormality processing prediction scheme includes:

IP address start corresponding spare node according to described abnormal child node；Or,

The abnormal information of described abnormal child node is notified corresponding server cluster manager.

7. a server cluster monitoring device, it is characterised in that include following device:

Device the most according to claim 1, it is characterised in that described pretreatment module, specifically uses In:

Device the most according to claim 2, it is characterised in that described monitoring modular, specifically for:

When described comparison result occurs abnormal child node, from described preset structure, read described exception sub The IP address of node thus position abnormal child node.

11. devices according to claim 1, it is characterised in that described monitoring modular is additionally operable to:

Described device also includes audiomonitor, and belonging audiomonitor is for mourning in silence to each described monitoring thread Monitor thus obtain the unusual condition of each described server cluster in real time.

12. devices according to claim 1, it is characterised in that described abnormality processing prediction scheme includes: