CN105978721A - Method, device and system for monitoring operation state of services in clustering system - Google Patents

Method, device and system for monitoring operation state of services in clustering system Download PDF

Info

Publication number
CN105978721A
CN105978721A CN201610311715.8A CN201610311715A CN105978721A CN 105978721 A CN105978721 A CN 105978721A CN 201610311715 A CN201610311715 A CN 201610311715A CN 105978721 A CN105978721 A CN 105978721A
Authority
CN
China
Prior art keywords
server
service
message
monitoring
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610311715.8A
Other languages
Chinese (zh)
Other versions
CN105978721B (en
Inventor
孙振华
丁医
陈铭
罗水华
崔磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN201610311715.8A priority Critical patent/CN105978721B/en
Publication of CN105978721A publication Critical patent/CN105978721A/en
Application granted granted Critical
Publication of CN105978721B publication Critical patent/CN105978721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method for monitoring operation state of services in a clustering system. The method comprises that a first server sends a first monitoring message to a second server, the first monitoring message makes request for the operation state of a first service in the second server, and the first server and the second server are two random servers which provide service for the external in the clustering system; the first server receives a first response message returned by the second server aimed at the first monitoring message, and the first response message carries operation state information of the first service in the second server; and if the operation state information of the first service in the second server shows that the operation state is abnormal, the first server sends recovery prompt information, and the recovery prompt information prompts to recover the first service in the second server. The invention also discloses a device and system for monitoring the operation state of the services in the clustering system.

Description

A kind of group system monitors the methods, devices and systems of service operation state
Technical field
The present invention relates to communication technical field, particularly relate to a kind of fortune monitoring service in group system The methods, devices and systems of row state.
Background technology
In group system, multiple servers can put together carry out concomitantly identical one or more Service.Owing to execution is shared on multiple servers in identical service, the group system carrying to service Ability has had great lifting to service, and therefore, the performance of group system be enough to compare favourably with large scale computer performance, Further, group system cost for large scale computer is the cheapest.Therefore, group system is at present by extensively Use generally.
In group system, each server runs and has one or more service.If on certain server Certain service operation abnormal, the properly functioning of this service also can be affected by group system, thus cluster System cannot ensure to continue the most externally to provide this service.Therefore, group system needs each clothes Each service that business device provides carries out the monitoring of running status.The running status serviced by monitoring, cluster The service that running status on any one server is abnormal can be recovered by system, thus keeps cluster Systems stay the most externally provides service.
In the prior art, group system, except including the server for externally providing service, also includes For the monitoring device to system server monitoring service operation state.This monitoring device is independent of other Server, is not used in the service that externally provides.By independent monitoring device to server each in group system The service run is monitored, and this monitoring device is it can be found that go out on any one server in group system Existing exception is also recovered, thus ensures that group system the most externally provides service.But, this prison Control equipment is the independent hardware device disposed in group system, and in group system, new deployment one is hard Part equipment needs group system to consume extra software and hardware resources, it is seen then that group system is in order to monitor service Running status need to consume extra software and hardware resources.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of operation monitoring service in group system The methods, devices and systems of state, so that the service operation state in group system can not only continue Stably it is monitored, and avoiding group system is that monitoring service operation state consumes extra soft or hard Part resource, thus not only increase the stability of monitoring, reliability and also save system resource.
First aspect, it is provided that a kind of method of running status monitoring service in group system.The party Method includes:
First server sends the first monitoring message to second server, and described first monitoring message is for asking Ask the running status of first service on described second server, described first server and described second service Device be in described group system any two for externally providing the server of service;
Described first server receives first that described second server returns for described first monitoring message Response message, carries described first service on described second server in described first response message Running state information;
If the running state information that described first service is on described second server represents that running status is different Often, described first server sends recovers information, and described recovery information is for pointing out described Described first service on second server recovers.
Optionally, described recovery information is the operational order for sending to described second server, Described operational order performs to grasp for the abnormality processing of described first service for triggering described second server Making, the operation of described abnormality processing is used for making described first service recover normal on described second server Running status.
Optionally, described abnormality processing operates as to restart described first service on described second server, Or, the operation of described abnormality processing updates number for described second server from the data base of described group system According to internal memory.
Optionally, described recovery information is the short message alarm notice for sending to SMS platform, institute State short message alarm notice and send alarm message for triggering described SMS platform to preassigned user, institute State alarm message for pointing out the described first service on described second server to be in abnormal operation shape State.
Optionally, described first server sends monitoring message to second server, particularly as follows:
Described first server is according to listening period set in advance, to described second clothes in the way of poll Business device sends described first monitoring message.
Optionally, described first server is provided with the first program and the second program, described first program Communication function with first server itself described in described second program reusable;
Described first server sends the first monitoring message to second server, particularly as follows: described first clothes Business device sends described first monitoring message by described first program to described second server;
Described first server receives the first response that described second server returns for described monitoring message Message, particularly as follows: described first server by described second program receive described second server for Described second response message that described monitoring message returns.
Optionally, also include:
Described first server receives the second monitoring message that described second server sends, described second prison Control message is for asking the running status of second service in described first server;
Described first server is based on described second service running state information in described first server Generate the second response message, and return described for described second monitoring message to described second server Two response messages.
Optionally, described first server being preserved the first tables of data, described first tables of data is used for remembering Record the running state information that in described first server, each service is current;
Described first server is based on described second service running state information in described first server Generate the second response message, particularly as follows: described first server is based on described first tables of data current record Running state information generate described second response message.
Optionally, if the running state information of described first service represents that running status is abnormal, described First server sends recovers information, including:
If the running state information that described first service is on described second server represents that running status is different Often, described first server recording exceptional service in second tables of data of the data base of described group system Information, described exception service information includes monitored node mark, service type identification and monitor node mark Knowing, described monitored node is designated the mark of described second server, and described service type identification is institute Stating the mark of first service, described monitor node is designated the mark of described first server;
Described first server inquires about described second tables of data, and at described exception service information successfully record Described exception service information is inquired in the case of described second tables of data;
Described first server, according to the instruction of described exception service information, sends described recovery information.
Optionally, after described first server sends described recovery information, also include:
After described first service recovers normal running status on described second server, described One server is by the described exception service information deletion in described second tables of data.
First aspect, it is provided that the device of a kind of running status monitoring service in group system.Described Device is configured at first server, including:
First transmitting element, for sending the first monitoring message, described first monitoring report to second server Literary composition for asking the running status of first service on described second server, described first server and described Second server be in described group system any two for externally providing the server of service;
First receives unit, for receiving what described second server returned for described first monitoring message First response message, carries described first service at described second server in described first response message On running state information;
Second transmitting element, if for described first service running status letter on described second server Breath represents that running status is abnormal, and described first server sends recovers information, described recovery prompting letter Described destination service on described second server is recovered by breath for prompting.
Optionally, described device also includes:
Second receives unit, for receiving the second monitoring message that described second server sends, and described the Two monitoring messages are for asking the running status of second service in described first server;
Signal generating unit, for based on described second service running state information in described first server Generate the second response message;
Return unit, for returning described second for described second monitoring message to described second server Response message.
The third aspect, it is provided that the system of a kind of running status monitoring service in group system.This is System includes that first server and second server, described first server are configured with aforementioned any one and implement The device of mode.
The embodiment provided according to the application, multiple for externally providing service in group system Server, can use on another server of server monitoring the running status of service, specifically, Assume first server and second server be in group system any two for externally providing the clothes of service Business device, first server can send the first monitoring message to second server, so that second server Return and carry the first response message of first service running state information on second server, first According to this running state information, server just can determine whether first service is on second server different Normal running status, it is possible to determine be in abnormal in the case of send and recover information, with prompting First service on second server is recovered.As can be seen here, by for externally providing service Service operation state on one other server of server monitoring, group system is not necessarily monitoring service Running status and individually dispose a hardware device, be also not necessarily and new dispose hardware device and consume extra Software and hardware resources, thus saved the resource that group system takies.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in embodiment being described below The required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments described in application, for those of ordinary skill in the art, it is also possible to according to these Accompanying drawing obtains other accompanying drawing.
Fig. 1 is the network system block schematic illustration in the embodiment of the present invention involved by an application scenarios;
Fig. 2 is a kind of method of running status monitoring service in group system in the embodiment of the present invention Schematic flow sheet;
Fig. 3 is the schematic diagram of a kind of network scenarios example in the embodiment of the present invention;
Fig. 4 is the device of a kind of running status monitoring service in group system in the embodiment of the present invention Structural representation;
Fig. 5 is the system of a kind of running status monitoring service in group system in the embodiment of the present invention Structural representation.
Detailed description of the invention
In order to make those skilled in the art be more fully understood that the present invention program, real below in conjunction with the present invention Execute the accompanying drawing in example, the technical scheme in the embodiment of the present invention be clearly and completely described, it is clear that Described embodiment is only a part of embodiment of the present invention rather than whole embodiments.Based on this Embodiment in bright, those of ordinary skill in the art are obtained under not making creative work premise Every other embodiment, broadly falls into the scope of protection of the invention.
Inventor finds through research, conventionally, as group system individually deploys one firmly Part equipment goes to monitor each for externally providing the service operation state on server, collection as monitoring device Group's system is accomplished by consuming extra software and hardware resources for the running status monitoring service.If additionally, should Monitoring device itself occurs abnormal, and in group system, the service operation state on each server just cannot monitor, The service that running status is abnormal just cannot recover normal, it is seen then that independent monitoring device is difficult to continually and steadily Service operation state in ground monitoring group system.
In order to solve the problems referred to above of prior art, in embodiments of the present invention, in group system Multiple for externally providing the server of service, can use and take on another server of server monitoring The running status of business.By the clothes on other servers of server monitoring for externally providing service Business running status, group system is not necessarily the running status of monitoring service and individually disposes a hardware device, Also it is not necessarily new deployment hardware device and consumes extra software and hardware resources, thus saved group system and accounted for Resource.Additionally, for the multiple servers for externally offer service in group system, each Server can monitor the running status of service on other servers.By the clothes for externally providing service Business device monitors the most mutually the service operation state of the other side, though a certain or some is for externally providing The server of service occurs abnormal, and in group system, other server also is able to continue Servers-all Carry out the monitoring of service operation state, so that the service of running status exception can recover normal, make Obtain service operation state can be monitored sustainedly and stably.
For example, one of scene of the embodiment of the present invention, network as shown in Figure 1 can be applied to In system.In this network system, server 101 and server 102 be in group system 103 arbitrarily Two for externally providing the server of service.First, server 101 can send to server 102 Monitoring message, wherein, monitoring message running status of service on request server 102.Then, Server 101 receives the response message that server 102 returns, wherein, response message for monitoring message In carry described service running state information on server 102.If described service is at server 102 On running state information represent that running status is abnormal, then server 101 can send recovery information, Wherein, recover information may be used for prompting the described service on server 102 is recovered.
It is understood that above-mentioned scene is only the Sample Scenario that the embodiment of the present invention provides, this Bright embodiment is not limited to this scene.
Below in conjunction with the accompanying drawings, describe in detail in the embodiment of the present invention by embodiment ... implement Mode.
See Fig. 2, it is shown that a kind of running status monitoring service in group system in the embodiment of the present invention The schematic flow sheet of method.In the present embodiment, described method the most specifically may include that
201, first server sends the first monitoring message to second server, and described first monitoring message is used In asking the running status of first service, described first server and described second on described second server Server be in described group system any two for externally providing the server of service.
When implementing, if first server needs to carry out second server the monitoring of service operation state, First server can send the operation shape for asking first service on second server to second server First monitoring message of state, so that second server returns first service operation on second server Status information.
Wherein, the first monitoring message such as can include Function Identification and first clothes of acquisition monitoring information The mark of business device.In a kind of first example monitoring message, the first monitoring message such as can use XML Form.Such as, the following is a kind of format sample monitoring message: " 000XXX<Monitor><Command> InformationCollection</Command><Params><MonitorServerID>CathayServer11 </MonitorServerID></Params></Monitor>”.Wherein, " Command " field carries prison The mark of control information collection function, i.e. " InformationCollectiong ";" MonitorServerID " word Section is for carrying the mark of first server, i.e. " CathayServer11 ".
It is understood that first server is to have the equipment of control function in group system, it is used for supervising Service operation state on other servers in control group system.In the present embodiment, first server represents Be in group system any one for externally providing server of service.Further, group system In each services for externally providing server of service to may be used to other except self in addition to of monitoring Device, that is, each can be the present embodiment for externally providing the server of service in group system Described first server.
It should be noted that second server is the equipment monitored by first server in group system.? In the present embodiment, second server can be in addition to group system any one use interior outside first server In the server externally providing service.Further, each in addition to first server in group system Individual for externally providing the server of service can be monitored service operation state by first server, that is, In group system, each server serviced for external offer in addition to first server can be Second server described by the present embodiment.
Additionally, first service can be any one service run on second server.Further, Any one service run on second server can be monitored running status by first server, that is, Each service run on second server can be the first service described by the present embodiment.
In some embodiments of the present embodiment, the first monitoring message such as can be sent out by first server Give group system interior Servers-all in addition to first server, receive the first prison with any one The server of control message is as second server, and second server obtains all tasks on second server Health information and feed back to first server.Under this embodiment, first server can By once monitoring the transmission of message, it is possible to all clothes on every other server in acquisition group system The running status of business.
In some embodiments of the present embodiment, in order to keep monitoring stable continuing, first service Device can send monitoring message termly, the service operation state current to obtain second server termly. Specifically, in the present embodiment, step 201 such as can be particularly as follows: described first server be according in advance The listening period first set, sends described first monitoring message to described second server in the way of poll.
In some embodiments of the present embodiment, in order in first server by existing communication merit The sending function of monitoring message can be disposed in first server, one can be installed in first server Can the first program of the existing communication function of multiplexing first server itself so that first server can To send monitoring message by the first program.Specifically, 201 such as can be particularly as follows: described first service Device sends described first monitoring message by described first program to described second server;Wherein, described The communication function of first server itself described in first program reusable.In one more specifically example, The communication function of first server itself can be such as to be provided by asynchronous communication framework MINA, the first journey Sequence can realize the transmission merit of the first monitoring message by the MINA framework in direct multiplexing first server Energy.
202, described first server receives what described second server returned for described first monitoring message First response message, carries described first service at described second server in described first response message On running state information.
When implementing, second server monitors message in response to receive first server transmission first, The first response message, and pin can be generated based on first service running state information on second server First monitoring message is returned the first response message to first server, so that first server is by connecing Receive the first response message and obtain first service running state information on second server.
More specifically, in some embodiments of the present embodiment, second server such as can be protected There is a tables of data.In this tables of data such as can by the way of real-time update record second service The running state information that on device, each service is current.Second server in response to receiving the first monitoring message, Running state information based on this tables of data current record can generate the first response message.Wherein, first Response message such as can carry all running state information of this tables of data current record, just wraps among these Include the running state information of first service.
In some embodiments of the present embodiment, in order in first server by existing communication merit The receive capabilities of response message can be disposed in first server, one can be installed in first server Can the existing communication function of multiplexing first server itself the second program and be that the second program arranges one Monitor interface, so that first server can be by the monitoring interface response message of the second program. Specifically, 202 such as can be particularly as follows: described first server receives described the by described second program Described second response message that two servers return for described monitoring message;Wherein, described second program The communication function of first server described in reusable itself.In one more specifically example, the first clothes The communication function of business device itself can be such as to be provided by asynchronous communication framework MINA, and the second program is permissible The sending function of the first monitoring message is realized by the MINA framework in direct multiplexing first server.
It is understood that be previously used for the first program sending monitoring message and be previously used for receiving response Second program of message, can be same application program, or can also be two different application journeys Sequence.
In some embodiments of the present embodiment, in the service fortune of first server monitoring second server On the basis of row state, second server can also monitor the service operation state of first server, that is, The service operation state of the other side can be mutually monitored between different server in group system.Specifically, The present embodiment such as can also include: described first server receives second that described second server sends Monitoring message, described second monitoring message is for asking the operation shape of second service in described first server State;Described first server is based on described second service running state information in described first server Generate the second response message, and return described for described second monitoring message to described second server Two response messages.More specifically, described first server such as can be preserved the first tables of data, Described first tables of data is for recording the running state information that in described first server, each service is current;Institute State first server and generate the mode of the second response message, such as, be characterized in particular in: described first server base Running state information in described first tables of data current record generates described second response message.
If the running state information that 203 described first services are on described second server represents running status Abnormal, described first server sends recovers information, and described recovery information is for pointing out institute The described first service stated on second server recovers.
When implementing, first service running status on second server can be believed by first server Breath is identified, and determines whether to send recovery information according to recognition result.If recognition result is for being somebody's turn to do Running state information mark running status is abnormal, then first server sends and recovers information.If identifying Result is that this running state information represents that running status is normal, then first server can send recovery Information.
In the present embodiment, the recovery information that first server sends is to second service for prompting On device, the first service of operation exception recovers.
In some embodiments of the present embodiment, first server can be by sending to second server The mode of operational order realizes recovering the transmission of information.Wherein, described recovery information is specially For the operational order sent to described second server, described operational order is specifically for triggering described the Two servers perform to operate for the abnormality processing of described first service, the concrete effect of described abnormality processing behaviour On described second server, normal running status is recovered in making described first service.
It is understood that the different abnormal operating conditions occurred for different services, can use not Same described abnormality processing operation.
Such as, if the abnormal operating condition that first service occurs can be got rid of by the service of restarting, then described Abnormality processing operation can be to restart described first service, described operational order on described second server Can be that described recovery information can for indicating second server to restart the reset command of first service Think the reset command message for carrying described reset command.Wherein, reset command message is the most permissible Function Identification and the mark of first service including reset command.Example at a kind of reset command message In, reset command message such as can use XML format.Such as, the following is a kind of reset command message Format sample: " 000XXX<monitor><command>restart_Service</Command><Par ams><MonitorServerID>CathayServer11</MonitorServerID><ServiceType>Cach eManager</ServiceType></Params></Monitor>”.Wherein, " Command " field is used for Carry the Function Identification of reset command, i.e. " Restart_Service ";" ServiceType " field is used for taking With the mark of first service, i.e. " CacheManager ".
And for example, if the abnormal operating condition that first service occurs is that cache information is incorrect, the most described exception Processing operation can be that described second server updates first service from the data base of described group system Data are to internal memory, and described operational order can be to be used for indicating second server to take from database update first The data of business are to the cache flush order of internal memory, and described recovery information can be to be used for carrying described delaying Deposit the cache flush command message of refresh command.Wherein, cache flush command message such as can include delaying Deposit the Function Identification of refresh command and the mark of first service.Example at a kind of cache flush command message In, cache flush command message such as can use XML format.Such as, the following is a kind of cache flush The format sample of command message: " 000XXX<tulip><command>refresh</Command><Para ms><<MonitorServerID>CathayServer11</MonitorServerID>CacheType>CTGCo nnectionPoolConfCache</CacheType></Params></Tulip>”.Wherein, " Command " Field is for carrying the Function Identification of cache flush order, i.e. " Refresh ";" CacheType " field is used In carrying the mark of first service, i.e. " CTGConnectionPoolConfCache ".
In other embodiments in the present embodiment, first server can be by triggering SMS platform Realize recovering the transmission of information to the mode specifying user to send alarm message.Wherein, described recovery Information is particularly for the short message alarm notice sent to SMS platform, described short message alarm notice tool Body is used for triggering described SMS platform and sends alarm message to preassigned user, and described alarm message has Body is used for pointing out the described first service on described second server to be in abnormal running status, so that The running status of this exception is got rid of by user.
Additionally, in first server after second server sends the first monitoring message, if first service Device does not receives the first response message that second server returns, then first service for the first monitoring message Device can also send short message alarm notice to SMS platform, triggers SMS platform and sends out to preassigned user Send short message alarm information, abnormal to point out user's second server to exist.
It should be noted that in some embodiments of the present embodiment, for use multiple in group system In externally providing the server serviced, each server can monitor the service operation shape in other services State.In this case, on same server, the same abnormal operating condition of same service may be multiple Other server listens to.In order to avoid multiple servers all process same abnormal operating condition, permissible A tables of data is preserved, in order to record the misoperation that server listens in the data base of group system State.Server can be believed by preserving exception service information and inquiry exception service in this tables of data The service to abnormal operating condition that ceases processes.Specifically, if 203 such as may include that described first Service running state information on described second server represents that running status is abnormal, described first service Device is recording exceptional information on services in second tables of data of the data base of described group system;Described first clothes Described second tables of data inquired about by business device, and successfully recorded described second data in described exception service information Described exception service information is inquired in the case of table;Described first server is believed according to described exception service The instruction of breath, sends described recovery information.Wherein, described exception service information includes monitored joint Point identification, service type identification and monitor node mark, described monitored node is designated described second clothes The mark of business device, described service type identification is the mark of described first service, and described monitor node identifies Mark for described first server.Additionally, described exception service information, it is also possible to include exception service Title, the title of refreshing service and the record time etc. in the second tables of data.In the second tables of data, The major key of exception service information can be its monitored node mark and service type identification.
It is understood that in exception service information, monitor node mark expression is used for processing described different The often server of information on services.When first server inquiry inquires abnormal clothes in the second tables of data The monitor node mark in abnormal server can be identified during business information.If in described exception service information Monitoring node mark is the mark of described first server itself, then this exception service is believed by first server Breath processes, if the monitoring node mark in described exception service information is not described first server basis The mark of body, then this abnormal server information is not just processed by first server.
Furthermore, after 203 have performed, if first service just recovers on second server Normal running status, first server can be by the described exception service information deletion in the second tables of data.
It should be noted that the present embodiment such as may apply in network scenarios as shown in Figure 3.? In this network scenarios, group system includes multiple server for externally providing service, i.e. shown in Fig. 3 " node 1 ", " node 2 ", " node 3 " and " node 4 ".Each service in these servers Device, both can remove, as the first server in the present embodiment, the service operation shape that monitors on other servers State, it is also possible to as the second server of the present embodiment by other server monitoring service operation states.Fig. 3 Shown " DB " is the aforesaid data base of the present embodiment." short message alarm platform " shown in Fig. 3 i.e. For the aforesaid SMS platform of the present embodiment.
The embodiment provided by the present embodiment, multiple for externally providing clothes in group system The server of business, can use the running status of service on another server of server monitoring.Pass through Service operation state on other servers of server monitoring that service is externally provided, cluster system System is not necessarily the running status of monitoring service and individually disposes a hardware device, is also not necessarily new deployment hard Part equipment and consume extra software and hardware resources, thus saved the resource that group system takies.Additionally, For the multiple servers for externally offer service in group system, each server can monitor it The running status of service on his server.The most mutual by the server for externally providing service Monitor the service operation state of the other side, though a certain or some server generation serviced for external offer Abnormal, in group system, other server also is able to continue Servers-all is carried out service operation state Monitoring so that the abnormal service of running status can recover normal so that service operation state energy Enough it is monitored sustainedly and stably.
See Fig. 4, it is shown that a kind of running status monitoring service in group system in the embodiment of the present invention The structural representation of device.In the present embodiment, described device 400 can be configured at first server. Described device 400 the most specifically may include that
First transmitting element 401, for sending the first monitoring message, described first monitoring to second server Message is for asking the running status of first service on described second server, described first server and institute State second server be in described group system any two for externally providing the server of service;
First receives unit 402, is used for receiving described second server and returns for described first monitoring message The first response message, described first response message carries described first service at described second service Running state information on device;
Second transmitting element 403, if for described first service running status on described second server Information represents that running status is abnormal, and described first server sends recovers information, and described recovery is pointed out Described destination service on described second server is recovered by information for prompting.
Optionally, in some embodiments of the present embodiment, described recovery information can be such as For the operational order sent to described second server, described operational order may be used for triggering described Two servers perform to operate for the abnormality processing of described first service, and the operation of described abnormality processing can be used On described second server, normal running status is recovered in making described first service.
Optionally, in other embodiments of the present embodiment, the operation of described abnormality processing is the most permissible For restarting described first service on described second server, or, the operation of described abnormality processing is the most permissible For described second server from the data base of described group system more new data to internal memory.
Optionally, in the other embodiment of the present embodiment, described recovery information is the most permissible For the short message alarm notice for sending to SMS platform, described short message alarm notice may be used for triggering institute Stating SMS platform and send alarm message to preassigned user, described alarm message may be used for pointing out institute The described first service stated on second server is in abnormal running status.
Optionally, in the still further embodiments of the present embodiment, described first transmitting element, such as have Body may be used for, according to listening period set in advance, sending to described second server in the way of poll Described first monitoring message.
Optionally, in the still further embodiments again of the present embodiment, described first server such as may be used To be provided with the first program and the second program, described first program and described second program can be with described in multiplexings The communication function of first server itself;
Described first transmitting element 401, can specifically for by described first program to described second service Device sends described first monitoring message;
Described first receives unit 402, can be specifically for receiving described second clothes by described second program Described second response message that business device returns for described monitoring message.
Optionally, in the still further embodiments again of the present embodiment, described device 400 is the most all right Including:
Second receives unit, for receiving the second monitoring message that described second server sends, and described the Two monitoring messages are for asking the running status of second service in described first server;
Signal generating unit, for based on described second service running state information in described first server Generate the second response message;
Return unit, for returning described second for described second monitoring message to described second server Response message.
Optionally, in the still further embodiments again of the present embodiment, described first server such as may be used To preserve the first tables of data, described first tables of data may be used for recording in described first server and respectively takes It is engaged in current running state information;
Described signal generating unit, can be specifically for running status based on described first tables of data current record Information generates described second response message.
Optionally, in the still further embodiments again of the present embodiment, described second transmitting element 403 example As can be specifically for:
If the running state information that described first service is on described second server represents that running status is different Often, recording exceptional information on services in second tables of data of the data base of described group system, described exception Information on services includes monitored node mark, the type identification of service and monitor node mark, described is supervised Control node identification is the mark of described second server, and the type identification of described service is described first service Mark, described monitor node is designated the mark of described first server;
Inquire about described second tables of data, and successfully recorded described second data in described exception service information Described exception service information is inquired in the case of table;
According to the instruction of described exception service information, send described recovery information.
Optionally, in the still further embodiments again of the present embodiment, described device 400 is the most all right Including:
Delete unit, for sending described recovery information, in institute in response to described second transmitting element State after first service recovers normal running status on described second server, by described second data Described exception service information deletion in table.
The embodiment provided by the present embodiment, multiple for externally providing clothes in group system The server of business, can use the running status of service on another server of server monitoring.Pass through Service operation state on other servers of server monitoring that service is externally provided, cluster system System is not necessarily the running status of monitoring service and individually disposes a hardware device, is also not necessarily new deployment hard Part equipment and consume extra software and hardware resources, thus saved the resource that group system takies.Additionally, For the multiple servers for externally offer service in group system, each server can monitor it The running status of service on his server.The most mutual by the server for externally providing service Monitor the service operation state of the other side, though a certain or some server generation serviced for external offer Abnormal, in group system, other server also is able to continue Servers-all is carried out service operation state Monitoring so that the abnormal service of running status can recover normal so that service operation state energy Enough it is monitored sustainedly and stably.
See Fig. 5, it is shown that a kind of running status monitoring service in group system in the embodiment of the present invention The structural representation of system.In the present embodiment, described system the most specifically can include first service Device 501 and second server 502, described first server 501 configures in the embodiment shown in earlier figures 4 The device of any one embodiment.
The embodiment provided by the present embodiment, multiple for externally providing clothes in group system The server of business, can use the running status of service on another server of server monitoring.Pass through Service operation state on other servers of server monitoring that service is externally provided, cluster system System is not necessarily the running status of monitoring service and individually disposes a hardware device, is also not necessarily new deployment hard Part equipment and consume extra software and hardware resources, thus saved the resource that group system takies.Additionally, For the multiple servers for externally offer service in group system, each server can monitor it The running status of service on his server.The most mutual by the server for externally providing service Monitor the service operation state of the other side, though a certain or some server generation serviced for external offer Abnormal, in group system, other server also is able to continue Servers-all is carried out service operation state Monitoring so that the abnormal service of running status can recover normal so that service operation state energy Enough it is monitored sustainedly and stably.
" first server ", " first service " mentioned in the embodiment of the present invention, " the first monitoring message ", " first " in the title such as " the first response message ", " the first tables of data " is used only to do name mark, Do not represent first sequentially.This rule is equally applicable to " second " etc..
As seen through the above description of the embodiments, those skilled in the art is it can be understood that arrive The mode that all or part of step in above-described embodiment method can add general hardware platform by software is real Existing.Based on such understanding, technical scheme can embody with the form of software product, This computer software product can be stored in storage medium, as read only memory is (English: read-only Memory, ROM)/RAM, magnetic disc, CD etc., including some instructions with so that a computer Equipment (can be personal computer, server, or the network communication equipment such as such as router) performs Each embodiment of the present invention or the method described in some part of embodiment.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical between each embodiment Similar part sees mutually, and what each embodiment stressed is different from other embodiments Part.For embodiment of the method and apparatus embodiments, owing to it is substantially similar to system in fact Executing example, so describing fairly simple, relevant part sees the part of system embodiment and illustrates.With Upper described equipment and system embodiment are only schematically, the mould wherein illustrated as separating component Block can be or may not be physically separate, and the parts shown as module can be or also Can not be physical module, i.e. may be located at a place, or multiple NE can also be distributed to On.Some or all of module therein can be selected according to the actual needs to realize the present embodiment scheme Purpose.Those of ordinary skill in the art, in the case of not paying creative work, are i.e. appreciated that also Implement.
The above is only the preferred embodiment of the present invention, is not intended to limit protection scope of the present invention. It should be pointed out that, for those skilled in the art, under the premise of not departing from the present invention, Can also make some improvements and modifications, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims (13)

1. the method for the running status monitoring service in group system, it is characterised in that described side Method includes:
First server sends the first monitoring message to second server, and described first monitoring message is for asking Ask the running status of first service on described second server, described first server and described second service Device be in described group system any two for externally providing the server of service;
Described first server receives first that described second server returns for described first monitoring message Response message, carries described first service on described second server in described first response message Running state information;
If the running state information that described first service is on described second server represents that running status is different Often, described first server sends recovers information, and described recovery information is for pointing out described Described first service on second server recovers.
Method the most according to claim 1, it is characterised in that described recovery information is for being used for The operational order sent to described second server, described operational order is used for triggering described second server Performing to operate for the abnormality processing of described first service, the operation of described abnormality processing is used for making described first Service recovers normal running status on described second server.
Method the most according to claim 2, it is characterised in that described abnormality processing operates as in institute State and on second server, restart described first service, or, the operation of described abnormality processing is described second service Device from the data base of described group system more new data to internal memory.
Method the most according to claim 1, it is characterised in that described recovery information is for being used for The short message alarm notice sent to SMS platform, described short message alarm notice is used for triggering described SMS platform Sending alarm message to preassigned user, described alarm message is used for pointing out described second server Described first service be in abnormal running status.
Method the most according to claim 1, it is characterised in that described first server is to the second clothes Business device sends monitoring message, particularly as follows:
Described first server is according to listening period set in advance, to described second clothes in the way of poll Business device sends described first monitoring message.
Method the most according to claim 1, it is characterised in that be provided with in described first server First server described in first program and the second program, described first program and described second program reusable The communication function of itself;
Described first server sends the first monitoring message to second server, particularly as follows: described first clothes Business device sends described first monitoring message by described first program to described second server;
Described first server receives the first response that described second server returns for described monitoring message Message, particularly as follows: described first server by described second program receive described second server for Described second response message that described monitoring message returns.
Method the most according to claim 1, it is characterised in that also include:
Described first server receives the second monitoring message that described second server sends, described second prison Control message is for asking the running status of second service in described first server;
Described first server is based on described second service running state information in described first server Generate the second response message, and return described for described second monitoring message to described second server Two response messages.
Method the most according to claim 7, it is characterised in that preserve in described first server First tables of data, described first tables of data is for recording the operation that in described first server, each service is current Status information;
Described first server is based on described second service running state information in described first server Generate the second response message, particularly as follows: described first server is based on described first tables of data current record Running state information generate described second response message.
Method the most according to claim 1, it is characterised in that if the fortune of described first service Row status information represents that running status is abnormal, and described first server sends recovers information, including:
If the running state information that described first service is on described second server represents that running status is different Often, described first server recording exceptional service in second tables of data of the data base of described group system Information, described exception service information includes monitored node mark, service type identification and monitor node mark Knowing, described monitored node is designated the mark of described second server, and described service type identification is institute Stating the mark of first service, described monitor node is designated the mark of described first server;
Described first server inquires about described second tables of data, and at described exception service information successfully record Described exception service information is inquired in the case of described second tables of data;
Described first server, according to the instruction of described exception service information, sends described recovery information.
Method the most according to claim 9, it is characterised in that send in described first server After described recovery information, also include:
After described first service recovers normal running status on described second server, described One server is by the described exception service information deletion in described second tables of data.
The device of 11. 1 kinds of running statuses monitoring service in group system, it is characterised in that described Device is configured at first server, including:
First transmitting element, for sending the first monitoring message, described first monitoring report to second server Literary composition for asking the running status of first service on described second server, described first server and described Second server be in described group system any two for externally providing the server of service;
First receives unit, for receiving what described second server returned for described first monitoring message First response message, carries described first service at described second server in described first response message On running state information;
Second transmitting element, if for described first service running status letter on described second server Breath represents that running status is abnormal, and described first server sends recovers information, described recovery prompting letter Described destination service on described second server is recovered by breath for prompting.
12. devices according to claim 11, it is characterised in that also include:
Second receives unit, for receiving the second monitoring message that described second server sends, and described the Two monitoring messages are for asking the running status of second service in described first server;
Signal generating unit, for based on described second service running state information in described first server Generate the second response message;
Return unit, for returning described second for described second monitoring message to described second server Response message.
The system of 13. 1 kinds of running statuses monitoring service in group system, it is characterised in that include First server and second server, described first server is configured with described in claim 11 or 12 Device.
CN201610311715.8A 2016-05-11 2016-05-11 The methods, devices and systems of monitoring service operating status in a kind of group system Active CN105978721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610311715.8A CN105978721B (en) 2016-05-11 2016-05-11 The methods, devices and systems of monitoring service operating status in a kind of group system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610311715.8A CN105978721B (en) 2016-05-11 2016-05-11 The methods, devices and systems of monitoring service operating status in a kind of group system

Publications (2)

Publication Number Publication Date
CN105978721A true CN105978721A (en) 2016-09-28
CN105978721B CN105978721B (en) 2019-04-12

Family

ID=56993003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610311715.8A Active CN105978721B (en) 2016-05-11 2016-05-11 The methods, devices and systems of monitoring service operating status in a kind of group system

Country Status (1)

Country Link
CN (1) CN105978721B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106161090A (en) * 2016-07-12 2016-11-23 许继集团有限公司 The monitoring method of a kind of subregion group system and device
CN106713007A (en) * 2016-11-15 2017-05-24 郑州云海信息技术有限公司 Alarm monitoring system and alarm monitoring method and device for server
CN107257384A (en) * 2017-07-24 2017-10-17 北京小米移动软件有限公司 Service state monitoring method and device
CN109361525A (en) * 2018-10-25 2019-02-19 珠海派诺科技股份有限公司 Restart method, apparatus, controlling terminal and medium that distributed deployment services more
CN109828883A (en) * 2017-11-23 2019-05-31 腾讯科技(北京)有限公司 Task data treating method and apparatus, storage medium and electronic device
CN110445650A (en) * 2019-08-07 2019-11-12 中国联合网络通信集团有限公司 Detect alarm method, equipment and server
CN110531988A (en) * 2019-08-06 2019-12-03 新华三大数据技术有限公司 The trend prediction method and relevant apparatus of application program
CN111565135A (en) * 2020-04-30 2020-08-21 吉林省鑫泽网络技术有限公司 Method for monitoring operation of server, monitoring server and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075919A (en) * 2006-06-22 2007-11-21 腾讯科技(深圳)有限公司 Method and system for monitoring Internet service
CN101207519A (en) * 2007-12-13 2008-06-25 上海华为技术有限公司 Version server, operation maintenance unit and method for restoring failure
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
CN102291275A (en) * 2011-08-01 2011-12-21 烟台杰瑞网络商贸有限公司 Server cluster monitoring technology and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075919A (en) * 2006-06-22 2007-11-21 腾讯科技(深圳)有限公司 Method and system for monitoring Internet service
CN101207519A (en) * 2007-12-13 2008-06-25 上海华为技术有限公司 Version server, operation maintenance unit and method for restoring failure
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
CN102291275A (en) * 2011-08-01 2011-12-21 烟台杰瑞网络商贸有限公司 Server cluster monitoring technology and method

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106161090A (en) * 2016-07-12 2016-11-23 许继集团有限公司 The monitoring method of a kind of subregion group system and device
CN106713007A (en) * 2016-11-15 2017-05-24 郑州云海信息技术有限公司 Alarm monitoring system and alarm monitoring method and device for server
CN107257384A (en) * 2017-07-24 2017-10-17 北京小米移动软件有限公司 Service state monitoring method and device
CN107257384B (en) * 2017-07-24 2021-08-17 北京小米移动软件有限公司 Service state monitoring method and device
CN109828883A (en) * 2017-11-23 2019-05-31 腾讯科技(北京)有限公司 Task data treating method and apparatus, storage medium and electronic device
CN109828883B (en) * 2017-11-23 2023-03-17 腾讯科技(北京)有限公司 Task data processing method and device, storage medium and electronic device
CN109361525A (en) * 2018-10-25 2019-02-19 珠海派诺科技股份有限公司 Restart method, apparatus, controlling terminal and medium that distributed deployment services more
CN109361525B (en) * 2018-10-25 2021-08-13 珠海派诺科技股份有限公司 Method, device, control terminal and medium for restarting distributed deployment of multiple services
CN110531988A (en) * 2019-08-06 2019-12-03 新华三大数据技术有限公司 The trend prediction method and relevant apparatus of application program
CN110445650A (en) * 2019-08-07 2019-11-12 中国联合网络通信集团有限公司 Detect alarm method, equipment and server
CN110445650B (en) * 2019-08-07 2022-06-10 中国联合网络通信集团有限公司 Detection alarm method, equipment and server
CN111565135A (en) * 2020-04-30 2020-08-21 吉林省鑫泽网络技术有限公司 Method for monitoring operation of server, monitoring server and storage medium

Also Published As

Publication number Publication date
CN105978721B (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN105978721A (en) Method, device and system for monitoring operation state of services in clustering system
CN103605722B (en) Database monitoring method and device, equipment
CN109710394A (en) Timing task processing system and method
EP3210367B1 (en) System and method for disaster recovery of cloud applications
CN106856489A (en) A kind of service node switching method and apparatus of distributed memory system
CN107391276A (en) Distributed monitor method, interception control device and system
CN108390907B (en) Management monitoring system and method based on Hadoop cluster
CN107682169B (en) Method and device for sending message by Kafka cluster
CN113704052B (en) Operation and maintenance system, method, equipment and medium of micro-service architecture
CN107544867B (en) Method, device and system for recovering intelligent network service
CN108268305A (en) For the system and method for virtual machine scalable appearance automatically
EP3439237A1 (en) Exception monitoring and alarming method and device
CN105429799A (en) Server backup method and device
CN106021070A (en) Method and device for server cluster monitoring
CN112422684A (en) Target message processing method and device, storage medium and electronic device
CN106330531A (en) Node fault recording and processing method and device
CN115202958A (en) Power abnormity monitoring method and device, electronic equipment and storage medium
CN111813348A (en) Node event processing device, method, equipment and medium in unified storage equipment
CN117130730A (en) Metadata management method for federal Kubernetes cluster
JP2005301436A (en) Cluster system and failure recovery method for it
CN117880254A (en) Reconnection method for real-time communication
CN104734895A (en) Service monitoring system and service monitoring method
JP6418377B2 (en) Management target device, management device, and network management system
CN105025179A (en) Method and system for monitoring service agents of call center
CN114301763B (en) Distributed cluster fault processing method and system, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant