CN105978721B - The methods, devices and systems of monitoring service operating status in a kind of group system - Google Patents

The methods, devices and systems of monitoring service operating status in a kind of group system Download PDF

Info

Publication number
CN105978721B
CN105978721B CN201610311715.8A CN201610311715A CN105978721B CN 105978721 B CN105978721 B CN 105978721B CN 201610311715 A CN201610311715 A CN 201610311715A CN 105978721 B CN105978721 B CN 105978721B
Authority
CN
China
Prior art keywords
server
service
monitoring
message
operating status
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610311715.8A
Other languages
Chinese (zh)
Other versions
CN105978721A (en
Inventor
孙振华
丁医
陈铭
罗水华
崔磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN201610311715.8A priority Critical patent/CN105978721B/en
Publication of CN105978721A publication Critical patent/CN105978721A/en
Application granted granted Critical
Publication of CN105978721B publication Critical patent/CN105978721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The method for the operating status of monitoring service that the embodiment of the invention discloses a kind of in group system.The described method includes: first server sends the first monitoring message to second server, the first monitoring message is used to request the operating status of first service on second server, and first server and second server are server of any two for externally offer service in group system;First server receives the first response message that second server is returned for the first monitoring message, carries running state information of the first service on second server in the first response message;If running state information of the first service on second server indicates that operating status is abnormal, first server, which is sent, restores prompt information, and the recovery prompt information is for prompting to restore the first service on the second server.In addition, the device and system of the embodiment of the invention also discloses a kind of in the group system operating status of monitoring service.

Description

The methods, devices and systems of monitoring service operating status in a kind of group system
Technical field
The present invention relates to field of communication technology, more particularly to a kind of operating status of monitoring service in group system Methods, devices and systems.
Background technique
In group system, multiple servers, which can be put together, concomitantly carries out identical one or more services.By It is shared in identical service to executing on multiple servers, group system has great promotions clothes to the bearing capacity of service Business, therefore, the performance of group system is enough to compare favourably with mainframe performance, also, group system for mainframe at This is more cheap.Therefore, group system is widely used at present.
In group system, operation has one or more services on each server.If certain service on certain server It is operating abnormally, group system also will receive influence to the normal operation of the service, so that group system cannot guarantee to continue surely The service is externally provided surely.Therefore, each service that group system needs to provide each server carries out operating status Monitoring.By the operating status of monitoring service, group system can be to the service of operating status exception on any one server Restored, so that group system be kept sustainedly and stably externally to provide service.
In the prior art, group system further includes for being in addition to including for externally providing the server of service The monitoring device for server monitoring service operating status of uniting.The monitoring device is not used in and externally mentions independently of other servers For service.It is monitored by the service that independent monitoring device runs server each in group system, the monitoring device energy It enough finds the exception occurred on any one server in group system and is restored, to guarantee that group system is normally right Outer offer service.But the monitoring device is the hardware device individually disposed in group system, the new portion in group system Affixing one's name to a hardware device needs group system to consume additional software and hardware resources, it is seen then that group system in order to monitoring service fortune Row state needs to consume additional software and hardware resources.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of side of the operating status of monitoring service in group system Method, device and system so that the service operation state in group system can not only sustainedly and stably be monitored, and avoid Group system is monitoring service operating status and consumes additional software and hardware resources, to not only increase the stabilization of monitoring Property, reliability and also save system resource.
In a first aspect, providing a kind of method of the operating status of monitoring service in group system.This method comprises:
First server sends the first monitoring message to second server, and the first monitoring message is for requesting described the The operating status of first service on two servers, the first server and the second server are appointed in the group system Meaning two for externally providing the server of service;
The first server receives the first response report that the second server is returned for the first monitoring message Text carries running state information of the first service on the second server in first response message;
If running state information of the first service on the second server indicates that operating status is abnormal, described the One server, which is sent, restores prompt information, and the recovery prompt information is for prompting to described first on the second server Service is restored.
Optionally, the prompt information of restoring is the operational order for sending to the second server, the operation It instructs and is operated for triggering the second server execution for the abnormality processing of the first service, the abnormality processing operation For making the first service restore normal operating status on the second server.
Optionally, the abnormality processing operation is restarts the first service on the second server, or, described different Normal processing operation be the second server from the database of the group system more new data to memory.
Optionally, the prompt information of restoring is the short message alarm notice for sending to SMS platform, and the short message is accused Alert notice sends alarm message to preassigned user for triggering the SMS platform, and the alarm message is for prompting institute The first service stated on second server is in abnormal operating status.
Optionally, the first server sends monitoring message to second server, specifically:
The first server is sent in a manner of poll to the second server according to preset listening period The first monitoring message.
Optionally, the first program and the second program, first program and described are installed in the first server The communication function of first server described in two program reusables itself;
The first server sends the first monitoring message to second server, specifically: the first server passes through First program sends the first monitoring message to the second server;
The first server receives the first response message that the second server is returned for the monitoring message, tool Body are as follows: the first server receives the institute that the second server is returned for the monitoring message by second program State the second response message.
Optionally, further includes:
The first server receives the second monitoring message that the second server is sent, and the second monitoring message is used In the operating status for requesting second service in the first server;
The first server generates the based on running state information of the second service in the first server Two response messages, and second response message is returned to the second server for the second monitoring message.
Optionally, the first tables of data is preserved in the first server, first tables of data is for recording described the Current running state information is respectively serviced on one server;
The first server generates the based on running state information of the second service in the first server Two response messages, specifically: the first server is generated based on the running state information of the first tables of data current record Second response message.
Optionally, if the running state information of the first service indicates that operating status is abnormal, the first service Device, which is sent, restores prompt information, comprising:
If running state information of the first service on the second server indicates that operating status is abnormal, described the One server recording exceptional information on services in the second tables of data of the database of the group system, the exception service information Including monitored node mark, service type identification and monitoring node identification, the monitored node is identified as second clothes The mark of business device, the service type identification are the mark of the first service, and the monitoring node identification is first clothes The mark of business device;
The first server inquires second tables of data, and is successfully recorded described the in the exception service information The exception service information is inquired in the case where two tables of data;
The first server sends the recovery prompt information according to the instruction of the exception service information.
Optionally, after the first server sends the recovery prompt information, further includes:
After the first service restores normal operating status on the second server, the first server By the exception service information deletion in second tables of data.
In a first aspect, providing a kind of device of the operating status of monitoring service in group system.Described device configuration In first server, comprising:
First transmission unit, for sending the first monitoring message to second server, the first monitoring message is for asking The operating status of first service on the second server is sought, the first server and the second server are the clusters Any two are for externally providing the server of service in system;
First receiving unit, the first response returned for receiving the second server for the first monitoring message Message carries running state information of the first service on the second server in first response message;
Second transmission unit, if indicating fortune for running state information of the first service on the second server Row abnormal state, the first server, which is sent, restores prompt information, and the recovery prompt information is for prompting to described second The destination service on server is restored.
Optionally, described device further include:
Second receiving unit, the second monitoring message sent for receiving the second server, the second monitoring report Text is for requesting the operating status of second service in the first server;
Generation unit, for generating second based on running state information of the second service in the first server Response message;
Return unit, for returning to the second response report to the second server for the second monitoring message Text.
The third aspect provides a kind of system of the operating status of monitoring service in group system.The system includes the One server and second server, the first server are configured with the device of any one aforementioned embodiment.
According to embodiment provided by the present application, for multiple for externally providing the service of service in group system Device, can be using the operating status serviced on another server of server monitoring, specifically, it is assumed that first server and the Two servers are that for any two for externally providing the server of service, first server can be to second service in group system Device sends the first monitoring message, so that second server return carries operating status of the first service on second server First response message of information, first server can determine first service in second server according to the running state information On whether in abnormal operating status, and can be sent in the case where determining in exception and restore prompt information, with prompt First service on second server is restored.It can be seen that being supervised by a server for externally providing service The service operation state on other servers is controlled, group system is not necessarily the operating status of monitoring service and individually disposes one firmly Part equipment is also not necessarily new deployment hardware device and consumes additional software and hardware resources, to save group system occupancy Resource.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations as described in this application Example, for those of ordinary skill in the art, is also possible to obtain other drawings based on these drawings.
Fig. 1 is network system block schematic illustration involved in an application scenarios in the embodiment of the present invention;
Fig. 2 is a kind of process signal of the method for the operating status of the monitoring service in group system in the embodiment of the present invention Figure;
Fig. 3 is a kind of exemplary schematic diagram of network scenarios in the embodiment of the present invention;
Fig. 4 is a kind of structural representation of the device of the operating status of the monitoring service in group system in the embodiment of the present invention Figure;
Fig. 5 is a kind of structural representation of the system of the operating status of the monitoring service in group system in the embodiment of the present invention Figure.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Inventor has found that conventionally, as group system individually deploys a hardware device work Go to monitor for monitoring device each for externally providing the service operation state on server, group system is for monitoring service Operating status just needs to consume additional software and hardware resources.In addition, if the monitoring device itself is abnormal, it is each in group system Service operation state on server can not just monitor, and the service of operating status exception can not just restore normal, it is seen then that independent Monitoring device is difficult to sustainedly and stably monitor the service operation state in group system.
In order to solve the above problem of the prior art, in embodiments of the present invention, multiple in group system are used for The server of service is externally provided, it can be using the operating status serviced on an another server of server monitoring.Pass through use Service operation state on external server monitoring other servers for providing service, group system are not necessarily monitoring clothes The operating status of business and individually dispose a hardware device, be also not necessarily new deployment hardware device and consume additional software and hardware money Source, to save the resource of group system occupancy.In addition, for multiple for externally providing the clothes of service in group system Business device, each server can monitor the operating status serviced on other servers.By for externally providing the service of service Device monitors mutually the service operation state of other side each other, even if a certain or certain servers for externally providing service are sent out Raw abnormal, other servers also can continue to the monitoring that service operation state is carried out to Servers-all in group system, from And the service of operating status exception is enabled to restore normally, service operation state to be sustainedly and stably monitored.
For example, one of the scene of the embodiment of the present invention, can be applied in network system as shown in Figure 1.? In the network system, server 101 and server 102 are clothes of any two for externally offer service in group system 103 Business device.Firstly, server 101 can send monitoring message to server 102, wherein monitoring message is used for request server 102 The operating status of upper service.Then, server 101 receives the response message that server 102 is returned for monitoring message, wherein Running state information of the service on server 102 is carried in response message.If the service is on server 102 Running state information indicates that operating status is abnormal, then server 101 can send recovery prompt information, wherein restores prompt letter Breath can be used for prompting to restore the service on server 102.
It is understood that above-mentioned scene is only a Sample Scenario provided in an embodiment of the present invention, the embodiment of the present invention It is not limited to this scene.
With reference to the accompanying drawing, the specific implementation by embodiment come in the present invention will be described in detail embodiment ....
Referring to fig. 2, a kind of method of the operating status of the monitoring service in group system in the embodiment of the present invention is shown Flow diagram.In the present embodiment, the method for example can specifically include:
201, first server sends the first monitoring message to second server, and the first monitoring message is for requesting institute The operating status of first service on second server is stated, the first server and the second server are the group systems Interior any two are for externally providing the server of service.
When specific implementation, if first server needs to carry out second server the monitoring of service operation state, the first clothes Business device can be sent to second server for requesting first of the operating status of first service on second server to monitor message, So that second server returns to running state information of the first service on second server.
Wherein, the first monitoring message for example may include the mark for the Function Identification and first server for acquiring monitoring information Know.In the example that one kind first monitors message, the first monitoring message can for example use XML format.For example, being a kind of below Monitor the format sample of message: " 000XXX<monitor><command>informationCollection</Command>< Params><MonitorServerID>CathayServer11</MonitorServerID></Params></Monitor>”。 Wherein, " Command " field carries the mark of monitoring information acquisition function, i.e. " InformationCollectiong "; " MonitorServerID " field is used to carry the mark of first server, i.e., " CathayServer11 ".
It is understood that first server is the equipment in group system with monitoring function, for monitoring cluster system Service operation state on other servers in uniting.In the present embodiment, what first server indicated is any one in group system A server for externally offer service.Further, the server that each is serviced for external offer in group system It may be used to monitor other servers in addition to itself, that is, each is for externally providing service in group system Server can be first server described in the present embodiment.
It should be noted that second server is the equipment monitored in group system by first server.In the present embodiment In, second server can be the service that any one is serviced for external offer in group system other than first server Device.Further, each in group system in addition to first server is ok for externally providing the server of service By first server monitoring service operating status, that is, each in group system in addition to first server is for external The server for providing service can be second server described in the present embodiment.
In addition, first service can be any one service run on second server.Further, second server Any one service of upper operation can monitor operating status by first server, that is, what is run on second server is every One service can be first service described in the present embodiment.
In some embodiments of the present embodiment, the first monitoring message for example can be sent to cluster by first server Servers-all in system in addition to first server receives the server of the first monitoring message as the using any one Two servers, second server obtain health information of all tasks on second server and feed back to first service Device.Under this embodiment, first server can be by once monitoring the transmission of message, so that it may obtain in group system The operating status of all services on every other server.
In some embodiments of the present embodiment, continue in order to keep monitoring stablizing, first server can determine Monitoring message is sent, phase regularly to obtain the current service operation state of second server.Specifically, in the present embodiment In, step 201 for example can be with specifically: the first server according to preset listening period, in a manner of poll to The second server sends the first monitoring message.
In some embodiments of the present embodiment, in order in first server by existing communication function first The sending function of deployment monitoring message on server, one can be installed in first server can be multiplexed first server sheet First program of the existing communication function of body, so that first server can send monitoring message by the first program.Specifically Ground, 201 for example can be with specifically: the first server sends described the to the second server by first program One monitoring message;Wherein, the communication function of first server itself described in the first program reusable.Specifically at one Example in, the communication function of first server itself for example can be by asynchronous communication frame MINA provide, the first program can To realize the sending function of the first monitoring message by the MINA frame being directly multiplexed in first server.
202, the first server receives the second server and answers for the first monitoring message returns first Message is answered, running state information of the first service on the second server is carried in first response message.
When specific implementation, second server monitors message in response to receiving the first of first server transmission, can be with base The first response message is generated in running state information of the first service on second server, and for the first monitoring message to the One server returns to the first response message, so that first server obtains first service the by receiving the first response message Running state information on two servers.
More specifically, in some embodiments of the present embodiment, one can be for example preserved on second server Tables of data.The fortune that each service is current on second server can be for example recorded by way of real-time update in the tables of data Row status information.Second server, can be based on the operation of the tables of data current record in response to receiving the first monitoring message Status information generates the first response message.Wherein, the first response message can for example carry all of the tables of data current record Running state information just includes the running state information of first service among these.
In some embodiments of the present embodiment, in order in first server by existing communication function first The receive capabilities of response message are disposed on server, one can be installed in first server can be multiplexed first server sheet Simultaneously a monitoring interface is arranged for the second program in second program of the existing communication function of body, so that first server can lead to Cross the monitoring interface response message of the second program.It specifically, 202 for example can be with specifically: the first server passes through Second program receives second response message that the second server is returned for the monitoring message;Wherein, institute State the communication function of first server described in the second program reusable itself.In one more specifically example, first service The communication function of device itself for example can be to be provided by asynchronous communication frame MINA, and the second program can be by being directly multiplexed first MINA frame on server realizes the sending function of the first monitoring message.
It is understood that being previously used for sending the first program of monitoring message and being previously used for receiving the of response message Two programs can be the same application program, or be also possible to two different application programs.
In some embodiments of the present embodiment, the service operation state of second server is monitored in first server On the basis of, second server can also monitor the service operation state of first server, that is, the different services in group system The service operation state of other side can be mutually monitored between device.Specifically, the present embodiment for example can also include: first clothes Business device receives the second monitoring message that the second server is sent, and the second monitoring message is for requesting the first service The operating status of second service on device;Operation of the first server based on the second service in the first server Status information generates the second response message, and returns to described second to the second server for the second monitoring message and answer Answer message.More specifically, the first tables of data can be for example preserved in the first server, first tables of data is used for It records and respectively services current running state information in the first server;The first server generates the second response message Mode, such as be characterized in particular in: the first server is generated based on the running state information of the first tables of data current record Second response message.
If 203, running state information of the first service on the second server indicates that operating status is abnormal, institute It states first server and sends and restore prompt information, the recoverys prompt information is for prompt to described on the second server First service is restored.
When specific implementation, first server can know running state information of the first service on second server Not, and determine whether to send according to recognition result and restore prompt information.If recognition result is running state information mark operation Abnormal state, then first server, which is sent, restores prompt information.If recognition result is that the running state information indicates operating status Normally, then first server may not necessarily send recovery prompt information.
In the present embodiment, the recovery prompt information that first server is sent is for prompting to running on second server Abnormal first service is restored.
In some embodiments of the present embodiment, first server can be by sending operational order to second server Mode realize restore prompt information transmission.Wherein, the recovery prompt information is particularly for the second server The operational order of transmission, the operational order are specifically used for triggering the second server execution for the different of the first service Normal processing operation, the abnormality processing behaviour, which specifically acts on, makes the first service restore normal on the second server Operating status.
It, can be using different described it is understood that for the different abnormal operating conditions that different services occur Abnormality processing operation.
For example, if the abnormal operating condition that first service occurs can be excluded by the service of restarting, the abnormality processing Operation can be to restart the first service on the second server, and the operational order can be to be used to indicate the second clothes The reset command for opening first service is thought highly of in business, and the prompt information of restoring can be to restart life for carry the reset command Enable message.Wherein, reset command message for example may include the Function Identification of reset command and the mark of first service.One In the example of kind reset command message, reset command message can for example use XML format.For example, being that one kind restarts life below Enable the format sample of message: " 000XXX<monitor><command>restart_Service</Command><par ams>< MonitorServerID>CathayServer11</MonitorServerID><ServiceType>Cach eManager</ ServiceType></Params></Monitor>".Wherein, " Command " field is used to carry the function mark of reset command Know, i.e., " Restart_Service ";" ServiceType " field is used to carry the mark of first service, i.e., “CacheManager”。
For another example, if the abnormal operating condition that first service occurs is that cache information is incorrect, the abnormality processing operation The data of first service can be updated from the database of the group system for the second server to memory, the operation Instruction can be the cache flush order for being used to indicate second server from the data of database update first service to memory, institute Stating recovery prompt information can be the cache flush command message for carrying the cache flush order.Wherein, cache flush Command message for example may include the Function Identification of cache flush order and the mark of first service.In a kind of cache flush order In the example of message, cache flush command message can for example use XML format.For example, being a kind of cache flush order below The format sample of message: " 000XXX<tulip><command>refresh</Command><para ms><< MonitorServerID>CathayServer11</MonitorServerID>CacheType>CTGCo nnectionPoolConfCache</CacheType></Params></Tulip>".Wherein, " Command " field is for taking Function Identification with cache flush order, i.e. " Refresh ";" CacheType " field is used to carry the mark of first service, i.e., “CTGConnectionPoolConfCache”。
In other embodiments in the present embodiment, first server can be by triggering SMS platform to specified use The mode that family sends alarm message realizes the transmission for restoring prompt information.Wherein, the recovery prompt information particularly for The short message alarm notice that SMS platform is sent, short message alarm notice are specifically used for triggering the SMS platform to preassigning User send alarm message, the alarm message is specifically used for that the first service on the second server is prompted to be in Abnormal operating status, so that user excludes the operating status of the exception.
In addition, after first server sends the first monitoring message to second server, if first server receives not The first response message returned to second server for the first monitoring message, then first server can also be sent out to SMS platform Short message alarm is sent to notify, triggering SMS platform sends short message alarm information to preassigned user, to prompt user second to take Device be engaged in the presence of abnormal.
It should be noted that in some embodiments of the present embodiment, for multiple for externally mentioning in group system For the server of service, each server can monitor the service operation state in other services.In this case, same clothes The same abnormal operating condition of same service may be listened to by multiple other servers on business device.In order to avoid multiple clothes Business device all handles same abnormal operating condition, and a tables of data can be saved in the database of group system, to record clothes The abnormal operating condition that business device listens to.Server can be different by saving exception service information and inquiry in the tables of data Normal information on services handles the service of abnormal operating condition.Specifically, if 203 for example may include: the first service Running state information on the second server indicates that operating status is abnormal, and the first server is in the group system Database the second tables of data in recording exceptional information on services;The first server inquiry second tables of data, and The exception service information inquires the exception service information in the case where second tables of data is successfully recorded;Described One server sends the recovery prompt information according to the instruction of the exception service information.Wherein, the exception service information Including monitored node mark, service type identification and monitoring node identification, the monitored node is identified as second clothes The mark of business device, the service type identification are the mark of the first service, and the monitoring node identification is first clothes The mark of business device.In addition, the exception service information, can also include the title of exception service, the title of refreshing service and Record time in second tables of data etc..In the second tables of data, the major key of exception service information can be its monitored node Mark and service type identification.
It is understood that monitoring node identification is indicated for handling the exception service letter in exception service information The server of breath.Exception can be identified when first server inquiry inquires an exception service information in the second tables of data Monitoring node identification in server.If the monitoring node mark in the exception service information is described first server itself Mark, then first server handles the exception service information, if the monitoring node mark in the exception service information Knowledge is not the mark of described first server itself, then first server is not with regard to handling the abnormal server information.
Furthermore, after 203 execute completion, if first service restores normally to run shape on second server State, first server can be by the exception service information deletions in the second tables of data.
It should be noted that the present embodiment for example can be applied in network scenarios as shown in Figure 3.In the network scenarios In, group system includes multiple for externally providing the server of service, i.e., " node 1 " shown in Fig. 3, " node 2 ", " node 3 " and " node 4 ".Each of these servers server both can be used as the first server in the present embodiment and go to monitor Service operation state on other servers, the second server that can also be used as the present embodiment are serviced by other server monitorings Operating status." DB " shown in Fig. 3 is the present embodiment database above-mentioned." short message alarm platform " shown in Fig. 3 is this Embodiment SMS platform above-mentioned.
The embodiment provided through this embodiment, for multiple for externally providing the service of service in group system Device, can be using the operating status serviced on an another server of server monitoring.Pass through one for externally providing service Service operation state on other servers of a server monitoring, group system are not necessarily the operating status of monitoring service and independent A hardware device is disposed, new deployment hardware device is also not necessarily and consumes additional software and hardware resources, to save cluster The resource that system occupies.In addition, each server can for multiple for externally providing the server of service in group system To monitor the operating status serviced on other servers.Pass through the monitoring mutually each other of the server for externally providing service The service operation state of other side, even if a certain or certain servers for externally providing service are abnormal, in group system Other servers also can continue to the monitoring that service operation state is carried out to Servers-all, so that operating status is abnormal Service can restore normally, service operation state to be sustainedly and stably monitored.
Referring to fig. 4, a kind of device of the operating status of the monitoring service in group system in the embodiment of the present invention is shown Structural schematic diagram.In the present embodiment, described device 400 can be configured at first server.Described device 400 is for example specific May include:
First transmission unit 401, for sending the first monitoring message to second server, the first monitoring message is used for The operating status of first service on the second server is requested, the first server and the second server are the collection Any two are for externally providing the server of service in group's system;
First receiving unit 402, first returned for receiving the second server for the first monitoring message Response message carries operating status letter of the first service on the second server in first response message Breath;
Second transmission unit 403, if the running state information table for the first service on the second server Show operating status exception, the first server, which is sent, restores prompt information, and the recovery prompt information is for prompting to described The destination service on second server is restored.
Optionally, in some embodiments of the present embodiment, the recovery prompt information for example can be for for institute The operational order of second server transmission is stated, the operational order can be used for triggering the second server and execute for described The abnormality processing of first service operates, and the abnormality processing operation may be used to the first service in the second server It is upper to restore normal operating status.
Optionally, in other embodiments of the present embodiment, the abnormality processing operation for example can be for described The first service is restarted on second server, or, abnormality processing operation can be for another example the second server from institute More new data is stated in the database of group system to memory.
Optionally, in the other embodiment of the present embodiment, the recovery prompt information for example can for for The short message alarm notice that SMS platform is sent, short message alarm notice can be used for triggering the SMS platform to preassigning User send alarm message, the alarm message can be used for prompting the first service on the second server to be in Abnormal operating status.
Optionally, in some other embodiments of the present embodiment, first transmission unit, such as specifically can be used for According to preset listening period, the first monitoring message is sent to the second server in a manner of poll.
Optionally, it in some other embodiments again of the present embodiment, can be for example equipped in the first server First program and the second program, first program and second program can be multiplexed the communication of described first server itself Function;
First transmission unit 401 can be specifically used for sending by first program to the second server The first monitoring message;
First receiving unit 402 can be specifically used for receiving the second server needle by second program Second response message that the monitoring message is returned.
Optionally, in some other embodiments again of the present embodiment, described device 400 for example can also include:
Second receiving unit, the second monitoring message sent for receiving the second server, the second monitoring report Text is for requesting the operating status of second service in the first server;
Generation unit, for generating second based on running state information of the second service in the first server Response message;
Return unit, for returning to the second response report to the second server for the second monitoring message Text.
Optionally, it in some other embodiments again of the present embodiment, can for example be preserved in the first server First tables of data, first tables of data, which can be used for recording, respectively services current operating status letter in the first server Breath;
The generation unit can be specifically used for the running state information based on the first tables of data current record and generate Second response message.
Optionally, in some other embodiments again of the present embodiment, second transmission unit 403 for example can be specific For:
If running state information of the first service on the second server indicates that operating status is abnormal, described Recording exceptional information on services in second tables of data of the database of group system, the exception service information includes monitored node Mark, the type identification of service and monitoring node identification, the monitored node are identified as the mark of the second server, institute The type identification for stating service is the mark of the first service, and the monitoring node identification is the mark of the first server;
Second tables of data is inquired, and the case where successfully second tables of data is recorded in the exception service information Under inquire the exception service information;
According to the instruction of the exception service information, the recovery prompt information is sent.
Optionally, in some other embodiments again of the present embodiment, described device 400 for example can also include:
Unit is deleted, for sending the recovery prompt information in response to second transmission unit, in first clothes After business restores normal operating status on the second server, the exception service in second tables of data is believed Breath is deleted.
The embodiment provided through this embodiment, for multiple for externally providing the service of service in group system Device, can be using the operating status serviced on an another server of server monitoring.Pass through one for externally providing service Service operation state on other servers of a server monitoring, group system are not necessarily the operating status of monitoring service and independent A hardware device is disposed, new deployment hardware device is also not necessarily and consumes additional software and hardware resources, to save cluster The resource that system occupies.In addition, each server can for multiple for externally providing the server of service in group system To monitor the operating status serviced on other servers.Pass through the monitoring mutually each other of the server for externally providing service The service operation state of other side, even if a certain or certain servers for externally providing service are abnormal, in group system Other servers also can continue to the monitoring that service operation state is carried out to Servers-all, so that operating status is abnormal Service can restore normally, service operation state to be sustainedly and stably monitored.
Referring to Fig. 5, a kind of system of the operating status of the monitoring service in group system in the embodiment of the present invention is shown Structural schematic diagram.In the present embodiment, the system for example can specifically include first server 501 and second server 502, the first server 501 configures the device of any one embodiment in aforementioned embodiment shown in Fig. 4.
The embodiment provided through this embodiment, for multiple for externally providing the service of service in group system Device, can be using the operating status serviced on an another server of server monitoring.Pass through one for externally providing service Service operation state on other servers of a server monitoring, group system are not necessarily the operating status of monitoring service and independent A hardware device is disposed, new deployment hardware device is also not necessarily and consumes additional software and hardware resources, to save cluster The resource that system occupies.In addition, each server can for multiple for externally providing the server of service in group system To monitor the operating status serviced on other servers.Pass through the monitoring mutually each other of the server for externally providing service The service operation state of other side, even if a certain or certain servers for externally providing service are abnormal, in group system Other servers also can continue to the monitoring that service operation state is carried out to Servers-all, so that operating status is abnormal Service can restore normally, service operation state to be sustainedly and stably monitored.
" first server ", " first service ", " the first monitoring message ", " the first response mentioned in the embodiment of the present invention " first " in the titles such as message ", " the first tables of data " is used only to do name mark, does not represent first sequentially.It should Rule is equally applicable to " second " etc..
As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation All or part of the steps in example method can add the mode of general hardware platform to realize by software.Based on this understanding, Technical solution of the present invention can be embodied in the form of software products, which can store is situated between in storage In matter, such as read-only memory (English: read-only memory, ROM)/RAM, magnetic disk, CD etc., including some instructions to So that a computer equipment (can be the network communication equipments such as personal computer, server, or router) executes Method described in certain parts of each embodiment of the present invention or embodiment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for method reality For applying example and apparatus embodiments, since it is substantially similar to system embodiment, so describe fairly simple, related place ginseng See the part explanation of system embodiment.Equipment and system embodiment described above is only schematical, wherein making It may or may not be physically separated for the module of separate part description, the component shown as module can be Or it may not be physical module, it can it is in one place, or may be distributed over multiple network units.It can be with Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment according to the actual needs.The common skill in this field Art personnel can understand and implement without creative efforts.
The above is only a preferred embodiment of the present invention, it is not intended to limit the scope of the present invention.It should refer to Out, for those skilled in the art, under the premise of not departing from the present invention, can also make several improvements And retouching, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims (11)

1. a kind of method of the operating status of the monitoring service in group system, which is characterized in that the described method includes:
First server sends the first monitoring message to second server, and the first monitoring message is for requesting second clothes The operating status of first service on business device, the first server and the second server are any two in the group system A server for externally offer service;
The first server receives the first response message that the second server is returned for the first monitoring message, institute State the running state information that the first service is carried in the first response message on the second server;
If running state information of the first service on the second server indicates that operating status is abnormal, first clothes Business device, which is sent, restores prompt information, and the recovery prompt information is for prompting to the first service on the second server Restored;
The first server receives the second monitoring message that the second server is sent, and the second monitoring message is for asking Seek the operating status of second service in the first server;
The first server generates second based on running state information of the second service in the first server and answers Message is answered, and returns to second response message to the second server for the second monitoring message.
2. the method according to claim 1, wherein the recovery prompt information is for the second service The operational order that device is sent, the operational order are used to trigger the second server and execute the exception for being directed to the first service Processing operation, the abnormality processing operation is for making the first service restore normally to run shape on the second server State.
3. according to the method described in claim 2, it is characterized in that, abnormality processing operation is on the second server The first service is restarted, or, abnormality processing operation is the second server from the database of the group system More new data is to memory.
4. the method according to claim 1, wherein the recovery prompt information is for sending to SMS platform Short message alarm notice, it is short that short message alarm notice sends alarm for triggering the SMS platform to preassigned user Letter, the alarm message are used to prompt the first service on the second server to be in abnormal operating status.
5. the method according to claim 1, wherein the first server sends monitoring report to second server Text, specifically:
The first server is according to preset listening period, to described in second server transmission in a manner of poll First monitoring message.
6. the method according to claim 1, wherein being equipped with the first program and second in the first server The communication function of first server described in program, first program and the second program reusable itself;
The first server sends the first monitoring message to second server, specifically: the first server passes through described First program sends the first monitoring message to the second server;
The first server receives the first response message that the second server is returned for the monitoring message, specifically Are as follows: the first server receives the second server for the described of the monitoring message return by second program Second response message.
7. the method according to claim 1, wherein preserve the first tables of data in the first server, institute It states the first tables of data and respectively services current running state information in the first server for recording;
The first server generates second based on running state information of the second service in the first server and answers Message is answered, specifically: described in running state information generation of the first server based on the first tables of data current record Second response message.
8. if the method according to claim 1, wherein the running state information of the first service indicates Operating status is abnormal, and the first server, which is sent, restores prompt information, comprising:
If running state information of the first service on the second server indicates that operating status is abnormal, first clothes Business device recording exceptional information on services in the second tables of data of the database of the group system, the exception service information include Monitored node mark, service type identification and monitoring node identification, the monitored node are identified as the second server Mark, the service type identification be the first service mark, the monitoring node identification be the first server Mark;
The first server inquires second tables of data, and second number is successfully recorded in the exception service information The exception service information is inquired in the case where according to table;
The first server sends the recovery prompt information according to the instruction of the exception service information.
9. according to the method described in claim 8, it is characterized in that, sending the recovery prompt information in the first server Later, further includes:
After the first service restores normal operating status on the second server, the first server is by institute State the exception service information deletion in the second tables of data.
10. a kind of device of the operating status of the monitoring service in group system, which is characterized in that described device is configured at first Server, comprising:
First transmission unit, for sending the first monitoring message to second server, the first monitoring message is for requesting institute The operating status of first service on second server is stated, the first server and the second server are the group systems Interior any two are for externally providing the server of service;
First receiving unit, the first response report returned for receiving the second server for the first monitoring message Text carries running state information of the first service on the second server in first response message;
Second transmission unit, if indicating operation shape for running state information of the first service on the second server State is abnormal, and the first server, which is sent, restores prompt information, and the recovery prompt information is for prompting to the second service The first service on device is restored;
Second receiving unit, the second monitoring message sent for receiving the second server, the second monitoring message are used In the operating status for requesting second service in the first server;
Generation unit, for generating the second response based on running state information of the second service in the first server Message;
Return unit, for returning to second response message to the second server for the second monitoring message.
11. a kind of system of the operating status of the monitoring service in group system, which is characterized in that including first server and Two servers, the first server are configured with device described in any one of claim 10.
CN201610311715.8A 2016-05-11 2016-05-11 The methods, devices and systems of monitoring service operating status in a kind of group system Active CN105978721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610311715.8A CN105978721B (en) 2016-05-11 2016-05-11 The methods, devices and systems of monitoring service operating status in a kind of group system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610311715.8A CN105978721B (en) 2016-05-11 2016-05-11 The methods, devices and systems of monitoring service operating status in a kind of group system

Publications (2)

Publication Number Publication Date
CN105978721A CN105978721A (en) 2016-09-28
CN105978721B true CN105978721B (en) 2019-04-12

Family

ID=56993003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610311715.8A Active CN105978721B (en) 2016-05-11 2016-05-11 The methods, devices and systems of monitoring service operating status in a kind of group system

Country Status (1)

Country Link
CN (1) CN105978721B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106161090A (en) * 2016-07-12 2016-11-23 许继集团有限公司 The monitoring method of a kind of subregion group system and device
CN106713007A (en) * 2016-11-15 2017-05-24 郑州云海信息技术有限公司 Alarm monitoring system and alarm monitoring method and device for server
CN107257384B (en) * 2017-07-24 2021-08-17 北京小米移动软件有限公司 Service state monitoring method and device
CN109828883B (en) * 2017-11-23 2023-03-17 腾讯科技(北京)有限公司 Task data processing method and device, storage medium and electronic device
CN109361525B (en) * 2018-10-25 2021-08-13 珠海派诺科技股份有限公司 Method, device, control terminal and medium for restarting distributed deployment of multiple services
CN110531988B (en) * 2019-08-06 2023-06-06 新华三大数据技术有限公司 Application program state prediction method and related device
CN110445650B (en) * 2019-08-07 2022-06-10 中国联合网络通信集团有限公司 Detection alarm method, equipment and server
CN111565135A (en) * 2020-04-30 2020-08-21 吉林省鑫泽网络技术有限公司 Method for monitoring operation of server, monitoring server and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075919A (en) * 2006-06-22 2007-11-21 腾讯科技(深圳)有限公司 Method and system for monitoring Internet service
CN101207519A (en) * 2007-12-13 2008-06-25 上海华为技术有限公司 Version server, operation maintenance unit and method for restoring failure
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
CN102291275A (en) * 2011-08-01 2011-12-21 烟台杰瑞网络商贸有限公司 Server cluster monitoring technology and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075919A (en) * 2006-06-22 2007-11-21 腾讯科技(深圳)有限公司 Method and system for monitoring Internet service
CN101207519A (en) * 2007-12-13 2008-06-25 上海华为技术有限公司 Version server, operation maintenance unit and method for restoring failure
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
CN102291275A (en) * 2011-08-01 2011-12-21 烟台杰瑞网络商贸有限公司 Server cluster monitoring technology and method

Also Published As

Publication number Publication date
CN105978721A (en) 2016-09-28

Similar Documents

Publication Publication Date Title
CN105978721B (en) The methods, devices and systems of monitoring service operating status in a kind of group system
CN103605722B (en) Database monitoring method and device, equipment
CN107453889B (en) A kind of method for uploading and device of journal file
TWI391828B (en) Method and system for monitoring server events in a node configuration by using direct communication between servers
EP3210367B1 (en) System and method for disaster recovery of cloud applications
US7093013B1 (en) High availability system for network elements
TW201416898A (en) Data monitoring method and system, and server end and user end thereof
US20130205017A1 (en) Computer failure monitoring method and device
US8266301B2 (en) Deployment of asynchronous agentless agent functionality in clustered environments
US20060117101A1 (en) Node discovery and communications in a network
CN107682169B (en) Method and device for sending message by Kafka cluster
CN110830283A (en) Fault detection method, device, equipment and system
CN112422684B (en) Target message processing method and device, storage medium and electronic device
CN112612545A (en) Configuration hot loading system, method, equipment and medium of server cluster
JP2013097548A (en) Information processing system, information processing device, client terminal, information processing method and program
CN107018159B (en) Service request processing method and device, and service request method and device
US20140101320A1 (en) Information processing system, control method, management apparatus and computer-readable recording medium
JP2005301436A (en) Cluster system and failure recovery method for it
CN111342986A (en) Distributed node management method and device, distributed system and storage medium
JP2000250833A (en) Operation information acquiring method for operation management of plural servers, and recording medium recorded with program therefor
CN103457771B (en) The management method of the cluster virtual machine of a kind of HA and equipment
CN112637337B (en) Data processing method and device
CN114116178A (en) Cluster framework task management method and related device
WO2019216210A1 (en) Service continuation system and service continuation method
CN110890989A (en) Channel connection method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant