CN105978721B - The methods, devices and systems of monitoring service operating status in a kind of group system - Google Patents
The methods, devices and systems of monitoring service operating status in a kind of group system Download PDFInfo
- Publication number
- CN105978721B CN105978721B CN201610311715.8A CN201610311715A CN105978721B CN 105978721 B CN105978721 B CN 105978721B CN 201610311715 A CN201610311715 A CN 201610311715A CN 105978721 B CN105978721 B CN 105978721B
- Authority
- CN
- China
- Prior art keywords
- server
- service
- monitoring
- message
- operating status
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5003—Managing SLA; Interaction between SLA and QoS
- H04L41/5009—Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Debugging And Monitoring (AREA)
Abstract
The method for the operating status of monitoring service that the embodiment of the invention discloses a kind of in group system.The described method includes: first server sends the first monitoring message to second server, the first monitoring message is used to request the operating status of first service on second server, and first server and second server are server of any two for externally offer service in group system;First server receives the first response message that second server is returned for the first monitoring message, carries running state information of the first service on second server in the first response message;If running state information of the first service on second server indicates that operating status is abnormal, first server, which is sent, restores prompt information, and the recovery prompt information is for prompting to restore the first service on the second server.In addition, the device and system of the embodiment of the invention also discloses a kind of in the group system operating status of monitoring service.
Description
Technical field
The present invention relates to field of communication technology, more particularly to a kind of operating status of monitoring service in group system
Methods, devices and systems.
Background technique
In group system, multiple servers, which can be put together, concomitantly carries out identical one or more services.By
It is shared in identical service to executing on multiple servers, group system has great promotions clothes to the bearing capacity of service
Business, therefore, the performance of group system is enough to compare favourably with mainframe performance, also, group system for mainframe at
This is more cheap.Therefore, group system is widely used at present.
In group system, operation has one or more services on each server.If certain service on certain server
It is operating abnormally, group system also will receive influence to the normal operation of the service, so that group system cannot guarantee to continue surely
The service is externally provided surely.Therefore, each service that group system needs to provide each server carries out operating status
Monitoring.By the operating status of monitoring service, group system can be to the service of operating status exception on any one server
Restored, so that group system be kept sustainedly and stably externally to provide service.
In the prior art, group system further includes for being in addition to including for externally providing the server of service
The monitoring device for server monitoring service operating status of uniting.The monitoring device is not used in and externally mentions independently of other servers
For service.It is monitored by the service that independent monitoring device runs server each in group system, the monitoring device energy
It enough finds the exception occurred on any one server in group system and is restored, to guarantee that group system is normally right
Outer offer service.But the monitoring device is the hardware device individually disposed in group system, the new portion in group system
Affixing one's name to a hardware device needs group system to consume additional software and hardware resources, it is seen then that group system in order to monitoring service fortune
Row state needs to consume additional software and hardware resources.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of side of the operating status of monitoring service in group system
Method, device and system so that the service operation state in group system can not only sustainedly and stably be monitored, and avoid
Group system is monitoring service operating status and consumes additional software and hardware resources, to not only increase the stabilization of monitoring
Property, reliability and also save system resource.
In a first aspect, providing a kind of method of the operating status of monitoring service in group system.This method comprises:
First server sends the first monitoring message to second server, and the first monitoring message is for requesting described the
The operating status of first service on two servers, the first server and the second server are appointed in the group system
Meaning two for externally providing the server of service;
The first server receives the first response report that the second server is returned for the first monitoring message
Text carries running state information of the first service on the second server in first response message;
If running state information of the first service on the second server indicates that operating status is abnormal, described the
One server, which is sent, restores prompt information, and the recovery prompt information is for prompting to described first on the second server
Service is restored.
Optionally, the prompt information of restoring is the operational order for sending to the second server, the operation
It instructs and is operated for triggering the second server execution for the abnormality processing of the first service, the abnormality processing operation
For making the first service restore normal operating status on the second server.
Optionally, the abnormality processing operation is restarts the first service on the second server, or, described different
Normal processing operation be the second server from the database of the group system more new data to memory.
Optionally, the prompt information of restoring is the short message alarm notice for sending to SMS platform, and the short message is accused
Alert notice sends alarm message to preassigned user for triggering the SMS platform, and the alarm message is for prompting institute
The first service stated on second server is in abnormal operating status.
Optionally, the first server sends monitoring message to second server, specifically:
The first server is sent in a manner of poll to the second server according to preset listening period
The first monitoring message.
Optionally, the first program and the second program, first program and described are installed in the first server
The communication function of first server described in two program reusables itself;
The first server sends the first monitoring message to second server, specifically: the first server passes through
First program sends the first monitoring message to the second server;
The first server receives the first response message that the second server is returned for the monitoring message, tool
Body are as follows: the first server receives the institute that the second server is returned for the monitoring message by second program
State the second response message.
Optionally, further includes:
The first server receives the second monitoring message that the second server is sent, and the second monitoring message is used
In the operating status for requesting second service in the first server;
The first server generates the based on running state information of the second service in the first server
Two response messages, and second response message is returned to the second server for the second monitoring message.
Optionally, the first tables of data is preserved in the first server, first tables of data is for recording described the
Current running state information is respectively serviced on one server;
The first server generates the based on running state information of the second service in the first server
Two response messages, specifically: the first server is generated based on the running state information of the first tables of data current record
Second response message.
Optionally, if the running state information of the first service indicates that operating status is abnormal, the first service
Device, which is sent, restores prompt information, comprising:
If running state information of the first service on the second server indicates that operating status is abnormal, described the
One server recording exceptional information on services in the second tables of data of the database of the group system, the exception service information
Including monitored node mark, service type identification and monitoring node identification, the monitored node is identified as second clothes
The mark of business device, the service type identification are the mark of the first service, and the monitoring node identification is first clothes
The mark of business device;
The first server inquires second tables of data, and is successfully recorded described the in the exception service information
The exception service information is inquired in the case where two tables of data;
The first server sends the recovery prompt information according to the instruction of the exception service information.
Optionally, after the first server sends the recovery prompt information, further includes:
After the first service restores normal operating status on the second server, the first server
By the exception service information deletion in second tables of data.
In a first aspect, providing a kind of device of the operating status of monitoring service in group system.Described device configuration
In first server, comprising:
First transmission unit, for sending the first monitoring message to second server, the first monitoring message is for asking
The operating status of first service on the second server is sought, the first server and the second server are the clusters
Any two are for externally providing the server of service in system;
First receiving unit, the first response returned for receiving the second server for the first monitoring message
Message carries running state information of the first service on the second server in first response message;
Second transmission unit, if indicating fortune for running state information of the first service on the second server
Row abnormal state, the first server, which is sent, restores prompt information, and the recovery prompt information is for prompting to described second
The destination service on server is restored.
Optionally, described device further include:
Second receiving unit, the second monitoring message sent for receiving the second server, the second monitoring report
Text is for requesting the operating status of second service in the first server;
Generation unit, for generating second based on running state information of the second service in the first server
Response message;
Return unit, for returning to the second response report to the second server for the second monitoring message
Text.
The third aspect provides a kind of system of the operating status of monitoring service in group system.The system includes the
One server and second server, the first server are configured with the device of any one aforementioned embodiment.
According to embodiment provided by the present application, for multiple for externally providing the service of service in group system
Device, can be using the operating status serviced on another server of server monitoring, specifically, it is assumed that first server and the
Two servers are that for any two for externally providing the server of service, first server can be to second service in group system
Device sends the first monitoring message, so that second server return carries operating status of the first service on second server
First response message of information, first server can determine first service in second server according to the running state information
On whether in abnormal operating status, and can be sent in the case where determining in exception and restore prompt information, with prompt
First service on second server is restored.It can be seen that being supervised by a server for externally providing service
The service operation state on other servers is controlled, group system is not necessarily the operating status of monitoring service and individually disposes one firmly
Part equipment is also not necessarily new deployment hardware device and consumes additional software and hardware resources, to save group system occupancy
Resource.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations as described in this application
Example, for those of ordinary skill in the art, is also possible to obtain other drawings based on these drawings.
Fig. 1 is network system block schematic illustration involved in an application scenarios in the embodiment of the present invention;
Fig. 2 is a kind of process signal of the method for the operating status of the monitoring service in group system in the embodiment of the present invention
Figure;
Fig. 3 is a kind of exemplary schematic diagram of network scenarios in the embodiment of the present invention;
Fig. 4 is a kind of structural representation of the device of the operating status of the monitoring service in group system in the embodiment of the present invention
Figure;
Fig. 5 is a kind of structural representation of the system of the operating status of the monitoring service in group system in the embodiment of the present invention
Figure.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this
Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist
Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Inventor has found that conventionally, as group system individually deploys a hardware device work
Go to monitor for monitoring device each for externally providing the service operation state on server, group system is for monitoring service
Operating status just needs to consume additional software and hardware resources.In addition, if the monitoring device itself is abnormal, it is each in group system
Service operation state on server can not just monitor, and the service of operating status exception can not just restore normal, it is seen then that independent
Monitoring device is difficult to sustainedly and stably monitor the service operation state in group system.
In order to solve the above problem of the prior art, in embodiments of the present invention, multiple in group system are used for
The server of service is externally provided, it can be using the operating status serviced on an another server of server monitoring.Pass through use
Service operation state on external server monitoring other servers for providing service, group system are not necessarily monitoring clothes
The operating status of business and individually dispose a hardware device, be also not necessarily new deployment hardware device and consume additional software and hardware money
Source, to save the resource of group system occupancy.In addition, for multiple for externally providing the clothes of service in group system
Business device, each server can monitor the operating status serviced on other servers.By for externally providing the service of service
Device monitors mutually the service operation state of other side each other, even if a certain or certain servers for externally providing service are sent out
Raw abnormal, other servers also can continue to the monitoring that service operation state is carried out to Servers-all in group system, from
And the service of operating status exception is enabled to restore normally, service operation state to be sustainedly and stably monitored.
For example, one of the scene of the embodiment of the present invention, can be applied in network system as shown in Figure 1.?
In the network system, server 101 and server 102 are clothes of any two for externally offer service in group system 103
Business device.Firstly, server 101 can send monitoring message to server 102, wherein monitoring message is used for request server 102
The operating status of upper service.Then, server 101 receives the response message that server 102 is returned for monitoring message, wherein
Running state information of the service on server 102 is carried in response message.If the service is on server 102
Running state information indicates that operating status is abnormal, then server 101 can send recovery prompt information, wherein restores prompt letter
Breath can be used for prompting to restore the service on server 102.
It is understood that above-mentioned scene is only a Sample Scenario provided in an embodiment of the present invention, the embodiment of the present invention
It is not limited to this scene.
With reference to the accompanying drawing, the specific implementation by embodiment come in the present invention will be described in detail embodiment ....
Referring to fig. 2, a kind of method of the operating status of the monitoring service in group system in the embodiment of the present invention is shown
Flow diagram.In the present embodiment, the method for example can specifically include:
201, first server sends the first monitoring message to second server, and the first monitoring message is for requesting institute
The operating status of first service on second server is stated, the first server and the second server are the group systems
Interior any two are for externally providing the server of service.
When specific implementation, if first server needs to carry out second server the monitoring of service operation state, the first clothes
Business device can be sent to second server for requesting first of the operating status of first service on second server to monitor message,
So that second server returns to running state information of the first service on second server.
Wherein, the first monitoring message for example may include the mark for the Function Identification and first server for acquiring monitoring information
Know.In the example that one kind first monitors message, the first monitoring message can for example use XML format.For example, being a kind of below
Monitor the format sample of message: " 000XXX<monitor><command>informationCollection</Command><
Params><MonitorServerID>CathayServer11</MonitorServerID></Params></Monitor>”。
Wherein, " Command " field carries the mark of monitoring information acquisition function, i.e. " InformationCollectiong ";
" MonitorServerID " field is used to carry the mark of first server, i.e., " CathayServer11 ".
It is understood that first server is the equipment in group system with monitoring function, for monitoring cluster system
Service operation state on other servers in uniting.In the present embodiment, what first server indicated is any one in group system
A server for externally offer service.Further, the server that each is serviced for external offer in group system
It may be used to monitor other servers in addition to itself, that is, each is for externally providing service in group system
Server can be first server described in the present embodiment.
It should be noted that second server is the equipment monitored in group system by first server.In the present embodiment
In, second server can be the service that any one is serviced for external offer in group system other than first server
Device.Further, each in group system in addition to first server is ok for externally providing the server of service
By first server monitoring service operating status, that is, each in group system in addition to first server is for external
The server for providing service can be second server described in the present embodiment.
In addition, first service can be any one service run on second server.Further, second server
Any one service of upper operation can monitor operating status by first server, that is, what is run on second server is every
One service can be first service described in the present embodiment.
In some embodiments of the present embodiment, the first monitoring message for example can be sent to cluster by first server
Servers-all in system in addition to first server receives the server of the first monitoring message as the using any one
Two servers, second server obtain health information of all tasks on second server and feed back to first service
Device.Under this embodiment, first server can be by once monitoring the transmission of message, so that it may obtain in group system
The operating status of all services on every other server.
In some embodiments of the present embodiment, continue in order to keep monitoring stablizing, first server can determine
Monitoring message is sent, phase regularly to obtain the current service operation state of second server.Specifically, in the present embodiment
In, step 201 for example can be with specifically: the first server according to preset listening period, in a manner of poll to
The second server sends the first monitoring message.
In some embodiments of the present embodiment, in order in first server by existing communication function first
The sending function of deployment monitoring message on server, one can be installed in first server can be multiplexed first server sheet
First program of the existing communication function of body, so that first server can send monitoring message by the first program.Specifically
Ground, 201 for example can be with specifically: the first server sends described the to the second server by first program
One monitoring message;Wherein, the communication function of first server itself described in the first program reusable.Specifically at one
Example in, the communication function of first server itself for example can be by asynchronous communication frame MINA provide, the first program can
To realize the sending function of the first monitoring message by the MINA frame being directly multiplexed in first server.
202, the first server receives the second server and answers for the first monitoring message returns first
Message is answered, running state information of the first service on the second server is carried in first response message.
When specific implementation, second server monitors message in response to receiving the first of first server transmission, can be with base
The first response message is generated in running state information of the first service on second server, and for the first monitoring message to the
One server returns to the first response message, so that first server obtains first service the by receiving the first response message
Running state information on two servers.
More specifically, in some embodiments of the present embodiment, one can be for example preserved on second server
Tables of data.The fortune that each service is current on second server can be for example recorded by way of real-time update in the tables of data
Row status information.Second server, can be based on the operation of the tables of data current record in response to receiving the first monitoring message
Status information generates the first response message.Wherein, the first response message can for example carry all of the tables of data current record
Running state information just includes the running state information of first service among these.
In some embodiments of the present embodiment, in order in first server by existing communication function first
The receive capabilities of response message are disposed on server, one can be installed in first server can be multiplexed first server sheet
Simultaneously a monitoring interface is arranged for the second program in second program of the existing communication function of body, so that first server can lead to
Cross the monitoring interface response message of the second program.It specifically, 202 for example can be with specifically: the first server passes through
Second program receives second response message that the second server is returned for the monitoring message;Wherein, institute
State the communication function of first server described in the second program reusable itself.In one more specifically example, first service
The communication function of device itself for example can be to be provided by asynchronous communication frame MINA, and the second program can be by being directly multiplexed first
MINA frame on server realizes the sending function of the first monitoring message.
It is understood that being previously used for sending the first program of monitoring message and being previously used for receiving the of response message
Two programs can be the same application program, or be also possible to two different application programs.
In some embodiments of the present embodiment, the service operation state of second server is monitored in first server
On the basis of, second server can also monitor the service operation state of first server, that is, the different services in group system
The service operation state of other side can be mutually monitored between device.Specifically, the present embodiment for example can also include: first clothes
Business device receives the second monitoring message that the second server is sent, and the second monitoring message is for requesting the first service
The operating status of second service on device;Operation of the first server based on the second service in the first server
Status information generates the second response message, and returns to described second to the second server for the second monitoring message and answer
Answer message.More specifically, the first tables of data can be for example preserved in the first server, first tables of data is used for
It records and respectively services current running state information in the first server;The first server generates the second response message
Mode, such as be characterized in particular in: the first server is generated based on the running state information of the first tables of data current record
Second response message.
If 203, running state information of the first service on the second server indicates that operating status is abnormal, institute
It states first server and sends and restore prompt information, the recoverys prompt information is for prompt to described on the second server
First service is restored.
When specific implementation, first server can know running state information of the first service on second server
Not, and determine whether to send according to recognition result and restore prompt information.If recognition result is running state information mark operation
Abnormal state, then first server, which is sent, restores prompt information.If recognition result is that the running state information indicates operating status
Normally, then first server may not necessarily send recovery prompt information.
In the present embodiment, the recovery prompt information that first server is sent is for prompting to running on second server
Abnormal first service is restored.
In some embodiments of the present embodiment, first server can be by sending operational order to second server
Mode realize restore prompt information transmission.Wherein, the recovery prompt information is particularly for the second server
The operational order of transmission, the operational order are specifically used for triggering the second server execution for the different of the first service
Normal processing operation, the abnormality processing behaviour, which specifically acts on, makes the first service restore normal on the second server
Operating status.
It, can be using different described it is understood that for the different abnormal operating conditions that different services occur
Abnormality processing operation.
For example, if the abnormal operating condition that first service occurs can be excluded by the service of restarting, the abnormality processing
Operation can be to restart the first service on the second server, and the operational order can be to be used to indicate the second clothes
The reset command for opening first service is thought highly of in business, and the prompt information of restoring can be to restart life for carry the reset command
Enable message.Wherein, reset command message for example may include the Function Identification of reset command and the mark of first service.One
In the example of kind reset command message, reset command message can for example use XML format.For example, being that one kind restarts life below
Enable the format sample of message: " 000XXX<monitor><command>restart_Service</Command><par ams><
MonitorServerID>CathayServer11</MonitorServerID><ServiceType>Cach eManager</
ServiceType></Params></Monitor>".Wherein, " Command " field is used to carry the function mark of reset command
Know, i.e., " Restart_Service ";" ServiceType " field is used to carry the mark of first service, i.e.,
“CacheManager”。
For another example, if the abnormal operating condition that first service occurs is that cache information is incorrect, the abnormality processing operation
The data of first service can be updated from the database of the group system for the second server to memory, the operation
Instruction can be the cache flush order for being used to indicate second server from the data of database update first service to memory, institute
Stating recovery prompt information can be the cache flush command message for carrying the cache flush order.Wherein, cache flush
Command message for example may include the Function Identification of cache flush order and the mark of first service.In a kind of cache flush order
In the example of message, cache flush command message can for example use XML format.For example, being a kind of cache flush order below
The format sample of message: " 000XXX<tulip><command>refresh</Command><para ms><<
MonitorServerID>CathayServer11</MonitorServerID>CacheType>CTGCo
nnectionPoolConfCache</CacheType></Params></Tulip>".Wherein, " Command " field is for taking
Function Identification with cache flush order, i.e. " Refresh ";" CacheType " field is used to carry the mark of first service, i.e.,
“CTGConnectionPoolConfCache”。
In other embodiments in the present embodiment, first server can be by triggering SMS platform to specified use
The mode that family sends alarm message realizes the transmission for restoring prompt information.Wherein, the recovery prompt information particularly for
The short message alarm notice that SMS platform is sent, short message alarm notice are specifically used for triggering the SMS platform to preassigning
User send alarm message, the alarm message is specifically used for that the first service on the second server is prompted to be in
Abnormal operating status, so that user excludes the operating status of the exception.
In addition, after first server sends the first monitoring message to second server, if first server receives not
The first response message returned to second server for the first monitoring message, then first server can also be sent out to SMS platform
Short message alarm is sent to notify, triggering SMS platform sends short message alarm information to preassigned user, to prompt user second to take
Device be engaged in the presence of abnormal.
It should be noted that in some embodiments of the present embodiment, for multiple for externally mentioning in group system
For the server of service, each server can monitor the service operation state in other services.In this case, same clothes
The same abnormal operating condition of same service may be listened to by multiple other servers on business device.In order to avoid multiple clothes
Business device all handles same abnormal operating condition, and a tables of data can be saved in the database of group system, to record clothes
The abnormal operating condition that business device listens to.Server can be different by saving exception service information and inquiry in the tables of data
Normal information on services handles the service of abnormal operating condition.Specifically, if 203 for example may include: the first service
Running state information on the second server indicates that operating status is abnormal, and the first server is in the group system
Database the second tables of data in recording exceptional information on services;The first server inquiry second tables of data, and
The exception service information inquires the exception service information in the case where second tables of data is successfully recorded;Described
One server sends the recovery prompt information according to the instruction of the exception service information.Wherein, the exception service information
Including monitored node mark, service type identification and monitoring node identification, the monitored node is identified as second clothes
The mark of business device, the service type identification are the mark of the first service, and the monitoring node identification is first clothes
The mark of business device.In addition, the exception service information, can also include the title of exception service, the title of refreshing service and
Record time in second tables of data etc..In the second tables of data, the major key of exception service information can be its monitored node
Mark and service type identification.
It is understood that monitoring node identification is indicated for handling the exception service letter in exception service information
The server of breath.Exception can be identified when first server inquiry inquires an exception service information in the second tables of data
Monitoring node identification in server.If the monitoring node mark in the exception service information is described first server itself
Mark, then first server handles the exception service information, if the monitoring node mark in the exception service information
Knowledge is not the mark of described first server itself, then first server is not with regard to handling the abnormal server information.
Furthermore, after 203 execute completion, if first service restores normally to run shape on second server
State, first server can be by the exception service information deletions in the second tables of data.
It should be noted that the present embodiment for example can be applied in network scenarios as shown in Figure 3.In the network scenarios
In, group system includes multiple for externally providing the server of service, i.e., " node 1 " shown in Fig. 3, " node 2 ", " node
3 " and " node 4 ".Each of these servers server both can be used as the first server in the present embodiment and go to monitor
Service operation state on other servers, the second server that can also be used as the present embodiment are serviced by other server monitorings
Operating status." DB " shown in Fig. 3 is the present embodiment database above-mentioned." short message alarm platform " shown in Fig. 3 is this
Embodiment SMS platform above-mentioned.
The embodiment provided through this embodiment, for multiple for externally providing the service of service in group system
Device, can be using the operating status serviced on an another server of server monitoring.Pass through one for externally providing service
Service operation state on other servers of a server monitoring, group system are not necessarily the operating status of monitoring service and independent
A hardware device is disposed, new deployment hardware device is also not necessarily and consumes additional software and hardware resources, to save cluster
The resource that system occupies.In addition, each server can for multiple for externally providing the server of service in group system
To monitor the operating status serviced on other servers.Pass through the monitoring mutually each other of the server for externally providing service
The service operation state of other side, even if a certain or certain servers for externally providing service are abnormal, in group system
Other servers also can continue to the monitoring that service operation state is carried out to Servers-all, so that operating status is abnormal
Service can restore normally, service operation state to be sustainedly and stably monitored.
Referring to fig. 4, a kind of device of the operating status of the monitoring service in group system in the embodiment of the present invention is shown
Structural schematic diagram.In the present embodiment, described device 400 can be configured at first server.Described device 400 is for example specific
May include:
First transmission unit 401, for sending the first monitoring message to second server, the first monitoring message is used for
The operating status of first service on the second server is requested, the first server and the second server are the collection
Any two are for externally providing the server of service in group's system;
First receiving unit 402, first returned for receiving the second server for the first monitoring message
Response message carries operating status letter of the first service on the second server in first response message
Breath;
Second transmission unit 403, if the running state information table for the first service on the second server
Show operating status exception, the first server, which is sent, restores prompt information, and the recovery prompt information is for prompting to described
The destination service on second server is restored.
Optionally, in some embodiments of the present embodiment, the recovery prompt information for example can be for for institute
The operational order of second server transmission is stated, the operational order can be used for triggering the second server and execute for described
The abnormality processing of first service operates, and the abnormality processing operation may be used to the first service in the second server
It is upper to restore normal operating status.
Optionally, in other embodiments of the present embodiment, the abnormality processing operation for example can be for described
The first service is restarted on second server, or, abnormality processing operation can be for another example the second server from institute
More new data is stated in the database of group system to memory.
Optionally, in the other embodiment of the present embodiment, the recovery prompt information for example can for for
The short message alarm notice that SMS platform is sent, short message alarm notice can be used for triggering the SMS platform to preassigning
User send alarm message, the alarm message can be used for prompting the first service on the second server to be in
Abnormal operating status.
Optionally, in some other embodiments of the present embodiment, first transmission unit, such as specifically can be used for
According to preset listening period, the first monitoring message is sent to the second server in a manner of poll.
Optionally, it in some other embodiments again of the present embodiment, can be for example equipped in the first server
First program and the second program, first program and second program can be multiplexed the communication of described first server itself
Function;
First transmission unit 401 can be specifically used for sending by first program to the second server
The first monitoring message;
First receiving unit 402 can be specifically used for receiving the second server needle by second program
Second response message that the monitoring message is returned.
Optionally, in some other embodiments again of the present embodiment, described device 400 for example can also include:
Second receiving unit, the second monitoring message sent for receiving the second server, the second monitoring report
Text is for requesting the operating status of second service in the first server;
Generation unit, for generating second based on running state information of the second service in the first server
Response message;
Return unit, for returning to the second response report to the second server for the second monitoring message
Text.
Optionally, it in some other embodiments again of the present embodiment, can for example be preserved in the first server
First tables of data, first tables of data, which can be used for recording, respectively services current operating status letter in the first server
Breath;
The generation unit can be specifically used for the running state information based on the first tables of data current record and generate
Second response message.
Optionally, in some other embodiments again of the present embodiment, second transmission unit 403 for example can be specific
For:
If running state information of the first service on the second server indicates that operating status is abnormal, described
Recording exceptional information on services in second tables of data of the database of group system, the exception service information includes monitored node
Mark, the type identification of service and monitoring node identification, the monitored node are identified as the mark of the second server, institute
The type identification for stating service is the mark of the first service, and the monitoring node identification is the mark of the first server;
Second tables of data is inquired, and the case where successfully second tables of data is recorded in the exception service information
Under inquire the exception service information;
According to the instruction of the exception service information, the recovery prompt information is sent.
Optionally, in some other embodiments again of the present embodiment, described device 400 for example can also include:
Unit is deleted, for sending the recovery prompt information in response to second transmission unit, in first clothes
After business restores normal operating status on the second server, the exception service in second tables of data is believed
Breath is deleted.
The embodiment provided through this embodiment, for multiple for externally providing the service of service in group system
Device, can be using the operating status serviced on an another server of server monitoring.Pass through one for externally providing service
Service operation state on other servers of a server monitoring, group system are not necessarily the operating status of monitoring service and independent
A hardware device is disposed, new deployment hardware device is also not necessarily and consumes additional software and hardware resources, to save cluster
The resource that system occupies.In addition, each server can for multiple for externally providing the server of service in group system
To monitor the operating status serviced on other servers.Pass through the monitoring mutually each other of the server for externally providing service
The service operation state of other side, even if a certain or certain servers for externally providing service are abnormal, in group system
Other servers also can continue to the monitoring that service operation state is carried out to Servers-all, so that operating status is abnormal
Service can restore normally, service operation state to be sustainedly and stably monitored.
Referring to Fig. 5, a kind of system of the operating status of the monitoring service in group system in the embodiment of the present invention is shown
Structural schematic diagram.In the present embodiment, the system for example can specifically include first server 501 and second server
502, the first server 501 configures the device of any one embodiment in aforementioned embodiment shown in Fig. 4.
The embodiment provided through this embodiment, for multiple for externally providing the service of service in group system
Device, can be using the operating status serviced on an another server of server monitoring.Pass through one for externally providing service
Service operation state on other servers of a server monitoring, group system are not necessarily the operating status of monitoring service and independent
A hardware device is disposed, new deployment hardware device is also not necessarily and consumes additional software and hardware resources, to save cluster
The resource that system occupies.In addition, each server can for multiple for externally providing the server of service in group system
To monitor the operating status serviced on other servers.Pass through the monitoring mutually each other of the server for externally providing service
The service operation state of other side, even if a certain or certain servers for externally providing service are abnormal, in group system
Other servers also can continue to the monitoring that service operation state is carried out to Servers-all, so that operating status is abnormal
Service can restore normally, service operation state to be sustainedly and stably monitored.
" first server ", " first service ", " the first monitoring message ", " the first response mentioned in the embodiment of the present invention
" first " in the titles such as message ", " the first tables of data " is used only to do name mark, does not represent first sequentially.It should
Rule is equally applicable to " second " etc..
As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation
All or part of the steps in example method can add the mode of general hardware platform to realize by software.Based on this understanding,
Technical solution of the present invention can be embodied in the form of software products, which can store is situated between in storage
In matter, such as read-only memory (English: read-only memory, ROM)/RAM, magnetic disk, CD etc., including some instructions to
So that a computer equipment (can be the network communication equipments such as personal computer, server, or router) executes
Method described in certain parts of each embodiment of the present invention or embodiment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for method reality
For applying example and apparatus embodiments, since it is substantially similar to system embodiment, so describe fairly simple, related place ginseng
See the part explanation of system embodiment.Equipment and system embodiment described above is only schematical, wherein making
It may or may not be physically separated for the module of separate part description, the component shown as module can be
Or it may not be physical module, it can it is in one place, or may be distributed over multiple network units.It can be with
Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment according to the actual needs.The common skill in this field
Art personnel can understand and implement without creative efforts.
The above is only a preferred embodiment of the present invention, it is not intended to limit the scope of the present invention.It should refer to
Out, for those skilled in the art, under the premise of not departing from the present invention, can also make several improvements
And retouching, these modifications and embellishments should also be considered as the scope of protection of the present invention.
Claims (11)
1. a kind of method of the operating status of the monitoring service in group system, which is characterized in that the described method includes:
First server sends the first monitoring message to second server, and the first monitoring message is for requesting second clothes
The operating status of first service on business device, the first server and the second server are any two in the group system
A server for externally offer service;
The first server receives the first response message that the second server is returned for the first monitoring message, institute
State the running state information that the first service is carried in the first response message on the second server;
If running state information of the first service on the second server indicates that operating status is abnormal, first clothes
Business device, which is sent, restores prompt information, and the recovery prompt information is for prompting to the first service on the second server
Restored;
The first server receives the second monitoring message that the second server is sent, and the second monitoring message is for asking
Seek the operating status of second service in the first server;
The first server generates second based on running state information of the second service in the first server and answers
Message is answered, and returns to second response message to the second server for the second monitoring message.
2. the method according to claim 1, wherein the recovery prompt information is for the second service
The operational order that device is sent, the operational order are used to trigger the second server and execute the exception for being directed to the first service
Processing operation, the abnormality processing operation is for making the first service restore normally to run shape on the second server
State.
3. according to the method described in claim 2, it is characterized in that, abnormality processing operation is on the second server
The first service is restarted, or, abnormality processing operation is the second server from the database of the group system
More new data is to memory.
4. the method according to claim 1, wherein the recovery prompt information is for sending to SMS platform
Short message alarm notice, it is short that short message alarm notice sends alarm for triggering the SMS platform to preassigned user
Letter, the alarm message are used to prompt the first service on the second server to be in abnormal operating status.
5. the method according to claim 1, wherein the first server sends monitoring report to second server
Text, specifically:
The first server is according to preset listening period, to described in second server transmission in a manner of poll
First monitoring message.
6. the method according to claim 1, wherein being equipped with the first program and second in the first server
The communication function of first server described in program, first program and the second program reusable itself;
The first server sends the first monitoring message to second server, specifically: the first server passes through described
First program sends the first monitoring message to the second server;
The first server receives the first response message that the second server is returned for the monitoring message, specifically
Are as follows: the first server receives the second server for the described of the monitoring message return by second program
Second response message.
7. the method according to claim 1, wherein preserve the first tables of data in the first server, institute
It states the first tables of data and respectively services current running state information in the first server for recording;
The first server generates second based on running state information of the second service in the first server and answers
Message is answered, specifically: described in running state information generation of the first server based on the first tables of data current record
Second response message.
8. if the method according to claim 1, wherein the running state information of the first service indicates
Operating status is abnormal, and the first server, which is sent, restores prompt information, comprising:
If running state information of the first service on the second server indicates that operating status is abnormal, first clothes
Business device recording exceptional information on services in the second tables of data of the database of the group system, the exception service information include
Monitored node mark, service type identification and monitoring node identification, the monitored node are identified as the second server
Mark, the service type identification be the first service mark, the monitoring node identification be the first server
Mark;
The first server inquires second tables of data, and second number is successfully recorded in the exception service information
The exception service information is inquired in the case where according to table;
The first server sends the recovery prompt information according to the instruction of the exception service information.
9. according to the method described in claim 8, it is characterized in that, sending the recovery prompt information in the first server
Later, further includes:
After the first service restores normal operating status on the second server, the first server is by institute
State the exception service information deletion in the second tables of data.
10. a kind of device of the operating status of the monitoring service in group system, which is characterized in that described device is configured at first
Server, comprising:
First transmission unit, for sending the first monitoring message to second server, the first monitoring message is for requesting institute
The operating status of first service on second server is stated, the first server and the second server are the group systems
Interior any two are for externally providing the server of service;
First receiving unit, the first response report returned for receiving the second server for the first monitoring message
Text carries running state information of the first service on the second server in first response message;
Second transmission unit, if indicating operation shape for running state information of the first service on the second server
State is abnormal, and the first server, which is sent, restores prompt information, and the recovery prompt information is for prompting to the second service
The first service on device is restored;
Second receiving unit, the second monitoring message sent for receiving the second server, the second monitoring message are used
In the operating status for requesting second service in the first server;
Generation unit, for generating the second response based on running state information of the second service in the first server
Message;
Return unit, for returning to second response message to the second server for the second monitoring message.
11. a kind of system of the operating status of the monitoring service in group system, which is characterized in that including first server and
Two servers, the first server are configured with device described in any one of claim 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610311715.8A CN105978721B (en) | 2016-05-11 | 2016-05-11 | The methods, devices and systems of monitoring service operating status in a kind of group system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610311715.8A CN105978721B (en) | 2016-05-11 | 2016-05-11 | The methods, devices and systems of monitoring service operating status in a kind of group system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105978721A CN105978721A (en) | 2016-09-28 |
CN105978721B true CN105978721B (en) | 2019-04-12 |
Family
ID=56993003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610311715.8A Active CN105978721B (en) | 2016-05-11 | 2016-05-11 | The methods, devices and systems of monitoring service operating status in a kind of group system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105978721B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106161090A (en) * | 2016-07-12 | 2016-11-23 | 许继集团有限公司 | The monitoring method of a kind of subregion group system and device |
CN106713007A (en) * | 2016-11-15 | 2017-05-24 | 郑州云海信息技术有限公司 | Alarm monitoring system and alarm monitoring method and device for server |
CN107257384B (en) * | 2017-07-24 | 2021-08-17 | 北京小米移动软件有限公司 | Service state monitoring method and device |
CN109828883B (en) * | 2017-11-23 | 2023-03-17 | 腾讯科技(北京)有限公司 | Task data processing method and device, storage medium and electronic device |
CN109361525B (en) * | 2018-10-25 | 2021-08-13 | 珠海派诺科技股份有限公司 | Method, device, control terminal and medium for restarting distributed deployment of multiple services |
CN110531988B (en) * | 2019-08-06 | 2023-06-06 | 新华三大数据技术有限公司 | Application program state prediction method and related device |
CN110445650B (en) * | 2019-08-07 | 2022-06-10 | 中国联合网络通信集团有限公司 | Detection alarm method, equipment and server |
CN111565135A (en) * | 2020-04-30 | 2020-08-21 | 吉林省鑫泽网络技术有限公司 | Method for monitoring operation of server, monitoring server and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075919A (en) * | 2006-06-22 | 2007-11-21 | 腾讯科技(深圳)有限公司 | Method and system for monitoring Internet service |
CN101207519A (en) * | 2007-12-13 | 2008-06-25 | 上海华为技术有限公司 | Version server, operation maintenance unit and method for restoring failure |
CN102231681A (en) * | 2011-06-27 | 2011-11-02 | 中国建设银行股份有限公司 | High availability cluster computer system and fault treatment method thereof |
CN102291275A (en) * | 2011-08-01 | 2011-12-21 | 烟台杰瑞网络商贸有限公司 | Server cluster monitoring technology and method |
-
2016
- 2016-05-11 CN CN201610311715.8A patent/CN105978721B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075919A (en) * | 2006-06-22 | 2007-11-21 | 腾讯科技(深圳)有限公司 | Method and system for monitoring Internet service |
CN101207519A (en) * | 2007-12-13 | 2008-06-25 | 上海华为技术有限公司 | Version server, operation maintenance unit and method for restoring failure |
CN102231681A (en) * | 2011-06-27 | 2011-11-02 | 中国建设银行股份有限公司 | High availability cluster computer system and fault treatment method thereof |
CN102291275A (en) * | 2011-08-01 | 2011-12-21 | 烟台杰瑞网络商贸有限公司 | Server cluster monitoring technology and method |
Also Published As
Publication number | Publication date |
---|---|
CN105978721A (en) | 2016-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105978721B (en) | The methods, devices and systems of monitoring service operating status in a kind of group system | |
CN103605722B (en) | Database monitoring method and device, equipment | |
CN107453889B (en) | A kind of method for uploading and device of journal file | |
TWI391828B (en) | Method and system for monitoring server events in a node configuration by using direct communication between servers | |
EP3210367B1 (en) | System and method for disaster recovery of cloud applications | |
US7093013B1 (en) | High availability system for network elements | |
TW201416898A (en) | Data monitoring method and system, and server end and user end thereof | |
US20130205017A1 (en) | Computer failure monitoring method and device | |
US8266301B2 (en) | Deployment of asynchronous agentless agent functionality in clustered environments | |
US20060117101A1 (en) | Node discovery and communications in a network | |
CN107682169B (en) | Method and device for sending message by Kafka cluster | |
CN110830283A (en) | Fault detection method, device, equipment and system | |
CN112422684B (en) | Target message processing method and device, storage medium and electronic device | |
CN112612545A (en) | Configuration hot loading system, method, equipment and medium of server cluster | |
JP2013097548A (en) | Information processing system, information processing device, client terminal, information processing method and program | |
CN107018159B (en) | Service request processing method and device, and service request method and device | |
US20140101320A1 (en) | Information processing system, control method, management apparatus and computer-readable recording medium | |
JP2005301436A (en) | Cluster system and failure recovery method for it | |
CN111342986A (en) | Distributed node management method and device, distributed system and storage medium | |
JP2000250833A (en) | Operation information acquiring method for operation management of plural servers, and recording medium recorded with program therefor | |
CN103457771B (en) | The management method of the cluster virtual machine of a kind of HA and equipment | |
CN112637337B (en) | Data processing method and device | |
CN114116178A (en) | Cluster framework task management method and related device | |
WO2019216210A1 (en) | Service continuation system and service continuation method | |
CN110890989A (en) | Channel connection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |