CN105978721A - Method, device and system for monitoring operation state of services in clustering system - Google Patents
Method, device and system for monitoring operation state of services in clustering system Download PDFInfo
- Publication number
- CN105978721A CN105978721A CN201610311715.8A CN201610311715A CN105978721A CN 105978721 A CN105978721 A CN 105978721A CN 201610311715 A CN201610311715 A CN 201610311715A CN 105978721 A CN105978721 A CN 105978721A
- Authority
- CN
- China
- Prior art keywords
- server
- service
- message
- monitoring
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5003—Managing SLA; Interaction between SLA and QoS
- H04L41/5009—Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a method for monitoring operation state of services in a clustering system. The method comprises that a first server sends a first monitoring message to a second server, the first monitoring message makes request for the operation state of a first service in the second server, and the first server and the second server are two random servers which provide service for the external in the clustering system; the first server receives a first response message returned by the second server aimed at the first monitoring message, and the first response message carries operation state information of the first service in the second server; and if the operation state information of the first service in the second server shows that the operation state is abnormal, the first server sends recovery prompt information, and the recovery prompt information prompts to recover the first service in the second server. The invention also discloses a device and system for monitoring the operation state of the services in the clustering system.
Description
Technical field
The present invention relates to communication technical field, particularly relate to a kind of fortune monitoring service in group system
The methods, devices and systems of row state.
Background technology
In group system, multiple servers can put together carry out concomitantly identical one or more
Service.Owing to execution is shared on multiple servers in identical service, the group system carrying to service
Ability has had great lifting to service, and therefore, the performance of group system be enough to compare favourably with large scale computer performance,
Further, group system cost for large scale computer is the cheapest.Therefore, group system is at present by extensively
Use generally.
In group system, each server runs and has one or more service.If on certain server
Certain service operation abnormal, the properly functioning of this service also can be affected by group system, thus cluster
System cannot ensure to continue the most externally to provide this service.Therefore, group system needs each clothes
Each service that business device provides carries out the monitoring of running status.The running status serviced by monitoring, cluster
The service that running status on any one server is abnormal can be recovered by system, thus keeps cluster
Systems stay the most externally provides service.
In the prior art, group system, except including the server for externally providing service, also includes
For the monitoring device to system server monitoring service operation state.This monitoring device is independent of other
Server, is not used in the service that externally provides.By independent monitoring device to server each in group system
The service run is monitored, and this monitoring device is it can be found that go out on any one server in group system
Existing exception is also recovered, thus ensures that group system the most externally provides service.But, this prison
Control equipment is the independent hardware device disposed in group system, and in group system, new deployment one is hard
Part equipment needs group system to consume extra software and hardware resources, it is seen then that group system is in order to monitor service
Running status need to consume extra software and hardware resources.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of operation monitoring service in group system
The methods, devices and systems of state, so that the service operation state in group system can not only continue
Stably it is monitored, and avoiding group system is that monitoring service operation state consumes extra soft or hard
Part resource, thus not only increase the stability of monitoring, reliability and also save system resource.
First aspect, it is provided that a kind of method of running status monitoring service in group system.The party
Method includes:
First server sends the first monitoring message to second server, and described first monitoring message is for asking
Ask the running status of first service on described second server, described first server and described second service
Device be in described group system any two for externally providing the server of service;
Described first server receives first that described second server returns for described first monitoring message
Response message, carries described first service on described second server in described first response message
Running state information;
If the running state information that described first service is on described second server represents that running status is different
Often, described first server sends recovers information, and described recovery information is for pointing out described
Described first service on second server recovers.
Optionally, described recovery information is the operational order for sending to described second server,
Described operational order performs to grasp for the abnormality processing of described first service for triggering described second server
Making, the operation of described abnormality processing is used for making described first service recover normal on described second server
Running status.
Optionally, described abnormality processing operates as to restart described first service on described second server,
Or, the operation of described abnormality processing updates number for described second server from the data base of described group system
According to internal memory.
Optionally, described recovery information is the short message alarm notice for sending to SMS platform, institute
State short message alarm notice and send alarm message for triggering described SMS platform to preassigned user, institute
State alarm message for pointing out the described first service on described second server to be in abnormal operation shape
State.
Optionally, described first server sends monitoring message to second server, particularly as follows:
Described first server is according to listening period set in advance, to described second clothes in the way of poll
Business device sends described first monitoring message.
Optionally, described first server is provided with the first program and the second program, described first program
Communication function with first server itself described in described second program reusable;
Described first server sends the first monitoring message to second server, particularly as follows: described first clothes
Business device sends described first monitoring message by described first program to described second server;
Described first server receives the first response that described second server returns for described monitoring message
Message, particularly as follows: described first server by described second program receive described second server for
Described second response message that described monitoring message returns.
Optionally, also include:
Described first server receives the second monitoring message that described second server sends, described second prison
Control message is for asking the running status of second service in described first server;
Described first server is based on described second service running state information in described first server
Generate the second response message, and return described for described second monitoring message to described second server
Two response messages.
Optionally, described first server being preserved the first tables of data, described first tables of data is used for remembering
Record the running state information that in described first server, each service is current;
Described first server is based on described second service running state information in described first server
Generate the second response message, particularly as follows: described first server is based on described first tables of data current record
Running state information generate described second response message.
Optionally, if the running state information of described first service represents that running status is abnormal, described
First server sends recovers information, including:
If the running state information that described first service is on described second server represents that running status is different
Often, described first server recording exceptional service in second tables of data of the data base of described group system
Information, described exception service information includes monitored node mark, service type identification and monitor node mark
Knowing, described monitored node is designated the mark of described second server, and described service type identification is institute
Stating the mark of first service, described monitor node is designated the mark of described first server;
Described first server inquires about described second tables of data, and at described exception service information successfully record
Described exception service information is inquired in the case of described second tables of data;
Described first server, according to the instruction of described exception service information, sends described recovery information.
Optionally, after described first server sends described recovery information, also include:
After described first service recovers normal running status on described second server, described
One server is by the described exception service information deletion in described second tables of data.
First aspect, it is provided that the device of a kind of running status monitoring service in group system.Described
Device is configured at first server, including:
First transmitting element, for sending the first monitoring message, described first monitoring report to second server
Literary composition for asking the running status of first service on described second server, described first server and described
Second server be in described group system any two for externally providing the server of service;
First receives unit, for receiving what described second server returned for described first monitoring message
First response message, carries described first service at described second server in described first response message
On running state information;
Second transmitting element, if for described first service running status letter on described second server
Breath represents that running status is abnormal, and described first server sends recovers information, described recovery prompting letter
Described destination service on described second server is recovered by breath for prompting.
Optionally, described device also includes:
Second receives unit, for receiving the second monitoring message that described second server sends, and described the
Two monitoring messages are for asking the running status of second service in described first server;
Signal generating unit, for based on described second service running state information in described first server
Generate the second response message;
Return unit, for returning described second for described second monitoring message to described second server
Response message.
The third aspect, it is provided that the system of a kind of running status monitoring service in group system.This is
System includes that first server and second server, described first server are configured with aforementioned any one and implement
The device of mode.
The embodiment provided according to the application, multiple for externally providing service in group system
Server, can use on another server of server monitoring the running status of service, specifically,
Assume first server and second server be in group system any two for externally providing the clothes of service
Business device, first server can send the first monitoring message to second server, so that second server
Return and carry the first response message of first service running state information on second server, first
According to this running state information, server just can determine whether first service is on second server different
Normal running status, it is possible to determine be in abnormal in the case of send and recover information, with prompting
First service on second server is recovered.As can be seen here, by for externally providing service
Service operation state on one other server of server monitoring, group system is not necessarily monitoring service
Running status and individually dispose a hardware device, be also not necessarily and new dispose hardware device and consume extra
Software and hardware resources, thus saved the resource that group system takies.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in embodiment being described below
The required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this
Some embodiments described in application, for those of ordinary skill in the art, it is also possible to according to these
Accompanying drawing obtains other accompanying drawing.
Fig. 1 is the network system block schematic illustration in the embodiment of the present invention involved by an application scenarios;
Fig. 2 is a kind of method of running status monitoring service in group system in the embodiment of the present invention
Schematic flow sheet;
Fig. 3 is the schematic diagram of a kind of network scenarios example in the embodiment of the present invention;
Fig. 4 is the device of a kind of running status monitoring service in group system in the embodiment of the present invention
Structural representation;
Fig. 5 is the system of a kind of running status monitoring service in group system in the embodiment of the present invention
Structural representation.
Detailed description of the invention
In order to make those skilled in the art be more fully understood that the present invention program, real below in conjunction with the present invention
Execute the accompanying drawing in example, the technical scheme in the embodiment of the present invention be clearly and completely described, it is clear that
Described embodiment is only a part of embodiment of the present invention rather than whole embodiments.Based on this
Embodiment in bright, those of ordinary skill in the art are obtained under not making creative work premise
Every other embodiment, broadly falls into the scope of protection of the invention.
Inventor finds through research, conventionally, as group system individually deploys one firmly
Part equipment goes to monitor each for externally providing the service operation state on server, collection as monitoring device
Group's system is accomplished by consuming extra software and hardware resources for the running status monitoring service.If additionally, should
Monitoring device itself occurs abnormal, and in group system, the service operation state on each server just cannot monitor,
The service that running status is abnormal just cannot recover normal, it is seen then that independent monitoring device is difficult to continually and steadily
Service operation state in ground monitoring group system.
In order to solve the problems referred to above of prior art, in embodiments of the present invention, in group system
Multiple for externally providing the server of service, can use and take on another server of server monitoring
The running status of business.By the clothes on other servers of server monitoring for externally providing service
Business running status, group system is not necessarily the running status of monitoring service and individually disposes a hardware device,
Also it is not necessarily new deployment hardware device and consumes extra software and hardware resources, thus saved group system and accounted for
Resource.Additionally, for the multiple servers for externally offer service in group system, each
Server can monitor the running status of service on other servers.By the clothes for externally providing service
Business device monitors the most mutually the service operation state of the other side, though a certain or some is for externally providing
The server of service occurs abnormal, and in group system, other server also is able to continue Servers-all
Carry out the monitoring of service operation state, so that the service of running status exception can recover normal, make
Obtain service operation state can be monitored sustainedly and stably.
For example, one of scene of the embodiment of the present invention, network as shown in Figure 1 can be applied to
In system.In this network system, server 101 and server 102 be in group system 103 arbitrarily
Two for externally providing the server of service.First, server 101 can send to server 102
Monitoring message, wherein, monitoring message running status of service on request server 102.Then,
Server 101 receives the response message that server 102 returns, wherein, response message for monitoring message
In carry described service running state information on server 102.If described service is at server 102
On running state information represent that running status is abnormal, then server 101 can send recovery information,
Wherein, recover information may be used for prompting the described service on server 102 is recovered.
It is understood that above-mentioned scene is only the Sample Scenario that the embodiment of the present invention provides, this
Bright embodiment is not limited to this scene.
Below in conjunction with the accompanying drawings, describe in detail in the embodiment of the present invention by embodiment ... implement
Mode.
See Fig. 2, it is shown that a kind of running status monitoring service in group system in the embodiment of the present invention
The schematic flow sheet of method.In the present embodiment, described method the most specifically may include that
201, first server sends the first monitoring message to second server, and described first monitoring message is used
In asking the running status of first service, described first server and described second on described second server
Server be in described group system any two for externally providing the server of service.
When implementing, if first server needs to carry out second server the monitoring of service operation state,
First server can send the operation shape for asking first service on second server to second server
First monitoring message of state, so that second server returns first service operation on second server
Status information.
Wherein, the first monitoring message such as can include Function Identification and first clothes of acquisition monitoring information
The mark of business device.In a kind of first example monitoring message, the first monitoring message such as can use XML
Form.Such as, the following is a kind of format sample monitoring message: " 000XXX<Monitor><Command>
InformationCollection</Command><Params><MonitorServerID>CathayServer11
</MonitorServerID></Params></Monitor>”.Wherein, " Command " field carries prison
The mark of control information collection function, i.e. " InformationCollectiong ";" MonitorServerID " word
Section is for carrying the mark of first server, i.e. " CathayServer11 ".
It is understood that first server is to have the equipment of control function in group system, it is used for supervising
Service operation state on other servers in control group system.In the present embodiment, first server represents
Be in group system any one for externally providing server of service.Further, group system
In each services for externally providing server of service to may be used to other except self in addition to of monitoring
Device, that is, each can be the present embodiment for externally providing the server of service in group system
Described first server.
It should be noted that second server is the equipment monitored by first server in group system.?
In the present embodiment, second server can be in addition to group system any one use interior outside first server
In the server externally providing service.Further, each in addition to first server in group system
Individual for externally providing the server of service can be monitored service operation state by first server, that is,
In group system, each server serviced for external offer in addition to first server can be
Second server described by the present embodiment.
Additionally, first service can be any one service run on second server.Further,
Any one service run on second server can be monitored running status by first server, that is,
Each service run on second server can be the first service described by the present embodiment.
In some embodiments of the present embodiment, the first monitoring message such as can be sent out by first server
Give group system interior Servers-all in addition to first server, receive the first prison with any one
The server of control message is as second server, and second server obtains all tasks on second server
Health information and feed back to first server.Under this embodiment, first server can
By once monitoring the transmission of message, it is possible to all clothes on every other server in acquisition group system
The running status of business.
In some embodiments of the present embodiment, in order to keep monitoring stable continuing, first service
Device can send monitoring message termly, the service operation state current to obtain second server termly.
Specifically, in the present embodiment, step 201 such as can be particularly as follows: described first server be according in advance
The listening period first set, sends described first monitoring message to described second server in the way of poll.
In some embodiments of the present embodiment, in order in first server by existing communication merit
The sending function of monitoring message can be disposed in first server, one can be installed in first server
Can the first program of the existing communication function of multiplexing first server itself so that first server can
To send monitoring message by the first program.Specifically, 201 such as can be particularly as follows: described first service
Device sends described first monitoring message by described first program to described second server;Wherein, described
The communication function of first server itself described in first program reusable.In one more specifically example,
The communication function of first server itself can be such as to be provided by asynchronous communication framework MINA, the first journey
Sequence can realize the transmission merit of the first monitoring message by the MINA framework in direct multiplexing first server
Energy.
202, described first server receives what described second server returned for described first monitoring message
First response message, carries described first service at described second server in described first response message
On running state information.
When implementing, second server monitors message in response to receive first server transmission first,
The first response message, and pin can be generated based on first service running state information on second server
First monitoring message is returned the first response message to first server, so that first server is by connecing
Receive the first response message and obtain first service running state information on second server.
More specifically, in some embodiments of the present embodiment, second server such as can be protected
There is a tables of data.In this tables of data such as can by the way of real-time update record second service
The running state information that on device, each service is current.Second server in response to receiving the first monitoring message,
Running state information based on this tables of data current record can generate the first response message.Wherein, first
Response message such as can carry all running state information of this tables of data current record, just wraps among these
Include the running state information of first service.
In some embodiments of the present embodiment, in order in first server by existing communication merit
The receive capabilities of response message can be disposed in first server, one can be installed in first server
Can the existing communication function of multiplexing first server itself the second program and be that the second program arranges one
Monitor interface, so that first server can be by the monitoring interface response message of the second program.
Specifically, 202 such as can be particularly as follows: described first server receives described the by described second program
Described second response message that two servers return for described monitoring message;Wherein, described second program
The communication function of first server described in reusable itself.In one more specifically example, the first clothes
The communication function of business device itself can be such as to be provided by asynchronous communication framework MINA, and the second program is permissible
The sending function of the first monitoring message is realized by the MINA framework in direct multiplexing first server.
It is understood that be previously used for the first program sending monitoring message and be previously used for receiving response
Second program of message, can be same application program, or can also be two different application journeys
Sequence.
In some embodiments of the present embodiment, in the service fortune of first server monitoring second server
On the basis of row state, second server can also monitor the service operation state of first server, that is,
The service operation state of the other side can be mutually monitored between different server in group system.Specifically,
The present embodiment such as can also include: described first server receives second that described second server sends
Monitoring message, described second monitoring message is for asking the operation shape of second service in described first server
State;Described first server is based on described second service running state information in described first server
Generate the second response message, and return described for described second monitoring message to described second server
Two response messages.More specifically, described first server such as can be preserved the first tables of data,
Described first tables of data is for recording the running state information that in described first server, each service is current;Institute
State first server and generate the mode of the second response message, such as, be characterized in particular in: described first server base
Running state information in described first tables of data current record generates described second response message.
If the running state information that 203 described first services are on described second server represents running status
Abnormal, described first server sends recovers information, and described recovery information is for pointing out institute
The described first service stated on second server recovers.
When implementing, first service running status on second server can be believed by first server
Breath is identified, and determines whether to send recovery information according to recognition result.If recognition result is for being somebody's turn to do
Running state information mark running status is abnormal, then first server sends and recovers information.If identifying
Result is that this running state information represents that running status is normal, then first server can send recovery
Information.
In the present embodiment, the recovery information that first server sends is to second service for prompting
On device, the first service of operation exception recovers.
In some embodiments of the present embodiment, first server can be by sending to second server
The mode of operational order realizes recovering the transmission of information.Wherein, described recovery information is specially
For the operational order sent to described second server, described operational order is specifically for triggering described the
Two servers perform to operate for the abnormality processing of described first service, the concrete effect of described abnormality processing behaviour
On described second server, normal running status is recovered in making described first service.
It is understood that the different abnormal operating conditions occurred for different services, can use not
Same described abnormality processing operation.
Such as, if the abnormal operating condition that first service occurs can be got rid of by the service of restarting, then described
Abnormality processing operation can be to restart described first service, described operational order on described second server
Can be that described recovery information can for indicating second server to restart the reset command of first service
Think the reset command message for carrying described reset command.Wherein, reset command message is the most permissible
Function Identification and the mark of first service including reset command.Example at a kind of reset command message
In, reset command message such as can use XML format.Such as, the following is a kind of reset command message
Format sample: " 000XXX<monitor><command>restart_Service</Command><Par
ams><MonitorServerID>CathayServer11</MonitorServerID><ServiceType>Cach
eManager</ServiceType></Params></Monitor>”.Wherein, " Command " field is used for
Carry the Function Identification of reset command, i.e. " Restart_Service ";" ServiceType " field is used for taking
With the mark of first service, i.e. " CacheManager ".
And for example, if the abnormal operating condition that first service occurs is that cache information is incorrect, the most described exception
Processing operation can be that described second server updates first service from the data base of described group system
Data are to internal memory, and described operational order can be to be used for indicating second server to take from database update first
The data of business are to the cache flush order of internal memory, and described recovery information can be to be used for carrying described delaying
Deposit the cache flush command message of refresh command.Wherein, cache flush command message such as can include delaying
Deposit the Function Identification of refresh command and the mark of first service.Example at a kind of cache flush command message
In, cache flush command message such as can use XML format.Such as, the following is a kind of cache flush
The format sample of command message: " 000XXX<tulip><command>refresh</Command><Para
ms><<MonitorServerID>CathayServer11</MonitorServerID>CacheType>CTGCo
nnectionPoolConfCache</CacheType></Params></Tulip>”.Wherein, " Command "
Field is for carrying the Function Identification of cache flush order, i.e. " Refresh ";" CacheType " field is used
In carrying the mark of first service, i.e. " CTGConnectionPoolConfCache ".
In other embodiments in the present embodiment, first server can be by triggering SMS platform
Realize recovering the transmission of information to the mode specifying user to send alarm message.Wherein, described recovery
Information is particularly for the short message alarm notice sent to SMS platform, described short message alarm notice tool
Body is used for triggering described SMS platform and sends alarm message to preassigned user, and described alarm message has
Body is used for pointing out the described first service on described second server to be in abnormal running status, so that
The running status of this exception is got rid of by user.
Additionally, in first server after second server sends the first monitoring message, if first service
Device does not receives the first response message that second server returns, then first service for the first monitoring message
Device can also send short message alarm notice to SMS platform, triggers SMS platform and sends out to preassigned user
Send short message alarm information, abnormal to point out user's second server to exist.
It should be noted that in some embodiments of the present embodiment, for use multiple in group system
In externally providing the server serviced, each server can monitor the service operation shape in other services
State.In this case, on same server, the same abnormal operating condition of same service may be multiple
Other server listens to.In order to avoid multiple servers all process same abnormal operating condition, permissible
A tables of data is preserved, in order to record the misoperation that server listens in the data base of group system
State.Server can be believed by preserving exception service information and inquiry exception service in this tables of data
The service to abnormal operating condition that ceases processes.Specifically, if 203 such as may include that described first
Service running state information on described second server represents that running status is abnormal, described first service
Device is recording exceptional information on services in second tables of data of the data base of described group system;Described first clothes
Described second tables of data inquired about by business device, and successfully recorded described second data in described exception service information
Described exception service information is inquired in the case of table;Described first server is believed according to described exception service
The instruction of breath, sends described recovery information.Wherein, described exception service information includes monitored joint
Point identification, service type identification and monitor node mark, described monitored node is designated described second clothes
The mark of business device, described service type identification is the mark of described first service, and described monitor node identifies
Mark for described first server.Additionally, described exception service information, it is also possible to include exception service
Title, the title of refreshing service and the record time etc. in the second tables of data.In the second tables of data,
The major key of exception service information can be its monitored node mark and service type identification.
It is understood that in exception service information, monitor node mark expression is used for processing described different
The often server of information on services.When first server inquiry inquires abnormal clothes in the second tables of data
The monitor node mark in abnormal server can be identified during business information.If in described exception service information
Monitoring node mark is the mark of described first server itself, then this exception service is believed by first server
Breath processes, if the monitoring node mark in described exception service information is not described first server basis
The mark of body, then this abnormal server information is not just processed by first server.
Furthermore, after 203 have performed, if first service just recovers on second server
Normal running status, first server can be by the described exception service information deletion in the second tables of data.
It should be noted that the present embodiment such as may apply in network scenarios as shown in Figure 3.?
In this network scenarios, group system includes multiple server for externally providing service, i.e. shown in Fig. 3
" node 1 ", " node 2 ", " node 3 " and " node 4 ".Each service in these servers
Device, both can remove, as the first server in the present embodiment, the service operation shape that monitors on other servers
State, it is also possible to as the second server of the present embodiment by other server monitoring service operation states.Fig. 3
Shown " DB " is the aforesaid data base of the present embodiment." short message alarm platform " shown in Fig. 3 i.e.
For the aforesaid SMS platform of the present embodiment.
The embodiment provided by the present embodiment, multiple for externally providing clothes in group system
The server of business, can use the running status of service on another server of server monitoring.Pass through
Service operation state on other servers of server monitoring that service is externally provided, cluster system
System is not necessarily the running status of monitoring service and individually disposes a hardware device, is also not necessarily new deployment hard
Part equipment and consume extra software and hardware resources, thus saved the resource that group system takies.Additionally,
For the multiple servers for externally offer service in group system, each server can monitor it
The running status of service on his server.The most mutual by the server for externally providing service
Monitor the service operation state of the other side, though a certain or some server generation serviced for external offer
Abnormal, in group system, other server also is able to continue Servers-all is carried out service operation state
Monitoring so that the abnormal service of running status can recover normal so that service operation state energy
Enough it is monitored sustainedly and stably.
See Fig. 4, it is shown that a kind of running status monitoring service in group system in the embodiment of the present invention
The structural representation of device.In the present embodiment, described device 400 can be configured at first server.
Described device 400 the most specifically may include that
First transmitting element 401, for sending the first monitoring message, described first monitoring to second server
Message is for asking the running status of first service on described second server, described first server and institute
State second server be in described group system any two for externally providing the server of service;
First receives unit 402, is used for receiving described second server and returns for described first monitoring message
The first response message, described first response message carries described first service at described second service
Running state information on device;
Second transmitting element 403, if for described first service running status on described second server
Information represents that running status is abnormal, and described first server sends recovers information, and described recovery is pointed out
Described destination service on described second server is recovered by information for prompting.
Optionally, in some embodiments of the present embodiment, described recovery information can be such as
For the operational order sent to described second server, described operational order may be used for triggering described
Two servers perform to operate for the abnormality processing of described first service, and the operation of described abnormality processing can be used
On described second server, normal running status is recovered in making described first service.
Optionally, in other embodiments of the present embodiment, the operation of described abnormality processing is the most permissible
For restarting described first service on described second server, or, the operation of described abnormality processing is the most permissible
For described second server from the data base of described group system more new data to internal memory.
Optionally, in the other embodiment of the present embodiment, described recovery information is the most permissible
For the short message alarm notice for sending to SMS platform, described short message alarm notice may be used for triggering institute
Stating SMS platform and send alarm message to preassigned user, described alarm message may be used for pointing out institute
The described first service stated on second server is in abnormal running status.
Optionally, in the still further embodiments of the present embodiment, described first transmitting element, such as have
Body may be used for, according to listening period set in advance, sending to described second server in the way of poll
Described first monitoring message.
Optionally, in the still further embodiments again of the present embodiment, described first server such as may be used
To be provided with the first program and the second program, described first program and described second program can be with described in multiplexings
The communication function of first server itself;
Described first transmitting element 401, can specifically for by described first program to described second service
Device sends described first monitoring message;
Described first receives unit 402, can be specifically for receiving described second clothes by described second program
Described second response message that business device returns for described monitoring message.
Optionally, in the still further embodiments again of the present embodiment, described device 400 is the most all right
Including:
Second receives unit, for receiving the second monitoring message that described second server sends, and described the
Two monitoring messages are for asking the running status of second service in described first server;
Signal generating unit, for based on described second service running state information in described first server
Generate the second response message;
Return unit, for returning described second for described second monitoring message to described second server
Response message.
Optionally, in the still further embodiments again of the present embodiment, described first server such as may be used
To preserve the first tables of data, described first tables of data may be used for recording in described first server and respectively takes
It is engaged in current running state information;
Described signal generating unit, can be specifically for running status based on described first tables of data current record
Information generates described second response message.
Optionally, in the still further embodiments again of the present embodiment, described second transmitting element 403 example
As can be specifically for:
If the running state information that described first service is on described second server represents that running status is different
Often, recording exceptional information on services in second tables of data of the data base of described group system, described exception
Information on services includes monitored node mark, the type identification of service and monitor node mark, described is supervised
Control node identification is the mark of described second server, and the type identification of described service is described first service
Mark, described monitor node is designated the mark of described first server;
Inquire about described second tables of data, and successfully recorded described second data in described exception service information
Described exception service information is inquired in the case of table;
According to the instruction of described exception service information, send described recovery information.
Optionally, in the still further embodiments again of the present embodiment, described device 400 is the most all right
Including:
Delete unit, for sending described recovery information, in institute in response to described second transmitting element
State after first service recovers normal running status on described second server, by described second data
Described exception service information deletion in table.
The embodiment provided by the present embodiment, multiple for externally providing clothes in group system
The server of business, can use the running status of service on another server of server monitoring.Pass through
Service operation state on other servers of server monitoring that service is externally provided, cluster system
System is not necessarily the running status of monitoring service and individually disposes a hardware device, is also not necessarily new deployment hard
Part equipment and consume extra software and hardware resources, thus saved the resource that group system takies.Additionally,
For the multiple servers for externally offer service in group system, each server can monitor it
The running status of service on his server.The most mutual by the server for externally providing service
Monitor the service operation state of the other side, though a certain or some server generation serviced for external offer
Abnormal, in group system, other server also is able to continue Servers-all is carried out service operation state
Monitoring so that the abnormal service of running status can recover normal so that service operation state energy
Enough it is monitored sustainedly and stably.
See Fig. 5, it is shown that a kind of running status monitoring service in group system in the embodiment of the present invention
The structural representation of system.In the present embodiment, described system the most specifically can include first service
Device 501 and second server 502, described first server 501 configures in the embodiment shown in earlier figures 4
The device of any one embodiment.
The embodiment provided by the present embodiment, multiple for externally providing clothes in group system
The server of business, can use the running status of service on another server of server monitoring.Pass through
Service operation state on other servers of server monitoring that service is externally provided, cluster system
System is not necessarily the running status of monitoring service and individually disposes a hardware device, is also not necessarily new deployment hard
Part equipment and consume extra software and hardware resources, thus saved the resource that group system takies.Additionally,
For the multiple servers for externally offer service in group system, each server can monitor it
The running status of service on his server.The most mutual by the server for externally providing service
Monitor the service operation state of the other side, though a certain or some server generation serviced for external offer
Abnormal, in group system, other server also is able to continue Servers-all is carried out service operation state
Monitoring so that the abnormal service of running status can recover normal so that service operation state energy
Enough it is monitored sustainedly and stably.
" first server ", " first service " mentioned in the embodiment of the present invention, " the first monitoring message ",
" first " in the title such as " the first response message ", " the first tables of data " is used only to do name mark,
Do not represent first sequentially.This rule is equally applicable to " second " etc..
As seen through the above description of the embodiments, those skilled in the art is it can be understood that arrive
The mode that all or part of step in above-described embodiment method can add general hardware platform by software is real
Existing.Based on such understanding, technical scheme can embody with the form of software product,
This computer software product can be stored in storage medium, as read only memory is (English: read-only
Memory, ROM)/RAM, magnetic disc, CD etc., including some instructions with so that a computer
Equipment (can be personal computer, server, or the network communication equipment such as such as router) performs
Each embodiment of the present invention or the method described in some part of embodiment.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical between each embodiment
Similar part sees mutually, and what each embodiment stressed is different from other embodiments
Part.For embodiment of the method and apparatus embodiments, owing to it is substantially similar to system in fact
Executing example, so describing fairly simple, relevant part sees the part of system embodiment and illustrates.With
Upper described equipment and system embodiment are only schematically, the mould wherein illustrated as separating component
Block can be or may not be physically separate, and the parts shown as module can be or also
Can not be physical module, i.e. may be located at a place, or multiple NE can also be distributed to
On.Some or all of module therein can be selected according to the actual needs to realize the present embodiment scheme
Purpose.Those of ordinary skill in the art, in the case of not paying creative work, are i.e. appreciated that also
Implement.
The above is only the preferred embodiment of the present invention, is not intended to limit protection scope of the present invention.
It should be pointed out that, for those skilled in the art, under the premise of not departing from the present invention,
Can also make some improvements and modifications, these improvements and modifications also should be regarded as protection scope of the present invention.
Claims (13)
1. the method for the running status monitoring service in group system, it is characterised in that described side
Method includes:
First server sends the first monitoring message to second server, and described first monitoring message is for asking
Ask the running status of first service on described second server, described first server and described second service
Device be in described group system any two for externally providing the server of service;
Described first server receives first that described second server returns for described first monitoring message
Response message, carries described first service on described second server in described first response message
Running state information;
If the running state information that described first service is on described second server represents that running status is different
Often, described first server sends recovers information, and described recovery information is for pointing out described
Described first service on second server recovers.
Method the most according to claim 1, it is characterised in that described recovery information is for being used for
The operational order sent to described second server, described operational order is used for triggering described second server
Performing to operate for the abnormality processing of described first service, the operation of described abnormality processing is used for making described first
Service recovers normal running status on described second server.
Method the most according to claim 2, it is characterised in that described abnormality processing operates as in institute
State and on second server, restart described first service, or, the operation of described abnormality processing is described second service
Device from the data base of described group system more new data to internal memory.
Method the most according to claim 1, it is characterised in that described recovery information is for being used for
The short message alarm notice sent to SMS platform, described short message alarm notice is used for triggering described SMS platform
Sending alarm message to preassigned user, described alarm message is used for pointing out described second server
Described first service be in abnormal running status.
Method the most according to claim 1, it is characterised in that described first server is to the second clothes
Business device sends monitoring message, particularly as follows:
Described first server is according to listening period set in advance, to described second clothes in the way of poll
Business device sends described first monitoring message.
Method the most according to claim 1, it is characterised in that be provided with in described first server
First server described in first program and the second program, described first program and described second program reusable
The communication function of itself;
Described first server sends the first monitoring message to second server, particularly as follows: described first clothes
Business device sends described first monitoring message by described first program to described second server;
Described first server receives the first response that described second server returns for described monitoring message
Message, particularly as follows: described first server by described second program receive described second server for
Described second response message that described monitoring message returns.
Method the most according to claim 1, it is characterised in that also include:
Described first server receives the second monitoring message that described second server sends, described second prison
Control message is for asking the running status of second service in described first server;
Described first server is based on described second service running state information in described first server
Generate the second response message, and return described for described second monitoring message to described second server
Two response messages.
Method the most according to claim 7, it is characterised in that preserve in described first server
First tables of data, described first tables of data is for recording the operation that in described first server, each service is current
Status information;
Described first server is based on described second service running state information in described first server
Generate the second response message, particularly as follows: described first server is based on described first tables of data current record
Running state information generate described second response message.
Method the most according to claim 1, it is characterised in that if the fortune of described first service
Row status information represents that running status is abnormal, and described first server sends recovers information, including:
If the running state information that described first service is on described second server represents that running status is different
Often, described first server recording exceptional service in second tables of data of the data base of described group system
Information, described exception service information includes monitored node mark, service type identification and monitor node mark
Knowing, described monitored node is designated the mark of described second server, and described service type identification is institute
Stating the mark of first service, described monitor node is designated the mark of described first server;
Described first server inquires about described second tables of data, and at described exception service information successfully record
Described exception service information is inquired in the case of described second tables of data;
Described first server, according to the instruction of described exception service information, sends described recovery information.
Method the most according to claim 9, it is characterised in that send in described first server
After described recovery information, also include:
After described first service recovers normal running status on described second server, described
One server is by the described exception service information deletion in described second tables of data.
The device of 11. 1 kinds of running statuses monitoring service in group system, it is characterised in that described
Device is configured at first server, including:
First transmitting element, for sending the first monitoring message, described first monitoring report to second server
Literary composition for asking the running status of first service on described second server, described first server and described
Second server be in described group system any two for externally providing the server of service;
First receives unit, for receiving what described second server returned for described first monitoring message
First response message, carries described first service at described second server in described first response message
On running state information;
Second transmitting element, if for described first service running status letter on described second server
Breath represents that running status is abnormal, and described first server sends recovers information, described recovery prompting letter
Described destination service on described second server is recovered by breath for prompting.
12. devices according to claim 11, it is characterised in that also include:
Second receives unit, for receiving the second monitoring message that described second server sends, and described the
Two monitoring messages are for asking the running status of second service in described first server;
Signal generating unit, for based on described second service running state information in described first server
Generate the second response message;
Return unit, for returning described second for described second monitoring message to described second server
Response message.
The system of 13. 1 kinds of running statuses monitoring service in group system, it is characterised in that include
First server and second server, described first server is configured with described in claim 11 or 12
Device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610311715.8A CN105978721B (en) | 2016-05-11 | 2016-05-11 | The methods, devices and systems of monitoring service operating status in a kind of group system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610311715.8A CN105978721B (en) | 2016-05-11 | 2016-05-11 | The methods, devices and systems of monitoring service operating status in a kind of group system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105978721A true CN105978721A (en) | 2016-09-28 |
CN105978721B CN105978721B (en) | 2019-04-12 |
Family
ID=56993003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610311715.8A Active CN105978721B (en) | 2016-05-11 | 2016-05-11 | The methods, devices and systems of monitoring service operating status in a kind of group system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105978721B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106161090A (en) * | 2016-07-12 | 2016-11-23 | 许继集团有限公司 | The monitoring method of a kind of subregion group system and device |
CN106713007A (en) * | 2016-11-15 | 2017-05-24 | 郑州云海信息技术有限公司 | Alarm monitoring system and alarm monitoring method and device for server |
CN107257384A (en) * | 2017-07-24 | 2017-10-17 | 北京小米移动软件有限公司 | Service state monitoring method and device |
CN109361525A (en) * | 2018-10-25 | 2019-02-19 | 珠海派诺科技股份有限公司 | Restart method, apparatus, controlling terminal and medium that distributed deployment services more |
CN109828883A (en) * | 2017-11-23 | 2019-05-31 | 腾讯科技(北京)有限公司 | Task data treating method and apparatus, storage medium and electronic device |
CN110445650A (en) * | 2019-08-07 | 2019-11-12 | 中国联合网络通信集团有限公司 | Detect alarm method, equipment and server |
CN110531988A (en) * | 2019-08-06 | 2019-12-03 | 新华三大数据技术有限公司 | The trend prediction method and relevant apparatus of application program |
CN111565135A (en) * | 2020-04-30 | 2020-08-21 | 吉林省鑫泽网络技术有限公司 | Method for monitoring operation of server, monitoring server and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075919A (en) * | 2006-06-22 | 2007-11-21 | 腾讯科技(深圳)有限公司 | Method and system for monitoring Internet service |
CN101207519A (en) * | 2007-12-13 | 2008-06-25 | 上海华为技术有限公司 | Version server, operation maintenance unit and method for restoring failure |
CN102231681A (en) * | 2011-06-27 | 2011-11-02 | 中国建设银行股份有限公司 | High availability cluster computer system and fault treatment method thereof |
CN102291275A (en) * | 2011-08-01 | 2011-12-21 | 烟台杰瑞网络商贸有限公司 | Server cluster monitoring technology and method |
-
2016
- 2016-05-11 CN CN201610311715.8A patent/CN105978721B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075919A (en) * | 2006-06-22 | 2007-11-21 | 腾讯科技(深圳)有限公司 | Method and system for monitoring Internet service |
CN101207519A (en) * | 2007-12-13 | 2008-06-25 | 上海华为技术有限公司 | Version server, operation maintenance unit and method for restoring failure |
CN102231681A (en) * | 2011-06-27 | 2011-11-02 | 中国建设银行股份有限公司 | High availability cluster computer system and fault treatment method thereof |
CN102291275A (en) * | 2011-08-01 | 2011-12-21 | 烟台杰瑞网络商贸有限公司 | Server cluster monitoring technology and method |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106161090A (en) * | 2016-07-12 | 2016-11-23 | 许继集团有限公司 | The monitoring method of a kind of subregion group system and device |
CN106713007A (en) * | 2016-11-15 | 2017-05-24 | 郑州云海信息技术有限公司 | Alarm monitoring system and alarm monitoring method and device for server |
CN107257384A (en) * | 2017-07-24 | 2017-10-17 | 北京小米移动软件有限公司 | Service state monitoring method and device |
CN107257384B (en) * | 2017-07-24 | 2021-08-17 | 北京小米移动软件有限公司 | Service state monitoring method and device |
CN109828883A (en) * | 2017-11-23 | 2019-05-31 | 腾讯科技(北京)有限公司 | Task data treating method and apparatus, storage medium and electronic device |
CN109828883B (en) * | 2017-11-23 | 2023-03-17 | 腾讯科技(北京)有限公司 | Task data processing method and device, storage medium and electronic device |
CN109361525A (en) * | 2018-10-25 | 2019-02-19 | 珠海派诺科技股份有限公司 | Restart method, apparatus, controlling terminal and medium that distributed deployment services more |
CN109361525B (en) * | 2018-10-25 | 2021-08-13 | 珠海派诺科技股份有限公司 | Method, device, control terminal and medium for restarting distributed deployment of multiple services |
CN110531988A (en) * | 2019-08-06 | 2019-12-03 | 新华三大数据技术有限公司 | The trend prediction method and relevant apparatus of application program |
CN110445650A (en) * | 2019-08-07 | 2019-11-12 | 中国联合网络通信集团有限公司 | Detect alarm method, equipment and server |
CN110445650B (en) * | 2019-08-07 | 2022-06-10 | 中国联合网络通信集团有限公司 | Detection alarm method, equipment and server |
CN111565135A (en) * | 2020-04-30 | 2020-08-21 | 吉林省鑫泽网络技术有限公司 | Method for monitoring operation of server, monitoring server and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105978721B (en) | 2019-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105978721A (en) | Method, device and system for monitoring operation state of services in clustering system | |
CN103605722B (en) | Database monitoring method and device, equipment | |
CN109710394A (en) | Timing task processing system and method | |
EP3210367B1 (en) | System and method for disaster recovery of cloud applications | |
CN106856489A (en) | A kind of service node switching method and apparatus of distributed memory system | |
CN107391276A (en) | Distributed monitor method, interception control device and system | |
CN108390907B (en) | Management monitoring system and method based on Hadoop cluster | |
CN107682169B (en) | Method and device for sending message by Kafka cluster | |
CN113704052B (en) | Operation and maintenance system, method, equipment and medium of micro-service architecture | |
CN107544867B (en) | Method, device and system for recovering intelligent network service | |
CN108268305A (en) | For the system and method for virtual machine scalable appearance automatically | |
EP3439237A1 (en) | Exception monitoring and alarming method and device | |
CN105429799A (en) | Server backup method and device | |
CN106021070A (en) | Method and device for server cluster monitoring | |
CN112422684A (en) | Target message processing method and device, storage medium and electronic device | |
CN106330531A (en) | Node fault recording and processing method and device | |
CN115202958A (en) | Power abnormity monitoring method and device, electronic equipment and storage medium | |
CN111813348A (en) | Node event processing device, method, equipment and medium in unified storage equipment | |
CN117130730A (en) | Metadata management method for federal Kubernetes cluster | |
JP2005301436A (en) | Cluster system and failure recovery method for it | |
CN117880254A (en) | Reconnection method for real-time communication | |
CN104734895A (en) | Service monitoring system and service monitoring method | |
JP6418377B2 (en) | Management target device, management device, and network management system | |
CN105025179A (en) | Method and system for monitoring service agents of call center | |
CN114301763B (en) | Distributed cluster fault processing method and system, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |