CN102148707A - Troubleshooting method and system of monitoring agents - Google Patents

Troubleshooting method and system of monitoring agents Download PDF

Info

Publication number
CN102148707A
CN102148707A CN2011100309810A CN201110030981A CN102148707A CN 102148707 A CN102148707 A CN 102148707A CN 2011100309810 A CN2011100309810 A CN 2011100309810A CN 201110030981 A CN201110030981 A CN 201110030981A CN 102148707 A CN102148707 A CN 102148707A
Authority
CN
China
Prior art keywords
monitoring agent
fault
takes place
handle
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100309810A
Other languages
Chinese (zh)
Inventor
王理想
刘成平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN2011100309810A priority Critical patent/CN102148707A/en
Publication of CN102148707A publication Critical patent/CN102148707A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a troubleshooting method and system of monitoring agents. In the method, a monitored resource node comprises a first monitoring agent and a second monitoring agent, wherein the second monitoring agent monitors the operation state of the first monitoring agent; and the second monitoring agent triggers the start flow of the first monitoring agent when monitoring that the first monitoring agent stops due to failures.

Description

The fault handling method of monitoring agent and system
Technical field
The present invention relates to computer application field, relate in particular to a kind of fault handling method and system of monitoring agent.
Background technology
Current, computer is more and more universal, and application surface is also more and more wider.The application of individual PC has promoted the extensive use of server.
Now, the large-scale businesses and institutions of scale is all moving ten hundreds of servers all the time.Along with the extensive use of server, how a large amount of servers, storage and the network equipment being managed effectively also becomes the problem that businesses and institutions more and more is concerned about.
For this reason, the numerous and confused device management software of releasing oneself of each big manufacturer server and software company.This device management software is by whether need installation agent realizing that management function is divided into 2 types on monitored resource node.Wherein a kind of is the monitoring agent program to be installed at monitored node, and management software just can be realized the management to monitored resource node; Another kind is need not installation agent, by Simple Network Management Protocol (Simple Network Management Protocol, SNMP) or IPMI (International Precious Metals Institute IPMI) wait to realize management.
In this device management software of two types, at monitored node the monitoring agent service routine not being installed is the easiest, safest way to manage.But, by snmp protocol and other such as agreements such as IPMI to realizing simple management to equipment.Along with the user to equipment control require more and more higher, this mode more and more can not satisfy user's regulatory requirement.Therefore, at monitored resource node the monitoring agent service routine being installed is present comparatively general a kind of mode.
In realizing process of the present invention, the inventor finds prior art, and there are the following problems:
Because the monitoring agent meeting out of service that a variety of causes causes makes Surveillance center's end to carry out proper communication with monitoring agent, causes and can't Surveillance center can't continue monitored resource node is managed, and becomes a weakness for this kind monitor mode.
Out of service in order to solve monitoring agent, can't be normally and the problem of Surveillance center's communication, we propose the scheme that a cover has the monitoring of tools management of flexibility and monitoring fault tolerance.
Summary of the invention
The invention provides a kind of fault handling method and system of monitoring agent, can not in time handle the problem that the monitoring agent operation is ended in the prior art to solve.
For solving the problems of the technologies described above, the invention provides following technical scheme:
A kind of fault handling method of monitoring agent, monitored resource node comprise first monitoring agent and second monitoring agent, wherein:
Described second monitoring agent is monitored the running status of described first monitoring agent;
Hinder operation for some reason when ending monitoring first monitoring agent, described second monitoring agent triggers the startup flow process of first monitoring agent.
Further, described method also has following characteristics: described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
Described second monitoring agent judges whether the fault that described first monitoring agent takes place needs to handle;
If the fault that described first monitoring agent takes place needs to handle, then described second monitoring agent starts described first monitoring agent after the fault of handling this described first monitoring agent generation; Otherwise described second monitoring agent directly starts described first monitoring agent.
Further, described method also has following characteristics: the process of the fault that this described first monitoring agent of described processing takes place comprises:
Described second monitoring agent is searched the processing policy of the fault correspondence of described first monitoring agent generation from the troubleshooting strategy that store in advance this locality;
If find the processing policy of the fault correspondence of first monitoring agent generation, then described second monitoring agent adopts this processing policy to handle the fault that described first monitoring agent takes place;
If do not find the processing policy of the fault correspondence of described first monitoring agent generation, then described second monitoring agent adopts this processing policy to handle the fault that described first monitoring agent takes place from the processing policy that Surveillance center obtains the fault correspondence of this monitoring agent generation again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
Further, described method also has following characteristics: described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
The information that described second monitoring agent reports the operation of first monitoring agent to end to Surveillance center;
The information that described Surveillance center ends according to described first monitoring agent operation starts described first monitoring agent.
Further, described method also has following characteristics: described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
Described second monitoring agent judges whether the fault that described first monitoring agent takes place needs to handle;
If the fault that described first monitoring agent takes place needs to handle, then described second monitoring agent notifies Surveillance center to handle the fault that described first monitoring agent takes place; Described Surveillance center handles the fault that described monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent;
If the fault that described first monitoring agent takes place does not need to handle, then described second monitoring agent directly starts described first monitoring agent.
A kind of fault processing system of monitoring agent, monitored resource node comprise first monitoring agent and second monitoring agent, and wherein said second monitoring agent comprises:
Supervising device is used to monitor the running status of described first monitoring agent;
Processing unit is used for hindering operation for some reason when ending monitoring first monitoring agent, and described second monitoring agent triggers the startup flow process of first monitoring agent.
Further, described method also has following characteristics: described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Start module, be used for when the fault that described first monitoring agent takes place need be handled, after the fault of handling this described first monitoring agent generation, starting described first monitoring agent; And described second monitoring agent does not directly start described first monitoring agent when the fault of described first monitoring agent generation does not need to handle.
Further, described method also has following characteristics: described processing unit also comprises:
Search module, the troubleshooting strategy that is used for storing in advance from this locality is searched the processing policy of the fault correspondence of described first monitoring agent generation;
First processing module is used for adopting this processing policy to handle the fault that described first monitoring agent takes place when the processing policy of the fault correspondence that finds the generation of first monitoring agent;
Second processing module, be used for when the processing policy of the fault correspondence that does not find described first monitoring agent generation, obtain the processing policy of the fault correspondence that this monitoring agent takes place from Surveillance center, adopt this processing policy to handle the fault that described first monitoring agent takes place again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
Further, described method also has following characteristics:
Described processing unit comprises:
Reporting module is used for the information that reports the operation of first monitoring agent to end to described Surveillance center;
Described system also comprises:
Surveillance center is used for the information according to described first monitoring agent operation termination, starts described first monitoring agent.
Further, described method also has following characteristics:
Described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Notification module is used for notifying described Surveillance center to handle the fault that described first monitoring agent takes place when the fault that described first monitoring agent takes place need be handled;
Start module, be used for when the fault that described first monitoring agent takes place does not need to handle, directly starting described first monitoring agent;
Described system also comprises:
Surveillance center is used to handle the fault that described first monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent.
Embodiment provided by the invention, monitor the running status of first monitoring agent by second monitoring agent, when the operation of first monitoring agent is ended, trigger the flow process that starts this first monitoring agent, shortened the time of finding the first monitoring agent fault, can shorten the time that starts first monitoring agent, the proper communication of guarantee information.
Description of drawings
Fig. 1 is the schematic flow sheet of the fault handling method of monitoring agent provided by the invention;
Fig. 2 is the structural representation of the fault processing system embodiment of monitoring agent provided by the invention;
Fig. 3 is the structural representation of processing unit described in the system shown in Figure 2 embodiment;
Fig. 4 is another structural representation of processing unit shown in Figure 3;
Fig. 5 is another structural representation of system shown in Figure 2 embodiment;
Fig. 6 is the another structural representation of system shown in Figure 2 embodiment.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with the accompanying drawings and the specific embodiments.Need to prove that under the situation of not conflicting, embodiment among the application and the feature among the embodiment be combination in any mutually.
Fig. 1 is the schematic flow sheet of the fault handling method of monitoring agent provided by the invention.Method embodiment shown in Figure 1 comprises:
In embodiment one, Surveillance center is that monitored resource comprises first monitoring agent and second monitoring agent.
Wherein said monitored resource can be the physical equipment in a certain network system (cloudlike the calculating operation system), can in server, memory device (as database) and the transmission equipment (as switch and router etc.) at least one.
Step 101, second monitoring agent receive first monitoring agent and hinder the information of not moving for some reason;
Wherein, the running status of monitoring first monitoring agent can have this second monitoring agent to monitor, but be not limited thereto, also can realize monitoring function by the communication unit that is used in the monitored resource node communicating by letter with first monitoring agent, for example, this communication module is not being received corresponding response in a preset time after the first monitoring agent transmission information, then communication module determines that this first monitoring agent is not in running status, then sends this first monitoring agent to second monitoring agent and hinders the information of not moving for some reason.
Step 102, second monitoring agent trigger the flow process that starts this first monitoring agent.
Wherein carry out the difference of body, following dual mode arranged triggering the flow process that starts this first monitoring agent according to the operation that starts this first monitoring agent:
First kind of mode 102A started by second monitoring agent, and be specific as follows:
Step 201, this second monitoring agent judge whether the fault that described first monitoring agent takes place needs to handle;
For example, second monitoring agent can be stored a tabulation in advance, and writing down needs the fault handled in the contingent fault on first monitoring agent, adopts this tabulation to compare.
If the fault that described first monitoring agent of step 202 takes place needs to handle, then after the fault of handling this described first monitoring agent generation, restart this first monitoring agent; Otherwise, directly start this first monitoring agent.
Need to prove that the content that step 201 is carried out is only to need this first monitoring agent of one-shot operation just can normally move when guaranteeing this first monitoring agent of follow-up startup, with the purpose in processing time of reaching the shortening fault.Certainly, also can directly start, because will disappear restarting first monitoring agent for some faults, such as causing the operation of first monitoring agent to end because Surveillance center has sent wrong order, this moment, directly this first monitoring agent of startup was just passable.Yet, for some faults, do not have the situation of enough hard drive space storing daily record information as first monitoring agent, after starting this first monitoring agent, this fault still exists, close this first monitoring agent, waiting for that troubleshooting is finished could start this first monitoring agent again, this shows, directly starts this first monitoring agent, the problem that restarting might occur has increased time of troubleshooting.
Wherein, the process of the fault that takes place of this described first monitoring agent of the processing in the step 202 also can following dual mode:
First kind of mode 202A specifically comprises:
The processing policy of the fault correspondence that the needs that steps A 1, second monitoring agent are stored in advance from this locality are handled is searched the processing policy of the fault correspondence of this first monitoring agent generation;
If find, execution in step A5 then; Otherwise, execution in step A2~A5;
Steps A 2, second monitoring agent are inquired about the processing policy of the fault correspondence of this first monitoring agent generation to Surveillance center;
Steps A 3, Surveillance center generate the processing policy of the fault correspondence of this first monitoring agent generation;
Steps A 4, Surveillance center send the processing policy of the fault correspondence of this first monitoring agent generation to second monitoring agent;
After steps A 4 is complete, execution in step A5.
Steps A 5, second monitoring agent adopt the processing policy that obtains to handle the fault that this first monitoring agent takes place;
Steps A 6, after detecting troubleshooting and finishing, second monitoring agent starts this first monitoring agent.
Wherein in first kind of mode 202A, if do not find the processing policy of the fault correspondence of this first monitoring agent generation behind the execution in step A1, the execution content of steps A 2~A5 can also be handled in the following way: the information that second monitoring agent reports the operation of first monitoring agent to end to Surveillance center; The information that Surveillance center reports according to second monitoring agent is carried out troubleshooting to this first monitoring agent.
Second way 102B is managed the startup of first monitoring agent jointly by Surveillance center or itself and second monitoring agent, and is specific as follows:
First kind of mode specifically comprises for only starting this first monitoring agent by Surveillance center:
The information that described second monitoring agent reports the operation of first monitoring agent to end to Surveillance center; The information that described Surveillance center ends according to described first monitoring agent operation starts described first monitoring agent.
In this mode, as long as the operation of first monitoring agent is ended, second monitoring agent will send information to Surveillance center, to trigger the flow process that Surveillance center starts first monitoring agent.The advantage of this kind mode is, could determine that with the information that Surveillance center in the prior art can not receive the transmission of first monitoring agent by a period of time first monitoring agent moves termination and compares, the information that the operation of first monitoring agent is ended can in time be known in Surveillance center, can carry out troubleshooting fast, the operating process that shortens the time that first monitoring agent operation ends and second monitoring agent is simple.
The second way is that the Surveillance center and second monitoring agent manage the startup of first monitoring agent jointly, specifically comprises:
Described second monitoring agent is searched the fault that whether comprises that described first monitoring agent takes place in the fault message that the needs of storage are in advance handled;
If do not find the fault that described first monitoring agent takes place in the described fault message that needs to handle, then described second monitoring agent directly starts this first monitoring agent;
If in the described fault message that needs to handle, find the fault that described first monitoring agent takes place, then described second monitoring agent notifies Surveillance center to handle the fault that described first monitoring agent takes place, described Surveillance center handles the fault that described first monitoring agent takes place, and after troubleshooting is finished, start described first monitoring agent.
Because some faults fault after restarting this first monitoring agent will disappear, so in order to shorten the processing time of first monitoring agent, preferably, second monitoring agent can judge earlier whether the fault that first monitoring agent takes place needs to handle, handle if desired, then first monitoring agent that reports to Surveillance center moves the information of ending, otherwise second monitoring agent directly starts this first monitoring agent.As seen from the above, the fault for fault after restarting this first monitoring agent will disappear is directly started by this second monitoring agent, has reduced the report flow of second monitoring agent, has also reduced the Processing tasks of Surveillance center.
Need to prove in actual applications, when monitored resource node comprises a plurality of monitoring agent, only need have a monitoring agent to be got final product by remaining at least one monitoring agent monitoring.For instance, two monitoring agents are arranged on the monitored resource node, both can monitor the other side's running status mutually.
Certainly, the content of acting on behalf of of a plurality of monitoring agents on the monitored resource node can be identical, also can be different.For example, interface that monitoring agent is responsible for providing various information to obtain; Another monitoring agent is responsible for the supervisory control system running situation.
Fig. 2 is the structural representation of the fault processing system embodiment of monitoring agent provided by the invention.Content in conjunction with method embodiment shown in Figure 1, system shown in Figure 2 comprises: monitored resource node comprises first monitoring agent and second monitoring agent, wherein said first monitoring agent is different with the agent functionality of described second monitoring agent, and described second monitoring agent comprises:
Supervising device is used to monitor the running status of described first monitoring agent;
Processing unit is used for hindering operation for some reason when ending monitoring first monitoring agent, and described second monitoring agent triggers the startup flow process of first monitoring agent.
Fig. 3 is the structural representation of processing unit described in the system shown in Figure 2 embodiment.Processing unit shown in Figure 3 comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Start module, be used for when the fault that described first monitoring agent takes place need be handled, after the fault of handling this described first monitoring agent generation, starting described first monitoring agent; And described second monitoring agent does not directly start described first monitoring agent when the fault of described first monitoring agent generation does not need to handle.
Fig. 4 is another structural representation of processing unit shown in Figure 3.Processing unit shown in Figure 4 also comprises:
Search module, the troubleshooting strategy that is used for storing in advance from this locality is searched the processing policy of the fault correspondence of described first monitoring agent generation;
First processing module is used for adopting this processing policy to handle the fault that described first monitoring agent takes place when the processing policy of the fault correspondence that finds the generation of first monitoring agent;
Second processing module, be used for when the processing policy of the fault correspondence that does not find described first monitoring agent generation, obtain the processing policy of the fault correspondence that this monitoring agent takes place from Surveillance center, adopt this processing policy to handle the fault that described first monitoring agent takes place again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
Fig. 5 is another structural representation of system shown in Figure 2 embodiment.System shown in Figure 5 is specific as follows:
Described processing unit comprises:
Reporting module is used for the information that reports the operation of first monitoring agent to end to described Surveillance center;
Described system also comprises:
Surveillance center is used for the information according to described first monitoring agent operation termination, starts described first monitoring agent.
Fig. 6 is the another structural representation of system shown in Figure 2 embodiment.System shown in Figure 6 is specific as follows:
Described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Notification module is used for notifying described Surveillance center to handle the fault that described first monitoring agent takes place when the fault that described first monitoring agent takes place need be handled;
Start module, be used for when the fault that described first monitoring agent takes place does not need to handle, directly starting described first monitoring agent;
Described system also comprises:
Surveillance center is used to handle the fault that described first monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent.
System embodiment provided by the invention, monitor the running status of first monitoring agent by second monitoring agent, when the operation of first monitoring agent is ended, trigger the flow process that starts this first monitoring agent, shortened the time of finding the first monitoring agent fault, can shorten the time that starts first monitoring agent, the proper communication of guarantee information.
The all or part of step that the one of ordinary skill in the art will appreciate that the foregoing description program circuit that can use a computer is realized, described computer program can be stored in the computer-readable recording medium, described computer program (as system, unit, device etc.) on the relevant hardware platform is carried out, when carrying out, comprise one of step or its combination of method embodiment.
Alternatively, all or part of step of the foregoing description also can use integrated circuit to realize, these steps can be made into integrated circuit modules one by one respectively, perhaps a plurality of modules in them or step is made into the single integrated circuit module and realizes.Like this, the present invention is not restricted to any specific hardware and software combination.
Each device/functional module/functional unit in the foregoing description can adopt the general calculation device to realize, they can concentrate on the single calculation element, also can be distributed on the network that a plurality of calculation element forms.
Each device/functional module/functional unit in the foregoing description is realized with the form of software function module and during as independently production marketing or use, can be stored in the computer read/write memory medium.The above-mentioned computer read/write memory medium of mentioning can be a read-only memory, disk or CD etc.
The above; only be the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the described protection range of claim.

Claims (10)

1. the fault handling method of a monitoring agent is characterized in that, monitored resource node comprises first monitoring agent and second monitoring agent, wherein:
Described second monitoring agent is monitored the running status of described first monitoring agent;
Hinder operation for some reason when ending monitoring first monitoring agent, described second monitoring agent triggers the startup flow process of first monitoring agent.
2. method according to claim 1 is characterized in that, described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
Described second monitoring agent judges whether the fault that described first monitoring agent takes place needs to handle;
If the fault that described first monitoring agent takes place needs to handle, then described second monitoring agent starts described first monitoring agent after the fault of handling this described first monitoring agent generation; Otherwise described second monitoring agent directly starts described first monitoring agent.
3. method according to claim 2 is characterized in that, the process of the fault that this described first monitoring agent of described processing takes place comprises:
Described second monitoring agent is searched the processing policy of the fault correspondence of described first monitoring agent generation from the troubleshooting strategy that store in advance this locality;
If find the processing policy of the fault correspondence of first monitoring agent generation, then described second monitoring agent adopts this processing policy to handle the fault that described first monitoring agent takes place;
If do not find the processing policy of the fault correspondence of described first monitoring agent generation, then described second monitoring agent adopts this processing policy to handle the fault that described first monitoring agent takes place from the processing policy that Surveillance center obtains the fault correspondence of this monitoring agent generation again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
4. method according to claim 1 is characterized in that, described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
The information that described second monitoring agent reports the operation of first monitoring agent to end to Surveillance center;
The information that described Surveillance center ends according to described first monitoring agent operation starts described first monitoring agent.
5. method according to claim 1 is characterized in that, described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
Described second monitoring agent judges whether the fault that described first monitoring agent takes place needs to handle;
If the fault that described first monitoring agent takes place needs to handle, then described second monitoring agent notifies Surveillance center to handle the fault that described first monitoring agent takes place; Described Surveillance center handles the fault that described monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent;
If the fault that described first monitoring agent takes place does not need to handle, then described second monitoring agent directly starts described first monitoring agent.
6. the fault processing system of a monitoring agent is characterized in that, monitored resource node comprises first monitoring agent and second monitoring agent, and wherein said second monitoring agent comprises:
Supervising device is used to monitor the running status of described first monitoring agent;
Processing unit is used for hindering operation for some reason when ending monitoring first monitoring agent, and described second monitoring agent triggers the startup flow process of first monitoring agent.
7. system according to claim 6 is characterized in that, described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Start module, be used for when the fault that described first monitoring agent takes place need be handled, after the fault of handling this described first monitoring agent generation, starting described first monitoring agent; And described second monitoring agent does not directly start described first monitoring agent when the fault of described first monitoring agent generation does not need to handle.
8. system according to claim 7 is characterized in that, described processing unit also comprises:
Search module, the troubleshooting strategy that is used for storing in advance from this locality is searched the processing policy of the fault correspondence of described first monitoring agent generation;
First processing module is used for adopting this processing policy to handle the fault that described first monitoring agent takes place when the processing policy of the fault correspondence that finds the generation of first monitoring agent;
Second processing module, be used for when the processing policy of the fault correspondence that does not find described first monitoring agent generation, obtain the processing policy of the fault correspondence that this monitoring agent takes place from Surveillance center, adopt this processing policy to handle the fault that described first monitoring agent takes place again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
9. system according to claim 6 is characterized in that:
Described processing unit comprises:
Reporting module is used for the information that reports the operation of first monitoring agent to end to described Surveillance center;
Described system also comprises:
Surveillance center is used for the information according to described first monitoring agent operation termination, starts described first monitoring agent.
10. system according to claim 6 is characterized in that:
Described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Notification module is used for notifying described Surveillance center to handle the fault that described first monitoring agent takes place when the fault that described first monitoring agent takes place need be handled;
Start module, be used for when the fault that described first monitoring agent takes place does not need to handle, directly starting described first monitoring agent;
Described system also comprises:
Surveillance center is used to handle the fault that described first monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent.
CN2011100309810A 2011-01-28 2011-01-28 Troubleshooting method and system of monitoring agents Pending CN102148707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100309810A CN102148707A (en) 2011-01-28 2011-01-28 Troubleshooting method and system of monitoring agents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100309810A CN102148707A (en) 2011-01-28 2011-01-28 Troubleshooting method and system of monitoring agents

Publications (1)

Publication Number Publication Date
CN102148707A true CN102148707A (en) 2011-08-10

Family

ID=44422724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100309810A Pending CN102148707A (en) 2011-01-28 2011-01-28 Troubleshooting method and system of monitoring agents

Country Status (1)

Country Link
CN (1) CN102148707A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103368789A (en) * 2012-03-29 2013-10-23 日本电气株式会社 Cluster monitor, method for monitoring a cluster, and computer-readable recording medium
WO2020107212A1 (en) * 2018-11-27 2020-06-04 刘馥祎 Computing device maintenance method and apparatus, storage medium, and program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009146311A1 (en) * 2008-05-29 2009-12-03 Citrix Systems, Inc. Systems and methods for load balancing via a plurality of virtual servers upon failover using metrics from a backup virtual server
CN101751326A (en) * 2008-12-18 2010-06-23 中国银联股份有限公司 System test equipment and test execution and monitoring method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009146311A1 (en) * 2008-05-29 2009-12-03 Citrix Systems, Inc. Systems and methods for load balancing via a plurality of virtual servers upon failover using metrics from a backup virtual server
CN101751326A (en) * 2008-12-18 2010-06-23 中国银联股份有限公司 System test equipment and test execution and monitoring method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103368789A (en) * 2012-03-29 2013-10-23 日本电气株式会社 Cluster monitor, method for monitoring a cluster, and computer-readable recording medium
CN103368789B (en) * 2012-03-29 2017-08-25 日本电气株式会社 Cluster monitor, the method for monitoring cluster and computer readable recording medium storing program for performing
WO2020107212A1 (en) * 2018-11-27 2020-06-04 刘馥祎 Computing device maintenance method and apparatus, storage medium, and program product
US11983539B2 (en) 2018-11-27 2024-05-14 Hong Kong Sunstar Technology Co., Limited Method for computing device maintenance, apparatus, storage medium and program product

Similar Documents

Publication Publication Date Title
US9141491B2 (en) Highly available server system based on cloud computing
CN102360324B (en) Failure recovery method and equipment for failure recovery
US20050237926A1 (en) Method for providing fault-tolerant application cluster service
CN104408071A (en) Distributive database high-availability method and system based on cluster manager
CN103607297A (en) Fault processing method of computer cluster system
CN101032123B (en) Method and apparatus for determining impact of faults on network service
CN103019889A (en) Distributed file system and failure processing method thereof
US10868581B2 (en) Data center management using device identification over power-line
CN103354503A (en) Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof
CN105227385A (en) A kind of method and system of troubleshooting
CN103581225A (en) Distributed system node processing task method
CN103795553A (en) Switching of main and standby servers on the basis of monitoring
CN102394914A (en) Cluster brain-split processing method and device
CN102437935B (en) WEB application monitoring method and equipment
EP2637102B1 (en) Cluster system with network node failover
CN105978721A (en) Method, device and system for monitoring operation state of services in clustering system
CN106330523A (en) Cluster server disaster recovery system and method, and server node
CN110618864A (en) Interrupt task recovery method and device
CN112612545A (en) Configuration hot loading system, method, equipment and medium of server cluster
CN102375772A (en) Server monitoring method and device
CN110417600A (en) Node switching method, device and the computer storage medium of distributed system
CN101262479B (en) A network file share method, server and network file share system
CN110958151B (en) Keep-alive detection method, keep-alive detection device, node, storage medium and communication system
CN102148707A (en) Troubleshooting method and system of monitoring agents
CN101958925A (en) Method and device for controlling remote equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110810