CN102148707A - Troubleshooting method and system of monitoring agents - Google Patents
Troubleshooting method and system of monitoring agents Download PDFInfo
- Publication number
- CN102148707A CN102148707A CN2011100309810A CN201110030981A CN102148707A CN 102148707 A CN102148707 A CN 102148707A CN 2011100309810 A CN2011100309810 A CN 2011100309810A CN 201110030981 A CN201110030981 A CN 201110030981A CN 102148707 A CN102148707 A CN 102148707A
- Authority
- CN
- China
- Prior art keywords
- monitoring agent
- fault
- takes place
- handle
- monitoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 337
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000013024 troubleshooting Methods 0.000 title claims abstract description 18
- 239000003795 chemical substances by application Substances 0.000 description 238
- 238000004891 communication Methods 0.000 description 7
- 238000007726 management method Methods 0.000 description 7
- 108010028984 3-isopropylmalate dehydratase Proteins 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000004883 computer application Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000010970 precious metal Substances 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Images
Landscapes
- Debugging And Monitoring (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention provides a troubleshooting method and system of monitoring agents. In the method, a monitored resource node comprises a first monitoring agent and a second monitoring agent, wherein the second monitoring agent monitors the operation state of the first monitoring agent; and the second monitoring agent triggers the start flow of the first monitoring agent when monitoring that the first monitoring agent stops due to failures.
Description
Technical field
The present invention relates to computer application field, relate in particular to a kind of fault handling method and system of monitoring agent.
Background technology
Current, computer is more and more universal, and application surface is also more and more wider.The application of individual PC has promoted the extensive use of server.
Now, the large-scale businesses and institutions of scale is all moving ten hundreds of servers all the time.Along with the extensive use of server, how a large amount of servers, storage and the network equipment being managed effectively also becomes the problem that businesses and institutions more and more is concerned about.
For this reason, the numerous and confused device management software of releasing oneself of each big manufacturer server and software company.This device management software is by whether need installation agent realizing that management function is divided into 2 types on monitored resource node.Wherein a kind of is the monitoring agent program to be installed at monitored node, and management software just can be realized the management to monitored resource node; Another kind is need not installation agent, by Simple Network Management Protocol (Simple Network Management Protocol, SNMP) or IPMI (International Precious Metals Institute IPMI) wait to realize management.
In this device management software of two types, at monitored node the monitoring agent service routine not being installed is the easiest, safest way to manage.But, by snmp protocol and other such as agreements such as IPMI to realizing simple management to equipment.Along with the user to equipment control require more and more higher, this mode more and more can not satisfy user's regulatory requirement.Therefore, at monitored resource node the monitoring agent service routine being installed is present comparatively general a kind of mode.
In realizing process of the present invention, the inventor finds prior art, and there are the following problems:
Because the monitoring agent meeting out of service that a variety of causes causes makes Surveillance center's end to carry out proper communication with monitoring agent, causes and can't Surveillance center can't continue monitored resource node is managed, and becomes a weakness for this kind monitor mode.
Out of service in order to solve monitoring agent, can't be normally and the problem of Surveillance center's communication, we propose the scheme that a cover has the monitoring of tools management of flexibility and monitoring fault tolerance.
Summary of the invention
The invention provides a kind of fault handling method and system of monitoring agent, can not in time handle the problem that the monitoring agent operation is ended in the prior art to solve.
For solving the problems of the technologies described above, the invention provides following technical scheme:
A kind of fault handling method of monitoring agent, monitored resource node comprise first monitoring agent and second monitoring agent, wherein:
Described second monitoring agent is monitored the running status of described first monitoring agent;
Hinder operation for some reason when ending monitoring first monitoring agent, described second monitoring agent triggers the startup flow process of first monitoring agent.
Further, described method also has following characteristics: described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
Described second monitoring agent judges whether the fault that described first monitoring agent takes place needs to handle;
If the fault that described first monitoring agent takes place needs to handle, then described second monitoring agent starts described first monitoring agent after the fault of handling this described first monitoring agent generation; Otherwise described second monitoring agent directly starts described first monitoring agent.
Further, described method also has following characteristics: the process of the fault that this described first monitoring agent of described processing takes place comprises:
Described second monitoring agent is searched the processing policy of the fault correspondence of described first monitoring agent generation from the troubleshooting strategy that store in advance this locality;
If find the processing policy of the fault correspondence of first monitoring agent generation, then described second monitoring agent adopts this processing policy to handle the fault that described first monitoring agent takes place;
If do not find the processing policy of the fault correspondence of described first monitoring agent generation, then described second monitoring agent adopts this processing policy to handle the fault that described first monitoring agent takes place from the processing policy that Surveillance center obtains the fault correspondence of this monitoring agent generation again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
Further, described method also has following characteristics: described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
The information that described second monitoring agent reports the operation of first monitoring agent to end to Surveillance center;
The information that described Surveillance center ends according to described first monitoring agent operation starts described first monitoring agent.
Further, described method also has following characteristics: described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
Described second monitoring agent judges whether the fault that described first monitoring agent takes place needs to handle;
If the fault that described first monitoring agent takes place needs to handle, then described second monitoring agent notifies Surveillance center to handle the fault that described first monitoring agent takes place; Described Surveillance center handles the fault that described monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent;
If the fault that described first monitoring agent takes place does not need to handle, then described second monitoring agent directly starts described first monitoring agent.
A kind of fault processing system of monitoring agent, monitored resource node comprise first monitoring agent and second monitoring agent, and wherein said second monitoring agent comprises:
Supervising device is used to monitor the running status of described first monitoring agent;
Processing unit is used for hindering operation for some reason when ending monitoring first monitoring agent, and described second monitoring agent triggers the startup flow process of first monitoring agent.
Further, described method also has following characteristics: described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Start module, be used for when the fault that described first monitoring agent takes place need be handled, after the fault of handling this described first monitoring agent generation, starting described first monitoring agent; And described second monitoring agent does not directly start described first monitoring agent when the fault of described first monitoring agent generation does not need to handle.
Further, described method also has following characteristics: described processing unit also comprises:
Search module, the troubleshooting strategy that is used for storing in advance from this locality is searched the processing policy of the fault correspondence of described first monitoring agent generation;
First processing module is used for adopting this processing policy to handle the fault that described first monitoring agent takes place when the processing policy of the fault correspondence that finds the generation of first monitoring agent;
Second processing module, be used for when the processing policy of the fault correspondence that does not find described first monitoring agent generation, obtain the processing policy of the fault correspondence that this monitoring agent takes place from Surveillance center, adopt this processing policy to handle the fault that described first monitoring agent takes place again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
Further, described method also has following characteristics:
Described processing unit comprises:
Reporting module is used for the information that reports the operation of first monitoring agent to end to described Surveillance center;
Described system also comprises:
Surveillance center is used for the information according to described first monitoring agent operation termination, starts described first monitoring agent.
Further, described method also has following characteristics:
Described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Notification module is used for notifying described Surveillance center to handle the fault that described first monitoring agent takes place when the fault that described first monitoring agent takes place need be handled;
Start module, be used for when the fault that described first monitoring agent takes place does not need to handle, directly starting described first monitoring agent;
Described system also comprises:
Surveillance center is used to handle the fault that described first monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent.
Embodiment provided by the invention, monitor the running status of first monitoring agent by second monitoring agent, when the operation of first monitoring agent is ended, trigger the flow process that starts this first monitoring agent, shortened the time of finding the first monitoring agent fault, can shorten the time that starts first monitoring agent, the proper communication of guarantee information.
Description of drawings
Fig. 1 is the schematic flow sheet of the fault handling method of monitoring agent provided by the invention;
Fig. 2 is the structural representation of the fault processing system embodiment of monitoring agent provided by the invention;
Fig. 3 is the structural representation of processing unit described in the system shown in Figure 2 embodiment;
Fig. 4 is another structural representation of processing unit shown in Figure 3;
Fig. 5 is another structural representation of system shown in Figure 2 embodiment;
Fig. 6 is the another structural representation of system shown in Figure 2 embodiment.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with the accompanying drawings and the specific embodiments.Need to prove that under the situation of not conflicting, embodiment among the application and the feature among the embodiment be combination in any mutually.
Fig. 1 is the schematic flow sheet of the fault handling method of monitoring agent provided by the invention.Method embodiment shown in Figure 1 comprises:
In embodiment one, Surveillance center is that monitored resource comprises first monitoring agent and second monitoring agent.
Wherein said monitored resource can be the physical equipment in a certain network system (cloudlike the calculating operation system), can in server, memory device (as database) and the transmission equipment (as switch and router etc.) at least one.
Step 101, second monitoring agent receive first monitoring agent and hinder the information of not moving for some reason;
Wherein, the running status of monitoring first monitoring agent can have this second monitoring agent to monitor, but be not limited thereto, also can realize monitoring function by the communication unit that is used in the monitored resource node communicating by letter with first monitoring agent, for example, this communication module is not being received corresponding response in a preset time after the first monitoring agent transmission information, then communication module determines that this first monitoring agent is not in running status, then sends this first monitoring agent to second monitoring agent and hinders the information of not moving for some reason.
Step 102, second monitoring agent trigger the flow process that starts this first monitoring agent.
Wherein carry out the difference of body, following dual mode arranged triggering the flow process that starts this first monitoring agent according to the operation that starts this first monitoring agent:
First kind of mode 102A started by second monitoring agent, and be specific as follows:
Step 201, this second monitoring agent judge whether the fault that described first monitoring agent takes place needs to handle;
For example, second monitoring agent can be stored a tabulation in advance, and writing down needs the fault handled in the contingent fault on first monitoring agent, adopts this tabulation to compare.
If the fault that described first monitoring agent of step 202 takes place needs to handle, then after the fault of handling this described first monitoring agent generation, restart this first monitoring agent; Otherwise, directly start this first monitoring agent.
Need to prove that the content that step 201 is carried out is only to need this first monitoring agent of one-shot operation just can normally move when guaranteeing this first monitoring agent of follow-up startup, with the purpose in processing time of reaching the shortening fault.Certainly, also can directly start, because will disappear restarting first monitoring agent for some faults, such as causing the operation of first monitoring agent to end because Surveillance center has sent wrong order, this moment, directly this first monitoring agent of startup was just passable.Yet, for some faults, do not have the situation of enough hard drive space storing daily record information as first monitoring agent, after starting this first monitoring agent, this fault still exists, close this first monitoring agent, waiting for that troubleshooting is finished could start this first monitoring agent again, this shows, directly starts this first monitoring agent, the problem that restarting might occur has increased time of troubleshooting.
Wherein, the process of the fault that takes place of this described first monitoring agent of the processing in the step 202 also can following dual mode:
First kind of mode 202A specifically comprises:
The processing policy of the fault correspondence that the needs that steps A 1, second monitoring agent are stored in advance from this locality are handled is searched the processing policy of the fault correspondence of this first monitoring agent generation;
If find, execution in step A5 then; Otherwise, execution in step A2~A5;
Steps A 2, second monitoring agent are inquired about the processing policy of the fault correspondence of this first monitoring agent generation to Surveillance center;
Steps A 3, Surveillance center generate the processing policy of the fault correspondence of this first monitoring agent generation;
Steps A 4, Surveillance center send the processing policy of the fault correspondence of this first monitoring agent generation to second monitoring agent;
After steps A 4 is complete, execution in step A5.
Steps A 5, second monitoring agent adopt the processing policy that obtains to handle the fault that this first monitoring agent takes place;
Steps A 6, after detecting troubleshooting and finishing, second monitoring agent starts this first monitoring agent.
Wherein in first kind of mode 202A, if do not find the processing policy of the fault correspondence of this first monitoring agent generation behind the execution in step A1, the execution content of steps A 2~A5 can also be handled in the following way: the information that second monitoring agent reports the operation of first monitoring agent to end to Surveillance center; The information that Surveillance center reports according to second monitoring agent is carried out troubleshooting to this first monitoring agent.
Second way 102B is managed the startup of first monitoring agent jointly by Surveillance center or itself and second monitoring agent, and is specific as follows:
First kind of mode specifically comprises for only starting this first monitoring agent by Surveillance center:
The information that described second monitoring agent reports the operation of first monitoring agent to end to Surveillance center; The information that described Surveillance center ends according to described first monitoring agent operation starts described first monitoring agent.
In this mode, as long as the operation of first monitoring agent is ended, second monitoring agent will send information to Surveillance center, to trigger the flow process that Surveillance center starts first monitoring agent.The advantage of this kind mode is, could determine that with the information that Surveillance center in the prior art can not receive the transmission of first monitoring agent by a period of time first monitoring agent moves termination and compares, the information that the operation of first monitoring agent is ended can in time be known in Surveillance center, can carry out troubleshooting fast, the operating process that shortens the time that first monitoring agent operation ends and second monitoring agent is simple.
The second way is that the Surveillance center and second monitoring agent manage the startup of first monitoring agent jointly, specifically comprises:
Described second monitoring agent is searched the fault that whether comprises that described first monitoring agent takes place in the fault message that the needs of storage are in advance handled;
If do not find the fault that described first monitoring agent takes place in the described fault message that needs to handle, then described second monitoring agent directly starts this first monitoring agent;
If in the described fault message that needs to handle, find the fault that described first monitoring agent takes place, then described second monitoring agent notifies Surveillance center to handle the fault that described first monitoring agent takes place, described Surveillance center handles the fault that described first monitoring agent takes place, and after troubleshooting is finished, start described first monitoring agent.
Because some faults fault after restarting this first monitoring agent will disappear, so in order to shorten the processing time of first monitoring agent, preferably, second monitoring agent can judge earlier whether the fault that first monitoring agent takes place needs to handle, handle if desired, then first monitoring agent that reports to Surveillance center moves the information of ending, otherwise second monitoring agent directly starts this first monitoring agent.As seen from the above, the fault for fault after restarting this first monitoring agent will disappear is directly started by this second monitoring agent, has reduced the report flow of second monitoring agent, has also reduced the Processing tasks of Surveillance center.
Need to prove in actual applications, when monitored resource node comprises a plurality of monitoring agent, only need have a monitoring agent to be got final product by remaining at least one monitoring agent monitoring.For instance, two monitoring agents are arranged on the monitored resource node, both can monitor the other side's running status mutually.
Certainly, the content of acting on behalf of of a plurality of monitoring agents on the monitored resource node can be identical, also can be different.For example, interface that monitoring agent is responsible for providing various information to obtain; Another monitoring agent is responsible for the supervisory control system running situation.
Fig. 2 is the structural representation of the fault processing system embodiment of monitoring agent provided by the invention.Content in conjunction with method embodiment shown in Figure 1, system shown in Figure 2 comprises: monitored resource node comprises first monitoring agent and second monitoring agent, wherein said first monitoring agent is different with the agent functionality of described second monitoring agent, and described second monitoring agent comprises:
Supervising device is used to monitor the running status of described first monitoring agent;
Processing unit is used for hindering operation for some reason when ending monitoring first monitoring agent, and described second monitoring agent triggers the startup flow process of first monitoring agent.
Fig. 3 is the structural representation of processing unit described in the system shown in Figure 2 embodiment.Processing unit shown in Figure 3 comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Start module, be used for when the fault that described first monitoring agent takes place need be handled, after the fault of handling this described first monitoring agent generation, starting described first monitoring agent; And described second monitoring agent does not directly start described first monitoring agent when the fault of described first monitoring agent generation does not need to handle.
Fig. 4 is another structural representation of processing unit shown in Figure 3.Processing unit shown in Figure 4 also comprises:
Search module, the troubleshooting strategy that is used for storing in advance from this locality is searched the processing policy of the fault correspondence of described first monitoring agent generation;
First processing module is used for adopting this processing policy to handle the fault that described first monitoring agent takes place when the processing policy of the fault correspondence that finds the generation of first monitoring agent;
Second processing module, be used for when the processing policy of the fault correspondence that does not find described first monitoring agent generation, obtain the processing policy of the fault correspondence that this monitoring agent takes place from Surveillance center, adopt this processing policy to handle the fault that described first monitoring agent takes place again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
Fig. 5 is another structural representation of system shown in Figure 2 embodiment.System shown in Figure 5 is specific as follows:
Described processing unit comprises:
Reporting module is used for the information that reports the operation of first monitoring agent to end to described Surveillance center;
Described system also comprises:
Surveillance center is used for the information according to described first monitoring agent operation termination, starts described first monitoring agent.
Fig. 6 is the another structural representation of system shown in Figure 2 embodiment.System shown in Figure 6 is specific as follows:
Described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Notification module is used for notifying described Surveillance center to handle the fault that described first monitoring agent takes place when the fault that described first monitoring agent takes place need be handled;
Start module, be used for when the fault that described first monitoring agent takes place does not need to handle, directly starting described first monitoring agent;
Described system also comprises:
Surveillance center is used to handle the fault that described first monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent.
System embodiment provided by the invention, monitor the running status of first monitoring agent by second monitoring agent, when the operation of first monitoring agent is ended, trigger the flow process that starts this first monitoring agent, shortened the time of finding the first monitoring agent fault, can shorten the time that starts first monitoring agent, the proper communication of guarantee information.
The all or part of step that the one of ordinary skill in the art will appreciate that the foregoing description program circuit that can use a computer is realized, described computer program can be stored in the computer-readable recording medium, described computer program (as system, unit, device etc.) on the relevant hardware platform is carried out, when carrying out, comprise one of step or its combination of method embodiment.
Alternatively, all or part of step of the foregoing description also can use integrated circuit to realize, these steps can be made into integrated circuit modules one by one respectively, perhaps a plurality of modules in them or step is made into the single integrated circuit module and realizes.Like this, the present invention is not restricted to any specific hardware and software combination.
Each device/functional module/functional unit in the foregoing description can adopt the general calculation device to realize, they can concentrate on the single calculation element, also can be distributed on the network that a plurality of calculation element forms.
Each device/functional module/functional unit in the foregoing description is realized with the form of software function module and during as independently production marketing or use, can be stored in the computer read/write memory medium.The above-mentioned computer read/write memory medium of mentioning can be a read-only memory, disk or CD etc.
The above; only be the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the described protection range of claim.
Claims (10)
1. the fault handling method of a monitoring agent is characterized in that, monitored resource node comprises first monitoring agent and second monitoring agent, wherein:
Described second monitoring agent is monitored the running status of described first monitoring agent;
Hinder operation for some reason when ending monitoring first monitoring agent, described second monitoring agent triggers the startup flow process of first monitoring agent.
2. method according to claim 1 is characterized in that, described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
Described second monitoring agent judges whether the fault that described first monitoring agent takes place needs to handle;
If the fault that described first monitoring agent takes place needs to handle, then described second monitoring agent starts described first monitoring agent after the fault of handling this described first monitoring agent generation; Otherwise described second monitoring agent directly starts described first monitoring agent.
3. method according to claim 2 is characterized in that, the process of the fault that this described first monitoring agent of described processing takes place comprises:
Described second monitoring agent is searched the processing policy of the fault correspondence of described first monitoring agent generation from the troubleshooting strategy that store in advance this locality;
If find the processing policy of the fault correspondence of first monitoring agent generation, then described second monitoring agent adopts this processing policy to handle the fault that described first monitoring agent takes place;
If do not find the processing policy of the fault correspondence of described first monitoring agent generation, then described second monitoring agent adopts this processing policy to handle the fault that described first monitoring agent takes place from the processing policy that Surveillance center obtains the fault correspondence of this monitoring agent generation again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
4. method according to claim 1 is characterized in that, described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
The information that described second monitoring agent reports the operation of first monitoring agent to end to Surveillance center;
The information that described Surveillance center ends according to described first monitoring agent operation starts described first monitoring agent.
5. method according to claim 1 is characterized in that, described second monitoring agent triggers the startup flow process of first monitoring agent, comprising:
Described second monitoring agent judges whether the fault that described first monitoring agent takes place needs to handle;
If the fault that described first monitoring agent takes place needs to handle, then described second monitoring agent notifies Surveillance center to handle the fault that described first monitoring agent takes place; Described Surveillance center handles the fault that described monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent;
If the fault that described first monitoring agent takes place does not need to handle, then described second monitoring agent directly starts described first monitoring agent.
6. the fault processing system of a monitoring agent is characterized in that, monitored resource node comprises first monitoring agent and second monitoring agent, and wherein said second monitoring agent comprises:
Supervising device is used to monitor the running status of described first monitoring agent;
Processing unit is used for hindering operation for some reason when ending monitoring first monitoring agent, and described second monitoring agent triggers the startup flow process of first monitoring agent.
7. system according to claim 6 is characterized in that, described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Start module, be used for when the fault that described first monitoring agent takes place need be handled, after the fault of handling this described first monitoring agent generation, starting described first monitoring agent; And described second monitoring agent does not directly start described first monitoring agent when the fault of described first monitoring agent generation does not need to handle.
8. system according to claim 7 is characterized in that, described processing unit also comprises:
Search module, the troubleshooting strategy that is used for storing in advance from this locality is searched the processing policy of the fault correspondence of described first monitoring agent generation;
First processing module is used for adopting this processing policy to handle the fault that described first monitoring agent takes place when the processing policy of the fault correspondence that finds the generation of first monitoring agent;
Second processing module, be used for when the processing policy of the fault correspondence that does not find described first monitoring agent generation, obtain the processing policy of the fault correspondence that this monitoring agent takes place from Surveillance center, adopt this processing policy to handle the fault that described first monitoring agent takes place again; Perhaps, the described second monitoring agent request Surveillance center handles the fault that described second monitoring agent takes place.
9. system according to claim 6 is characterized in that:
Described processing unit comprises:
Reporting module is used for the information that reports the operation of first monitoring agent to end to described Surveillance center;
Described system also comprises:
Surveillance center is used for the information according to described first monitoring agent operation termination, starts described first monitoring agent.
10. system according to claim 6 is characterized in that:
Described processing unit comprises:
Judge module is used to judge whether the fault that described first monitoring agent takes place needs to handle;
Notification module is used for notifying described Surveillance center to handle the fault that described first monitoring agent takes place when the fault that described first monitoring agent takes place need be handled;
Start module, be used for when the fault that described first monitoring agent takes place does not need to handle, directly starting described first monitoring agent;
Described system also comprises:
Surveillance center is used to handle the fault that described first monitoring agent takes place, and after troubleshooting is finished, starts described first monitoring agent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100309810A CN102148707A (en) | 2011-01-28 | 2011-01-28 | Troubleshooting method and system of monitoring agents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100309810A CN102148707A (en) | 2011-01-28 | 2011-01-28 | Troubleshooting method and system of monitoring agents |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102148707A true CN102148707A (en) | 2011-08-10 |
Family
ID=44422724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011100309810A Pending CN102148707A (en) | 2011-01-28 | 2011-01-28 | Troubleshooting method and system of monitoring agents |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102148707A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103368789A (en) * | 2012-03-29 | 2013-10-23 | 日本电气株式会社 | Cluster monitor, method for monitoring a cluster, and computer-readable recording medium |
WO2020107212A1 (en) * | 2018-11-27 | 2020-06-04 | 刘馥祎 | Computing device maintenance method and apparatus, storage medium, and program product |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009146311A1 (en) * | 2008-05-29 | 2009-12-03 | Citrix Systems, Inc. | Systems and methods for load balancing via a plurality of virtual servers upon failover using metrics from a backup virtual server |
CN101751326A (en) * | 2008-12-18 | 2010-06-23 | 中国银联股份有限公司 | System test equipment and test execution and monitoring method |
-
2011
- 2011-01-28 CN CN2011100309810A patent/CN102148707A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009146311A1 (en) * | 2008-05-29 | 2009-12-03 | Citrix Systems, Inc. | Systems and methods for load balancing via a plurality of virtual servers upon failover using metrics from a backup virtual server |
CN101751326A (en) * | 2008-12-18 | 2010-06-23 | 中国银联股份有限公司 | System test equipment and test execution and monitoring method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103368789A (en) * | 2012-03-29 | 2013-10-23 | 日本电气株式会社 | Cluster monitor, method for monitoring a cluster, and computer-readable recording medium |
CN103368789B (en) * | 2012-03-29 | 2017-08-25 | 日本电气株式会社 | Cluster monitor, the method for monitoring cluster and computer readable recording medium storing program for performing |
WO2020107212A1 (en) * | 2018-11-27 | 2020-06-04 | 刘馥祎 | Computing device maintenance method and apparatus, storage medium, and program product |
US11983539B2 (en) | 2018-11-27 | 2024-05-14 | Hong Kong Sunstar Technology Co., Limited | Method for computing device maintenance, apparatus, storage medium and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9141491B2 (en) | Highly available server system based on cloud computing | |
CN102360324B (en) | Failure recovery method and equipment for failure recovery | |
US20050237926A1 (en) | Method for providing fault-tolerant application cluster service | |
CN104408071A (en) | Distributive database high-availability method and system based on cluster manager | |
CN103607297A (en) | Fault processing method of computer cluster system | |
CN101032123B (en) | Method and apparatus for determining impact of faults on network service | |
CN103019889A (en) | Distributed file system and failure processing method thereof | |
US10868581B2 (en) | Data center management using device identification over power-line | |
CN103354503A (en) | Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof | |
CN105227385A (en) | A kind of method and system of troubleshooting | |
CN103581225A (en) | Distributed system node processing task method | |
CN103795553A (en) | Switching of main and standby servers on the basis of monitoring | |
CN102394914A (en) | Cluster brain-split processing method and device | |
CN102437935B (en) | WEB application monitoring method and equipment | |
EP2637102B1 (en) | Cluster system with network node failover | |
CN105978721A (en) | Method, device and system for monitoring operation state of services in clustering system | |
CN106330523A (en) | Cluster server disaster recovery system and method, and server node | |
CN110618864A (en) | Interrupt task recovery method and device | |
CN112612545A (en) | Configuration hot loading system, method, equipment and medium of server cluster | |
CN102375772A (en) | Server monitoring method and device | |
CN110417600A (en) | Node switching method, device and the computer storage medium of distributed system | |
CN101262479B (en) | A network file share method, server and network file share system | |
CN110958151B (en) | Keep-alive detection method, keep-alive detection device, node, storage medium and communication system | |
CN102148707A (en) | Troubleshooting method and system of monitoring agents | |
CN101958925A (en) | Method and device for controlling remote equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20110810 |