CN106713014B - Monitored host in monitoring system, monitoring system and monitoring method - Google Patents

Monitored host in monitoring system, monitoring system and monitoring method Download PDF

Info

Publication number
CN106713014B
CN106713014B CN201611088934.0A CN201611088934A CN106713014B CN 106713014 B CN106713014 B CN 106713014B CN 201611088934 A CN201611088934 A CN 201611088934A CN 106713014 B CN106713014 B CN 106713014B
Authority
CN
China
Prior art keywords
monitoring
service
module
agent
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611088934.0A
Other languages
Chinese (zh)
Other versions
CN106713014A (en
Inventor
唐德平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huawei Cloud Computing Technology Co ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201611088934.0A priority Critical patent/CN106713014B/en
Publication of CN106713014A publication Critical patent/CN106713014A/en
Application granted granted Critical
Publication of CN106713014B publication Critical patent/CN106713014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0695Management of faults, events, alarms or notifications the faulty arrangement being the maintenance, administration or management system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A monitored host, a monitoring system and a monitoring method are used for providing the capability of interaction between the monitoring host and an external service system. The monitored host comprises a monitoring client, an agent tool module and a business module, wherein the agent tool module provides a service interface for the business module, and the business module records a key associated with each business failure category and a fault description parameter set corresponding to the key one by one; when the interaction between the business module and a business system outside the host fails, the business module sends a key corresponding to the business failure and a value of a fault description parameter corresponding to the failure to the agent tool module through the service interface; and the agent tool module writes the values of the fault description parameters into the template file corresponding to the key, generates and reports monitoring information to the monitoring server. By the method, the service failure caused by the failure of the non-monitored host is monitored.

Description

Monitored host in monitoring system, monitoring system and monitoring method
Technical Field
The present application relates to the field of network technologies, and in particular, to a monitored host, a monitoring system, and a monitoring method in a monitoring system.
Background
Zabbix is an open source distributed monitoring system, and can monitor data of network equipment. As shown in fig. 1, the Zabbix monitoring system includes a service host and several monitored hosts, and only one monitored host is shown in fig. 1. The service host comprises a Zabbix network (web) Graphical User Interface (GUI), a Zabbix database and a Zabbix server. In an equipment monitoring scheme implemented by Zabbix, a Zabbix client and a monitoring script are installed in a monitored host. A user adds some configuration information such as monitoring items in a Zabbix server through a Zabbix network GUI, and configures keys of the monitoring items and corresponding monitoring scripts in a configuration file of a monitoring client. The Zabbix client-side synchronizes some configuration information such as monitoring items from the Zabbix server-side, schedules the corresponding monitoring scripts to collect monitoring data according to the configuration information, and reports the collected monitoring data to the Zabbix server-side. The Zabbix server stores the received monitoring data into a Zabbix database, and the user can check the result of the monitoring data through a Zabbix network GUI.
The monitored host runs a service process, and a user interacts with a business system outside the host through the service process, wherein the business system can be a communication system, a database system or a web service system and the like. Because the Zabbix system only monitors the monitored host, when the link between the monitored host and the service system fails or the service system fails, the Zabbix system cannot find the failure in time. For example, the monitored host may be connected to a communication system, the monitored host may be connected to a short message gateway of an operator, and a user using the monitored host sends a short message through the short message gateway. However, when a communication link between the monitored host and the short message gateway fails or the short message gateway itself fails, the user cannot send a short message through the monitored host, thereby causing a short message service failure of the user.
Disclosure of Invention
The embodiment of the application provides a monitored host, a monitoring system and a monitoring method in a monitoring system, which are used for solving the problem that the monitoring system cannot find a fault in time when a link between the monitored host and a service system fails or the service system fails.
The embodiment of the application provides the following specific technical scheme: in a first aspect, a monitored host in a monitoring system is provided, where the monitored host includes a monitoring client, an agent module and a service module, the agent module provides a service interface to the service module, and the service module records a key associated with each service failure category and a failure description parameter set corresponding to the key one to one; the agent tool module records the corresponding relation between the key and the template file, and the template file comprises a fault description parameter set corresponding to the key; when the interaction between the business module and a business system outside the host fails, the business module sends a key corresponding to the business failure and a value of a fault description parameter corresponding to the failure to the agent tool module through the service interface; and the agent tool module writes the values of the fault description parameters into the template file corresponding to the key, generates and reports monitoring information to the monitoring server. The embodiment of the invention defines the fault reporting flow of the business module after the business failure by adding the agent tool module in the monitored host and providing the service interface for the business module by the agent tool module, thereby realizing the monitoring of the monitoring system on the business failure caused by the fault of the non-monitored host. The service module does not need to be coupled with the monitoring system, and the service module only needs to define keys and all fault description parameters of the JSON format required by the abnormal scene according to the abnormal scene of the service module.
In a possible design, after writing the values of the fault description parameters into a template file, the agent tool module generates values, where the values are character strings corresponding to the values of the fault description parameters; correspondingly, the monitoring information includes the key corresponding to the service failure category and the value.
In another possible design, the agent tool module may report the monitoring information to the monitoring server by calling a command line tool of the monitoring client; or, the agent module sends the monitoring information to the monitoring client, so that the monitoring client sends the monitoring information to the monitoring server.
In another possible design, the agent module provides a local loopback address to the service module through the service interface, and receives the service fault information transmitted by the service module in an HTTP manner.
The fault description parameters can adopt JSON objects.
In a possible scenario, the embodiment of the invention can customize the content and format of the monitoring information according to the needs by combining and changing the template file and the JSON object, and report the customized monitoring information to the monitoring server, thereby facilitating the system administrator to check the detailed abnormal conditions of the service. On the other hand, because stateless HTTP communication is adopted between the service module and the agent tool module, even if the process of the monitoring system breaks down, the service module is not influenced, so that the service of the user is not influenced, and the safety of the service is ensured.
The agent tool module can also execute a flow control strategy and limit the reporting frequency of the same type of service failure. And the flow control strategy comprises the step of limiting the reporting frequency of the monitoring information corresponding to the same key value to be not more than a preset value.
The agent tool module is combined with the monitoring client.
In a second aspect, there is provided a monitoring system comprising: the monitoring client and the agent tool module run on a monitored host, and the agent tool module provides a service interface for the business module; wherein the agent module has a function of implementing the agent module described in the above first aspect.
In a third aspect, a monitoring method is provided, where corresponding to the first aspect, the service module, the agent module, and the monitoring server execute functions of corresponding modules in the first aspect.
In a fourth aspect, there is provided another monitored host in a monitoring system, wherein the monitored host has a function of implementing the behavior of the monitored host in the first aspect and any one of the possible designs. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the monitored host in the monitoring system includes a transceiver and a processor, wherein the processor is configured to invoke a set of program code to perform the method as described in the second aspect and any one of the possible designs.
In a fifth aspect, there is provided a computer storage medium for storing computer software instructions for a monitored host of the above aspects, comprising a program designed for executing the above aspects.
Drawings
FIG. 1 is a prior art architecture diagram of a Zabbix monitoring system;
FIG. 2 is a diagram of a monitoring system architecture in an embodiment of the present application;
FIG. 3 is a schematic flow chart of a monitoring method in an embodiment of the present application;
fig. 4 is a schematic diagram of a monitored host hardware structure in the monitoring system in the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described with reference to the accompanying drawings.
As shown in fig. 2, a schematic structural diagram of a monitoring system provided in an embodiment of the present invention includes a monitored host 11 and a monitoring server 12, where the monitoring system is connected to a service system 14 through a network 13, and specifically, the monitored host includes a monitoring client 111, an agent module 112, and a service module 113. The agent module 112 provides a service interface to the business module 113, the business module 113 interacts with the agent module 112 through the service interface, and the business module 113 interacts with the business system by calling an interface of the business system 14.
In one possible scenario, a user logs in to the monitored host 11, runs an application on the host 11, accesses to the external service system 14 through the service module 113, and accesses a service provided by the service system 14. For example, the service system 14 may be a short message system, and the service module 113 is connected to a short message center of the service system 14 through the network 13, and sends a short message to the outside through the short message center.
In order to achieve the above object, in the embodiment of the present invention, an agent module 112 is newly added to the host 11, the agent module 112 provides a service interface to the service module 113, and when the service module 113 detects a service failure, service failure information is reported to the agent module 112 through the service interface, so that the monitoring information of the service failure is reported to the monitoring server through the agent module 112.
In one possible scenario, the host 11 may be any physical server in a physical server cluster, which may be a cloud computing physical server cluster providing cloud services to users; in another possible scenario, the host 11 may be a stand-alone physical server. The monitoring server 12 may run on a separate physical server.
In one possible design, the business module 113 may be a service process for processing business. Illustratively, the service module 113 may be an ns (notification service) process that communicates with a short message center.
The service module 113 is connected to the external service system 14, and accesses services provided by the service system 14. When the service access fails, the service module 113 determines the category of the service failure and the set of fault description parameters of the service failure.
It should be noted that the category of the service failure represents a factor causing the service failure. For example, the category of the service failure may include a link failure, an account number failure or a service system failure, and the like, and the set of failure description parameters may include a set of parameters that can accurately describe the cause of the service failure, such as a process identifier, a service system address, and a failure indication. Different traffic failure categories may correspond to different sets of failure description parameters. It can be understood by those skilled in the art that the category of the service failure and the set of the failure description parameter may be flexibly defined according to different scenarios, and the embodiment of the present invention does not limit the category of the service failure to the above example.
Further, the monitoring system may describe the monitoring information using a key-value format. At this time, before the monitoring system starts to operate, the service module 113 may record a key associated with each service failure category and a set of fault description parameters corresponding to the key one to one, and the service module 113 sends the key corresponding to each service failure category and the set of fault description parameters corresponding to the key one to the agent module 112. In one possible design, a key is assigned for each traffic failure category that may uniquely identify the traffic failure category. For example, the service failure category is connection timeout, no link response, link port failure, and the like, and the corresponding key may be set freely or according to the definition rule of the task.
The agent tool module 112 records a correspondence between a key and a template file, where the template file includes a fault description parameter set corresponding to the key.
In one possible design, the agent module 112 provides a RESTful interface to the business module 113, and the request content of the interface may be arbitrary json (javascript Object notification) format data. The agent tool module 112 generates a template file, and each fault description parameter in the fault description parameter set is metadata in the template file. One service failure type may correspond to one template file. The JSON object is a lightweight syntax format for data exchange, and for the detailed description of JSON, see https:// www.w.3. org/TR/JSON-ld/.
The service module 113 interacts with the service system 14 through the network 13, and when a service fails, sends service failure information to the agent module 112 through the service interface, where the service failure information includes keys corresponding to the type of the service failure and values of each failure description parameter in the failure description parameter set, where the values of each failure description parameter can accurately represent information of the service failure, including a service name, a failure reason, and the like, and specifically, the service name can be represented by a process identifier.
The agent tool module 112 receives the service fault information sent by the service module 113 through the service interface, searches the template file corresponding to the key according to the correspondence, writes the value of each fault description parameter into the template file, and generates monitoring information.
In a possible design, the agent tool module 112 reads data in JSON format provided by the service module 113 (service process) calling the RESTful interface, writes values of each fault description parameter into a template file, and generates final monitoring information. Specifically, the agent tool module 112 generates a value according to the written template file, where the value is a character string corresponding to the value of each fault description parameter, and correspondingly, the monitoring information includes a key corresponding to the type of the service failure and the value.
The agent module 112 reports the generated monitoring information to the monitoring server 12 through the monitoring client 111.
The agent tool module 112 may call a command line tool of the monitoring client 111 in a synchronous or asynchronous manner, and report the monitoring information to the monitoring server 12; alternatively, the agent module 112 sends the monitoring information to the monitoring client 111, so that the monitoring client 111 sends the monitoring information to the monitoring server 12.
In order to improve the monitoring efficiency of the monitoring system and avoid repeated and high-frequency reporting of the same fault, the agent module 112 may further have a flow control function to limit the number of times of repeated sending of the same type of monitoring information. For example, the agent module 112 executes a flow control policy, where the flow control policy includes that the reporting frequency of the monitoring information corresponding to the same key value is not greater than a preset value. Those skilled in the art understand that the preset value can be flexibly set by a system administrator according to requirements, and preferably, the reporting frequency can be set according to the service importance.
In one possible design, the agent module 112 may be deployed alone or in combination with the monitoring client 111.
The agent module 112 provides a local loopback address (e.g., 127.0.0.1) to the service module 113 through a RESTful service interface, and receives service failure information transmitted by the service module 113 through a hypertext transfer Protocol (HTTP). In particular, for a description of a Representational state transfer (REST) architecture and RESTful interface, see https:// zh.
The embodiment of the present application provides a monitored host 11 in a monitoring system, and an agent module 112(AgentTool) is added in the host 11, and by the above scheme, the problem of reporting a fault after a service failure of a service module 113 is solved. On one hand, the service module 113 does not need to be coupled with the monitoring system, and the service module 113 only needs to define keys and all fault description parameters in the JSON format required by the abnormal scene according to the abnormal scene of the service. On the other hand, by combining and changing the template file and the JSON object, the content and the format of the monitoring information can be customized according to needs, the customized monitoring information can be reported to the monitoring server, and a system administrator can conveniently check detailed abnormal conditions of the service. On the other hand, because stateless HTTP communication is used between the service module 113 and the agent module 112, even if a process of the monitoring system fails, the service module is not affected, so that the service of the user is not affected, and the service security is ensured.
In the embodiment of the present invention, the monitoring system may be a Zabbix system, and the command line tool may be a Zabbix Sender, which may transmit a Key/Value parameter.
Based on the architecture of the monitoring system shown in fig. 2, the monitoring method provided in the embodiment of the present application will be described below.
Referring to fig. 3, a monitoring method according to an embodiment of the present application is shown.
Step 301: the service module accesses the service system through the network, and determines the key corresponding to the service failure category and the fault description parameter set of the service failure according to the possible abnormal condition of the service.
The category of service failure represents a factor causing the service failure. For example, the category of the service failure may include a link failure, an account number failure or a service system failure, and the like, and the set of failure description parameters may include a set of parameters that can accurately describe the cause of the service failure, such as a process identifier, a service system address, and a failure indication. Different traffic failure categories may correspond to different sets of failure description parameters. Each fault description parameter in the set of fault description parameters may be in JSON format.
Examples are as follows:
Key:smn-001-001
the JSON body: set of fault description parameters
{
"Subject":"Channel Checking",
"ServiceName":"SMN-NS"
"ServiceAddress":"127.0.0.1"
"Error":"Error"
}
Step 302: and the service module sends the keys corresponding to the service failure categories and the fault description parameter sets corresponding to the keys one to the agent tool module.
In one possible design, the agent module provides a RESTful interface to the traffic module.
Step 303: and the agent tool module receives the keys sent by the service module and the fault description parameter sets corresponding to the keys one by one, and records the corresponding relation between the keys and the template file, wherein the template file comprises the fault description parameter sets corresponding to the keys.
The agent tool module generates a template file, and each fault description parameter in the fault description parameter set is a dynamic variable in the template file and used for representing metadata. One service failure type may correspond to one template file.
Template file example:
Figure BDA0001167209600000091
step 304: and a system administrator logs in the monitoring server through a graphical user interface of the monitoring system to create a key and a monitoring index.
In one possible design, the agent module may send the key and the set of fault description parameters to the monitoring server, and the monitoring server creates the key and a monitoring index, where the monitoring index may be in a text format and used to present the received monitoring information.
Step 305: the service module accesses the service system through the network, when a service failure is found, a service interface provided by the agent tool module is called, and service failure information is sent to the agent tool module, wherein the service failure information comprises keys corresponding to the type of the service failure and values of all failure description parameters in a failure description parameter set, and the values of all the failure description parameters can accurately represent the information of the service failure, including a service name, failure reasons and the like.
Step 306: and the agent tool module receives the service fault information sent by the service module through the service interface, searches the template file corresponding to the key according to the corresponding relation, writes the value of each fault description parameter into the template file, and generates monitoring information.
Step 307: and the agent tool module sends the monitoring information to the monitoring server.
Specifically, the agent tool module reads data in a JSON format in the fault description parameter set, and writes the value of each fault description parameter into a template file to generate a value, where the value is a character string corresponding to the value of each fault description parameter. And the agent tool module takes the key corresponding to the service failure type and the generated value as the input parameters of the command line tool, and calls the command line tool of the monitoring client to send monitoring information (namely the key and the generated value) to the monitoring server.
Specifically, the agent tool module may call a command line tool of the monitoring client in a synchronous or asynchronous manner, and report the monitoring information to the monitoring server; or, the agent module sends the monitoring information to the monitoring client, so that the monitoring client sends the monitoring information to the monitoring server.
The agent tool module and the monitoring server can adopt a JSON-remote protocol call (JSON-RPC) based remote call protocol.
Examples of monitoring information are as follows:
key:smn-001-001
value:
Senior Alert:Channel Checking
Components SMN-NS 127.0.0.1
Error:Can not connect channel
step 308: and the monitoring server receives the monitoring information, and triggers an alarm to notify a system administrator when the service is determined to fail.
The embodiment of the invention defines the fault reporting flow of the business module 113 after the business failure by adding the agent tool module in the monitored host and providing the service interface for the business module by the agent tool module, thereby realizing the monitoring of the monitoring system on the business failure caused by the fault of the non-monitored host. On one hand, the service module does not need to be coupled with the monitoring system, and the service module only needs to define keys and all fault description parameters of the JSON format required by the abnormal scene according to the abnormal scene of the service module. On the other hand, by combining and changing the template file and the JSON object, the content and the format of the monitoring information can be customized according to needs, the customized monitoring information can be reported to the monitoring server, and a system administrator can conveniently check detailed abnormal conditions of the service. On the other hand, because stateless HTTP communication is adopted between the service module and the agent tool module, even if the process of the monitoring system breaks down, the service module is not influenced, so that the service of the user is not influenced, and the safety of the service is ensured.
Corresponding to the monitoring system and the monitoring method, an embodiment of the present invention provides a monitored host, including the monitoring client, the agent module, and the service module. Each module in the monitored host executes the functions in the monitoring system and the monitoring method, and the embodiment of the invention is not repeated again.
Corresponding to the foregoing monitoring system and monitoring method, an embodiment of the present invention provides another monitored host, including the foregoing monitoring client and agent module. Each module in the monitored host executes the functions in the monitoring system and the monitoring method, and the embodiment of the invention is not repeated again. The service module may be located on another host in a network connection relationship with the monitored host.
Based on the same inventive concept, referring to fig. 4, an embodiment of the present application further provides another monitored host 400 in a monitoring system, which includes a transceiver 401, a processor 402, and a memory 403, where both the transceiver 401 and the memory 403 are connected to the processor 402, and it should be noted that the connection manner between the parts shown in fig. 4 is only one possible example, and also, both the transceiver 401 and the memory 403 are connected to the processor 402, and there is no connection between the transceiver 401 and the memory 403, or other possible connection manners.
The memory 403 stores a set of programs, and the processor 402 is configured to call the programs stored in the memory 403 to perform the functions of the modules of the monitored host in the monitoring system and the monitoring method shown in fig. 2 and 3.
In FIG. 4, the processor 402 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.
The processor 402 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The aforementioned PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
The memory 401 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory 401 may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 401 may also comprise a combination of the above kinds of memories.
The physical server where the monitoring server is located may also adopt a hardware structure as shown in fig. 4. The embodiment of the invention is not described in detail.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
These computer program codes may be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims (17)

1. A monitored host, comprising: a monitoring client, an agent tool module and a business module, wherein the agent tool module provides a service interface for the business module,
the service module is used for recording a key associated with each service failure category and fault description parameter sets corresponding to the keys one by one;
the agent tool module is used for recording the corresponding relation between a key and a template file, and the template file comprises a fault description parameter set corresponding to the key;
the service module is also used for interacting with a service system through a network, and when a service fails, service fault information is sent to the agent tool module through the service interface, wherein the service fault information comprises keys corresponding to the type of the service failure and values of fault description parameters in a fault description parameter set;
the agent tool module is further configured to receive service fault information sent by the service module through the service interface, search a template file corresponding to the key according to the correspondence, write values of the fault description parameters into the template file, and generate monitoring information;
and the agent tool module is also used for reporting the generated monitoring information to the monitoring server through the monitoring client.
2. The monitored host according to claim 1, wherein the agent module is further configured to generate a value after writing the value of each fault description parameter into a template file, where the value is a character string corresponding to the value of each fault description parameter;
correspondingly, the monitoring information includes the key corresponding to the service failure category and the value.
3. The monitored host of claim 1,
the agent tool module is specifically used for calling a command line tool of the monitoring client and reporting the monitoring information to the monitoring server; alternatively, the first and second electrodes may be,
the agent tool module is specifically configured to send the monitoring information to the monitoring client, so that the monitoring client sends the monitoring information to the monitoring server.
4. The monitored host according to any one of claims 1-3, wherein the agent module is specifically configured to provide a local loopback address to the traffic module through the service interface, and receive the traffic failure information transmitted by the traffic module in a hypertext transfer protocol (HTTP) manner.
5. A monitored host according to any one of claims 1-3,
the agent tool module is further configured to execute a flow control strategy, where the flow control strategy includes that the reporting frequency of monitoring information corresponding to the same key value is limited to be not greater than a preset value.
6. A monitored host as claimed in any one of claims 1 to 3, wherein the service interface employs HTTP; the fault description parameters are object notation JSON objects.
7. A monitored host as claimed in any one of claims 1 to 3, wherein said agent module is co-located with said monitoring client.
8. A monitoring system is characterized by comprising a monitoring client, an agent tool module and a monitoring server, wherein the monitoring client and the agent tool module run on a monitored host, the agent tool module provides a service interface for a business module,
the agent tool module is used for recording the corresponding relation between the keys and a template file, wherein the template file comprises a fault description parameter set corresponding to the keys, and each key corresponds to a service failure category;
the agent tool module is also used for receiving service fault information sent by the service module through the service interface, wherein the service fault information comprises keys corresponding to the types of the service failure and values of all fault description parameters in the fault description parameter set;
the agent tool module is further configured to search a template file corresponding to the key according to the corresponding relationship, write the fault description parameter into the template file, generate monitoring information, and report the generated monitoring information to the monitoring server through the monitoring client;
and the monitoring server is used for receiving the monitoring information.
9. The monitoring system according to claim 8, wherein the agent module is further configured to generate a value after writing the value of each fault description parameter into a template file, where the value is a character string corresponding to the value of each fault description parameter;
correspondingly, the monitoring information includes the key corresponding to the service failure category and the value.
10. A monitoring system in accordance with claim 8,
the agent tool module is specifically used for calling a command line tool of the monitoring client and reporting the monitoring information to the monitoring server; alternatively, the first and second electrodes may be,
the agent tool module is specifically configured to send the monitoring information to the monitoring client, so that the monitoring client sends the monitoring information to the monitoring server.
11. A monitoring system according to any one of claims 8-10,
the agent tool module is specifically configured to provide a local loopback address to the service module through the service interface, and receive the service fault information transmitted by the service module in an HTTP manner.
12. A monitoring system according to any one of claims 8-10,
the agent tool module is further configured to execute a flow control strategy, where the flow control strategy includes that the reporting frequency of monitoring information corresponding to the same key value is limited to be not greater than a preset value.
13. A method for monitoring a service, comprising:
the service module records a key associated with each service failure category and a fault description parameter set corresponding to the key one by one, and sends the key corresponding to each service failure category and the fault description parameter set corresponding to the key one by one to the agent tool module;
the agent tool module records the corresponding relation between the key and the template file, and the template file comprises a fault description parameter set corresponding to the key;
the service module interacts with a service system through a network, and when a service fails, service fault information is sent to the agent tool module through the service interface, wherein the service fault information comprises keys corresponding to the type of the service failure and values of fault description parameters in a fault description parameter set;
the agent tool module receives the service fault information sent by the service module through the service interface, searches the template file corresponding to the key according to the corresponding relation, writes the value of each fault description parameter into the template file, and generates monitoring information;
and the agent tool module reports the generated monitoring information to the monitoring server through the monitoring client.
14. The monitoring method according to claim 13, wherein the value is a character string corresponding to the value of each fault description parameter, and accordingly, the monitoring information includes a key corresponding to the category in which the current service fails and the value.
15. The monitoring method of claim 13, wherein the agent module reporting the generated monitoring information to the monitoring server through the monitoring client comprises:
the agent tool module calls a command line tool of the monitoring client and reports the monitoring information to the monitoring server; alternatively, the first and second electrodes may be,
and the agent tool module sends the monitoring information to the monitoring client so that the monitoring client sends the monitoring information to the monitoring server.
16. The monitoring method according to any one of claims 13-15, wherein the receiving, by the agent module, the service failure information sent by the service module via the service interface comprises:
the agent tool module provides a local loopback address for the business module through the service interface and receives the business fault information transmitted by the business module in an HTTP mode.
17. A method of monitoring as claimed in any of claims 13 to 15, the method further comprising:
and the agent tool module executes a flow control strategy, wherein the flow control strategy comprises that the reporting frequency of the monitoring information corresponding to the same key value is limited to be not more than a preset value.
CN201611088934.0A 2016-11-30 2016-11-30 Monitored host in monitoring system, monitoring system and monitoring method Active CN106713014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611088934.0A CN106713014B (en) 2016-11-30 2016-11-30 Monitored host in monitoring system, monitoring system and monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611088934.0A CN106713014B (en) 2016-11-30 2016-11-30 Monitored host in monitoring system, monitoring system and monitoring method

Publications (2)

Publication Number Publication Date
CN106713014A CN106713014A (en) 2017-05-24
CN106713014B true CN106713014B (en) 2020-01-10

Family

ID=58934415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611088934.0A Active CN106713014B (en) 2016-11-30 2016-11-30 Monitored host in monitoring system, monitoring system and monitoring method

Country Status (1)

Country Link
CN (1) CN106713014B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110752939B (en) * 2018-07-24 2022-09-16 成都华为技术有限公司 Service process fault processing method, notification method and device
CN112769622A (en) * 2021-01-18 2021-05-07 孙冬英 Cluster service fault early warning system based on RPC service monitoring

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1848764A (en) * 2005-12-22 2006-10-18 华为技术有限公司 Server and network equipment long-distance management maintenance system and realizing method
CN103003802A (en) * 2010-07-15 2013-03-27 思科技术公司 Monitoring of systems along a path
CN103176892A (en) * 2011-12-20 2013-06-26 阿里巴巴集团控股有限公司 Page monitoring method and system
CN105915405A (en) * 2016-03-29 2016-08-31 深圳市中博科创信息技术有限公司 Large-scale cluster node performance monitoring system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1848764A (en) * 2005-12-22 2006-10-18 华为技术有限公司 Server and network equipment long-distance management maintenance system and realizing method
CN103003802A (en) * 2010-07-15 2013-03-27 思科技术公司 Monitoring of systems along a path
CN103176892A (en) * 2011-12-20 2013-06-26 阿里巴巴集团控股有限公司 Page monitoring method and system
CN105915405A (en) * 2016-03-29 2016-08-31 深圳市中博科创信息技术有限公司 Large-scale cluster node performance monitoring system

Also Published As

Publication number Publication date
CN106713014A (en) 2017-05-24

Similar Documents

Publication Publication Date Title
US10263850B2 (en) Network testing device for automated topology validation
US9965758B2 (en) Troubleshooting transactions in a network environment
CN110036599B (en) Programming interface for network health information
US10931730B2 (en) Method and system for ISP network performance monitoring and fault detection
US10097433B2 (en) Dynamic configuration of entity polling using network topology and entity status
US20190132289A1 (en) Application-context-aware firewall
US20160077910A1 (en) Supportability framework for mobile software applications
US20180307735A1 (en) Integrating relational and non-relational databases
US11463310B2 (en) Blockchain network management
US10931513B2 (en) Event-triggered distributed data collection in a distributed transaction monitoring system
US10049403B2 (en) Transaction identification in a network environment
US9760874B2 (en) Transaction tracing in a network environment
US20230062588A1 (en) Recommending a candidate runbook based on a relevance of the results of the candidate runbook to remediation of an event
US11799892B2 (en) Methods for public cloud database activity monitoring and devices thereof
US8099489B2 (en) Network monitoring method and system
CN106713014B (en) Monitored host in monitoring system, monitoring system and monitoring method
US11902804B2 (en) Fault triage and management with restricted third-party access to a tenant network
US20170012814A1 (en) System Resiliency Tracing
US9544214B2 (en) System and method for optimized event monitoring in a management environment
US11924112B2 (en) Real-time data transaction configuration of network devices
US20140181281A1 (en) Connecting network management systems
US9172596B2 (en) Cross-network listening agent for network entity monitoring
EP3895015B1 (en) Collecting repeated diagnostics data from across users participating in a document collaboration session
CN105516297A (en) Information reporting method and device
CN114125054B (en) Content auditing system, method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220222

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221206

Address after: 518129 Huawei Headquarters Office Building 101, Wankecheng Community, Bantian Street, Longgang District, Shenzhen, Guangdong

Patentee after: Shenzhen Huawei Cloud Computing Technology Co.,Ltd.

Address before: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee before: Huawei Cloud Computing Technology Co.,Ltd.