WO2018214009A1

WO2018214009A1 - Server monitoring method and system

Info

Publication number: WO2018214009A1
Application number: PCT/CN2017/085437
Authority: WO
Inventors: 王一庭; 牛丽华; 蒋民
Original assignee: 深圳中兴力维技术有限公司
Priority date: 2017-05-23
Filing date: 2017-05-23
Publication date: 2018-11-29

Abstract

The present invention provides a server monitoring method and system belonging to the technical field of machine room operation and maintenance. The method comprises: a master node server selecting from an operation and maintenance template a template parameter corresponding to a monitored host, and sending the template parameter to a slave node server corresponding to the monitored host; the slave node server comparing data generated by the monitored host with the template parameter, and the slave node server reporting data to the master node server when the data generated by the monitored host matches the template parameter; and the master node server reporting the data to an operation and maintenance platform. The server monitoring method and system provided by the present invention reduce complexities of operation and maintenance parameter acquisition for different types of servers in an operation and maintenance system, perform centralized management on operation and maintenance of a system by means of deploying master and slave node servers, perform unified processing on operation and maintenance of the same type of servers through use of templates, and flexibly handle differentiation of the same type of operation and maintenance parameters by means of template inheritance.

Description

Specification Name of Invention: Server Monitoring Method and System

Technical field

[0001] The present invention relates to the technical field of computer room operation and maintenance, and in particular, to a server monitoring method and system.

Background technique

[0002] In the operation and maintenance of the equipment room, the monitoring of the server is very important. It usually needs to monitor a lot of data of the server, such as the resource usage of the hardware, the number of transactions of the software, the number of requests, etc., but as the system continues to expand, the server The types of servers are also increasing. The parameters that need to be monitored by different types of servers are also inconsistent. For example, a storage server focuses on the IOPS and storage space of the system, while an algorithm server focuses on the CPU usage. It is not sensitive to the use of the hard disk. In this operation, separate monitoring parameters should be set for different devices. If the number of devices is small, it is relatively easy to configure the monitoring parameters separately. However, as the number and types of devices increase, it is very troublesome to configure the monitoring parameters. Therefore, a monitoring method using a template is proposed here, and manual configuration and automatic judgment based on historical data are supported.

technical problem

[0003] The main purpose of the present invention is to provide a server monitoring method and system, which are convenient for setting and updating corresponding monitoring parameters for different hosts.

Problem solution

Technical solution

[0004] In order to achieve the above object, the present invention provides a server monitoring method, where the method includes:

[0005] The master node server selects a template parameter corresponding to the monitored host in the operation and maintenance template, and sends the template parameter to the slave node server corresponding to the monitored host;

[0006] The slave node server compares the data generated by the monitored host with the template parameter, and when the data generated by the monitored host meets the template parameter, the slave node server reports the location Describe the data to the primary node server;

[0007] The primary node server reports the data to the operation and maintenance platform.

Optionally, the method further includes: [0009] The master node server receives the data generated by the slave node server for generating the alarm parameter value multiple times in the predetermined interval, and determines whether the alarm parameter value is smaller than the abnormal boundary value, and if yes, the master node server The operation and maintenance platform is reported to perform an alarm.

[0010] Optionally, the method further includes:

[0011] The master node server automatically generates an alarm template, and sends the alarm template to the slave node server and reports to the operation and maintenance platform;

[0012] The operation and maintenance platform reports the alarm template to the user end.

[0013] Optionally, before the master node server selects a template parameter corresponding to the monitored host in the operation and maintenance template, the method further includes:

[0014] the operation and maintenance platform receives an operation and maintenance template configured by the user end;

[0015] The operation and maintenance platform saves the operation and maintenance template to the primary node server.

[0016] Optionally, the template parameters include CPU usage, memory usage, network input/output, hard disk input/output, remaining space of the hard disk, number of network connections, monitoring of important service ports, and dedicated software deployed. The use of various parameters of its own.

[0017] In addition, in order to achieve the above object, the present invention further provides a server monitoring system, where the system includes a primary node server, at least one monitored host, a slave node server corresponding to the monitoring host, and an operation and maintenance platform. among them,

[0018] the master node server is configured to select a template parameter corresponding to the monitored host in an operation and maintenance template, and send the template parameter to the slave node server corresponding to the monitored host; [0019] The slave node server is configured to compare the data generated by the monitored host with the template parameter, and report the data when the data generated by the monitored host meets the template parameter To the primary node server;

[0020] The primary node server is further configured to report the data to the operation and maintenance platform.

[0021] Optionally, the primary node server is further configured to: generate an alarm parameter value by receiving data reported from the node server multiple times in a predetermined interval, and determine whether the alarm parameter value is less than an abnormal boundary value, If yes, the operation and maintenance platform is reported to perform an alarm.

[0022] Optionally, the primary node server is further configured to automatically generate an alarm template, and send the template to the slave node server and report to the operation and maintenance platform; [0023] The operation and maintenance platform is configured to report the alarm template to the user end.

[0024] Optionally, the operation and maintenance platform is further configured to receive an operation and maintenance template configured by the client, and save the operation and maintenance template to the primary node server.

[0025] Optionally, the template parameters include central processor occupancy, memory usage, network input/output, hard disk input/output, remaining space of the hard disk, number of network connections, monitoring of important service ports, and dedicated software deployed. The use of various parameters of its own.

Advantageous effects of the invention

Beneficial effect

The server monitoring method and system provided by the present invention selects a template parameter corresponding to the monitored host in the operation and maintenance template by the primary node server, and sends the template parameter to the slave node server corresponding to the monitored host, the slave node. The server compares the data generated by the monitored host with the template parameters. When the data generated by the monitored host meets the template parameters, the data is reported from the node server to the primary node server, and the primary node server reports the data to the operation and maintenance platform. Therefore, the complexity of the operation and maintenance parameters acquisition of different types of servers in the operation and maintenance system is reduced, and the operation and maintenance of the system is unified and managed through the deployment mode of the master-slave node server, and the same server operation and maintenance is performed through the use of the template. Consistency processing, flexible processing of the same type of operation and maintenance parameters through template inheritance.

Brief description of the drawing

DRAWINGS

1 is a schematic flowchart of a server monitoring method according to a first embodiment of the present invention;

2 is a schematic diagram of an example of a server monitoring method according to a preferred embodiment of the present invention;

3 is a schematic flowchart of a server monitoring method according to a second embodiment of the present invention;

4 is a schematic flowchart of a server monitoring method according to a third embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a server monitoring system according to a fourth embodiment of the present invention.

[0032] The implementation, functional features, and advantages of the present invention will be further described with reference to the accompanying drawings.

Embodiments of the invention

[0033] Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, The same or similar reference numerals denote the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative, and are not to be construed as limiting.

1 is a schematic flowchart of a server monitoring method according to a preferred embodiment of the present invention. The method includes the following steps:

[0035] Step 110: The master node server selects a template parameter corresponding to the monitored host in the operation and maintenance template, and sends the template parameter to the slave node server corresponding to the monitored host.

[0036] Specifically, in the operation and maintenance platform, one primary node server is deployed independently, and multiple monitored hosts are used, and each monitored host acts as a monitored terminal, and the slave node server is independently deployed on each monitored host. Connect to the master node server to respond to the operation and maintenance parameter commands sent and received by the master node server, and report the operation and maintenance data and alarms.

[0037] The operation and maintenance template is configured on the primary node server, and the operation and maintenance template has a template parameter corresponding to each monitored host, so that the primary node server selects a template parameter corresponding to the monitored host, and sends the template parameter. To the slave node server that monitors the monitored host.

[0038] Further, the template parameters include at least: a central processing unit (CPU) occupancy rate, memory usage, network input/output (10), hard disk input/output (10), remaining space of the hard disk, and number of network connections. , important service port monitoring, and the use of various parameters of the deployed dedicated software.

[0039] Further, due to the diversity of the current server, different types of slave nodes need to monitor different parameters, for example: the streaming media forwarding server only needs to pay attention to CPU usage, memory usage, and network 10

The number of network connections and the service parameters of the service itself, and the storage server may pay attention to the differences, focusing on the network 10, the hard disk 10, the remaining space of the hard disk and its own business parameters.

[0040] Further, the monitored host of the trunk node and the monitored host of the edge node need different parameters to be monitored.

[0041] Further, the data corresponding to the monitored host is obtained from the node server.

[0042] Further, the slave node server and the master node server have a unified interface protocol to ensure communication consistency.

[0043] Step 120: The slave node server generates data according to the monitored host and the template parameter. Performing an alignment, when the data generated by the monitored host meets the template parameter, the slave node server reports the data to the master node server.

[0044] Specifically, the data of the host to be monitored is obtained from the node server, and the acquired data is compared with the template parameters. When the template parameters are met, the abnormal data is displayed, and the operation and maintenance data of the monitored host is reported. To the primary node server.

[0045] Exemplarily, as shown in FIG. 2, the template parameters of the equipment room include CPU usage, memory usage, network 10, hard disk 10, remaining space of the hard disk, number of network connections, status of important service ports, and so on. The corresponding alarm threshold is: al-a6.

[0046] The monitored hosts in the equipment room include an intelligent analysis server, a storage server, and a streaming media forwarding server.

The template parameters of the intelligent analysis server include the CPU usage and memory usage. The corresponding alarm threshold is bl-b2. The template parameters of the storage server include the CPU usage, the memory usage, the network 10, the hard disk I 0, the remaining space of the hard disk, and the network connection parameters. The corresponding alarm threshold is cl-c6. The template parameters of the streaming media forwarding server include CPU usage, memory usage, network 10, and number of network connections. The corresponding alarm threshold is dl-d4. The streaming media forwarding server also includes servers of edge nodes: a core network streaming media forwarding server, a backbone network streaming media forwarding server, and an edge streaming media forwarding server. The template parameters of the core network streaming media forwarding server include CPU usage, memory usage, network 10, and number of network connections. The corresponding alarm threshold is dl l-dl4. The template parameters of the trunk network streaming media forwarding server include the CP U occupancy rate, memory usage, network 10, and number of network connections. The corresponding alarm threshold is d21-d24. The template parameters of the edge streaming media forwarding server include CPU usage, memory usage, network 10, and network connection. The corresponding alarm threshold is d31-d34. In this example, the streaming media forwarding server is the server of the primary node compared to the server of the edge node.

[0047] If the template parameter dl of the streaming media forwarding server of the backbone node is set to 40%, when the CPU usage of the streaming media forwarding server of the backbone node acquired from the node server is greater than 40%, an alarm is generated and the data is reported. To the primary node server. The template parameter d31 of the edge streaming media forwarding server can be set to 30%. When the CPU usage of the edge streaming media forwarding server obtained from the node server is greater than 30%, an alarm is generated and the data is reported to the primary node server. That is to say, the edge node server is configured on the basis of the backbone node server, as long as the template parameters of the trunk node are inherited to the template parameters of the edge node, and the template parameters of other servers do not need to be changed, to the greatest extent. Drop Low workload.

[0048] Step 130: The primary node server reports the data to the operation and maintenance platform.

[0049] Specifically, the master node server reports the received data to the operation and maintenance platform, and the operation and maintenance platform reports the data to the client, so that the operator processes the data through the client.

[0050] In the server monitoring method of the embodiment, the template parameter corresponding to the monitored host is selected in the operation and maintenance template by the primary node server, and the template parameter is sent to the slave node server corresponding to the monitored host, and the slave node server is configured according to the node server. The data generated by the monitored host is compared with the template parameters. When the data generated by the monitored host meets the template parameters, the data is reported from the node server to the primary node server, and the primary node server reports the data to the operation and maintenance platform. Therefore, the complexity of the operation and maintenance parameters acquisition of different types of servers in the operation and maintenance system is reduced, and the operation and maintenance of the system is unified and managed through the deployment mode of the master-slave node server, and the same server operation and maintenance is performed through the use of the template. Consistency processing, flexible processing of the same type of operation and maintenance parameters through template inheritance.

[0051] Referring to FIG. 3, a second embodiment of the present invention further provides a server monitoring method, where the method includes:

[0052] Step 310: The master node server selects a template parameter corresponding to the monitored host in the operation and maintenance template, and sends the template parameter to the slave node server corresponding to the monitored host.

[0053] Step 320: The slave node server compares data generated by the monitored host with the template parameter, and when the data generated by the monitored host meets the template parameter, the slave node server The data is reported to the primary node server.

[0054] Step 330: The primary node server reports the data to the operation and maintenance platform.

[0055] The content of the above steps 310-330 is the same as the content of the step 110-13- of the first embodiment, and the same content is not described in detail in this embodiment.

[0056] Step 340, the master node server receives the data reported from the node server to generate an alarm parameter value multiple times in a predetermined interval, and determines whether the alarm parameter value is less than an abnormal boundary value, and if yes, proceeds to the step 350.

[0057] Specifically, the operation and maintenance parameter value X obtained from the node server can be abstracted as an approximate Gaussian distribution.

(If the operation and maintenance parameter curve is asymmetrical, you can use logX instead of X to process the curve so that the curve tends to be Gaussian.) You can use the generated historical operation and maintenance data to automatically generate a template to determine whether it is needed. To report the police.

[0058] If there is a period of time T, the operation and maintenance parameter xi generates t data {x(l), _X (2), ..., x(t)}, assuming that a total of j kinds of operation and maintenance parameters participate in the judgment calculation .

[0059] averaging uj during operation:

[0060] Take the standard deviation oj:

[0061] Establishing a Gaussian function for the alarm parameter value:

】. 3⁄4 3⁄4 TM : -.„_ " ^Λ '

[0062] determining an abnormal boundary value s according to n normal values of historical samples reported from the node server to the primary node server (excluding all outliers in the sampling interval),

[0063] = Teng / (.....

[0064] determining whether the alarm parameter value f(x) is less than the abnormal boundary value s, and if yes, proceeding to step 350.

[0065] Step 350: The master node server reports the operation and maintenance platform to perform an alarm.

[0066] Specifically, when f(x)<s吋, the master node server may determine that the operation and maintenance machine generates an abnormality, and reports the alarm to the primary node server.

[0067] Step 360: The master node server automatically generates an alarm template, and sends the alarm template to the slave node server and reports to the operation and maintenance platform.

[0068] Specifically, the alarm template is automatically generated according to the alarm data reported from the node server, and is sent to the slave node server and reported to the operation and maintenance platform.

[0069] Further, the master node server can directly deliver the data to the slave through the TCP protocol through the data bearer mode of the XML. The node server is effective immediately, ensuring the immediateness of operation and maintenance data and alarm acquisition.

[0070] Further, the generation of the automatic alarm template is periodic, and the calculation of the larger sampling data is performed every other period of time, thereby avoiding the pressure of multiple calculations.

[0071] Further, when the master node server sends the alarm template to the slave node server, the T is automatically cleared.

[0072] Further, the primary node server generally does not actively connect to the secondary node server, and only the operation and maintenance parameters need to be changed, and the primary node server actively sends signaling to the secondary node server.

[0073] Step 370: The operation and maintenance platform reports the alarm template to the user end.

[0074] In the server monitoring method of the embodiment, the primary node server generates the alarm parameter value by receiving the data reported from the node server multiple times in the predetermined interval, and the alarm value is less than the abnormal boundary value, the primary node server The operation and maintenance platform is reported to perform alarms. The master node server automatically generates an alarm template and automatically updates the template parameters to simplify the operation.

Referring to FIG. 4, a third embodiment of the present invention further provides a server monitoring method. In the fourth embodiment, the server monitoring method is a further improvement based on the first embodiment and the second embodiment, except that, before step 110 or step 310, the following steps are further included:

[0076] Step 410: The operation and maintenance platform receives the operation and maintenance template configured by the client.

[0077] Specifically, the operator configures the operation and maintenance template through the user end, and the user end uploads the operation and maintenance template to the operation and maintenance platform, so that the operation and maintenance platform receives the operation and maintenance template.

[0078] The template of the slave node server can be manually or automatically configured on the master node server, and the monitored host of the same type can use a monitoring template, and the template includes monitoring parameters required by the server of the type. Set different operation and maintenance templates for different types of monitored host hosts. The template specifies the operation and maintenance parameters of the monitored host, including hardware performance parameters and deployed software service parameters. The configuration of the template parameters determines whether the operation and maintenance data of the monitored host is reported, including the alarm threshold and whether the historical operation and maintenance data is stored.

[0079] The manual template configured on the primary node server can be inherited. When different servers need to be added, if there are only a few changes with the existing template, most of the parameters of the original template can be inherited, and only a small number of parameters need to be modified to generate a new one. Sub-templates, templates are inherited in a tree structure.

[0080] In addition to manual configuration, the operation and maintenance template parameters can also automatically generate new templates through historical operation and maintenance data. , to achieve the optimal configuration of operation and maintenance parameters.

[0081] Step 420: The operation and maintenance platform saves the operation and maintenance template to the primary node server.

Specifically, after configuring the operation and maintenance template on the user end, it is necessary to first determine whether to use the sub-template corresponding to the edge node server, and if so, inherit the main template parameter to configure the sub-template parameter, if not, directly The parameters of the operation and maintenance template are saved to the primary node server.

[0083] Further, when the operation and maintenance strategy parameters need to be modified and optimized, the operation and maintenance template is directly modified on the operation and maintenance platform.

After the command is sent to the master node server, the master node server can directly communicate with the slave node server, that is, modify the operation and maintenance template parameters and take effect.

[0084] In the server monitoring method of the embodiment, the operation and maintenance platform configures the operation and maintenance template and ensures the operation and maintenance template to the primary node server, and uses the operation and maintenance template to perform the consistency processing of the same type of server operation and maintenance. Inheritance, flexible handling of the same type of operation and maintenance parameters.

[0085] Referring to FIG. 5, a fourth embodiment of the present invention provides a server monitoring system, which includes: an operation and maintenance platform 510, a client 520 connected to the operation and maintenance platform 510, and a master node server 530, the master node server. 530 is in communication with at least one slave node server 540. Each slave node server 540 corresponds to a monitored host (not shown).

[0086] The master node server 530 is configured to select a template parameter corresponding to the monitored host in the operation and maintenance template, and send the template parameter to the slave node server 540 corresponding to the monitored host.

[0087] Specifically, in the operation and maintenance platform 510, one master node server 530 is independently deployed, and multiple monitored hosts are deployed.

Each monitored host acts as a monitored terminal, and the slave node server 540 is independently deployed on each monitored host to connect to the master node server 530 to respond to the operation and maintenance parameter commands sent and received by the master node server 530, and That is, the operation and maintenance data and alarms are reported.

[0088] The master node server 530 is configured with an operation and maintenance template, and the operation and maintenance template has a template parameter corresponding to each monitored host, so that the master node server 530 selects a template parameter corresponding to the monitored host, and the template is The parameters are sent to the slave node server 540 that monitors the monitored host.

[0089] Further, the template parameters include at least: a central processing unit (CPU) occupancy rate, memory usage, network input/output (10), hard disk input/output (10), remaining space of the hard disk, and number of network connections. , important service port monitoring, and the use of various parameters of the deployed dedicated software. [0090] Further, due to the diversity of the current server, different types of slave node servers 540 need to monitor different parameters, for example: the streaming media forwarding server only needs to pay attention to CPU usage, memory usage, network 10, network connection number, and service. The business parameters of the server itself may be different from the storage parameters, focusing on the network 10, the hard disk 10, the remaining space of the hard disk, and its own business parameters.

[0091] Further, the monitored host of the backbone node and the monitored host of the edge node need different parameters to be monitored.

Further, the data corresponding to the monitored host is acquired from the node server 540.

[0093] Further, the slave node server 540 and the master node server 530 have a unified interface protocol to ensure communication consistency.

[0094] The slave node server 540 is configured to compare the data generated by the monitored host with the template parameter, and when the data generated by the monitored host meets the template parameter, the slave node server 540 The data is reported to the primary node server 530.

[0095] Specifically, the data corresponding to the monitored host is obtained from the node server 540, and the acquired data is compared with the template parameters. When the template parameters are met, the abnormal data is displayed, and the operation and maintenance data of the monitored host is performed. Reported to the primary node server 530.

[0096] Exemplarily, as shown in FIG. 2, the template parameters of the equipment room include CPU usage, memory usage, network 10, hard disk 10, remaining space of the hard disk, number of network connections, status of important service ports, and so on. The corresponding alarm threshold is: al-a6.

[0097] The monitored hosts in the equipment room include an intelligent analysis server, a storage server, and a streaming media forwarding server.

The template parameters of the intelligent analysis server include the CPU usage and the memory usage. The corresponding alarm threshold is bl-b2. The template parameters of the storage server include the CPU usage, the memory usage, the network 10, the hard disk I 0, the remaining space of the hard disk, and the network connection parameters. The corresponding alarm threshold is cl-c6. The template parameters of the streaming media forwarding server include CPU usage, memory usage, network 10, and number of network connections. The corresponding alarm threshold is dl-d4. The streaming media forwarding server further includes servers of edge nodes: a core network streaming media forwarding server, a backbone network streaming media forwarding server, and an edge streaming media forwarding server. The template parameters of the core network streaming media forwarding server include the CPU usage, the memory usage, the network 10, and the number of network connections. The corresponding alarm threshold is dl l-dl4. The template parameters of the backbone network forwarding server include the CP U occupancy rate, memory usage, network 10, and number of network connections. The corresponding alarm threshold is d21-d24. Side The template parameters of the edge streaming media forwarding server include CPU usage, memory usage, network 10, and number of network connections. The corresponding alarm threshold is d31-d34. In this example, the streaming media forwarding server is the server of the primary node compared to the server of the edge node.

[0098] If the template parameter dl of the streaming media forwarding server of the backbone node is set to 40%, when the CPU usage of the streaming media forwarding server of the backbone node acquired by the node server 540 is greater than 40%, an alarm is generated and reported. The data is sent to the primary node server 530. The template parameter d31 of the edge streaming media forwarding server may be set to 30%. When the CPU usage of the edge streaming media forwarding server acquired from the node server 540 is greater than 30%, an alarm is generated and the data is reported to the primary node server. 530. That is to say, the edge node server is configured on the basis of the backbone node server, as long as the template parameters of the trunk node are inherited to the template parameters of the edge node, and the template parameters of other servers do not need to be changed, to the greatest extent. Reduced the workload.

[0099] The master node server 530 is further configured to report the data to the operation and maintenance platform 510.

[0100] Specifically, the master node server 530 reports the received data to the operation and maintenance platform 510, and the operation and maintenance platform 510 reports the data to the client 520, so that the operator processes the data through the client 520.

[0101] The master node server 530 is further configured to generate the alarm parameter value by receiving the data reported from the node server 540 multiple times in the predetermined interval, and determine whether the alarm parameter value is smaller than the abnormal boundary value, and if yes, report the value. The operation and maintenance platform 510 performs an alarm.

[0102] Specifically, the operation parameter value X obtained from the node server 540 can be abstracted as an approximate Gaussian distribution (if the operation parameter curve is asymmetric, logX can be used instead of X to process the curve as much as possible. Gaussian distribution), the template can be automatically generated by the historical operation and maintenance data generated to determine whether it is necessary to report the alarm.

[0103] If there is a period of time T, the operation and maintenance parameter xi generates t data {x(l), _X (2), ..., x(t)}, assuming that a total of j kinds of operation and maintenance parameters participate in the judgment calculation .

[0104] averaging uj during operation:

[0105] Take the standard deviation oj:

[0106] Establishing a Gaussian function for the alarm parameter value:

[0107] Based on the n normal values of the historical samples reported from the node server 540 to the primary node server 530 (excluding all outliers in the sampling interval), the abnormal boundary value s is determined,

[0108]

^ = off / hard

[0109] It is determined whether the alarm parameter value f(x) is smaller than the abnormal boundary value s, and if yes, the operation and maintenance platform 51 0 is reported to perform an alarm.

[0110] More specifically, when f(x)<s吋, the master node server 530 can determine that the operation and maintenance machine generates an abnormality and report it to the master node server 530 for an alarm.

[0111] The master node server 530 is further configured to automatically generate an alarm template, and send it to the slave node server 5 40 and report to the operation and maintenance platform 510.

[0112] Specifically, the alarm template is automatically generated according to the alarm data reported from the node server 540, and is sent to the slave node server 540 and reported to the operation and maintenance platform 510.

[0113] Further, the master node server 530 can be directly delivered to the slave node server 540 by using the data transfer mode of the XML to ensure the immediateness of the operation and maintenance data and the alarm acquisition.

[0114] Further, the generation of the automatic alarm template is periodic, and the calculation of the larger sampling data is performed every other period of time, thereby avoiding the pressure of multiple calculations.

[0115] Further, when the master node server 530 sends the alarm template to the slave node server 540,

[0116] Further, the master node server 530 does not actively connect to the slave node server 540. Only when the operation and maintenance parameters need to be changed, the master node server 530 actively sends signaling to the slave node server 540. [0117] The operation and maintenance platform 510 is configured to report the alarm template to the client 520.

[0118] The operation and maintenance platform 510 is further configured to receive an operation and maintenance template configured by the client 520, and save the operation and maintenance template to the primary node server 530.

[0119] Specifically, the operator configures the operation and maintenance template through the client 520, and the client 520 uploads the operation and maintenance template to the operation and maintenance platform 510, so that the operation and maintenance platform 510 receives the operation and maintenance template.

[0120] The master node server 530 can manually or automatically configure the template of the slave node server 540. The same type of monitored host can use a monitoring template, and the template includes monitoring parameters required by the type server. Set different operation and maintenance templates for different types of monitored host hosts. The template specifies the operation and maintenance parameters of the monitored host, including hardware performance parameters and deployed software service parameters. The setting of the template parameters specifies whether the operation and maintenance data of the monitored host is reported, including the alarm threshold and whether the historical operation and maintenance data is stored.

[0121] The manual template configured on the primary node server 530 can be inherited. When different servers need to be added, if there are only a few changes with the existing template, most of the parameters of the original template can be inherited, and only a small number of parameters need to be modified to generate a new one. The child template, the template inherits in a tree structure.

[0122] In addition to manual configuration, the template parameters of the operation and maintenance can also automatically generate new templates through historical operation and maintenance data to achieve optimal configuration of operation and maintenance parameters.

[0123] After the operation and maintenance template is configured on the user end 520, it is necessary to first determine whether to use the sub-template corresponding to the edge node server. If yes, the main template parameter is inherited to configure the sub-template parameter, and if not used, the operation and maintenance are directly performed. The parameters of the template are saved to the master node server 530.

[0124] Further, when the operation and maintenance policy parameters need to be modified, the operation and maintenance template is directly modified on the operation and maintenance platform 510, and after the instruction is sent to the primary node server 530, the primary node server 530 can directly interact with the secondary node server. 540 communication, that is, modify the operation and maintenance template parameters and take effect.

[0125] The server monitoring system of the embodiment selects a template parameter corresponding to the monitored host in the operation and maintenance template by the master node server 530, and sends the template parameter to the slave node server 540 corresponding to the monitored host, and the slave node. The server 540 compares the data generated by the monitored host with the template parameters. When the data generated by the monitored host meets the template parameters, the slave node server 540 reports the data to the master node server 530, and the master node server 530 reports the data to the operation and maintenance. Platform 510. Therefore, the complexity of obtaining operation and maintenance parameters of different types of servers in the operation and maintenance system is reduced, and the deployment mode of the master-slave node server is adopted. To manage the operation and maintenance of the system in a unified manner, use the template to perform the consistency processing of the same type of server operation and maintenance, and flexibly handle the differentiation of the same type of operation and maintenance parameters through template inheritance.

[0126] It should be noted that, the term "comprising" or any other variation thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or system that includes a series of elements includes not only those elements but also It also includes other elements that are not explicitly listed, or elements that are inherent to such a process, method, item, or system. An element defined by the statement "comprising a ..." without further restrictions does not exclude the existence of additional identical elements in the process, method, item, or system that includes the element.

[0127] The foregoing serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

[0128] Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former It is a better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

The above are only the preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and the equivalent structure or equivalent process transformations made by the description of the present invention and the contents of the drawings may be directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of the present invention.

Industrial applicability

[0130] The server monitoring method and system provided by the present invention selects a template parameter corresponding to the monitored host in the operation and maintenance template by the primary node server, and sends the template parameter to the slave node server corresponding to the monitored host, the slave node. The server compares the data generated by the monitored host with the template parameters. When the data generated by the monitored host meets the template parameters, the data is reported from the node server to the primary node server, and the primary node server reports the data to the operation and maintenance platform. Therefore, the complexity of the operation and maintenance parameters acquisition of different types of servers in the operation and maintenance system is reduced, and the operation and maintenance of the system is unified and managed through the deployment mode of the master-slave node server, and the same server operation and maintenance is performed through the use of the template. Consistency processing, flexible processing of the same type of operation and maintenance parameters through template inheritance.

Claims

Claim

A server monitoring method, the method comprising:

The master node server selects a template parameter corresponding to the monitored host in the operation and maintenance template, and sends the template parameter to the slave node server corresponding to the monitored host; the slave node server generates according to the monitored host. The data is compared with the template parameter, and when the data generated by the monitored host meets the template parameter, the slave node server reports the data to the master node server;

The master node server reports the data to the operation and maintenance platform.

The server monitoring method according to claim 1, wherein the method further comprises: the primary node server receiving the data reported from the node server for generating the alarm parameter value in the predetermined interval, and determining the alarm. If the parameter value is smaller than the abnormal boundary value, the master node server reports the operation and maintenance platform to perform an alarm.

The server monitoring method according to claim 2, wherein the method further comprises: the master node server automatically generating an alarm template, and transmitting the template to the slave node server and reporting to the operation and maintenance platform;

The operation and maintenance platform reports the alarm template to the user end.

The server monitoring method according to claim 1, wherein before the master node server selects a template parameter corresponding to the monitored host in the operation and maintenance template, the method further includes:

The operation and maintenance platform receives an operation and maintenance template configured by the user end;

The operation and maintenance platform saves the operation and maintenance template to the primary node server.

The server monitoring method according to any one of claims 1 to 4, wherein the template parameters include a central processor occupancy rate, memory usage, network input/output, hard disk input/output, remaining space of the hard disk, number of network connections, Important service port listening conditions, as well as the use of various parameters of the deployed dedicated software.

A server monitoring system, the system includes a master node server, at least one monitored host, a slave node server corresponding to the monitoring host, and an operation and maintenance platform, wherein the master node server is set to be in an operation and maintenance template Selecting to correspond to the monitored host a template parameter, and sending the template parameter to the slave node server corresponding to the monitored host;

The slave node server is configured to compare the data generated by the monitored host with the template parameter, and when the data generated by the monitored host meets the template parameter, report the data to the Primary node server;

The primary node server is further configured to report the data to the operation and maintenance platform.

[Claim 7] The server monitoring system according to claim 6, wherein the master node server is further configured to generate an alarm parameter value by receiving data reported from the node server multiple times in a predetermined interval, and determine Whether the alarm parameter value is smaller than the abnormal boundary value, and if yes, reporting the operation and maintenance platform to perform an alarm.

[Claim 8] The server monitoring system according to claim 7, wherein the master node server is further configured to automatically generate an alarm template, and send the template to the slave node server and report to the operation and maintenance platform. ;

The operation and maintenance platform is configured to report the alarm template to the user end.

[Claim 9] The server monitoring system according to claim 6, wherein the operation and maintenance platform is further configured to receive an operation and maintenance template configured by the user end, and save the operation and maintenance template to the primary node server .

The server monitoring system according to any one of claims 6 to 9, wherein the template parameters include central processor occupancy, memory usage, network input/output, hard disk input/output, and remaining space of the hard disk. , the number of network connections, the monitoring of important service ports, and the use of various parameters of the deployed dedicated software.