CN112162907A

CN112162907A - Health degree evaluation method based on monitoring index data

Info

Publication number: CN112162907A
Application number: CN202011059652.4A
Authority: CN
Inventors: 程永新; 林小勇; 麦锦花
Original assignee: Shanghai New Torch Network Information Technology Ltd By Share Ltd
Current assignee: Shanghai New Torch Network Information Technology Ltd By Share Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-01

Abstract

The invention discloses a health degree evaluation method based on monitoring index data, which comprises the following steps: s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels; s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule; s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree; s4) layering the system, setting hierarchical weight, and calculating the health degree of the system. The health degree evaluation method based on the monitoring index data can solve the problem of the existing operation and maintenance mechanism for passively receiving fault information and reduce the service acceptance fault rate caused by the performance problem of the terminal.

Description

Health degree evaluation method based on monitoring index data

Technical Field

The invention relates to a monitoring data evaluation method, in particular to a health degree evaluation method based on monitoring index data.

Background

The pressure of business systems is increasing due to the increasing market competition and the improvement of business support service capability brought by the increasing number of customers, so that the requirement on the reliability and stability of the operating IT basic resources is also increasing. The possibility that faults such as server performance reduction, slow network card or unavailable service occur in the operation process of the business application is greatly increased, and many basic businesses cannot be developed. In order to avoid the influence on the operation of key services caused by the unavailability of a service system, IT is required that an IT administrator can continuously monitor factors which may influence the availability of the service system through software and hardware equipment, notify relevant personnel at the first time of the occurrence of a fault, and judge the root cause of the fault, so that the fault can be solved in the shortest time, the downtime of the service system is reduced, the availability of the service system is improved, and the satisfaction degree of a user is finally improved.

The prior art has the following defects:

1. the dependence on humans is high: the business personnel in the city judge the performance problem of the business handling terminal manually;

2. passively receiving fault information: only receiving the primary reported fault information of the city by the customer service sub-unit, wherein the reported information is fuzzy and missing;

3. lack of performance index data: the fault information has no specific performance index, and the fault reason cannot be quickly positioned.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a health degree evaluation method based on monitoring index data, which can solve the existing operation and maintenance mechanism for passively receiving fault information and reduce the service acceptance fault rate caused by the performance problem of a terminal.

The technical scheme adopted by the invention for solving the technical problems is to provide a health degree evaluation method based on monitoring index data, which comprises the following steps: s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels; s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule; s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree; s4) layering the system, setting hierarchical weight, and calculating the health degree of the system.

In the health degree evaluation method based on the monitoring index data, the key performance index in step S1 includes the usage rates of the host CPU, the memory, the disk IO, and the network IO.

In the health degree evaluation method based on the monitoring index data, in step S2, the operation state of the service system is divided into two states, i.e., available state and unavailable state, and if the service system is unavailable, the health degrees of all nodes associated with the service system are 0.

In the health degree evaluation method based on the monitoring index data, the operation and maintenance states of the network device, the middleware, the database and the host associated with the service system are divided into an available state and an unavailable state, and if the operation and maintenance states are unavailable, the health degree of the node is 0.

In the above health degree evaluation method based on monitoring index data, in step S2, the bypass monitoring server is used to obtain the Web request, the network transmission information, and the server response information, and determine whether the operation and maintenance states of the network device, the middleware, the database, and the host associated with the service system are available.

In the health degree evaluation method based on the monitoring index data, if the operation and maintenance state of the host is unavailable and the host is one host in the cluster, the cluster health degree is used as the node health degree, and a certain score is deducted until the node health degree is 0 every time one host in the cluster is unavailable.

In the health evaluation method based on monitoring index data, in step S3, the availability of each node in the network layer is first determined, and if there is an unavailable node in the network layer, the unavailable node degree weight is proportionally and equally divided into the remaining node weights.

In the health degree evaluation method based on the monitoring index data, the system health degree in step S4 is calculated as follows:

the system health degree is (network layer health degree + network layer weight + storage health degree + host health degree + database health degree + middleware health degree)/node weight sum.

Compared with the prior art, the invention has the following beneficial effects: the health degree evaluation method based on the monitoring index data can actively collect and analyze real experience data of a user, subjectively judge the performance of a service system, formulate a performance index, calculate the health degree of the terminal through a specific formula, actively analyze and process the terminal with low health degree, quickly position hardware performance problems, network problems or application system bottlenecks, reduce terminal performance faults and improve user perception; therefore, the problem of the existing operation and maintenance mechanism for passively receiving the fault information is solved, and the service acceptance fault rate caused by the performance problem of the terminal is reduced.

Drawings

FIG. 1 is a schematic view of a health assessment process based on monitoring index data according to the present invention;

fig. 2 is a schematic diagram of the present invention used in the on-line CRM service acceptance system of the telecom operator.

Detailed Description

The invention is further described below with reference to the figures and examples.

Fig. 1 is a schematic view of a health assessment process based on monitoring index data according to the present invention.

Referring to fig. 1, the health assessment method based on monitoring index data provided by the present invention includes the following steps: 1. configuring key indexes and available performance indexes; 2. configuring the weight; 3. calculating the health degree; 4. and (5) displaying the health degree.

1. Key performance index

Firstly, the network, the middleware, the database, the server and the like are taken as configuration items to be managed in a unified way, and a relation model among the configuration items is established according to actual needs. And then abstracting key performance indexes, establishing a relation between the key performance indexes and setting the importance level weight of the key points.

2. Availability index

The operation state of the service system is divided into an available state and an unavailable state, the health degree is established on the basis that the service system is available, and if the service system is unavailable, the node health degree is invalid and is 0.

Meanwhile, the operation and maintenance states of the network equipment, the middleware, the database and the host associated with the service system are divided into an available state and an unavailable state, and if the network equipment, the middleware, the database and the host are unavailable, the health degree of the node is invalid, namely 0.

3. Node health calculation

Key performance indicators (e.g., host CPU, memory, disk IO, network IO) and weights (e.g., CPU utilization weight 25, memory 25, disk 25, network IO 25) are set.

The performance index deduction rule is set and can be set synchronously with the alarm rule, for example, 20 points of alarm, 50 points of serious deduction and 100 points of fatal deduction.

Performance index is (index 1 deduct index 1 weight + index N deduct index N weight)/weight sum.

For example, the CPU utilization rate is seriously alarmed, and the performance index of the machine is deducted as follows:

the performance index deduction is (50 × 25+0+0+0)/(25+25 +25) ═ 12.5 points;

health score 100-12.5 score 87.5;

and setting an availability index deduction rule, and directly deducting 100 points by using the whole equipment.

For example, the host is unavailable and crashes;

the health degree is 100-.

The clusters can be adjusted and set, the clusters are high in availability, one cluster is taken as a drop, and the number of the drops is 50;

the health degree of the high-availability cluster is 100-50 points;

and (3) health degree algorithm: the lowest 0 minute is reserved until the buckling is finished;

the health degree is 100-deduction of availability index-deduction of performance index.

4. Hierarchical health algorithm

The network level health degree algorithm is multiplied by the weight proportion of the available equipment on the basis of the original network level health degree, namely the health degree is reduced rapidly when the number of the unavailable equipment is large. The level health degree is calculated according to the weight of the network level and the network element level to which the node (network equipment, middleware, database, etc.) belongs and the node health degree. The network element is the smallest unit which can be monitored and managed in network management, is composed of one or more machine disks or machine frames, and can independently complete a certain transmission function.

Firstly, judging whether a node is available or not according to a node availability index, if four nodes in a network layer have a node unavailable and an unavailable node degree weight, proportionally dividing the node unavailable weight to the rest node weights, wherein the rest node weights account for the total weight and are heavier, and proportionally dividing the node unavailable weight to the rest node weights continuously according to the unavailable one of the nodes, and analogizing the node unavailable weight by the following steps: (ii) tier availability ═ 1- [ total weight/n + total weight/(n-1) + total weight/(n-2) + total weight + … ]/total weight; then, the hierarchical health degree is calculated by a calculation formula, namely:

network layer health ═ node 1 weight + node N health × (node N weight)/(node weight sum).

5. System health degree algorithm

The system is computed hierarchically (network layer, storage, host, middleware, database) and set hierarchical weights, e.g., network layer weight 100, storage 60, host 80, middleware 60, database 70, then

System health ═ network layer health + storage N weight + host health + host weight + database health + database weight + middleware health + middleware weight)/(node weight sum).

The invention is used in a CRM service acceptance system on line of a telecom operator as shown in figure 2, real-time network bypass monitoring is carried out on terminal flow data, real experience data of a user can be actively collected and analyzed, the performance of a service system is subjectively judged, performance indexes are formulated, the health degree of the terminal is calculated through a specific formula, the terminal with low health degree is actively analyzed and processed, hardware performance problems, network problems or application system bottlenecks, such as page loading time alarm, server response time alarm, network corresponding time alarm and the like, are quickly positioned, the performance faults of the terminal are reduced, and the user perception is improved; therefore, the problem of the existing operation and maintenance mechanism for passively receiving the fault information is solved, and the service acceptance fault rate caused by the performance problem of the terminal is reduced.

Although the present invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A health degree assessment method based on monitoring index data is characterized by comprising the following steps:

s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels;

s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule;

s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree;

s4) layering the system, setting hierarchical weight, and calculating the health degree of the system.

2. The method for assessing the health of a subject based on the monitored target data as claimed in claim 1, wherein the key performance indicators in step S1 include usage rates of CPU, memory, disk IO and network IO of the host.

3. The health assessment method according to claim 1, wherein the step S2 is to divide the operation status of the service system into two statuses of available and unavailable, and if the service system is unavailable, the health of all nodes associated with the service system is 0.

4. The health assessment method according to claim 3, wherein the operation and maintenance status of the network device, middleware, database, and host associated with the service system is divided into available and unavailable states, and if the operation and maintenance status is unavailable, the node health is 0.

5. The health assessment method according to claim 4, wherein the step S2 employs a bypass monitoring server to obtain the Web request, the network transmission information and the server response information, and determine whether the operation and maintenance status of the network device, the middleware, the database and the host associated with the service system is available.

6. The method of claim 4, wherein if the operation and maintenance status of the host is unavailable and the host is a host in the cluster, the health of the cluster is used as the health of the node, and a certain score is deducted for each occurrence of unavailability of a host in the cluster until the health of the node is 0.

7. The health assessment method according to claim 1, wherein the step S3 first determines the availability of nodes in the network layer, and if there is an unavailable node in the network layer, the unavailable node degree weight is proportionally divided into the remaining node weights.

8. The method for assessing the health of a subject based on the monitored index data as claimed in claim 1, wherein the system health in step S4 is calculated as follows: