CN112162907A - Health degree evaluation method based on monitoring index data - Google Patents
Health degree evaluation method based on monitoring index data Download PDFInfo
- Publication number
- CN112162907A CN112162907A CN202011059652.4A CN202011059652A CN112162907A CN 112162907 A CN112162907 A CN 112162907A CN 202011059652 A CN202011059652 A CN 202011059652A CN 112162907 A CN112162907 A CN 112162907A
- Authority
- CN
- China
- Prior art keywords
- health
- health degree
- node
- network
- unavailable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3089—Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3438—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a health degree evaluation method based on monitoring index data, which comprises the following steps: s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels; s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule; s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree; s4) layering the system, setting hierarchical weight, and calculating the health degree of the system. The health degree evaluation method based on the monitoring index data can solve the problem of the existing operation and maintenance mechanism for passively receiving fault information and reduce the service acceptance fault rate caused by the performance problem of the terminal.
Description
Technical Field
The invention relates to a monitoring data evaluation method, in particular to a health degree evaluation method based on monitoring index data.
Background
The pressure of business systems is increasing due to the increasing market competition and the improvement of business support service capability brought by the increasing number of customers, so that the requirement on the reliability and stability of the operating IT basic resources is also increasing. The possibility that faults such as server performance reduction, slow network card or unavailable service occur in the operation process of the business application is greatly increased, and many basic businesses cannot be developed. In order to avoid the influence on the operation of key services caused by the unavailability of a service system, IT is required that an IT administrator can continuously monitor factors which may influence the availability of the service system through software and hardware equipment, notify relevant personnel at the first time of the occurrence of a fault, and judge the root cause of the fault, so that the fault can be solved in the shortest time, the downtime of the service system is reduced, the availability of the service system is improved, and the satisfaction degree of a user is finally improved.
The prior art has the following defects:
1. the dependence on humans is high: the business personnel in the city judge the performance problem of the business handling terminal manually;
2. passively receiving fault information: only receiving the primary reported fault information of the city by the customer service sub-unit, wherein the reported information is fuzzy and missing;
3. lack of performance index data: the fault information has no specific performance index, and the fault reason cannot be quickly positioned.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a health degree evaluation method based on monitoring index data, which can solve the existing operation and maintenance mechanism for passively receiving fault information and reduce the service acceptance fault rate caused by the performance problem of a terminal.
The technical scheme adopted by the invention for solving the technical problems is to provide a health degree evaluation method based on monitoring index data, which comprises the following steps: s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels; s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule; s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree; s4) layering the system, setting hierarchical weight, and calculating the health degree of the system.
In the health degree evaluation method based on the monitoring index data, the key performance index in step S1 includes the usage rates of the host CPU, the memory, the disk IO, and the network IO.
In the health degree evaluation method based on the monitoring index data, in step S2, the operation state of the service system is divided into two states, i.e., available state and unavailable state, and if the service system is unavailable, the health degrees of all nodes associated with the service system are 0.
In the health degree evaluation method based on the monitoring index data, the operation and maintenance states of the network device, the middleware, the database and the host associated with the service system are divided into an available state and an unavailable state, and if the operation and maintenance states are unavailable, the health degree of the node is 0.
In the above health degree evaluation method based on monitoring index data, in step S2, the bypass monitoring server is used to obtain the Web request, the network transmission information, and the server response information, and determine whether the operation and maintenance states of the network device, the middleware, the database, and the host associated with the service system are available.
In the health degree evaluation method based on the monitoring index data, if the operation and maintenance state of the host is unavailable and the host is one host in the cluster, the cluster health degree is used as the node health degree, and a certain score is deducted until the node health degree is 0 every time one host in the cluster is unavailable.
In the health evaluation method based on monitoring index data, in step S3, the availability of each node in the network layer is first determined, and if there is an unavailable node in the network layer, the unavailable node degree weight is proportionally and equally divided into the remaining node weights.
In the health degree evaluation method based on the monitoring index data, the system health degree in step S4 is calculated as follows:
the system health degree is (network layer health degree + network layer weight + storage health degree + host health degree + database health degree + middleware health degree)/node weight sum.
Compared with the prior art, the invention has the following beneficial effects: the health degree evaluation method based on the monitoring index data can actively collect and analyze real experience data of a user, subjectively judge the performance of a service system, formulate a performance index, calculate the health degree of the terminal through a specific formula, actively analyze and process the terminal with low health degree, quickly position hardware performance problems, network problems or application system bottlenecks, reduce terminal performance faults and improve user perception; therefore, the problem of the existing operation and maintenance mechanism for passively receiving the fault information is solved, and the service acceptance fault rate caused by the performance problem of the terminal is reduced.
Drawings
FIG. 1 is a schematic view of a health assessment process based on monitoring index data according to the present invention;
fig. 2 is a schematic diagram of the present invention used in the on-line CRM service acceptance system of the telecom operator.
Detailed Description
The invention is further described below with reference to the figures and examples.
Fig. 1 is a schematic view of a health assessment process based on monitoring index data according to the present invention.
Referring to fig. 1, the health assessment method based on monitoring index data provided by the present invention includes the following steps: 1. configuring key indexes and available performance indexes; 2. configuring the weight; 3. calculating the health degree; 4. and (5) displaying the health degree.
1. Key performance index
Firstly, the network, the middleware, the database, the server and the like are taken as configuration items to be managed in a unified way, and a relation model among the configuration items is established according to actual needs. And then abstracting key performance indexes, establishing a relation between the key performance indexes and setting the importance level weight of the key points.
2. Availability index
The operation state of the service system is divided into an available state and an unavailable state, the health degree is established on the basis that the service system is available, and if the service system is unavailable, the node health degree is invalid and is 0.
Meanwhile, the operation and maintenance states of the network equipment, the middleware, the database and the host associated with the service system are divided into an available state and an unavailable state, and if the network equipment, the middleware, the database and the host are unavailable, the health degree of the node is invalid, namely 0.
3. Node health calculation
Key performance indicators (e.g., host CPU, memory, disk IO, network IO) and weights (e.g., CPU utilization weight 25, memory 25, disk 25, network IO 25) are set.
The performance index deduction rule is set and can be set synchronously with the alarm rule, for example, 20 points of alarm, 50 points of serious deduction and 100 points of fatal deduction.
Performance index is (index 1 deduct index 1 weight + index N deduct index N weight)/weight sum.
For example, the CPU utilization rate is seriously alarmed, and the performance index of the machine is deducted as follows:
the performance index deduction is (50 × 25+0+0+0)/(25+25 +25) ═ 12.5 points;
health score 100-12.5 score 87.5;
and setting an availability index deduction rule, and directly deducting 100 points by using the whole equipment.
For example, the host is unavailable and crashes;
the health degree is 100-.
The clusters can be adjusted and set, the clusters are high in availability, one cluster is taken as a drop, and the number of the drops is 50;
the health degree of the high-availability cluster is 100-50 points;
and (3) health degree algorithm: the lowest 0 minute is reserved until the buckling is finished;
the health degree is 100-deduction of availability index-deduction of performance index.
4. Hierarchical health algorithm
The network level health degree algorithm is multiplied by the weight proportion of the available equipment on the basis of the original network level health degree, namely the health degree is reduced rapidly when the number of the unavailable equipment is large. The level health degree is calculated according to the weight of the network level and the network element level to which the node (network equipment, middleware, database, etc.) belongs and the node health degree. The network element is the smallest unit which can be monitored and managed in network management, is composed of one or more machine disks or machine frames, and can independently complete a certain transmission function.
Firstly, judging whether a node is available or not according to a node availability index, if four nodes in a network layer have a node unavailable and an unavailable node degree weight, proportionally dividing the node unavailable weight to the rest node weights, wherein the rest node weights account for the total weight and are heavier, and proportionally dividing the node unavailable weight to the rest node weights continuously according to the unavailable one of the nodes, and analogizing the node unavailable weight by the following steps: (ii) tier availability ═ 1- [ total weight/n + total weight/(n-1) + total weight/(n-2) + total weight + … ]/total weight; then, the hierarchical health degree is calculated by a calculation formula, namely:
network layer health ═ node 1 weight + node N health × (node N weight)/(node weight sum).
5. System health degree algorithm
The system is computed hierarchically (network layer, storage, host, middleware, database) and set hierarchical weights, e.g., network layer weight 100, storage 60, host 80, middleware 60, database 70, then
System health ═ network layer health + storage N weight + host health + host weight + database health + database weight + middleware health + middleware weight)/(node weight sum).
The invention is used in a CRM service acceptance system on line of a telecom operator as shown in figure 2, real-time network bypass monitoring is carried out on terminal flow data, real experience data of a user can be actively collected and analyzed, the performance of a service system is subjectively judged, performance indexes are formulated, the health degree of the terminal is calculated through a specific formula, the terminal with low health degree is actively analyzed and processed, hardware performance problems, network problems or application system bottlenecks, such as page loading time alarm, server response time alarm, network corresponding time alarm and the like, are quickly positioned, the performance faults of the terminal are reduced, and the user perception is improved; therefore, the problem of the existing operation and maintenance mechanism for passively receiving the fault information is solved, and the service acceptance fault rate caused by the performance problem of the terminal is reduced.
Although the present invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A health degree assessment method based on monitoring index data is characterized by comprising the following steps:
s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels;
s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule;
s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree;
s4) layering the system, setting hierarchical weight, and calculating the health degree of the system.
2. The method for assessing the health of a subject based on the monitored target data as claimed in claim 1, wherein the key performance indicators in step S1 include usage rates of CPU, memory, disk IO and network IO of the host.
3. The health assessment method according to claim 1, wherein the step S2 is to divide the operation status of the service system into two statuses of available and unavailable, and if the service system is unavailable, the health of all nodes associated with the service system is 0.
4. The health assessment method according to claim 3, wherein the operation and maintenance status of the network device, middleware, database, and host associated with the service system is divided into available and unavailable states, and if the operation and maintenance status is unavailable, the node health is 0.
5. The health assessment method according to claim 4, wherein the step S2 employs a bypass monitoring server to obtain the Web request, the network transmission information and the server response information, and determine whether the operation and maintenance status of the network device, the middleware, the database and the host associated with the service system is available.
6. The method of claim 4, wherein if the operation and maintenance status of the host is unavailable and the host is a host in the cluster, the health of the cluster is used as the health of the node, and a certain score is deducted for each occurrence of unavailability of a host in the cluster until the health of the node is 0.
7. The health assessment method according to claim 1, wherein the step S3 first determines the availability of nodes in the network layer, and if there is an unavailable node in the network layer, the unavailable node degree weight is proportionally divided into the remaining node weights.
8. The method for assessing the health of a subject based on the monitored index data as claimed in claim 1, wherein the system health in step S4 is calculated as follows:
the system health degree is (network layer health degree + network layer weight + storage health degree + host health degree + database health degree + middleware health degree)/node weight sum.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011059652.4A CN112162907A (en) | 2020-09-30 | 2020-09-30 | Health degree evaluation method based on monitoring index data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011059652.4A CN112162907A (en) | 2020-09-30 | 2020-09-30 | Health degree evaluation method based on monitoring index data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112162907A true CN112162907A (en) | 2021-01-01 |
Family
ID=73861642
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011059652.4A Pending CN112162907A (en) | 2020-09-30 | 2020-09-30 | Health degree evaluation method based on monitoring index data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112162907A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113159638A (en) * | 2021-05-17 | 2021-07-23 | 国网山东省电力公司电力科学研究院 | Intelligent substation layered health degree index evaluation method and device |
CN113312234A (en) * | 2021-05-18 | 2021-08-27 | 福建天泉教育科技有限公司 | Health detection optimization method and terminal |
CN113487316A (en) * | 2021-07-22 | 2021-10-08 | 银清科技有限公司 | Distributed payment system security processing method and device |
CN114095345A (en) * | 2021-10-22 | 2022-02-25 | 深信服科技股份有限公司 | Method, device, equipment and storage medium for evaluating health condition of host network |
CN114116431A (en) * | 2022-01-25 | 2022-03-01 | 深圳市明源云科技有限公司 | System operation health detection method and device, electronic equipment and readable storage medium |
CN114338424A (en) * | 2021-12-29 | 2022-04-12 | 中国电信股份有限公司 | Evaluation method and evaluation device for operation health degree of Internet of things |
CN115174353A (en) * | 2022-07-14 | 2022-10-11 | 中国工商银行股份有限公司 | Fault root cause determination method, device, equipment and medium |
CN115190039A (en) * | 2022-07-31 | 2022-10-14 | 苏州浪潮智能科技有限公司 | Equipment health evaluation method, system, equipment and storage medium |
CN115473834A (en) * | 2022-09-14 | 2022-12-13 | 中国电信股份有限公司 | Monitoring task scheduling method and system |
CN116127149A (en) * | 2023-04-14 | 2023-05-16 | 杭州悦数科技有限公司 | Quantification method and system for health degree of graph database cluster |
-
2020
- 2020-09-30 CN CN202011059652.4A patent/CN112162907A/en active Pending
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022242181A1 (en) * | 2021-05-17 | 2022-11-24 | 国网山东省电力公司电力科学研究院 | Method and apparatus for evaluating health degree indexes of layers of smart substation |
CN113159638A (en) * | 2021-05-17 | 2021-07-23 | 国网山东省电力公司电力科学研究院 | Intelligent substation layered health degree index evaluation method and device |
US11954210B2 (en) | 2021-05-17 | 2024-04-09 | State Grid Shandong Electric Power Research Institute | Hierarchical health index evaluation method and apparatus for intelligent substation |
CN113312234A (en) * | 2021-05-18 | 2021-08-27 | 福建天泉教育科技有限公司 | Health detection optimization method and terminal |
CN113487316A (en) * | 2021-07-22 | 2021-10-08 | 银清科技有限公司 | Distributed payment system security processing method and device |
CN113487316B (en) * | 2021-07-22 | 2024-05-03 | 银清科技有限公司 | Distributed payment system security processing method and device |
CN114095345A (en) * | 2021-10-22 | 2022-02-25 | 深信服科技股份有限公司 | Method, device, equipment and storage medium for evaluating health condition of host network |
CN114338424A (en) * | 2021-12-29 | 2022-04-12 | 中国电信股份有限公司 | Evaluation method and evaluation device for operation health degree of Internet of things |
CN114116431A (en) * | 2022-01-25 | 2022-03-01 | 深圳市明源云科技有限公司 | System operation health detection method and device, electronic equipment and readable storage medium |
CN115174353B (en) * | 2022-07-14 | 2024-04-16 | 中国工商银行股份有限公司 | Fault root cause determining method, device, equipment and medium |
CN115174353A (en) * | 2022-07-14 | 2022-10-11 | 中国工商银行股份有限公司 | Fault root cause determination method, device, equipment and medium |
CN115190039B (en) * | 2022-07-31 | 2023-08-08 | 苏州浪潮智能科技有限公司 | Equipment health evaluation method, system, equipment and storage medium |
CN115190039A (en) * | 2022-07-31 | 2022-10-14 | 苏州浪潮智能科技有限公司 | Equipment health evaluation method, system, equipment and storage medium |
CN115473834A (en) * | 2022-09-14 | 2022-12-13 | 中国电信股份有限公司 | Monitoring task scheduling method and system |
CN115473834B (en) * | 2022-09-14 | 2024-04-02 | 中国电信股份有限公司 | Monitoring task scheduling method and system |
CN116127149A (en) * | 2023-04-14 | 2023-05-16 | 杭州悦数科技有限公司 | Quantification method and system for health degree of graph database cluster |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112162907A (en) | Health degree evaluation method based on monitoring index data | |
US7953691B2 (en) | Performance evaluating apparatus, performance evaluating method, and program | |
CN109039833B (en) | Method and device for monitoring bandwidth state | |
US7783605B2 (en) | Calculating cluster availability | |
CN101632093A (en) | Be used to use statistical analysis to come the system and method for management of performance fault | |
US8250400B2 (en) | Method and apparatus for monitoring data-processing system | |
CN110287081A (en) | A kind of service monitoring system and method | |
JPH08307524A (en) | Method and equipment for discriminating risk in abnormal conditions of constitutional element of communication network | |
CN109670690A (en) | Data information center monitoring and early warning method, system and equipment | |
CN101997709A (en) | Root alarm data analysis method and system | |
US10402298B2 (en) | System and method for comprehensive performance and availability tracking using passive monitoring and intelligent synthetic transaction generation in a transaction processing system | |
CN109992473A (en) | Monitoring method, device, equipment and the storage medium of application system | |
CN108769170A (en) | A kind of cluster network fault self-checking system and method | |
CN108632086A (en) | A kind of concurrent job operation troubles localization method | |
JP2007249373A (en) | Monitoring system of distributed program | |
KR20190002280A (en) | Apparatus and method for managing trouble using big data of 5G distributed cloud system | |
CN117632897A (en) | Dynamic capacity expansion and contraction method and device | |
CN115150253B (en) | Fault root cause determining method and device and electronic equipment | |
CN116204393A (en) | Wind control management method and device of business system | |
US11210159B2 (en) | Failure detection and correction in a distributed computing system | |
CN113541982A (en) | Network element health early warning method and device, computing equipment and computer storage medium | |
TWI712880B (en) | Information service availability management method and system | |
CN111694705A (en) | Monitoring method, device, equipment and computer readable storage medium | |
CN118282838A (en) | Quality evaluation method and device for service system | |
CN117076270A (en) | System stability evaluation method, device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |