CN112162907A - Health degree evaluation method based on monitoring index data - Google Patents

Health degree evaluation method based on monitoring index data Download PDF

Info

Publication number
CN112162907A
CN112162907A CN202011059652.4A CN202011059652A CN112162907A CN 112162907 A CN112162907 A CN 112162907A CN 202011059652 A CN202011059652 A CN 202011059652A CN 112162907 A CN112162907 A CN 112162907A
Authority
CN
China
Prior art keywords
health
health degree
node
network
unavailable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011059652.4A
Other languages
Chinese (zh)
Inventor
程永新
林小勇
麦锦花
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai New Torch Network Information Technology Ltd By Share Ltd
Original Assignee
Shanghai New Torch Network Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai New Torch Network Information Technology Ltd By Share Ltd filed Critical Shanghai New Torch Network Information Technology Ltd By Share Ltd
Priority to CN202011059652.4A priority Critical patent/CN112162907A/en
Publication of CN112162907A publication Critical patent/CN112162907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a health degree evaluation method based on monitoring index data, which comprises the following steps: s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels; s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule; s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree; s4) layering the system, setting hierarchical weight, and calculating the health degree of the system. The health degree evaluation method based on the monitoring index data can solve the problem of the existing operation and maintenance mechanism for passively receiving fault information and reduce the service acceptance fault rate caused by the performance problem of the terminal.

Description

Health degree evaluation method based on monitoring index data
Technical Field
The invention relates to a monitoring data evaluation method, in particular to a health degree evaluation method based on monitoring index data.
Background
The pressure of business systems is increasing due to the increasing market competition and the improvement of business support service capability brought by the increasing number of customers, so that the requirement on the reliability and stability of the operating IT basic resources is also increasing. The possibility that faults such as server performance reduction, slow network card or unavailable service occur in the operation process of the business application is greatly increased, and many basic businesses cannot be developed. In order to avoid the influence on the operation of key services caused by the unavailability of a service system, IT is required that an IT administrator can continuously monitor factors which may influence the availability of the service system through software and hardware equipment, notify relevant personnel at the first time of the occurrence of a fault, and judge the root cause of the fault, so that the fault can be solved in the shortest time, the downtime of the service system is reduced, the availability of the service system is improved, and the satisfaction degree of a user is finally improved.
The prior art has the following defects:
1. the dependence on humans is high: the business personnel in the city judge the performance problem of the business handling terminal manually;
2. passively receiving fault information: only receiving the primary reported fault information of the city by the customer service sub-unit, wherein the reported information is fuzzy and missing;
3. lack of performance index data: the fault information has no specific performance index, and the fault reason cannot be quickly positioned.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a health degree evaluation method based on monitoring index data, which can solve the existing operation and maintenance mechanism for passively receiving fault information and reduce the service acceptance fault rate caused by the performance problem of a terminal.
The technical scheme adopted by the invention for solving the technical problems is to provide a health degree evaluation method based on monitoring index data, which comprises the following steps: s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels; s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule; s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree; s4) layering the system, setting hierarchical weight, and calculating the health degree of the system.
In the health degree evaluation method based on the monitoring index data, the key performance index in step S1 includes the usage rates of the host CPU, the memory, the disk IO, and the network IO.
In the health degree evaluation method based on the monitoring index data, in step S2, the operation state of the service system is divided into two states, i.e., available state and unavailable state, and if the service system is unavailable, the health degrees of all nodes associated with the service system are 0.
In the health degree evaluation method based on the monitoring index data, the operation and maintenance states of the network device, the middleware, the database and the host associated with the service system are divided into an available state and an unavailable state, and if the operation and maintenance states are unavailable, the health degree of the node is 0.
In the above health degree evaluation method based on monitoring index data, in step S2, the bypass monitoring server is used to obtain the Web request, the network transmission information, and the server response information, and determine whether the operation and maintenance states of the network device, the middleware, the database, and the host associated with the service system are available.
In the health degree evaluation method based on the monitoring index data, if the operation and maintenance state of the host is unavailable and the host is one host in the cluster, the cluster health degree is used as the node health degree, and a certain score is deducted until the node health degree is 0 every time one host in the cluster is unavailable.
In the health evaluation method based on monitoring index data, in step S3, the availability of each node in the network layer is first determined, and if there is an unavailable node in the network layer, the unavailable node degree weight is proportionally and equally divided into the remaining node weights.
In the health degree evaluation method based on the monitoring index data, the system health degree in step S4 is calculated as follows:
the system health degree is (network layer health degree + network layer weight + storage health degree + host health degree + database health degree + middleware health degree)/node weight sum.
Compared with the prior art, the invention has the following beneficial effects: the health degree evaluation method based on the monitoring index data can actively collect and analyze real experience data of a user, subjectively judge the performance of a service system, formulate a performance index, calculate the health degree of the terminal through a specific formula, actively analyze and process the terminal with low health degree, quickly position hardware performance problems, network problems or application system bottlenecks, reduce terminal performance faults and improve user perception; therefore, the problem of the existing operation and maintenance mechanism for passively receiving the fault information is solved, and the service acceptance fault rate caused by the performance problem of the terminal is reduced.
Drawings
FIG. 1 is a schematic view of a health assessment process based on monitoring index data according to the present invention;
fig. 2 is a schematic diagram of the present invention used in the on-line CRM service acceptance system of the telecom operator.
Detailed Description
The invention is further described below with reference to the figures and examples.
Fig. 1 is a schematic view of a health assessment process based on monitoring index data according to the present invention.
Referring to fig. 1, the health assessment method based on monitoring index data provided by the present invention includes the following steps: 1. configuring key indexes and available performance indexes; 2. configuring the weight; 3. calculating the health degree; 4. and (5) displaying the health degree.
1. Key performance index
Firstly, the network, the middleware, the database, the server and the like are taken as configuration items to be managed in a unified way, and a relation model among the configuration items is established according to actual needs. And then abstracting key performance indexes, establishing a relation between the key performance indexes and setting the importance level weight of the key points.
2. Availability index
The operation state of the service system is divided into an available state and an unavailable state, the health degree is established on the basis that the service system is available, and if the service system is unavailable, the node health degree is invalid and is 0.
Meanwhile, the operation and maintenance states of the network equipment, the middleware, the database and the host associated with the service system are divided into an available state and an unavailable state, and if the network equipment, the middleware, the database and the host are unavailable, the health degree of the node is invalid, namely 0.
3. Node health calculation
Key performance indicators (e.g., host CPU, memory, disk IO, network IO) and weights (e.g., CPU utilization weight 25, memory 25, disk 25, network IO 25) are set.
The performance index deduction rule is set and can be set synchronously with the alarm rule, for example, 20 points of alarm, 50 points of serious deduction and 100 points of fatal deduction.
Performance index is (index 1 deduct index 1 weight + index N deduct index N weight)/weight sum.
For example, the CPU utilization rate is seriously alarmed, and the performance index of the machine is deducted as follows:
the performance index deduction is (50 × 25+0+0+0)/(25+25 +25) ═ 12.5 points;
health score 100-12.5 score 87.5;
and setting an availability index deduction rule, and directly deducting 100 points by using the whole equipment.
For example, the host is unavailable and crashes;
the health degree is 100-.
The clusters can be adjusted and set, the clusters are high in availability, one cluster is taken as a drop, and the number of the drops is 50;
the health degree of the high-availability cluster is 100-50 points;
and (3) health degree algorithm: the lowest 0 minute is reserved until the buckling is finished;
the health degree is 100-deduction of availability index-deduction of performance index.
4. Hierarchical health algorithm
The network level health degree algorithm is multiplied by the weight proportion of the available equipment on the basis of the original network level health degree, namely the health degree is reduced rapidly when the number of the unavailable equipment is large. The level health degree is calculated according to the weight of the network level and the network element level to which the node (network equipment, middleware, database, etc.) belongs and the node health degree. The network element is the smallest unit which can be monitored and managed in network management, is composed of one or more machine disks or machine frames, and can independently complete a certain transmission function.
Firstly, judging whether a node is available or not according to a node availability index, if four nodes in a network layer have a node unavailable and an unavailable node degree weight, proportionally dividing the node unavailable weight to the rest node weights, wherein the rest node weights account for the total weight and are heavier, and proportionally dividing the node unavailable weight to the rest node weights continuously according to the unavailable one of the nodes, and analogizing the node unavailable weight by the following steps: (ii) tier availability ═ 1- [ total weight/n + total weight/(n-1) + total weight/(n-2) + total weight + … ]/total weight; then, the hierarchical health degree is calculated by a calculation formula, namely:
network layer health ═ node 1 weight + node N health × (node N weight)/(node weight sum).
5. System health degree algorithm
The system is computed hierarchically (network layer, storage, host, middleware, database) and set hierarchical weights, e.g., network layer weight 100, storage 60, host 80, middleware 60, database 70, then
System health ═ network layer health + storage N weight + host health + host weight + database health + database weight + middleware health + middleware weight)/(node weight sum).
The invention is used in a CRM service acceptance system on line of a telecom operator as shown in figure 2, real-time network bypass monitoring is carried out on terminal flow data, real experience data of a user can be actively collected and analyzed, the performance of a service system is subjectively judged, performance indexes are formulated, the health degree of the terminal is calculated through a specific formula, the terminal with low health degree is actively analyzed and processed, hardware performance problems, network problems or application system bottlenecks, such as page loading time alarm, server response time alarm, network corresponding time alarm and the like, are quickly positioned, the performance faults of the terminal are reduced, and the user perception is improved; therefore, the problem of the existing operation and maintenance mechanism for passively receiving the fault information is solved, and the service acceptance fault rate caused by the performance problem of the terminal is reduced.
Although the present invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A health degree assessment method based on monitoring index data is characterized by comprising the following steps:
s1) firstly, uniformly setting the network, the middleware, the database and the server as configuration items, establishing a relation model among the configuration items, extracting key performance indexes, and setting weights for the obtained key performance indexes according to the importance levels;
s2) synchronously setting a performance index deduction rule according to the alarm rule, and calculating the node health degree according to the performance index deduction rule;
s3) calculating the level health degree according to the network level to which the node belongs, the weight of the network element level and the node health degree;
s4) layering the system, setting hierarchical weight, and calculating the health degree of the system.
2. The method for assessing the health of a subject based on the monitored target data as claimed in claim 1, wherein the key performance indicators in step S1 include usage rates of CPU, memory, disk IO and network IO of the host.
3. The health assessment method according to claim 1, wherein the step S2 is to divide the operation status of the service system into two statuses of available and unavailable, and if the service system is unavailable, the health of all nodes associated with the service system is 0.
4. The health assessment method according to claim 3, wherein the operation and maintenance status of the network device, middleware, database, and host associated with the service system is divided into available and unavailable states, and if the operation and maintenance status is unavailable, the node health is 0.
5. The health assessment method according to claim 4, wherein the step S2 employs a bypass monitoring server to obtain the Web request, the network transmission information and the server response information, and determine whether the operation and maintenance status of the network device, the middleware, the database and the host associated with the service system is available.
6. The method of claim 4, wherein if the operation and maintenance status of the host is unavailable and the host is a host in the cluster, the health of the cluster is used as the health of the node, and a certain score is deducted for each occurrence of unavailability of a host in the cluster until the health of the node is 0.
7. The health assessment method according to claim 1, wherein the step S3 first determines the availability of nodes in the network layer, and if there is an unavailable node in the network layer, the unavailable node degree weight is proportionally divided into the remaining node weights.
8. The method for assessing the health of a subject based on the monitored index data as claimed in claim 1, wherein the system health in step S4 is calculated as follows:
the system health degree is (network layer health degree + network layer weight + storage health degree + host health degree + database health degree + middleware health degree)/node weight sum.
CN202011059652.4A 2020-09-30 2020-09-30 Health degree evaluation method based on monitoring index data Pending CN112162907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011059652.4A CN112162907A (en) 2020-09-30 2020-09-30 Health degree evaluation method based on monitoring index data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011059652.4A CN112162907A (en) 2020-09-30 2020-09-30 Health degree evaluation method based on monitoring index data

Publications (1)

Publication Number Publication Date
CN112162907A true CN112162907A (en) 2021-01-01

Family

ID=73861642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011059652.4A Pending CN112162907A (en) 2020-09-30 2020-09-30 Health degree evaluation method based on monitoring index data

Country Status (1)

Country Link
CN (1) CN112162907A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159638A (en) * 2021-05-17 2021-07-23 国网山东省电力公司电力科学研究院 Intelligent substation layered health degree index evaluation method and device
CN113312234A (en) * 2021-05-18 2021-08-27 福建天泉教育科技有限公司 Health detection optimization method and terminal
CN113487316A (en) * 2021-07-22 2021-10-08 银清科技有限公司 Distributed payment system security processing method and device
CN114095345A (en) * 2021-10-22 2022-02-25 深信服科技股份有限公司 Method, device, equipment and storage medium for evaluating health condition of host network
CN114116431A (en) * 2022-01-25 2022-03-01 深圳市明源云科技有限公司 System operation health detection method and device, electronic equipment and readable storage medium
CN114338424A (en) * 2021-12-29 2022-04-12 中国电信股份有限公司 Evaluation method and evaluation device for operation health degree of Internet of things
CN115174353A (en) * 2022-07-14 2022-10-11 中国工商银行股份有限公司 Fault root cause determination method, device, equipment and medium
CN115190039A (en) * 2022-07-31 2022-10-14 苏州浪潮智能科技有限公司 Equipment health evaluation method, system, equipment and storage medium
CN115473834A (en) * 2022-09-14 2022-12-13 中国电信股份有限公司 Monitoring task scheduling method and system
CN116127149A (en) * 2023-04-14 2023-05-16 杭州悦数科技有限公司 Quantification method and system for health degree of graph database cluster

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022242181A1 (en) * 2021-05-17 2022-11-24 国网山东省电力公司电力科学研究院 Method and apparatus for evaluating health degree indexes of layers of smart substation
CN113159638A (en) * 2021-05-17 2021-07-23 国网山东省电力公司电力科学研究院 Intelligent substation layered health degree index evaluation method and device
US11954210B2 (en) 2021-05-17 2024-04-09 State Grid Shandong Electric Power Research Institute Hierarchical health index evaluation method and apparatus for intelligent substation
CN113312234A (en) * 2021-05-18 2021-08-27 福建天泉教育科技有限公司 Health detection optimization method and terminal
CN113487316A (en) * 2021-07-22 2021-10-08 银清科技有限公司 Distributed payment system security processing method and device
CN113487316B (en) * 2021-07-22 2024-05-03 银清科技有限公司 Distributed payment system security processing method and device
CN114095345A (en) * 2021-10-22 2022-02-25 深信服科技股份有限公司 Method, device, equipment and storage medium for evaluating health condition of host network
CN114338424A (en) * 2021-12-29 2022-04-12 中国电信股份有限公司 Evaluation method and evaluation device for operation health degree of Internet of things
CN114116431A (en) * 2022-01-25 2022-03-01 深圳市明源云科技有限公司 System operation health detection method and device, electronic equipment and readable storage medium
CN115174353B (en) * 2022-07-14 2024-04-16 中国工商银行股份有限公司 Fault root cause determining method, device, equipment and medium
CN115174353A (en) * 2022-07-14 2022-10-11 中国工商银行股份有限公司 Fault root cause determination method, device, equipment and medium
CN115190039B (en) * 2022-07-31 2023-08-08 苏州浪潮智能科技有限公司 Equipment health evaluation method, system, equipment and storage medium
CN115190039A (en) * 2022-07-31 2022-10-14 苏州浪潮智能科技有限公司 Equipment health evaluation method, system, equipment and storage medium
CN115473834A (en) * 2022-09-14 2022-12-13 中国电信股份有限公司 Monitoring task scheduling method and system
CN115473834B (en) * 2022-09-14 2024-04-02 中国电信股份有限公司 Monitoring task scheduling method and system
CN116127149A (en) * 2023-04-14 2023-05-16 杭州悦数科技有限公司 Quantification method and system for health degree of graph database cluster

Similar Documents

Publication Publication Date Title
CN112162907A (en) Health degree evaluation method based on monitoring index data
US7953691B2 (en) Performance evaluating apparatus, performance evaluating method, and program
CN109039833B (en) Method and device for monitoring bandwidth state
US7783605B2 (en) Calculating cluster availability
CN101632093A (en) Be used to use statistical analysis to come the system and method for management of performance fault
US8250400B2 (en) Method and apparatus for monitoring data-processing system
CN110287081A (en) A kind of service monitoring system and method
JPH08307524A (en) Method and equipment for discriminating risk in abnormal conditions of constitutional element of communication network
CN109670690A (en) Data information center monitoring and early warning method, system and equipment
CN101997709A (en) Root alarm data analysis method and system
US10402298B2 (en) System and method for comprehensive performance and availability tracking using passive monitoring and intelligent synthetic transaction generation in a transaction processing system
CN109992473A (en) Monitoring method, device, equipment and the storage medium of application system
CN108769170A (en) A kind of cluster network fault self-checking system and method
CN108632086A (en) A kind of concurrent job operation troubles localization method
JP2007249373A (en) Monitoring system of distributed program
KR20190002280A (en) Apparatus and method for managing trouble using big data of 5G distributed cloud system
CN117632897A (en) Dynamic capacity expansion and contraction method and device
CN115150253B (en) Fault root cause determining method and device and electronic equipment
CN116204393A (en) Wind control management method and device of business system
US11210159B2 (en) Failure detection and correction in a distributed computing system
CN113541982A (en) Network element health early warning method and device, computing equipment and computer storage medium
TWI712880B (en) Information service availability management method and system
CN111694705A (en) Monitoring method, device, equipment and computer readable storage medium
CN118282838A (en) Quality evaluation method and device for service system
CN117076270A (en) System stability evaluation method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination