CN109669836B - Intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium - Google Patents

Intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN109669836B
CN109669836B CN201811118210.5A CN201811118210A CN109669836B CN 109669836 B CN109669836 B CN 109669836B CN 201811118210 A CN201811118210 A CN 201811118210A CN 109669836 B CN109669836 B CN 109669836B
Authority
CN
China
Prior art keywords
alarm
score
abnormal point
fault type
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811118210.5A
Other languages
Chinese (zh)
Other versions
CN109669836A (en
Inventor
方振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN201811118210.5A priority Critical patent/CN109669836B/en
Publication of CN109669836A publication Critical patent/CN109669836A/en
Application granted granted Critical
Publication of CN109669836B publication Critical patent/CN109669836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data

Abstract

The invention discloses an intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium, wherein the method comprises the following steps: acquiring all alarm events within preset time, and acquiring all alarm abnormal points in each alarm event; for each alarm outlier, the following steps are performed: counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes; calculating the associated score of the alarm abnormal point through a pre-stored first rule, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule; and determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score. The invention solves the technical problem that the accuracy of fault analysis is reduced because the data coupling degree of the IT operation and maintenance platform is higher and higher in the prior art.

Description

Intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an intelligent IT operation and maintenance analysis method, apparatus, device, and readable storage medium.
Background
When a certain function of the traditional IT operation and maintenance platform cannot normally operate, a set of fault analysis mechanism is used for carrying out fault analysis on the platform so as to determine which monitoring indexes are abnormal. However, conventional fault analysis performs monitoring analysis from only a single dimension, that is, whether the value of a certain monitoring index is within a certain abnormal region. With the development of information technology, different monitoring indexes are frequently called mutually, and the data coupling degree between the indexes is higher and higher, namely the monitoring indexes possibly are normal, but abnormal occurs due to the abnormality of other related monitoring indexes. If the fault analysis is carried out on the platform according to the traditional fault analysis method, the monitoring indexes cannot be subjected to cross analysis and induction fusion, so that the monitoring indexes which are truly abnormal cannot be checked out by the system, the accuracy of the fault analysis is reduced, the working efficiency of the system is low, and the normal operation of the system function cannot be maintained.
Disclosure of Invention
The invention mainly aims to provide an intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium, and aims to solve the technical problems that in the prior art, the data coupling degree of an IT operation and maintenance platform is higher and higher, and the accuracy of fault analysis is reduced.
In order to achieve the above object, the present invention provides an intelligent IT operation and maintenance analysis method, which includes:
acquiring all alarm events within preset time, and acquiring all alarm abnormal points in each alarm event;
for each alarm outlier, the following steps are performed:
counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
calculating the associated score of the alarm abnormal point through a pre-stored first rule, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
and determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score.
Optionally, the step of calculating the associated score of the alarm abnormal point through a pre-stored first rule includes:
acquiring an upstream call relation and a downstream call relation preset by the alarm abnormal point, and acquiring an upstream network weight corresponding to the upstream call relation and a downstream network weight corresponding to the downstream call relation;
and setting the sum of the upstream network weight and the downstream network weight as the associated score of the alarm abnormal point.
Optionally, the step of calculating the alarm score of the alarm abnormal point through a pre-stored second rule includes:
acquiring the target node category of the alarm abnormal point, and acquiring the target node weight score of the target node category of the alarm abnormal point based on the association relation between each pre-stored node category and the node weight score;
acquiring a pre-stored event emergency weight score of the alarm index;
and setting the sum of the weight score of the target node and the emergency weight score of the event as the alarm score of the alarm abnormal point.
Optionally, the step of determining the fault type of the alarm abnormal point according to the associated score of the alarm abnormal point, the current traffic and the alarm score includes:
and if the alarm score is smaller than the first threshold value, the current traffic is larger than the second threshold value and the association score is larger than the third threshold value, judging that the fault type of the alarm abnormal point is the association fault type.
Optionally, the step of determining that the fault type of the alert abnormal point is an associated fault type after the step of determining that the alert abnormal point is the associated fault type includes:
and if the alarm score is greater than or equal to a first threshold value and the current traffic is less than or equal to a second threshold value, judging that the fault type of the alarm abnormal point is a node fault type.
Optionally, the step of determining that the fault type of the alarm abnormal point is the node fault type after the step of determining that the alarm abnormal point is the node fault type if the alarm score is greater than or equal to a first threshold and the current traffic is less than or equal to a second threshold includes:
determining a target early warning mode of the abnormal alarm point according to the association relation between the pre-stored fault type and the early warning mode, wherein the early warning mode comprises voice early warning modes with different tone colors and mail early warning modes;
and carrying out early warning processing on the alarm abnormal point based on the target early warning mode.
Optionally, the step of determining the fault type of the alarm abnormal point according to the associated score of the alarm abnormal point, the current traffic and the alarm score includes:
and recording the fault type of the alarm abnormal point at the present time, and generating an analysis report of the alarm abnormal point according to the fault type, wherein the analysis report comprises the association score of the alarm abnormal point, the current traffic and the alarm score.
The invention also provides an intelligent IT operation and maintenance analysis device, which comprises:
the first acquisition module is used for acquiring all alarm events in preset time and acquiring all alarm abnormal points in each alarm event;
for each alarm outlier, there is a processing module comprising:
the statistics sub-module is used for counting the alarm indexes corresponding to the alarm abnormal points and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
the calculation sub-module is used for calculating the associated score of the alarm abnormal point through a pre-stored first rule and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
and the determining submodule is used for determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score.
Optionally, the computing submodule includes:
the first acquisition unit is used for acquiring an upstream call relation and a downstream call relation preset by the alarm abnormal point, and acquiring an upstream network weight corresponding to the upstream call relation and a downstream network weight corresponding to the downstream call relation;
and the first setting unit is used for setting the sum of the upstream network weight and the downstream network weight as the associated score of the alarm abnormal point.
Optionally, the computing submodule further includes:
the second acquisition unit is used for acquiring the target node category of the alarm abnormal point and acquiring the target node weight score of the target node category of the alarm abnormal point based on the association relation between each pre-stored node category and the node weight score;
a third obtaining unit, configured to obtain a pre-stored emergency weight score of the event of the alarm indicator;
and a second setting unit configured to set a sum of the target node weight score and the event emergency weight score as an alarm score of the alarm abnormal point.
Optionally, the determining submodule includes:
and the first judging unit is used for judging that the fault type of the abnormal alarm point is the associated fault type if the alarm score is smaller than a first threshold value, the current traffic is larger than a second threshold value and the associated score is larger than a third threshold value.
Optionally, the determining submodule further includes:
and the second judging unit is used for judging that the fault type of the abnormal alarm point is a node fault type if the alarm score is larger than or equal to a first threshold value and the current traffic is smaller than or equal to a second threshold value.
Optionally, the intelligent IT operation and maintenance analysis device further includes:
the early warning determining module is used for determining a target early warning mode of the warning abnormal point according to the association relation between the pre-stored fault type and the early warning mode, wherein the early warning mode comprises voice early warning modes with different tone colors and mail early warning modes;
and the early warning module is used for carrying out early warning processing on the abnormal warning points based on the target early warning mode.
Optionally, the intelligent IT operation and maintenance analysis device further includes:
the recording module is used for recording the fault type of the alarm abnormal point at the present time and generating an analysis report of the alarm abnormal point according to the fault type, wherein the analysis report comprises the association score of the alarm abnormal point, the current traffic and the alarm score.
In addition, to achieve the above object, the present invention also provides an intelligent IT operation and maintenance analysis apparatus, including: a memory, a processor, a communication bus, and a smart IT operation and maintenance analysis program stored on the memory,
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute the intelligent IT operation and maintenance analysis program to implement the following steps:
acquiring all alarm events within preset time, and acquiring all alarm abnormal points in each alarm event;
for each alarm outlier, the following steps are performed:
counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
calculating the associated score of the alarm abnormal point through a pre-stored first rule, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
and determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score.
In addition, to achieve the above object, the present invention also provides a readable storage medium storing one or more programs executable by one or more processors for:
acquiring all alarm events within preset time, and acquiring all alarm abnormal points in each alarm event;
for each alarm outlier, the following steps are performed:
counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
calculating the associated score of the alarm abnormal point through a pre-stored first rule, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
and determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score.
The method comprises the steps of obtaining all alarm events in preset time and obtaining all alarm abnormal points in all alarm events; for each alarm outlier, the following steps are performed: counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes; calculating the associated score of the alarm abnormal point through a pre-stored first rule, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule; and determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score. In the method, all alarm events in preset time are collected firstly, after all alarm abnormal points in all alarm events are obtained, the fault types of the alarm abnormal points are determined from three dimensions of the associated score, the current service volume and the alarm score, namely, the fault types of the alarm abnormal points are determined from three dimensions of the associated score, the current service volume and the alarm score, and not only the fault types of the alarm abnormal points are determined from a certain dimension, so that the fault types of the alarm abnormal points can be determined more dimensionally and accurately, and the investigation direction can be provided for the processing of the alarm abnormal points more accurately due to the fact that the fault types of the alarm abnormal points are determined more dimensionally and accurately, and the problem that the fault analysis accuracy is lowered due to the fact that the abnormal monitoring indexes cannot be timely removed by a system due to the fact that the analysis of the alarm abnormal points is conducted from a certain dimension is solved, and the problem that the fault analysis accuracy is lowered in the prior art is solved.
Drawings
FIG. 1 is a flowchart of a first embodiment of an intelligent IT operation and maintenance analysis method according to the present invention;
FIG. 2 is a detailed flowchart of the step of calculating the associated score of the alarm outlier by a pre-stored first rule in a second embodiment of the intelligent IT operation and maintenance analysis method of the present invention;
FIG. 3 is a schematic diagram of a device architecture of a hardware operating environment involved in a method according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The present invention provides an intelligent IT operation and maintenance analysis method, in a first embodiment of the intelligent IT operation and maintenance analysis method of the present invention, referring to fig. 1, the intelligent IT operation and maintenance analysis method includes:
step S10, acquiring all alarm events in preset time, and acquiring all alarm abnormal points in each alarm event;
for each alarm outlier, the following steps are performed:
step S20, counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
step S30, calculating the associated score of the alarm abnormal point through a pre-stored first rule, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
and step S40, determining the fault type of the alarm abnormal point according to the associated score of the alarm abnormal point, the current traffic and the alarm score.
The method comprises the following specific steps:
step S10, acquiring all alarm events in preset time, and acquiring all alarm abnormal points in each alarm event;
in this embodiment, an intelligent IT operation and maintenance analysis method is provided, which may be applied to an intelligent IT operation and maintenance analysis system (may be simply referred to as a system), in which the intelligent IT operation and maintenance analysis system collects all alarm events, obtains alarm abnormal points and corresponding alarm indexes from the alarm events, obtains current traffic of each alarm abnormal point from the alarm indexes, calculates an alarm score and an associated score of each alarm abnormal point according to a preset weight, the current traffic and the like, and determines whether the current alarm abnormal point is an associated fault or a node fault through threshold comparison, so that the intelligent IT operation and maintenance analysis system obtains a fault processing direction, and further performs node fault processing according to the fault processing direction.
Specifically, in the intelligent IT operation and maintenance analysis system, there is a monitoring subsystem, which monitors each IT service subsystem to acquire the relevant alarm event of each IT service subsystem, whereas for the intelligent IT operation and maintenance analysis system, each IT service subsystem is a node, and a data structure of the intelligent IT operation and maintenance analysis system is formed by each IT service subsystem, that is, each node, in the data structure, a directional connection between two nodes indicates which subsystem invokes which subsystem, so as to acquire the association relationship of each alarm abnormal point subsequently. In this embodiment, alarm events, such as the most recent 1 hour, within a preset period of time are extracted from each monitoring subsystem in real time, and alarm abnormal points and alarm indexes involved in each event are extracted from the alarm events. For example, the intelligent IT operation and maintenance analysis system acquires 2 alarm events, which are respectively: 1. the node A cannot call the node B, and the ip1 address of the node A and the ip2 address of the node B are not communicated; 2. the database of the a node cannot be invoked normally. The system will acquire alarm outliers a and B and alarm indicators ip1, ip2, database call anomalies.
After obtaining each alarm abnormal point, executing the following steps aiming at each alarm abnormal point:
step S20, counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
the intelligent IT operation and maintenance analysis system performs statistical classification on each alarm index so as to enable each alarm index to be corresponding to the position below each alarm abnormal point, and accordingly, what alarm index exists at each alarm abnormal point is obtained. After the intelligent IT operation and maintenance analysis system acquires the alarm abnormal points and the alarm indexes, each alarm index can be corresponding to the position below each alarm abnormal point through statistics. The alarm index of the node A comprises ip1 address non-passing and the database cannot be called, and the alarm index of the node B comprises ip2 address non-passing. Meanwhile, the intelligent IT operation and maintenance analysis system acquires the current traffic of the alarm abnormal point from the alarm index, the current traffic of each alarm abnormal point obtained by the system from the alarm index refers to the current transmission data quantity of the alarm abnormal point, and particularly, the current traffic can be key current traffic, namely, the intelligent IT operation and maintenance analysis system obtains the key current traffic of each alarm abnormal point from the alarm index, and the key current traffic or the current traffic represents the state of the data throughput of the current alarm abnormal point, such as whether the data throughput is high or low. In general, if an alarm abnormal point is abnormal in index, the key current traffic or the current traffic will also change greatly, and the change reflects the fault existence of the current alarm abnormal point from the side.
Step S30, calculating the associated score of the alarm abnormal point through a pre-stored first rule, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
in this embodiment, the association score of the alarm abnormal point is calculated through a pre-stored first rule, and the alarm score of the alarm abnormal point is calculated through a pre-stored second rule, where the association score is a quantization index that the current alarm abnormal point is affected by other alarm abnormal points, the alarm score is the alarm degree corresponding to each of the current different alarm abnormal points, and the first rule may be a calling weight rule, and the second rule may be a node importance level weight rule.
Specifically, referring to fig. 2, the step of calculating the associated score of the alert outlier by a pre-stored first rule includes:
step S31, an upstream call relation and a downstream call relation preset by the alarm abnormal point are obtained, and an upstream network weight corresponding to the upstream call relation and a downstream network weight corresponding to the downstream call relation are obtained;
in this embodiment, the intelligent IT operation and maintenance analysis system obtains the upstream and downstream call relationships of the alarm abnormal points, and obtains the upstream network weights corresponding to the upstream call relationships, and the downstream network weights corresponding to the downstream call relationships, for example, the upstream call node of the alarm abnormal point B is an a node, and the downstream call node is a C node, so that the upstream and downstream call relationships of the alarm abnormal point B are respectively called by a and called by C. Therefore, the upstream network weight corresponding to the alarm abnormal point B called by the A needs to be obtained, and the downstream network weight corresponding to the alarm abnormal point B called by the C needs to be obtained. The weights that are invoked for the different alarm outliers are preset, e.g., for node B, the upstream network weight for B invoked by a is 3.
And S32, setting the sum of the upstream network weight and the downstream network weight as the associated score of the alarm abnormal point.
And setting the sum of the upstream network weight and the downstream network weight as the associated score of the alarm abnormal point, for example, when the downstream network weight of the call C of B is 5 and the upstream network weight of the call B of A is 3, the associated score of the node B is 3+5=8.
The step of calculating the alarm score of the alarm abnormal point through a pre-stored second rule comprises the following steps:
step S33, obtaining a target node category of the alarm abnormal point, and obtaining a target node weight score of the target node category of the alarm abnormal point based on a pre-stored association relation between each node category and the node weight score;
in this embodiment, an association relationship between a node class and a corresponding node weight score is pre-stored, for example, the node class of the a node is a, the node weight score of the a node is x, the node class of the B node is B according to the association relationship, and the node weight score of the B node is y according to the association relationship, that is, after the alarm abnormal point is obtained, the target node class of the alarm abnormal point is obtained, so as to obtain the target node weight score of the alarm abnormal point.
Step S34, obtaining a pre-stored event emergency weight score of the alarm index;
in this embodiment, the event emergency weight score of the alarm index is also required to be obtained, and the event emergency weight score of the alarm index is preset, for example, the alarm index is a network ip, and the event emergency weight score corresponding to the network ip can be directly invoked.
And step S35, setting the sum of the weight score of the target node and the emergency weight score of the event as the alarm score of the alarm abnormal point.
The alarm score refers to the alarm degree corresponding to each of the different alarm abnormal points at present, the alarm score represents the node key degree of the alarm abnormal point, and the alarm event itself has the preset event emergency weight score because the abnormal point itself has the preset node weight score, so the alarm score calculating method comprises the following steps: and setting the sum of the target node weight score and the event emergency weight score as the alarm score of the alarm abnormal point, for example, the abnormal point is a database node, the target node weight score of the node is 3, the abnormal point is provided with two alarm events a and b, and the corresponding event emergency weight scores are respectively 4 and 3. The alert score for the outlier is the target node weight score + event emergency weight score a + event emergency weight score b = 10.
And step S40, determining the fault type of the alarm abnormal point according to the associated score of the alarm abnormal point, the current traffic and the alarm score.
In this embodiment, after the association score, the current traffic volume and the alert score are obtained, determining a fault type of the alert abnormal point according to the association score, the current traffic volume and the alert score, where the fault type includes an association fault type and a node fault type, where the association fault type is that the alert abnormal point is affected by association, and the node fault type is that the node is not affected by association, and determining that the node itself has a fault.
The method comprises the steps of obtaining all alarm events in preset time and obtaining all alarm abnormal points in all alarm events; for each alarm outlier, the following steps are performed: counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes; calculating the associated score of the alarm abnormal point through a pre-stored first rule, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule; and determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score. In the method, all alarm events in preset time are collected firstly, after all alarm abnormal points in all alarm events are obtained, the fault types of the alarm abnormal points are determined from three dimensions of the associated score, the current service volume and the alarm score, namely, the fault types of the alarm abnormal points are determined from three dimensions of the associated score, the current service volume and the alarm score, and not only the fault types of the alarm abnormal points are determined from a certain dimension, so that the fault types of the alarm abnormal points can be determined more dimensionally and accurately, and the investigation direction can be provided for the processing of the alarm abnormal points more accurately due to the fact that the fault types of the alarm abnormal points are determined more dimensionally and accurately, and the problem that the fault analysis accuracy is lowered due to the fact that the abnormal monitoring indexes cannot be timely removed by a system due to the fact that the analysis of the alarm abnormal points is conducted from a certain dimension is solved, and the problem that the fault analysis accuracy is lowered in the prior art is solved.
Further, the present invention provides another embodiment of the intelligent IT operation and maintenance analysis method, in which the determining the fault type of the alert outlier according to the associated score of the alert outlier, the current traffic volume, and the alert score includes:
and step S41, if the alarm score is smaller than a first threshold value, the current traffic is larger than a second threshold value and the association score is larger than a third threshold value, judging that the fault type of the alarm abnormal point is the association fault type.
In this embodiment, if the current alarm score is smaller than the first threshold and the current traffic is larger than the second threshold, it is indicated that the data processing capability of the current alarm abnormal point is within the normal range, so that the alarm abnormal point is most likely to be affected by association, and therefore the system will determine that the abnormal node is an association failure.
And if the alarm score is smaller than a first threshold, the current traffic is larger than a second threshold and the association score is larger than a third threshold, determining that the fault type of the alarm abnormal point is the association fault type comprises the following steps:
and step S42, if the alarm score is greater than or equal to a first threshold value and the current traffic is less than or equal to a second threshold value, judging that the fault type of the alarm abnormal point is a node fault type.
In this embodiment, if the alarm score is greater than or equal to the first threshold, it indicates that the alarm degree of the node itself of the current abnormal point of the alarm exceeds the threshold, and the current traffic is less than or equal to the second threshold, which indicates that the data processing capability of the current node is lower than the threshold, and at this time, the system determines that the node is a node failure.
In this embodiment, if the alarm score is smaller than the first threshold, the current traffic is larger than the second threshold, and the association score is larger than the third threshold, the fault type of the alarm abnormal point is determined to be the association fault type, and the fault type is determined from three dimensions of the alarm score, the current traffic, and the association score, so that the accuracy of IT operation and maintenance analysis is improved.
Further, the present invention provides another embodiment of the intelligent IT operation and maintenance analysis method, in this embodiment, the step of determining that the fault type of the alarm abnormal point is the node fault type after the step of determining that the alarm abnormal point is the node fault type if the alarm score is greater than or equal to a first threshold value and the current traffic is less than or equal to a second threshold value includes:
step S50, determining a target early warning mode of the warning abnormal point according to the association relation between the pre-stored fault type and the early warning mode, wherein the early warning mode comprises voice early warning modes with different tone colors and mail early warning modes;
in this embodiment, when an abnormal event occurs, for example, when an abnormal alarm occurs at a certain node, a predictive alarm is given, where an intelligent IT operation and maintenance analysis system pre-stores an association relationship between a fault type and an alarm mode, so that a target alarm mode of the abnormal alarm point can be determined, where the alarm mode includes a voice alarm mode and a mail alarm mode with different timbres. It should be noted that, when the fault type is a node fault type, the early warning mode is a mail early warning mode, when the fault type is an associated fault type, the early warning mode is a voice early warning mode, for the associated fault type, the association degree of the abnormal alarm point can be obtained, and the voice early warning modes with different high and low tones are given out according to the comparison of the association degree and a pre-stored design threshold value.
And step S60, carrying out early warning processing on the abnormal alarm points based on the target early warning mode.
After the target early warning mode is obtained, early warning processing is carried out on the alarm abnormal points based on the target early warning mode, so that the operation and maintenance personnel can determine the association degree range of the alarm abnormal points, and the operation and maintenance experience of the operation and maintenance personnel is improved.
Further, the step of determining the fault type of the alarm abnormal point according to the associated score of the alarm abnormal point, the current traffic and the alarm score includes:
and step S70, recording the fault type of the alarm abnormal point at the present time, and generating an analysis report of the alarm abnormal point according to the fault type, wherein the analysis report comprises the association score of the alarm abnormal point, the current traffic and the alarm score.
In this embodiment, the fault type of the alarm abnormal point is recorded, and an analysis report of the alarm abnormal point is generated according to the fault type, so that an operation and maintenance personnel can perform subsequent alarm abnormal inquiry, summary or processing according to the analysis report, especially subsequent alarm abnormal point correction process abnormal condition inquiry or summary, wherein the analysis report comprises the associated score of the alarm abnormal point, the current traffic and the alarm score.
In this embodiment, the fault type of the alarm abnormal point is recorded, and an analysis report of the alarm abnormal point is generated according to the fault type, where the analysis report includes the associated score of the alarm abnormal point, the current traffic and the alarm score, so that a direction and convenience can be provided for processing of subsequent query alarm abnormal points.
Referring to fig. 3, fig. 3 is a schematic device structure diagram of a hardware running environment according to an embodiment of the present invention.
The intelligent IT operation and maintenance analysis equipment of the embodiment of the invention can be a PC, and can also be terminal equipment such as a smart phone, a tablet personal computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III, dynamic image expert compression standard audio layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert compression standard audio layer 3) player, a portable computer and the like.
As shown in fig. 3, the intelligent IT operation and maintenance analysis device may include: a processor 1001, such as a CPU, memory 1005, and a communication bus 1002. Wherein a communication bus 1002 is used to enable connected communication between the processor 1001 and a memory 1005. The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Optionally, the intelligent IT operation and maintenance analysis device may further include a target user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The target user interface may comprise a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the selectable target user interface may further comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).
IT will be appreciated by those skilled in the art that the intelligent IT operation and maintenance analysis device structure shown in fig. 3 does not constitute a limitation of the intelligent IT operation and maintenance analysis device, and may include more or less components than illustrated, or may combine certain components, or may be arranged in different components.
As shown in fig. 3, an operating system, a network communication module, and a smart IT operation and maintenance analysis program may be included in a memory 1005, which is a type of computer storage medium. The operating system is a program that manages and controls the intelligent IT operation and maintenance analysis device hardware and software resources, supporting the operation of the intelligent IT operation and maintenance analysis program and other software and/or programs. The network communication module is used to enable communication between components within the memory 1005 and with other hardware and software in the intelligent IT operation and maintenance analysis device.
In the intelligent IT operation and maintenance analysis apparatus shown in fig. 3, a processor 1001 is configured to execute an intelligent IT operation and maintenance analysis program stored in a memory 1005, to implement the steps of the intelligent IT operation and maintenance analysis method described in any one of the above.
The specific implementation manner of the intelligent IT operation and maintenance analysis device is basically the same as that of each embodiment of the intelligent IT operation and maintenance analysis method, and is not repeated here.
The invention also provides an intelligent IT operation and maintenance analysis device, which comprises:
the first acquisition module is used for acquiring all alarm events in preset time and acquiring all alarm abnormal points in each alarm event;
for each alarm outlier, there is a processing module comprising:
the statistics sub-module is used for counting the alarm indexes corresponding to the alarm abnormal points and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
the calculation sub-module is used for calculating the associated score of the alarm abnormal point through a pre-stored first rule and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
and the determining submodule is used for determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score.
The specific implementation of the intelligent IT operation and maintenance analysis device is basically the same as the above-mentioned embodiments of the intelligent IT operation and maintenance analysis method, and will not be described in detail herein.
The present invention provides a readable storage medium storing one or more programs that are further executable by one or more processors for implementing the steps of the intelligent IT operation and maintenance analysis method described in any one of the above.
The specific implementation manner of the readable storage medium of the present invention is basically the same as the above embodiments of the intelligent IT operation and maintenance analysis method, and will not be described herein.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein, or any application, directly or indirectly, within the scope of the invention.

Claims (9)

1. An intelligent IT operation and maintenance analysis method, which is characterized by comprising the following steps:
acquiring all alarm events within preset time, and acquiring all alarm abnormal points in each alarm event;
for each alarm outlier, the following steps are performed:
counting the alarm indexes corresponding to the alarm abnormal points, and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
acquiring an upstream call relation and a downstream call relation preset by the alarm abnormal point, acquiring an upstream network weight corresponding to the upstream call relation and a downstream network weight corresponding to the downstream call relation, setting the sum of the upstream network weight and the downstream network weight as an associated score of the alarm abnormal point, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
and determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score, wherein the fault type comprises an association fault type and a node fault type.
2. The intelligent IT operation and maintenance analysis method according to claim 1, wherein the calculating the alarm score of the alarm outlier by the pre-stored second rule comprises:
acquiring the target node category of the alarm abnormal point, and acquiring the target node weight score of the target node category of the alarm abnormal point based on the association relation between each pre-stored node category and the node weight score;
acquiring a pre-stored event emergency weight score of the alarm index;
and setting the sum of the weight score of the target node and the emergency weight score of the event as the alarm score of the alarm abnormal point.
3. The intelligent IT operation and maintenance analysis method according to claim 1, wherein the determining the fault type of the alarm outlier according to the associated score of the alarm outlier, the current traffic volume, and the alarm score comprises:
and if the alarm score is smaller than the first threshold value, the current traffic is larger than the second threshold value and the association score is larger than the third threshold value, judging that the fault type of the alarm abnormal point is the association fault type.
4. The intelligent IT operation and maintenance analysis method according to claim 3, wherein the step of determining that the fault type of the alert abnormal point is the associated fault type if the alert score is less than a first threshold, the current traffic is greater than a second threshold, and the associated score is greater than a third threshold comprises:
and if the alarm score is greater than or equal to a first threshold value and the current traffic is less than or equal to a second threshold value, judging that the fault type of the alarm abnormal point is a node fault type.
5. The intelligent IT operation and maintenance analysis method according to claim 4, wherein the step of determining that the fault type of the alarm abnormal point is the node fault type if the alarm score is greater than or equal to a first threshold value and the current traffic is less than or equal to a second threshold value comprises:
determining a target early warning mode of the abnormal alarm point according to the association relation between the pre-stored fault type and the early warning mode, wherein the early warning mode comprises voice early warning modes with different tone colors and mail early warning modes;
and carrying out early warning processing on the alarm abnormal point based on the target early warning mode.
6. The intelligent IT operation and maintenance analysis method according to any one of claims 1 to 5, wherein the step of determining the fault type of the alert outlier based on the associated score of the alert outlier, the current traffic volume, and the alert score comprises:
and recording the fault type of the alarm abnormal point at the present time, and generating an analysis report of the alarm abnormal point according to the fault type, wherein the analysis report comprises the association score of the alarm abnormal point, the current traffic and the alarm score.
7. An intelligent IT operation and maintenance analysis device, characterized in that the intelligent IT operation and maintenance analysis device comprises:
the first acquisition module is used for acquiring all alarm events in preset time and acquiring all alarm abnormal points in each alarm event;
for each alarm outlier, there is a processing module comprising:
the statistics sub-module is used for counting the alarm indexes corresponding to the alarm abnormal points and acquiring the current traffic of the alarm abnormal points from the alarm indexes;
the calculation sub-module is used for acquiring an upstream call relation and a downstream call relation preset by the alarm abnormal point, acquiring an upstream network weight corresponding to the upstream call relation and a downstream network weight corresponding to the downstream call relation, setting the sum of the upstream network weight and the downstream network weight as an associated score of the alarm abnormal point, and calculating the alarm score of the alarm abnormal point through a pre-stored second rule;
and the determining submodule is used for determining the fault type of the alarm abnormal point according to the association score of the alarm abnormal point, the current traffic and the alarm score, wherein the fault type comprises the association fault type and the node fault type.
8. An intelligent IT operation and maintenance analysis device, characterized in that the intelligent IT operation and maintenance analysis device comprises: a memory, a processor, a communication bus, and a smart IT operation and maintenance analysis program stored on the memory,
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute the intelligent IT operation and maintenance analysis program to implement the steps of the intelligent IT operation and maintenance analysis method according to any one of claims 1 to 6.
9. A readable storage medium, characterized in that IT has stored thereon an intelligent IT operation and maintenance analysis program, which when executed by a processor, implements the steps of the intelligent IT operation and maintenance analysis method according to any of claims 1-6.
CN201811118210.5A 2018-09-25 2018-09-25 Intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium Active CN109669836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811118210.5A CN109669836B (en) 2018-09-25 2018-09-25 Intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811118210.5A CN109669836B (en) 2018-09-25 2018-09-25 Intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN109669836A CN109669836A (en) 2019-04-23
CN109669836B true CN109669836B (en) 2023-04-28

Family

ID=66141631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811118210.5A Active CN109669836B (en) 2018-09-25 2018-09-25 Intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN109669836B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110601875B (en) * 2019-08-15 2022-08-19 平安普惠企业管理有限公司 Information output method, information output apparatus, management device, and computer-readable storage medium
CN110557281B (en) * 2019-08-21 2022-04-26 北京市天元网络技术股份有限公司 Intelligent operation and maintenance method and device based on CMDB and alarm map

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7760861B1 (en) * 2005-10-31 2010-07-20 At&T Intellectual Property Ii, L.P. Method and apparatus for monitoring service usage in a communications network
CN107678907A (en) * 2017-05-22 2018-02-09 平安科技(深圳)有限公司 Database business logic monitoring method, system and storage medium
CN107886242A (en) * 2017-11-10 2018-04-06 平安科技(深圳)有限公司 Data monitoring method, device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8924533B2 (en) * 2005-04-14 2014-12-30 Verizon Patent And Licensing Inc. Method and system for providing automated fault isolation in a managed services network
US9538402B2 (en) * 2011-09-30 2017-01-03 Nokia Solutions And Networks Oy Fault management traffic reduction in heterogeneous networks
US8868736B2 (en) * 2012-04-27 2014-10-21 Motorola Mobility Llc Estimating a severity level of a network fault
CN107547262B (en) * 2017-07-25 2021-07-06 新华三技术有限公司 Method and device for generating alarm level and network management equipment
CN107832200A (en) * 2017-10-24 2018-03-23 平安科技(深圳)有限公司 Alert processing method, device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7760861B1 (en) * 2005-10-31 2010-07-20 At&T Intellectual Property Ii, L.P. Method and apparatus for monitoring service usage in a communications network
CN107678907A (en) * 2017-05-22 2018-02-09 平安科技(深圳)有限公司 Database business logic monitoring method, system and storage medium
CN107886242A (en) * 2017-11-10 2018-04-06 平安科技(深圳)有限公司 Data monitoring method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109669836A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
CN110826071B (en) Software vulnerability risk prediction method, device, equipment and storage medium
CN109753406B (en) Interface performance monitoring method, device, equipment and computer readable storage medium
CN107291586B (en) Application program analysis method and device
CN109633351B (en) Intelligent IT operation and maintenance fault positioning method, device, equipment and readable storage medium
US20200410376A1 (en) Prediction method, training method, apparatus, and computer storage medium
CN109669836B (en) Intelligent IT operation and maintenance analysis method, device, equipment and readable storage medium
CN108337127B (en) Application performance monitoring method, system, terminal and computer readable storage medium
CN112035320B (en) Service monitoring method and device, electronic equipment and readable storage medium
CN109992473A (en) Monitoring method, device, equipment and the storage medium of application system
CN111158926B (en) Service request analysis method, device and equipment
CN112333763A (en) Network selection method and device
CN110659179A (en) Method and device for evaluating system running condition and electronic equipment
CN110796552A (en) Risk prompting method and device
CN111626498A (en) Equipment operation state prediction method, device, equipment and storage medium
CN213547561U (en) Internet of things sensing equipment evaluation system
CN112187946B (en) System and method for evaluating sensing equipment of Internet of things
CN116416764A (en) Alarm threshold generation method and device, electronic equipment and storage medium
CN112363895B (en) System fault positioning method and device and electronic equipment
CN113453261A (en) Abnormal cell identification method and device and electronic equipment
CN112019390A (en) Network fault positioning method and related device
CN111538889A (en) Interface request method, device, equipment and computer readable storage medium
EP3640821B1 (en) Coefficient calculation method, component calling method, device, medium, server, and terminal
CN113534209A (en) Position reporting method and device based on tracker, storage medium and terminal
WO2023186090A1 (en) Verification method, apparatus and device
CN115470859A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant