CN113806180A - Unsupervised intelligent noise reduction processing method - Google Patents

Unsupervised intelligent noise reduction processing method Download PDF

Info

Publication number
CN113806180A
CN113806180A CN202111117474.0A CN202111117474A CN113806180A CN 113806180 A CN113806180 A CN 113806180A CN 202111117474 A CN202111117474 A CN 202111117474A CN 113806180 A CN113806180 A CN 113806180A
Authority
CN
China
Prior art keywords
alarm
importance
time
noise reduction
unsupervised
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111117474.0A
Other languages
Chinese (zh)
Other versions
CN113806180B (en
Inventor
雷建椿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tengyun Yuezhi Technology Shenzhen Co ltd
Original Assignee
Tengyun Yuezhi Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tengyun Yuezhi Technology Shenzhen Co ltd filed Critical Tengyun Yuezhi Technology Shenzhen Co ltd
Priority to CN202111117474.0A priority Critical patent/CN113806180B/en
Publication of CN113806180A publication Critical patent/CN113806180A/en
Application granted granted Critical
Publication of CN113806180B publication Critical patent/CN113806180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • G06F11/3082Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting the data filtering being achieved by aggregating or compressing the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Abstract

The invention discloses an unsupervised intelligent noise reduction processing method, which enables operation and maintenance personnel to distinguish an alarm storm, saves labor cost, and can quickly lock root cause alarm, thereby quickly solving the problem before a larger accident occurs. According to the method, through an unsupervised intelligent noise reduction processing mode, firstly, the relevance among alarms is effectively utilized to perform unsupervised clustering on the alarms, then, the business rules and the expert experience are used to perform root cause analysis, and finally, the feedback experience of operation and maintenance personnel is automatically absorbed, so that the algorithm is more accurate along with the use of the operation and maintenance personnel, the purposes that the operation and maintenance personnel can focus on the problems and solve the problems are achieved, and meanwhile, the method has the property of self-learning improvement.

Description

Unsupervised intelligent noise reduction processing method
Technical Field
The invention relates to the technical field of AIOPS intelligent operation and maintenance systems, in particular to an unsupervised intelligent noise reduction processing method.
Background
Traditional operation and maintenance face a single structure, such as deploying application services on one server, because a person only needs to log in to check whether the application services are normal to judge whether the current services are normal. With the development of big data, operation and maintenance personnel often face a plurality of associated operation and maintenance service objects, and therefore the operation and maintenance tools are used for establishing the monitoring of each operation and maintenance service object. Because certain association exists among the operation and maintenance service objects, when one service has a problem, the alarm of other services can be caused, and an alarm storm is formed when the number of alarms is large in a short time. When an alarm storm comes, operation and maintenance personnel often need to spend a large amount of time for extracting effective information. Sometimes, the information is inundated by the alarm due to too many alarms, so that the effective information cannot be extracted, and the operation difficulty of the operation and maintenance personnel is increased.
The existing method of establishing a statistical mathematical model through historical data predicts the threshold value of future alarm. And when the alarm quantity of the actual situation exceeds a threshold value, actively using Apriori or adopting a policy association rule to carry out alarm combination and sending the alarm combination to the same operation and maintenance personnel. The method has certain intelligent noise reduction value, but has certain defects in specific application scenes:
1. due to the lack of alarm root cause positioning, in an actual scene, operation and maintenance personnel pay more attention to root cause alarm in an alarm storm, and the alarm storm can be solved by solving the root cause alarm;
2. the algorithm model does not have self-adaptive capacity, the alarm is compressed by depending on historical data, the experience of operation and maintenance personnel in processing the alarm storm every time subsequently cannot be absorbed, and the model lacks self-adaptive capacity.
Accordingly, the prior art is deficient and needs improvement.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an unsupervised intelligent noise reduction processing method.
The technical scheme of the invention is as follows: the method for unsupervised intelligent noise reduction processing is provided, and comprises the following steps:
step 1: generating a model, and sequentially carrying out data requirement, data preprocessing, feature processing, category generation and alarm importance generation on the model, and initial training of five modules of intelligent noise reduction and root cause output;
step 2: updating the relation matrix and the importance matrix on the basis of initial training, absorbing information contained in a newly added sample, and performing incremental training;
and step 3: processing experience of operation and maintenance personnel on the algorithm result is used as user feedback information, the user feedback information is fed back to the database, and iteration is carried out by using the user feedback information as newly added data;
and 4, step 4: and processing the alarm storm according to training and iterative learning.
Further, the alarm data in the data requirement in the step 1 is collected by the cloud platform and stored in the database.
Further, the data preprocessing in step 1 specifically comprises the following steps:
according to the business knowledge, the 'NAME +' | '+ IP _ ADDRESS' is used as an alarm main key, missing values and abnormal values are processed at the same time, three methods of supplementing, discarding and retaining are used for the missing values, and the abnormal values are directly discarded.
Further, the specific steps of the feature processing in step 1 are as follows:
converting the alarm attribute to form a useful feature; the alarm attribute is divided into a time attribute and a non-time attribute, and a corresponding conversion method is adopted according to the difference of the alarm attribute.
Further, for the time attribute, the specific processing manner is as follows:
a. extracting the maximum time and the minimum time in all samples;
b. the minimum time is used as the start, the maximum time is used as the betweenness, the division is carried out by taking the step length every minute, if the alarm survival time period contains the time point, the mark is '1', and if the alarm survival time period contains the time point, the mark is '0';
further, for non-temporal attributes, if the two alarms are the same tenant, a 1.0 flag is used to have a relationship; if the two alarms are not of the same tenant, 0 is used for marking, and therefore a relation matrix is obtained.
Further, the specific steps of the category generation and the alarm importance generation in step 1 are as follows:
after the feature processing is completed, obtaining the vector expression of the alarm, and carrying out cluster analysis on the basis of the vector expression; and screening n alarms closest to each alarm by using a KNN algorithm based on cosine distance to form a link, and setting a threshold value for filtering the link without great correlation. And taking each alarm as a node, and taking the contact as an edge to form a graph, wherein each connected subgraph in the graph is a category.
Further, the intelligent noise reduction and root cause output in step 1 specifically comprises the following steps:
calculating corresponding confidence degrees according to the classification results and each alarm importance table;
confidence coefficient ═ (time correlation coefficient +1)/4+ (master alarm importance-slave alarm importance)/2 × (maximum importance value within class-minimum importance value within class).
Further, the update rule in the incremental training of step 2 is as follows:
if a new alarm occurs, the similarity of the new alarm with the data support can be directly calculated and filled, the similarity of the new alarm without the data support is directly supplemented with 0, and the importance matrixes are the same;
if the alarm is an original alarm, the new similarity of the original alarm is calculated and averaged with the original similarity, and the importance matrix is processed in the same way.
By adopting the scheme, the method and the system have the advantages that through an unsupervised intelligent noise reduction processing mode, firstly, the relevance among alarms is effectively utilized to perform unsupervised clustering on the alarms, then, the business rules and the expert experience are used to perform root cause analysis, and finally, the feedback experience of operation and maintenance personnel is automatically absorbed, so that the algorithm is more accurate along with the use of the operation and maintenance personnel, the purposes that the operation and maintenance personnel can focus on the problems and solve the problems are achieved, and meanwhile, the method and the system have the property of self-learning improvement.
Drawings
FIG. 1 is a block flow diagram of the present invention.
Fig. 2 is an initial training flowchart.
Fig. 3 is a flow chart of incremental training.
Fig. 4 is a flow chart of user feedback iteration.
Detailed Description
The invention is described in detail below with reference to the figures and the specific embodiments.
Referring to fig. 1, the present invention provides an unsupervised intelligent noise reduction processing method, which includes the following steps:
step 1: referring to fig. 2, a model is generated, and the model is subjected to data requirement, data preprocessing, feature processing, category generation, alarm importance generation, and initial training of five modules for intelligent noise reduction and root cause output in sequence.
Data requirements (database):
and the alarm data is collected by the cloud platform and stored in the database. In this embodiment, the database includes the following attributes: NAME (event NAME), CREATE _ TIME (creation TIME), receiver _ TIME (recovery TIME), CLOSE _ TIME (closing TIME), IP _ ADDRESS (IP ADDRESS), age (warning source), ower (tenant), company ID.
Data preprocessing:
the alarm main key is an identity representation of an alarm, and takes NAME + '|' + IP _ ADDRESS as the alarm main key according to business knowledge, and simultaneously processes missing values and abnormal values, wherein three methods of supplementing, discarding and retaining are used for the missing values, and the abnormal values are directly discarded.
1. For the missing of the main key and CREATE _ TIME, directly discarding the alarm sample;
2. for the loss of CLOSE _ TIME, complement _ TIME is adopted for supplement, and if the complement _ TIME is also lost, the CLOSE _ TIME is discarded;
3. for the missing of other fields, the record is reserved and is not processed;
4. and directly discarding the abnormal service alarm data, such as creation time > closing time.
Feature processing:
converting the alarm attribute to form a useful feature; the alarm attribute is divided into a time attribute and a non-time attribute, and a corresponding conversion method is adopted according to the difference of the alarm attribute.
For the time attribute, the specific processing mode is as follows:
a. extracting the maximum time and the minimum time in all samples;
b. the minimum time is used as the start, the maximum time is used as the intermediate position, the division is carried out by taking the step length per minute, if the alarm survival time period contains the time point, the mark is '1', and if the other time points contain the time point, the mark is '0'.
TABLE 1 alarm example
ID CREATE_TIME CLOSE_TIME
A 2021-08-11 16:10:37 2021-08-09 16:13:37
B 2021-08-09 16:12:37 2021-08-09 16:18:37
Sample features of alarm generation are as follows (date markers omitted):
TABLE 2 alarm time characteristics Table
ID 16:10 16:11 16:12 16:13 16:14 16:15 16:16 16:17 16:18 16:19 16:20
A 1 1 1 1 0 0 0 0 0 0 0
B 0 0 1 1 1 1 1 1 1 0 0
Based on the data of the alarm time profile table of table 2, a time correlation coefficient matrix is generated, assuming that the time correlation coefficient matrix (named coe _ time) is as follows:
TABLE 3 time correlation coefficient Table (coe _ time)
Figure BDA0003275844540000051
Figure BDA0003275844540000061
For feature extraction of non-temporal attributes, such as a tenant field, if two alarms are the same tenant, a relationship is marked, and the relationship is represented by 1.0, and otherwise, the relationship is represented by 0, so that the following tenant relationship matrix can be obtained.
Table 4 tenant relationship table
A B C D
A 1.0 1.0 0 1.0
B 1.0 1.0 0 0
C 0 0 1.0 1.0
D 1.0 0 1.0 1.0
Similarly, the warning source, company and network segment can also derive corresponding relationship matrices, and it is assumed that these relationship matrices are consistent with the relationship matrices of the tenants.
Category generation and alarm importance generation (generating coefficient matrix, KNN, connected subgraph, generating alarm importance matrix, generating alarm importance):
and after the characteristic processing is finished, obtaining the vector expression of the alarm, and carrying out cluster analysis on the basis of the vector expression. Because of unsupervised training, the distance is preferably defined, and any two element values have no equivalence in vector expression, the distance between two alarms is calculated by adopting cosine distance. The invention adopts a KNN algorithm based on cosine distance to screen out n alarms nearest to each alarm, thereby forming a connection, and simultaneously setting a threshold value to filter the connection without great correlation. And taking each alarm as a node, and taking the contact as an edge to form a graph, wherein each connected subgraph in the graph is a category.
The specific exemplary steps in this embodiment are as follows:
1. merging the characteristics of various relational tables:
all relationships for alarms are as follows:
TABLE 5 various alarm relationship tables
Categories Temporal characteristics Tenant characteristics Source of alarm feature Company characteristics Network segment characteristics
A (1.0,0.1,0.1,0.1) (1.0,1.0,0,1.0) (1.0,1.0,0,1.0) (1.0,1.0,0,1.0) (1.0,1.0,0,1.0)
B (0.1,1.0,0.7,0.6) (1.0,1.0,0,0) (1.0,1.0,0,0) (1.0,1.0,0,0) (1.0,1.0,0,0)
C (0.5,0.7,1.0,0.4) (0,0,1.0,1.0) (0,0,1.0,1.0) (0,0,1.0,1.0) (0,0,1.0,1.0)
D (0.8,0.6,0.4,1.0) (1.0,0,1.0,1.0) (1.0,0,1.0,1.0) (1.0,0,1.0,1.0) (1.0,0,1.0,1.0)
The vector of alarms is expressed as:
TABLE 6 alarm vector expression table
Categories Feature(s)
A (1.0,0.1,0.1,0.1,1.0,1.0,0,1.0,1.0,1.0,0,1.0,1.0,1.0,0,1.0,1.0,1.0,0,1.0)
B (0.1,1.0,0.7,0.6,1.0,1.0,0,0,1.0,1.0,0,0,1.0,1.0,0,0,1.0,1.0,0,0)
C (0.5,0.7,1.0,0.4,0,0,1.0,1.0,0,0,1.0,1.0,0,0,1.0,1.0,0,0,1.0,1.0)
D (0.8,0.6,0.4,1.0,1.0,0,1.0,1.0,1.0,0,1.0,1.0,1.0,0,1.0,1.0,1.0,0,1.0,1.0)
2. And returning n alarms with the nearest distance for each alarm by using a KNN unsupervised clustering algorithm based on cosine distance:
the original cosine distance formula is:
Figure BDA0003275844540000071
KNN uses 1 minus the above formula as the distance. Here, if n is set to 2, the nearest 2 alarms and distances are as follows:
table 72 nearest distance alarm tables
Alarm system Two recent alerts Corresponding distance
A (A,B) (0,0.2650897)
B (B,A) (0,0.2650897)
C (C,D) (0,0.18749536)
D (D,C) (0,0.18749536)
Since the set of nodes whose distance is to be calculated contains itself, the closest node is itself, the distance is 0, and assuming that the threshold is 0.2, i.e. all links whose distance is greater than 0.2 are discarded (the smaller the distance is, the more similar), according to the above table, the relationship between a and B can be discarded, i.e. only the link between D and C remains. Four nodes A, B, C, D are used as nodes of the graph, and (D, C) are used as edges to be input into the graph, and a connected subgraph is solved to obtain A, B and (C, D) three connected subgraphs.
That is, the alarms are classified, and the alarms in the class need to be sorted according to importance so as to confirm the master-slave relationship.
3. Most of the historical root cause alarms have shorter survival time than other alarms, because other alarms are caused by root cause alarms, and the solution of the root cause alarms can solve other alarms
Rule one is as follows: if the alarm time period A is the proper subset of the alarm time period B, the alarm importance of the alarm time period A is determined to be greater than the alarm importance of the alarm time period B, important [ A ] [ B ] ═ 2 is given, important [ B ] [ A ] ═ 2(important is an importance matrix)
Rule two: if the alarm time period A is equal to the alarm time period B, the alarm importance of AB is determined to be the same, and important [ A ] [ B ] is given as 1, and important [ B ] [ A ] is given as 1
Rule three: if A, B alarm time is not above two relations, it is determined that AB alarm can not distinguish importance, and important [ A ] [ B ] is given 0, and important [ B ] [ A ] is given 0
According to the above rules, in combination with the alarm time profile table of table 2, it is assumed that the importance matrix is formed as follows:
TABLE 8 rule importance Table
A B C D
A 1 2 0 1
B -2 1 2 0
C 0 -2 1 1
D 1 0 1 1
Forming a rule importance summary table for each alarm according to the rule importance table of Table 8
TABLE 9 rule importance summary sheet
Alarm system A B C D
Rule importance summarization 4 1 0 3
Because the influence caused by different alarms is different, for example, the influence degrees of machine downtime and abnormal service are inconsistent, the alarm classification method and the alarm classification device are combined with the Weiying cloud to give different weights to the alarm classification.
TABLE 10 alarm level importance Table
Figure BDA0003275844540000091
Figure BDA0003275844540000101
Assume that the categories of alerts A, B, C, D are as follows:
table 11 importance table for each alarm level
Alarm system Rank of Score value
A 0-recovery event 0
B 1-information level events 1
C 3-failure level event 3
D 4-Accident level time 4
In conjunction with the summary of the importance of the rules in Table 9 and the importance of each alarm level in Table 11, the final importance of each alarm table can be obtained:
table 12 importance of each alarm table
Alarm system A B C D
Importance of 4 2 3 7
Intelligent noise reduction and root cause output (master-slave alarm):
calculating corresponding confidence degrees according to the classification results and each alarm importance table;
confidence coefficient ═ (time correlation coefficient +1)/4+ (master alarm importance-slave alarm importance)/2 × (maximum importance value within class-minimum importance value within class).
In special cases: if the maximum importance value in the class is the same as the minimum importance value in the class, setting the maximum importance value in the class as 0; the confidence of a single class is set directly to 1.
The final output results obtained are:
table 13 master-slave alarm output table
Main alarm Slave alerts Importance of
A 100.00%
B 100.00%
D C 85.00%
When actually reporting an emergency and asking for help or increased vigilance to the fortune dimension personnel, report an emergency and ask for help or increased vigilance the coincidence when meetting D and C, can only send D to fortune dimension personnel and report an emergency and ask for help or increased vigilance to reach the intelligence and fall the purpose of making an uproar, report an emergency and ask for help or increased vigilance C and merge D simultaneously and report an emergency and ask for help or increased vigilance during, fortune dimension personnel only need solve D and report an emergency and ask for help or increased vigilance, just can solve C in step and report an emergency and increased vigilance.
Step 2: referring to fig. 3, the relationship matrix and the importance matrix are updated based on the initial training, and the information contained in the newly added sample is extracted for incremental training. The incremental training route is consistent with the initial training basic route, data is sequentially subjected to preprocessing, feature processing, model training and result output, the change part is mainly the updating of the relationship matrix and the importance matrix, the updating is mainly aimed at absorbing information contained in a newly-added sample, for example, the A and B in the new sample are more intimate and the master and slave are more clear, the model iteration process needs to iterate in the direction of the relationship closeness and the master and slave clearness, and considering that only the time relationship matrix has adjustability, the original intention can be influenced by the adjustment of other attribute relationship matrices, so that the updating rule (only used for the time relationship matrix) is set as follows:
1. if a new alarm occurs, the similarity of the new alarm with the data support can be directly calculated and filled, the similarity of the new alarm without the data support is directly supplemented with 0, and the importance matrixes are the same;
2. if the alarm is an original alarm, the new similarity of the original alarm is calculated and averaged with the original similarity, and the importance matrix is processed in the same way.
In this embodiment, the original time correlation coefficient matrix
Table 14 original time correlation coefficient table
A B C D
A 1.0 0.1 0.1 0.1
B 0.1 1.0 0.7 0.6
C 0.5 0.7 1.0 0.4
D 0.8 0.6 0.4 1.0
The new alarm forms a time relation matrix as follows:
table 15 new time correlation coefficient table
A B E
A 1 0.5 0.5
B 0.5 1 0.7
E 0.5 0.7 1
The updated time relationship matrix is:
TABLE 16 Final Table of time correlation coefficients
Figure BDA0003275844540000121
Figure BDA0003275844540000131
The updating of the importance matrix is consistent with the updating of the time correlation coefficient matrix, and the final output is operated according to the subsequent flow in the initial training chart, so that the relationship and the confidence coefficient between the updated alarms can be obtained.
And step 3: referring to fig. 4, the processing experience of the operation and maintenance staff on the algorithm result is used as the user feedback information, which is fed back to the database, and the user feedback information is used as the new data for iteration.
Because the feedback of the operation and maintenance personnel is absorbed, compared with incremental training, two steps of data preprocessing and feature processing are not needed. The algorithm mainly updates the time correlation coefficient matrix and the importance matrix. The user has two results for a master-slave relationship determination: in favor of or against, the model is iterated in different directions according to different results, so the update rule (applied to the time relation matrix) is as follows:
assume a master-B slave.
1. When the time correlation coefficient coe [ A ] [ B ] and coe [ B ] [ A ] are increased by 0.1 in favor of the time correlation coefficient, the time correlation coefficient is ensured to fall between [ -1,1], and only an extreme value is taken when the time correlation coefficient exceeds the interval; the importance coefficient important [ A ] [ B ] is promoted by 2, and the important [ B ] [ A ] is reduced by 2;
2. when in reverse time, coe A and coe B are reduced by 0.1, and the situation that the coe A and coe B fall between-1 and 1 is guaranteed, and only an extreme value is taken when the exceeding interval is exceeded; the importance coefficient important [ B ] [ A ] is raised by 2, and important [ A ] [ B ] is lowered by 2. In this embodiment, the specific steps are as follows:
original time relation matrix:
TABLE 17 original time correlation coefficient table
Figure BDA0003275844540000132
Figure BDA0003275844540000141
Assuming that the A master B slave is approved and the C master D slave is disapproved, update to:
TABLE 18 Final Table of time correlation coefficients
A B C D
A 1.0 0.2 0.1 0.1
B 0.2 1.0 0.7 0.6
C 0.5 0.7 1.0 0.3
D 0.8 0.6 0.3 1.0
And calculating the values before and after the change, wherein the AB distance is reduced (representing intimacy), the CD distance is increased (representing distancing), and the service requirement is met.
Importance matrix:
TABLE 19 original rule importance Table
A B C D
A 1 2 0 1
B -2 1 2 0
C 0 -2 1 1
D 1 0 1 1
Assuming that the A master B slave is approved and the C master D slave is disapproved, update to:
TABLE 20 final table of rule importance
A B C D
A 1 4 0 1
B -4 1 2 0
C 0 -2 1 -1
D 1 0 3 1
Calculating the values before and after the change, wherein in the AB, the importance of A is increased, and the importance of B is reduced; in the CD, the importance of C is reduced, and the importance of D is increased, so that the CD meets the service requirements. And finally, outputting to operate according to the subsequent flow in the initial training chart, and obtaining the relationship and the confidence coefficient between the updated alarms.
And 4, step 4: and processing the alarm storm according to training and iterative learning.
The method can learn the characteristics of the samples under the condition that the samples are not marked, classify the samples, analyze the importance of the samples and lock the root cause alarm, and can absorb the operation experience of operation and maintenance personnel to perform continuous optimization iteration so as to improve the prediction accuracy of the algorithm.
In summary, in the invention, through an unsupervised intelligent noise reduction processing mode, firstly, the relevance between alarms is effectively utilized to perform unsupervised clustering of the alarms, then, the business rules and the expert experience are used to perform root cause analysis, and finally, the feedback experience of the operation and maintenance personnel is automatically absorbed, so that the algorithm is more accurate along with the use of the operation and maintenance personnel, the purposes that the operation and maintenance personnel can focus on the problems and solve the problems are achieved, and meanwhile, the unsupervised intelligent noise reduction processing mode has the property of self-learning improvement.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. An unsupervised intelligent noise reduction processing method is characterized by comprising the following steps:
step 1: generating a model, and sequentially carrying out data requirement, data preprocessing, feature processing, category generation and alarm importance generation on the model, and initial training of five modules of intelligent noise reduction and root cause output;
step 2: updating the relation matrix and the importance matrix on the basis of initial training, absorbing information contained in a newly added sample, and performing incremental training;
and step 3: processing experience of operation and maintenance personnel on the algorithm result is used as user feedback information, the user feedback information is fed back to the database, and iteration is carried out by using the user feedback information as newly added data;
and 4, step 4: and processing the alarm storm according to training and iterative learning.
2. The unsupervised intelligent noise reduction processing method according to claim 1, wherein the alarm data in the data requirement in step 1 is collected by a cloud platform and stored in a database.
3. The unsupervised intelligent noise reduction processing method according to claim 1, wherein the data preprocessing in step 1 comprises the following specific steps:
according to the business knowledge, the 'NAME +' | '+ IP _ ADDRESS' is used as an alarm main key, missing values and abnormal values are processed at the same time, three methods of supplementing, discarding and retaining are used for the missing values, and the abnormal values are directly discarded.
4. The unsupervised intelligent noise reduction processing method according to claim 1, wherein the specific steps of the feature processing in the step 1 are as follows:
converting the alarm attribute to form a useful feature; the alarm attribute is divided into a time attribute and a non-time attribute, and a corresponding conversion method is adopted according to the difference of the alarm attribute.
5. The unsupervised intelligent noise reduction processing method according to claim 4, wherein specific processing modes are as follows for time attributes:
a. extracting the maximum time and the minimum time in all samples;
b. the minimum time is used as the start, the maximum time is used as the betweenness, the division is carried out by taking the step length every minute, if the alarm survival time period contains the time point, the mark is '1', and if the alarm survival time period contains the time point, the mark is '0';
6. the unsupervised intelligent denoising processing method of claim 4, wherein for non-temporal attributes, if two alarms are the same tenant, a 1.0 flag is used to have a relationship; if the two alarms are not of the same tenant, 0 is used for marking, and therefore a relation matrix is obtained.
7. The unsupervised intelligent noise reduction processing method according to claim 1, wherein the specific steps of the category generation and the alarm importance generation in the step 1 are as follows:
after the feature processing is completed, obtaining the vector expression of the alarm, and carrying out cluster analysis on the basis of the vector expression; and screening n alarms closest to each alarm by using a KNN algorithm based on cosine distance to form a link, and setting a threshold value for filtering the link without great correlation. And taking each alarm as a node, and taking the contact as an edge to form a graph, wherein each connected subgraph in the graph is a category.
8. The unsupervised intelligent denoising processing method according to claim 1, wherein the intelligent denoising and root cause output in step 1 comprises the following specific steps:
calculating corresponding confidence degrees according to the classification results and each alarm importance table;
confidence coefficient ═ (time correlation coefficient +1)/4+ (master alarm importance-slave alarm importance)/2 × (maximum importance value within class-minimum importance value within class).
9. The unsupervised intelligent noise reduction processing method according to claim 1, wherein the update rule in the incremental training of step 2 is as follows:
if a new alarm occurs, the similarity of the new alarm with the data support can be directly calculated and filled, the similarity of the new alarm without the data support is directly supplemented with 0, and the importance matrixes are the same;
if the alarm is an original alarm, the new similarity of the original alarm is calculated and averaged with the original similarity, and the importance matrix is processed in the same way.
CN202111117474.0A 2021-09-23 2021-09-23 Unsupervised intelligent noise reduction processing method Active CN113806180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111117474.0A CN113806180B (en) 2021-09-23 2021-09-23 Unsupervised intelligent noise reduction processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111117474.0A CN113806180B (en) 2021-09-23 2021-09-23 Unsupervised intelligent noise reduction processing method

Publications (2)

Publication Number Publication Date
CN113806180A true CN113806180A (en) 2021-12-17
CN113806180B CN113806180B (en) 2022-08-12

Family

ID=78940118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111117474.0A Active CN113806180B (en) 2021-09-23 2021-09-23 Unsupervised intelligent noise reduction processing method

Country Status (1)

Country Link
CN (1) CN113806180B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664374A (en) * 2018-05-17 2018-10-16 腾讯科技(深圳)有限公司 Fault warning model creation method, apparatus, fault alarming method and device
CN109460396A (en) * 2018-10-12 2019-03-12 中国平安人寿保险股份有限公司 Model treatment method and device, storage medium and electronic equipment
CN113259379A (en) * 2021-06-15 2021-08-13 中国航空油料集团有限公司 Abnormal alarm identification method, device, server and storage medium based on incremental learning
CN113328869A (en) * 2020-02-28 2021-08-31 华为技术有限公司 Alarm aggregation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664374A (en) * 2018-05-17 2018-10-16 腾讯科技(深圳)有限公司 Fault warning model creation method, apparatus, fault alarming method and device
CN109460396A (en) * 2018-10-12 2019-03-12 中国平安人寿保险股份有限公司 Model treatment method and device, storage medium and electronic equipment
CN113328869A (en) * 2020-02-28 2021-08-31 华为技术有限公司 Alarm aggregation method and device
CN113259379A (en) * 2021-06-15 2021-08-13 中国航空油料集团有限公司 Abnormal alarm identification method, device, server and storage medium based on incremental learning

Also Published As

Publication number Publication date
CN113806180B (en) 2022-08-12

Similar Documents

Publication Publication Date Title
CN110609759B (en) Fault root cause analysis method and device
CN108415789B (en) Node fault prediction system and method for large-scale hybrid heterogeneous storage system
CN103761173A (en) Log based computer system fault diagnosis method and device
CN111885040A (en) Distributed network situation perception method, system, server and node equipment
EP3968243A1 (en) Method and apparatus for realizing model training, and computer storage medium
CN108470022B (en) Intelligent work order quality inspection method based on operation and maintenance management
CN113497726B (en) Alarm monitoring method, alarm monitoring system, computer readable storage medium and electronic equipment
CN112367303B (en) Distributed self-learning abnormal flow collaborative detection method and system
CN112217674B (en) Alarm root cause identification method based on causal network mining and graph attention network
CN109547251B (en) Service system fault and performance prediction method based on monitoring data
Al-Janabi A proposed framework for analyzing crime data set using decision tree and simple k-means mining algorithms
CN113723452A (en) Large-scale anomaly detection system based on KPI clustering
CN114024829A (en) Fault repairing method, device, equipment and storage medium of power communication network
CN112085869A (en) Civil aircraft flight safety analysis method based on flight parameter data
CN113268370B (en) Root cause alarm analysis method, system, equipment and storage medium
CN114090393A (en) Method, device and equipment for determining alarm level
CN112199805B (en) Power transmission line hidden danger identification model evaluation method and device
CN113806180B (en) Unsupervised intelligent noise reduction processing method
CN111694957B (en) Method, equipment and storage medium for classifying problem sheets based on graph neural network
CN113824575B (en) Method and device for identifying fault node, computing equipment and computer storage medium
CN109635008B (en) Equipment fault detection method based on machine learning
CN116545867A (en) Method and device for monitoring abnormal performance index of network element of communication network
CN116541166A (en) Super-computing power scheduling server and resource management method
CN108521346B (en) Method for positioning abnormal nodes of telecommunication bearer network based on terminal data
CN115514627A (en) Fault root cause positioning method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant