CN116319255A - Root cause positioning method, device, equipment and storage medium based on KPI - Google Patents

Root cause positioning method, device, equipment and storage medium based on KPI Download PDF

Info

Publication number
CN116319255A
CN116319255A CN202310002419.XA CN202310002419A CN116319255A CN 116319255 A CN116319255 A CN 116319255A CN 202310002419 A CN202310002419 A CN 202310002419A CN 116319255 A CN116319255 A CN 116319255A
Authority
CN
China
Prior art keywords
kpi
kpis
determining
counter
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310002419.XA
Other languages
Chinese (zh)
Inventor
杨仁凤
李月平
张安国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruijie Networks Co Ltd
Original Assignee
Ruijie Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruijie Networks Co Ltd filed Critical Ruijie Networks Co Ltd
Priority to CN202310002419.XA priority Critical patent/CN116319255A/en
Publication of CN116319255A publication Critical patent/CN116319255A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition

Abstract

The application provides a root cause positioning method, a root cause positioning device, root cause positioning equipment and a storage medium based on KPIs, wherein the method comprises the following steps: aiming at each target cell, calculating the association degree according to sampling values of two different KPIs in the same time period, and determining the KPI with association; determining the contribution degree of each influence factor to the KPI based on sampling values of different moments of each KPI and each influence factor of the KPI; determining the importance of each KPI based on different first importance evaluation indexes of the KPI; and determining the fault influence of any KPI and the root cause of the fault based on the importance of the KPI, the association degree corresponding to other KPIs associated with the KPI and the contribution degree of each influence factor of the KPI. The method and the device can efficiently and accurately locate the faults, reduce the fault recovery time, improve the operation and maintenance efficiency, improve the user experience and the like.

Description

Root cause positioning method, device, equipment and storage medium based on KPI
Technical Field
The present disclosure relates to the field of AI application of communication technologies, and in particular, to a root cause positioning method, apparatus, device and storage medium based on KPI.
Background
The operation and maintenance personnel monitor various key performance indexes (English: key Performance Indicator, abbreviated as KPI) to locate the KPI generating the problem and the subdivision reason of the KPI, then solve the problem and restore service. Therefore, accurate and efficient root cause positioning is very important to the aspects of improving operation and maintenance efficiency, user experience and the like.
The existing KPI root cause positioning method is summarized into three main categories: first, the problem needs to be manually searched by operation and maintenance personnel, which is very time-consuming, and the accuracy depends on the experience of the operation and maintenance personnel, so that the unified level is difficult to achieve; secondly, root cause positioning is performed by constructing a fault propagation diagram, but the information required by constructing the fault propagation diagram is insufficient at present, and the fault propagation diagram can be changed along with the change of service, so that difficulty exists in maintenance; and thirdly, locating the root cause of the fault by searching pruning anomaly detection method from top to bottom or from bottom to top, wherein when the method is actually applied, once a certain link is misjudged, the calculated amount of locating the root cause of the fault is greatly increased, and the locating performance is poor.
However, in an actual 5G scene, since the number of KPIs increases rapidly and has no periodicity and no obvious regular features, the existing KPI root positioning method cannot well solve the problem of KPI root positioning in the 5G scene, so that a method is needed to accurately and efficiently position the KPI generating the problem even in the 5G scene, reduce the fault recovery time, improve the operation and maintenance efficiency, improve the user experience and the like.
Disclosure of Invention
According to the root cause positioning method, device, equipment and storage medium based on the KPI, faults can be rapidly and efficiently and accurately positioned, specific KPIs and root causes of the faults can be found out, fault recovery time is shortened, operation and maintenance efficiency and user experience are improved, and the like.
In a first aspect, the present application provides a root cause positioning method based on KPIs, including:
aiming at all KPIs of each target cell, calculating the association degree between every two different KPIs according to sampling values of the two different KPIs in the same time period, and determining the associated KPI;
determining the contribution degree of each counter to the KPI based on sampling values of different moments of each KPI and each influence factor counter affecting the KPI at the corresponding moment; determining the importance of each KPI of each target cell based on different first importance evaluation indexes of the KPI;
determining fault influence of any KPI based on importance of the KPI, association degrees corresponding to other KPIs associated with the KPI and contribution degrees of counters affecting the KPI aiming at any target cell;
the root cause of the fault is determined based on the fault impact of the KPIs of the target cells.
In one or more possible embodiments, determining a root cause of a failure based on failure impact of KPIs of respective target cells includes:
sequencing according to the fault influence of the KPIs of each target cell, and determining the first n KPIs with larger fault influence;
and sequencing the contribution degree of the counters of each KPI in the first n KPIs, and determining the first m counters with larger contribution degree of each KPI as the root cause of the fault, wherein m and n are positive integers.
In one or more possible embodiments, further comprising:
determining the importance of each counter based on different second importance evaluation indexes of each counter of each target cell;
for any target cell, determining the fault influence of any counter based on the importance of the counter and the contribution degree of the counter to all KPIs influenced by the counter.
In one or more possible embodiments, mapping each KPI and counter to a node, establishing a connection between nodes corresponding to two KPIs with an association relationship, and a connection between nodes corresponding to KPIs with an influence relationship and counters, and determining a fault influence of the KPI/a fault influence of the counter, which specifically includes:
for any node, determining the influence of the node according to the importance of the node and a first weight factor, and determining the influence given to the node by other nodes connected with the node according to the association degree/contribution degree and a second weight factor respectively corresponding to all other nodes connected with the node;
Determining the fault influence of the node according to the influence of the node and the influence given to the node by other nodes connected with the node;
the first weight factor is the reciprocal of the number of all nodes, and the second weight factor is the fault influence of other nodes connected with the node. In one or more possible embodiments, further comprising:
mapping each KPI and counter into nodes respectively; connecting nodes corresponding to two KPIs with association relation through a first edge, wherein the weight of the first edge is the association degree between the two KPIs;
and connecting the nodes corresponding to the KPIs and the counters with influence relations through a second edge to obtain a KPI association propagation diagram, wherein the weight of the second edge is the contribution degree of the counters and the KPIs.
In one or more possible embodiments, further comprising:
acquiring at least one key performance index KPI corresponding to each of a plurality of cells;
when determining that any acquired key KPI has a preset threshold, comparing the key KPI with a corresponding preset threshold, and determining whether the key KPI is abnormal;
when any acquired key KPI is determined to have no preset threshold value, predicting whether the key KPI is abnormal or not by adopting a probability statistical model; and screening out target cells with abnormal key KPIs.
In one or more possible embodiments, predicting whether the key KPI is abnormal using a probabilistic statistical model includes:
inputting all sampling values of any key KPI without a preset threshold value into a probability statistical model; the probability statistical model is utilized to aggregate all sampling values into two classifications, and abnormal classifications are determined according to the number of sampling values of each classification;
and determining whether the key KPI is abnormal according to the classification of the current sampling value of the key KPI.
In one or more possible embodiments, further comprising: and carrying out accuracy evaluation on the probability statistical model, and determining whether the key KPI is abnormal according to the accuracy evaluation result of the probability model.
In one or more possible embodiments, determining the contribution of each counter to the KPI using a linear fitting method includes:
taking each KPI at a plurality of moments as a vector y, and taking each counter affecting the KPI at the corresponding moment as a vector x to obtain a plurality of discrete points;
and performing linear fitting on the discrete points by using a partial least square method to obtain a vector k in a linear regression curve y=kx, and determining the vector k as the contribution degree of the counter to the KPI.
In one or more possible embodiments, the first importance rating index includes any one or more of:
whether a key KPI is, whether an abnormal KPI is, a first initial weight set for different non-key KPIs, respectively.
In one or more possible embodiments, the second important evaluation index is a second initial weight set for different counters, or a weight of each counter is calculated according to whether KPIs related to each counter are abnormal.
In one or more possible embodiments, the correlation degree between every two different KPIs is calculated through a normalized cross-correlation method, and the associated KPIs are determined, which specifically includes:
and calculating the association degree between every two different KPIs by a normalized cross-correlation method, and determining two KPIs with association degree larger than a set threshold as associated KPIs.
In one or more possible embodiments, the key KPIs include any one or more of the following:
radio access success rate, user terminal UE context establishment success rate, infinite resource control layer RRC connection establishment success rate, and infinite resource control layer RRC reestablishment success rate.
In a second aspect, the present application further provides a root cause positioning device based on KPIs, including:
The association degree determining module is used for calculating association degree between every two different KPIs according to sampling values of the two different KPIs at the same time aiming at all KPIs of each target cell, and determining associated KPIs;
the contribution degree determining module is used for determining the contribution degree of each counter to the KPI according to sampling values of different moments of the KPI and each influence factor counter affecting the KPI at the corresponding moment;
the importance determining module is used for determining the importance of each KPI of each target cell based on different first importance evaluation indexes of the KPI;
the influence determining module is used for determining fault influence of any KPI aiming at any target cell based on importance of the KPI, association degrees corresponding to other KPIs associated with the KPI and contribution degrees of counters affecting the KPI;
and the fault determining module is used for determining the root cause of the fault based on the fault influence of the KPIs of the target cells.
In a third aspect, the present application further provides a root cause positioning device based on KPIs, the device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a KPI-based root cause localization method as claimed in any of the first aspects.
In a fourth aspect, the present application further provides a storage medium storing a computer program for causing a computer to execute the KPI-based root cause localization method according to any one of the first aspects.
According to the root cause positioning method, device, equipment and storage medium based on the KPI, faults can be rapidly and efficiently and accurately positioned, specific KPIs and root causes of the faults can be found out, fault recovery time is shortened, operation and maintenance efficiency and user experience are improved, and the like.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application and do not constitute an undue limitation on the application.
FIG. 1 is a flow chart provided in accordance with one embodiment of the present application;
FIG. 2 is a flow chart provided in accordance with one embodiment of the present application;
FIG. 3 is a flow chart provided in accordance with one embodiment of the present application;
FIG. 4 is an anomaly schematic diagram of a key KPI provided in accordance with an embodiment of the present application;
FIG. 5 is a KPI linkage propagation chart provided according to one embodiment of the present application;
FIG. 6 is a graph illustrating the relevance of two KPIs provided in accordance with an embodiment of the present application;
FIG. 7 is a schematic diagram of an apparatus provided according to one embodiment of the present application;
FIG. 8 is a schematic diagram of an apparatus provided according to one embodiment of the present application;
fig. 9 is a schematic diagram of a computer storage medium according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the prior art, the fault root cause positioning is generally divided into two main types, the first type is to position the fault root cause by searching pruning anomaly detection methods from top to bottom or from bottom to top, and when the methods are actually applied, once a certain link is misjudged, the calculation amount of the fault root cause positioning is greatly increased, and the positioning performance is poor; moreover, the accuracy of anomaly detection is difficult to ensure, the relevance among KPIs is not considered, a plurality of KPI indexes in a 5G scene do not show periodicity and have no obvious regularity, and it is difficult to predict whether certain points and intervals are abnormal or not through historical trend changes. Some of these methods use predictive rules, which are difficult to contain for the most part. In efficiency, assuming a dataset with 10 KPIs, with 5 dimensional elements below each KPI, there are 5 10 different combinations of power of 10 (9765625), and analysis of these combinations is very time consuming to accomplish manually and is also susceptible to experience. The number of KPIs in a 5G scene is thousands to tens of thousands, each KPI has hundreds of dimension elements, and even if pruning search is performed in a tree mode, the efficiency is low, and the accuracy of anomaly detection is difficult to ensure. The second category is to use knowledge graphs or machine learning to establish causal relation of KPIs, but in practice, the causal relation exists between KPIs, the actual business change is very fast, and the causal relation is difficult to capture, so that the low-efficiency search of a tree mode is overcome, the relevance between KPIs is considered, but the knowledge graphs are limited by the richness of the knowledge graphs, and the achieved effect is not very good.
Based on the above problems, the present application provides a root cause positioning method based on KPIs, as shown in fig. 1, including:
step 101, calculating the association degree between every two different KPIs according to sampling values of the two different KPIs in the same time period aiming at all KPIs of each target cell, and determining the associated KPI.
All the KPIs comprise key KPIs and non-key KPIs, and for sampling values of every two different KPIs of each target cell in the same time period, the association degree between every two different KPIs can be calculated through a normalized cross-correlation method. Since there are KPI sampling values at a plurality of moments in a certain period, the KPI sampling values at a plurality of sampling moments can be synthesized, and the association degree between every two different KPIs is calculated by a normalized cross-correlation method.
Step 102, determining, for each target cell, a contribution degree of each counter to the KPI based on sampling values of different moments of each KPI and each influence factor counter affecting the KPI at the corresponding moment.
For different KPIs, the influence factors counter influencing the KPI may be one or more, one counter may influence one or more KPIs, the KPIs and the counter may be predefined, the KPIs and the counter corresponding to each sampling time are determined for each KPI in a certain period during root cause positioning, so that a discrete point is obtained, and the linear relationship between the KPI and the counter can be determined by using a linear fitting method due to the linear relationship between the KPI and the counter, so that the contribution degree of each counter to the KPI is determined.
Step 103, determining the importance of each KPI of each target cell based on different first importance evaluation indexes of the KPI.
At least one first importance evaluation index can be predefined for the KPI, and as the KPI can be divided into a key KPI and a non-key KPI, corresponding first importance evaluation indexes can be respectively defined for the key KPI and the non-key KPI; the first importance evaluation index may include any one or more of the following: whether a key KPI is, whether an abnormal KPI is, a first initial weight set for different non-key KPIs, respectively.
Step 104, determining the fault influence of the KPI according to the importance of any KPI, the association degree corresponding to other KPIs associated with the KPI and the contribution degree of each counter affecting the KPI aiming at any target cell.
The importance of the KPI, the relevance of other KPIs associated with the KPI, and the contribution of each counter affecting the KPI are all related to the fault influence of the KPI, for example, the higher the importance of the KPI is, the higher the fault influence of the KPI is, the higher the relevance of the KPI is, the higher the fault influence of the KPI is, the contribution of each counter affecting the KPI is, the higher the fault influence of the KPI is, and therefore, the fault influence of the KPI can be determined by combining the factors of the above aspects.
Step 105, determining the root cause of the fault based on the fault influence of the KPI of each target cell.
According to the method and the system, the influence of the factors is comprehensively considered, the influence of the importance of the KPI on the fault influence of the KPI, the influence of the association degree between other KPIs associated with the KPI on the fault influence of the KPI are included, meanwhile, the contribution degree between the one or more counters and the KPI can also influence the fault influence of the KPI due to the fact that the KPI is provided with the one or more counters, the KPI with relatively large fault influence can be directly positioned, the faults can be rapidly and efficiently and accurately positioned, specific KPIs and root causes of the faults can be found out, fault recovery time is shortened, operation and maintenance efficiency and user experience are improved, and the like.
The embodiment of the application further includes determining a specific implementation manner of each target cell, as shown in fig. 2, including the following steps:
step 201, obtaining at least one key performance index KPI corresponding to each of a plurality of cells;
in one or more possible embodiments, when at least one key performance index KPI corresponding to each of the plurality of cells is obtained, the key KPI and the non-key KPI are determined by defining classification of the KPIs in advance, and meanwhile, the key KPI is also a cause with relatively large influence on faults, so that the efficiency of screening abnormal KPIs can be effectively improved. The key KPIs include any one or more of the following: the method comprises the following steps of wireless access success rate, user Equipment (UE) context establishment success rate, wireless resource control layer (English: radio Resource Control, RRC) connection establishment success rate and wireless resource control layer RRC reestablishment success rate; the key KPIs are typically tens to hundreds, and are defined for classification by an operator or company, including but not limited to any one or more of the above.
The wireless access success rate reflects the UE access quality, the related statistics is executed in a Centralized Unit (CU) RRC layer, and in the statistics period, the wireless access success rate is obtained by multiplying the success rate of establishing the RRC connection by the success rate of establishing the NG interface signaling connection between the wireless access network and the 5G core network and the success rate of establishing an initial service quality Flow (English: quality of Service Flow, abbreviated as QoS Flow); the UE context establishment success rate reflects the UE access quality, the relevant statistics is executed in a CU RRC layer, and the UE context establishment success rate is obtained through the ratio of the UE context establishment success times to the UE context establishment request times in a statistical time period; the RRC connection establishment success rate reflects the UE access quality, the relevant statistics is executed in a CU RRC layer, and the RRC connection establishment success rate is obtained through the ratio of the RRC connection establishment success times to the RRC connection establishment request times in the statistical period; the RRC reestablishment success rate reflects the UE service maintenance quality, the relevant statistics is executed in the CU RRC layer, and the RRC reestablishment success rate is obtained through the ratio of the number of times of RRC reestablishment completion to the number of times of RRC reestablishment request in the statistical period.
Step 202, when determining that any acquired key KPI has a preset threshold, comparing the key KPI with a corresponding preset threshold, and determining whether the key KPI is abnormal.
For some key KPIs, if the value of the key KPI is larger than a preset threshold, judging that the key KPI is abnormal, and for some key KPIs, if the value of the key KPI is smaller than the preset threshold, judging that the key KPI is abnormal.
As shown in fig. 4, a threshold of a certain key KPI is 99, and if the value of the key KPI is less than 99.0, the key KPI is determined to be abnormal.
In one or more possible embodiments, when the determination of whether a certain key KPI is abnormal is performed by using the corresponding preset threshold, it may be selected according to a specific service requirement to comprehensively calculate all sampling values in a certain period of time according to a calculation formula of the corresponding KPI defined by an operator, and determine whether the key KPI is abnormal by using the sampling values of the key KPI in the certain period of time as a whole; or, the sampling value of a certain key KPI at a certain moment can be directly compared with a corresponding preset threshold value to determine whether the key KPI at the current moment is abnormal.
And 203, when any acquired key KPI is determined to have no preset threshold value, predicting whether the key KPI is abnormal by adopting a probability statistical model.
The probability statistical model may classify key KPIs for which a preset threshold does not currently exist based on a large number of data classifications. And aiming at different key KPIs, predicting whether the key KPI is abnormal by adopting a probability statistical model corresponding to the key KPI.
And step 204, screening out target cells with abnormal key KPIs.
For a certain target cell, as long as one abnormal key KPI exists, the target cell is determined as the target cell with the abnormal key KPI.
In one or more possible implementations, determining a root cause of a failure based on failure impact of KPIs of respective target cells includes: sequencing according to the fault influence of the KPIs of each target cell, and determining the first n KPIs with larger fault influence; the first n KPIs are the causes of faults, the contribution degree of the counters of each KPI in the first n KPIs is ordered, the first m counters with larger contribution degree of each KPI are the root causes of faults, the causes of faults of the KPIs are different, and the root causes of the faults can be basically determined according to the contribution degree of each counter to the KPIs; wherein m and n are positive integers.
In one or more possible embodiments, further comprising: determining the importance of each counter based on different second importance evaluation indexes of each counter of each target cell; for any target cell, determining the fault influence of any counter based on the importance of the counter and the contribution degree of the counter to all KPIs influenced by the counter. The second important evaluation degree index is a second initial weight set for different counters, or the weight of each counter is calculated according to whether the KPI with influence of each counter is abnormal or not.
The second initial weight is a set fixed importance, and all the importance of the counter is a set initial value regardless of the relationship with the abnormal KPI; alternatively, the counter is used as a non-critical KPI to calculate the importance, and may be 0.6×0.3+0.4×0.8=0.5 or 0.6×0.3+0.4×0.2=0.26. And determining the fault influence of any counter according to the importance of the counter and the contribution degree of the counter to all KPIs influenced by the counter.
In one or more possible embodiments, mapping each KPI and counter to a node, establishing a connection between nodes corresponding to two KPIs with an association relationship, and a connection between nodes corresponding to KPIs with an influence relationship and counters, and determining a fault influence of the KPI/a fault influence of the counter, which specifically includes: for any node, determining the influence of the node according to the importance of the node and a first weight factor, and determining the influence given to the node by other nodes connected with the node according to the association degree/contribution degree and a second weight factor respectively corresponding to all other nodes connected with the node; determining the fault influence of the node according to the influence of the node and the influence given to the node by other nodes connected with the node; the first weight factor is the reciprocal of the number of all nodes, and the second weight factor is the fault influence of other nodes connected with the node. The specific calculation formula is as follows:
Figure BDA0004034473950000111
NR i The fault influence of the ith node is represented, and N represents the number of nodes. IMT (inertial measurement unit) i Representing the importance of the ith node, C is a constant used to adjust the impact duty cycle that itself and other nodes connected to that node give. Ln (V) i ) Represents the set of all pointing nodes i, out (V j ) Representing the set of nodes j pointing to other nodes. W (W) ji Representing the degree of association or contribution of node j to node i. In the above formula, the left half represents the influence of the node itself, and the right half represents the influence given by other nodes connected to the node. Thus, the fault influence of each node can be obtained. According to the above-mentioned fault influence of each node, several KPIs with a large influence on the fault can be obtained, and according to the contribution degree of counters corresponding to several KPIs with a large influence on the fault, specific reasons of the fault can be obtained, for example, according to the fault influence of KPIs, determining that the RRC connection establishment success rate is the KPI causing the fault, according to the ranking of the contribution degree of counters corresponding to the RRC connection establishment success rate, it can be determined that one or more counters among the number of successful RRC connection establishment times, the ratio of the number of times of RRC connection establishment requests, the admission failure, the air interface timer timeout, the cell rejection and other reasons are root causes causing the fault.
In one or more possible embodiments, as shown in fig. 5, further comprising: mapping each KPI and counter into nodes respectively; connecting nodes corresponding to two KPIs with association relation through a first edge, wherein the weight of the first edge is the association degree between the two KPIs; connecting the KPI with the nodes corresponding to the counter with influence relationship through a second edge to obtain a KPI association propagation diagram, wherein the weight of the second edge is the contribution degree of the counter and the KPI; therefore, the connection relation between each KPI and other KPIs and counters can be more conveniently watched, so that the connection relation between nodes is more concise and clear, and the problems can be conveniently and rapidly found out. In one or more possible embodiments, predicting whether the key KPI is abnormal using a probabilistic statistical model, as shown in fig. 3, includes:
step 301, inputting all sampling values of any key KPI without a preset threshold value into a probability statistical model;
step 302, aggregating all sampling values into two classifications by using the probability statistical model, and determining abnormal classifications according to the number of sampling values of each classification;
step 303, determining whether the key KPI is abnormal according to the classification of the current sampling value of the key KPI.
In one or more possible embodiments, predicting whether the key KPI is abnormal using a probabilistic statistical model includes: inputting all sampling values of any key KPI without a preset threshold value into a probability statistical model; the probability statistical model is utilized to aggregate all sampling values into two classifications, and abnormal classifications are determined according to the number of sampling values of each classification; and determining whether the key KPI is abnormal according to the classification of the current sampling value of the key KPI. When judging whether a certain key KPI is abnormal or not through the probability statistical model, the judgment of whether the key KPI is abnormal or not can be selected according to specific service requirements.
In one or more possible embodiments, further comprising: based on the key KPI abnormal classification of the input key KPI prediction and the actual key KPI abnormal classification in the historical time period by utilizing a generalized statistical model, determining a corresponding model evaluation index, carrying out accuracy assessment on the probability statistical model by utilizing the model evaluation index, and determining whether the key KPI is abnormal or not according to the accuracy assessment result of the probability model. For the problem of abnormality detection, the evaluation index is more complex. The main reason is that the data set of the general anomaly detection is an unbalanced data set, namely, the normal data has more abnormal data and less abnormal data. If only the accuracy is seen, a scenario can be envisaged: if 90% of the test set are normal, the accuracy rate can reach 90% when the predicted result is all normal. But in fact we are more concerned about anomalies. Therefore, the following method can be adopted to judge whether the result of the abnormality detection is correct; T/F represents whether the prediction is "matched" with the actual, P/N represents the prediction as "positive/negative", P represents the prediction as a positive class-! N is predicted to be abnormal as shown in the following table:
Figure BDA0004034473950000131
From this the following formula can be derived:
Figure BDA0004034473950000132
Figure BDA0004034473950000133
Figure BDA0004034473950000134
the larger the recall rate and the meaning copper TPR-value is, the better the performance is; the accuracy rate represents the number of samples of which the prediction is abnormal and actual, and the larger the ratio of the number of samples to the total number of the prediction is abnormal, the better the performance;
the greater the weighted harmonic mean of F1P and R, the better the performance. Thus anomaly detection is more focused on recall and accuracy indicators. The average recall and average precision for N Kpi anomaly detections are defined as follows:
Figure BDA0004034473950000135
Figure BDA0004034473950000136
meanwhile, only the key KPI is subjected to abnormality detection, so that the detection efficiency can be greatly improved.
For example, KPI timing data for an actual week, time such as: 2022-03-26 to 2022-04-01, calculated according to the above index, can obtain the following accuracy index:
Figure BDA0004034473950000137
Figure BDA0004034473950000141
the accuracy of the probability statistical model in anomaly detection of key KPIs can be reflected.
The following is a description of possible implementations of calculating the degree of association between each two different KPIs and determining the KPIs with which the association is possible.
And calculating the association degree between every two different KPIs by a normalized cross-correlation method, and determining two KPIs with association degree larger than a set threshold as associated KPIs. Sampling value x of KPIx at i time i Sampling value y of KPIy at i time i The relevance of KPIs x and y is calculated according to the following formula:
Figure BDA0004034473950000142
wherein:
Figure BDA0004034473950000143
Figure BDA0004034473950000151
Figure BDA0004034473950000152
Figure BDA0004034473950000153
the n refers to the number of KPIs acquired from the beginning of data acquisition to the moment i, and the n refers to the number of KPIs acquired from the beginning of data acquisition to the moment i
Figure BDA0004034473950000154
Mean value of the sampled value of KPIx from the beginning of the data acquisition to the instant i, +.>
Figure BDA0004034473950000155
Mean the average value, s, of the sampled values of KPIy from the beginning of data acquisition to the instant i x Refers to the variance, s, of KPIx sampling values y Refers to the variance of the KPIy sample values.
For example, the association degree of the radio access success rate and the RRC connection establishment success rate is 0.97, as shown in fig. 6.
Possible embodiments for determining the contribution degree of the influence factor counter affecting the KPI to the KPI are given below.
In one or more possible embodiments, determining the contribution of each counter to the KPI includes: taking each KPI at a plurality of moments as a vector y, and taking each counter affecting the KPI at the corresponding moment as a vector x to obtain a plurality of discrete points; and performing linear fitting on the discrete points by using a partial least square method to obtain a vector k in a linear regression curve y=kx, and determining the vector k as the contribution degree of the counter to the KPI.
The contribution degree between the KPI and the influence factor counter is calculated by adopting partial least square (English: partial Least Squares, abbreviated as PLS) according to the calculation formula of the KPI and the prior knowledge of the relevant counter. The RRC connection establishment success rate calculation formula is:
RRC.SuccConnEstab/RRC.AttConnEstab*100%
From the calculation formula, 2 counters can be obtained: the RRC. Accconnestab refers to the number of RRC connection establishment successes, and the RRC. Attconnestab refers to the ratio of the number of RRC connection establishment requests. There are 4 counters for the prior knowledge of RRC connection establishment success rate, rrc.failconnestab, rrc.failconnestab.failbase, null timer timeout, rrc.failconnestab.reject, rrc.failconnestab.other. The calculation flow is to abstract the data of kpi into a vector y, abstract the counter related to kpi into each independent variable x, train and obtain a k vector in y=kx by adopting the traditional partial least square, and record the contribution degree of each x in the k vector.
The following is a description of possible implementations for determining the importance of key KPIs.
The first importance evaluation index includes any one or more of the following: whether a key KPI is, whether an abnormal KPI is, a first initial weight set for different non-key KPIs, respectively. For example, when the importance of key KPIs is evaluated, the following formula is adopted for calculation:
IMT i =α×isKey(i)+β×isAnomaly(i)
wherein isKey (i) represents whether the key KPI is a key KPI, the key KPI value is 0.7, and the non-key KPI value is 0.3; isAnomaly (i) represents whether the key KPI has an abnormal value, wherein the abnormal value is an abnormal index, the value is 0.8, and the value is 0.2 when the key KPI does not exist; and simultaneously, the alpha value is 0.6, and the beta value is 0.4. For example, a key abnormal KPI indicator, its importance IMT value is 0.6×0.7+0.4×0.8=0.74. If the non-key KPI is subjected to importance evaluation, whether the non-key KPI is related to the key KPI can be checked, if so, whether the non-key KPI is abnormal is determined according to the related key KPI, so that the first initial weight set by the non-key KPI, that is, the importance thereof, may be 0.6×0.3+0.4×0.8=0.5 or 0.6×0.3+0.4×0.2=0.26.
According to the root cause positioning method based on the KPI, faults can be rapidly and efficiently positioned accurately, specific KPIs and root causes of the faults can be found out, fault recovery time is shortened, operation and maintenance efficiency and user experience are improved, and the like.
Based on the same inventive concept, the second aspect of the present application further provides a root cause positioning device based on KPI, as shown in fig. 7, including:
the association degree determining module 701 is configured to calculate, for all KPIs of each target cell, an association degree between each two different KPIs according to sampling values of the two different KPIs at the same time, and determine the KPIs with association;
the contribution degree determining module 702 is configured to determine, for each target cell, a contribution degree of each counter to each KPI based on sampling values of different moments of each KPI and each influence factor counter affecting the KPI at a corresponding moment;
an importance determining module 703, configured to determine the importance of each KPI of each target cell based on different first importance evaluation indexes of the KPI;
an influence determining module 704, configured to determine, for any target cell, a fault influence of any KPI based on importance of the KPI, association degrees corresponding to other KPIs associated with the KPI, and contribution degrees of counters affecting the KPI;
A fault determining module 705, configured to determine a root cause of the fault based on a fault impact of KPIs of each target cell.
Based on the same inventive concept, the third aspect of the present application further provides a root cause positioning device based on KPIs, the device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the KPI-based root cause localization methods as provided by the embodiments of the first aspect.
As shown in fig. 8, the device includes a processor 801, a memory 802, a communication interface 803, and a bus 804. Wherein the processor 801, the memory 802 and the communication interface 803 are connected to each other through a bus 804.
The processor 801 is configured to read and execute the instructions in the memory 802, so that the at least one processor can execute a method for locking a network system and a frequency band provided in the foregoing embodiment.
The memory 802 is configured to store various instructions and programs of the abnormal measurement configuration processing method of the different system provided in the foregoing embodiment.
Bus 804 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or one type of bus.
The processor 801 may be any combination of a central processing unit (central processing unit, CPU for short), a network processor (network processor, NP for short), an image processor (Graphic Processing Unit, GPU for short) or CPU, NP, GPU. But also a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD for short), a field-programmable gate array (field-programmable gate array, FPGA for short), general-purpose array logic (generic array logic, GAL for short), or any combination thereof.
Based on the same inventive concept, the fourth aspect further provides a storage medium storing a computer program for causing a computer to execute any one of the KPI-based root cause localization methods as provided by the embodiments of the first aspect.
As shown in fig. 9, the memory may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
The memory may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (16)

1. A root cause positioning method based on KPIs, comprising:
aiming at all KPIs of each target cell, calculating the association degree between every two different KPIs according to sampling values of the two different KPIs in the same time period, and determining the associated KPI;
determining the contribution degree of each counter to the KPI based on sampling values of different moments of each KPI and each influence factor counter affecting the KPI at the corresponding moment;
Determining the importance of each KPI of each target cell based on different first importance evaluation indexes of the KPI;
determining fault influence of any KPI based on importance of the KPI, association degrees corresponding to other KPIs associated with the KPI and contribution degrees of counters affecting the KPI aiming at any target cell;
the root cause of the fault is determined based on the fault impact of the KPIs of the target cells.
2. The method of claim 1, wherein determining the root cause of the failure based on the failure impact of the KPIs of each target cell comprises:
sequencing according to the fault influence of the KPIs of each target cell, and determining the first n KPIs with larger fault influence;
and sequencing the contribution degree of the counters of each KPI in the first n KPIs, and determining the first m counters with larger contribution degree of each KPI as the root cause of the fault, wherein m and n are positive integers.
3. The method according to claim 1 or 2, further comprising:
determining the importance of each counter based on different second importance evaluation indexes of each counter of each target cell;
for any target cell, determining the fault influence of any counter based on the importance of the counter and the contribution degree of the counter to all KPIs influenced by the counter.
4. A method according to claim 3, wherein mapping each KPI and counter to a node respectively, establishing a connection between nodes corresponding to two KPIs with an association relationship, and a connection between nodes corresponding to KPIs with an influence relationship and counters, and determining a fault influence of the KPI/a fault influence of the counter, specifically includes:
for any node, determining the influence of the node according to the importance of the node and a first weight factor, and determining the influence given to the node by other nodes connected with the node according to the association degree/contribution degree and a second weight factor respectively corresponding to all other nodes connected with the node;
determining the fault influence of the node according to the influence of the node and the influence given to the node by other nodes connected with the node;
the first weight factor is the reciprocal of the number of all nodes, and the second weight factor is the fault influence of other nodes connected with the node.
5. The method as recited in claim 1, further comprising:
mapping each KPI and counter into nodes respectively;
connecting nodes corresponding to two KPIs with association relation through a first edge, wherein the weight of the first edge is the association degree between the two KPIs;
And connecting the nodes corresponding to the KPIs and the counters with influence relations through a second edge to obtain a KPI association propagation diagram, wherein the weight of the second edge is the contribution degree of the counters and the KPIs.
6. The method as recited in claim 1, further comprising:
acquiring at least one key performance index KPI corresponding to each of a plurality of cells;
when determining that any acquired key KPI has a preset threshold, comparing the key KPI with a corresponding preset threshold, and determining whether the key KPI is abnormal;
when any acquired key KPI is determined to have no preset threshold value, predicting whether the key KPI is abnormal or not by adopting a probability statistical model;
and screening out target cells with abnormal key KPIs.
7. The method of claim 6, wherein predicting whether the key KPI is abnormal using a probabilistic statistical model comprises:
inputting all sampling values of any key KPI without a preset threshold value into a probability statistical model;
the probability statistical model is utilized to aggregate all sampling values into two classifications, and abnormal classifications are determined according to the number of sampling values of each classification;
and determining whether the key KPI is abnormal according to the classification of the current sampling value of the key KPI.
8. The method according to claim 6 or 7, further comprising:
and carrying out accuracy evaluation on the probability statistical model, and determining whether the key KPI is abnormal according to the accuracy evaluation result of the probability model.
9. The method of claim 1, wherein determining the contribution of each counter to the KPI comprises:
taking each KPI at a plurality of moments as a vector y, and taking each counter affecting the KPI at the corresponding moment as a vector x to obtain a plurality of discrete points;
and performing linear fitting on the discrete points by using a partial least square method to obtain a vector k in a linear regression curve y=kx, and determining the vector k as the contribution degree of the counter to the KPI.
10. The method according to claim 1 or 6, wherein the first importance assessment indicator comprises any one or more of:
whether a key KPI is, whether an abnormal KPI is, a first initial weight set for different non-key KPIs, respectively.
11. A method according to claim 3, wherein the second importance rating measure is a second initial weight set for each counter, or the weight of each counter is calculated based on whether the KPI associated with each counter is abnormal.
12. The method according to claim 1, wherein calculating the degree of association between each two different KPIs, determining the associated KPIs, comprises:
and calculating the association degree between every two different KPIs by a normalized cross-correlation method, and determining two KPIs with association degree larger than a set threshold as associated KPIs.
13. The method of claim 6, wherein the key KPIs comprise any one or more of the following:
the method comprises the following steps of wireless access success rate, user terminal (UE) context establishment success rate, radio resource control layer (RRC) connection establishment success rate and radio resource control layer (RRC) reestablishment success rate.
14. Root cause positioning device based on KPI, characterized by comprising:
the association degree determining module is used for calculating association degree between every two different KPIs according to sampling values of the two different KPIs at the same time aiming at all KPIs of each target cell, and determining associated KPIs;
the contribution degree determining module is used for determining the contribution degree of each counter to the KPI according to sampling values of different moments of the KPI and each influence factor counter affecting the KPI at the corresponding moment;
the importance determining module is used for determining the importance of each KPI of each target cell based on different first importance evaluation indexes of the KPI;
The influence determining module is used for determining fault influence of any KPI aiming at any target cell based on importance of the KPI, association degrees corresponding to other KPIs associated with the KPI and contribution degrees of counters affecting the KPI;
and the fault determining module is used for determining the root cause of the fault based on the fault influence of the KPIs of the target cells.
15. A KPI-based root cause positioning device, the device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the KPI-based root cause localization method according to any of claims 1-13.
16. A storage medium storing a computer program for causing a computer to perform the KPI-based root cause localization method according to any one of claims 1-13.
CN202310002419.XA 2023-01-03 2023-01-03 Root cause positioning method, device, equipment and storage medium based on KPI Pending CN116319255A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310002419.XA CN116319255A (en) 2023-01-03 2023-01-03 Root cause positioning method, device, equipment and storage medium based on KPI

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310002419.XA CN116319255A (en) 2023-01-03 2023-01-03 Root cause positioning method, device, equipment and storage medium based on KPI

Publications (1)

Publication Number Publication Date
CN116319255A true CN116319255A (en) 2023-06-23

Family

ID=86824716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310002419.XA Pending CN116319255A (en) 2023-01-03 2023-01-03 Root cause positioning method, device, equipment and storage medium based on KPI

Country Status (1)

Country Link
CN (1) CN116319255A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117560706A (en) * 2024-01-12 2024-02-13 亚信科技(中国)有限公司 Root cause analysis method, root cause analysis device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117560706A (en) * 2024-01-12 2024-02-13 亚信科技(中国)有限公司 Root cause analysis method, root cause analysis device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108683530B (en) Data analysis method and device for multi-dimensional data and storage medium
CN116450399B (en) Fault diagnosis and root cause positioning method for micro service system
CN111444060B (en) Abnormality detection model training method, abnormality detection method and related devices
EP4120653A1 (en) Communication network performance and fault analysis using learning models with model interpretation
CN110275878B (en) Service data detection method and device, computer equipment and storage medium
CN107679089A (en) A kind of cleaning method for electric power sensing data, device and system
CN108802535A (en) Screening technique, dominant interferer recognition methods and device, server and storage medium
CN116319255A (en) Root cause positioning method, device, equipment and storage medium based on KPI
CN111611146A (en) Micro-service fault prediction method and device
CN110348717B (en) Base station value scoring method and device based on grid granularity
CN108684051A (en) A kind of wireless network performance optimization method, electronic equipment and storage medium based on cause and effect diagnosis
CN115188688A (en) Abnormality detection method and apparatus, electronic device, and storage medium
CN111367782B (en) Regression testing data automatic generation method and device
CN108363024B (en) Method and device for positioning fault point of charging pile
CN114138601A (en) Service alarm method, device, equipment and storage medium
CN115114124A (en) Host risk assessment method and device
CN108076473B (en) Neighbor cell optimization processing method and device
CN112464164A (en) Human factor reliability assessment method and device and information processing equipment
CN116414717A (en) Automatic testing method, device, equipment, medium and product based on flow playback
US7797136B2 (en) Metrics to evaluate process objects
CN115904955A (en) Performance index diagnosis method and device, terminal equipment and storage medium
CN116151163A (en) DFT diagnosis quality analysis method and device, storage medium and terminal equipment
CN114760190A (en) Service-oriented converged network performance anomaly detection method
CN109389313A (en) A kind of failure modes diagnostic method based on weighting neighbour's decision
CN115203556A (en) Score prediction model training method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination