WO2017118380A1 - Fingerprinting root cause analysis in cellular systems - Google Patents

Fingerprinting root cause analysis in cellular systems Download PDF

Info

Publication number
WO2017118380A1
WO2017118380A1 PCT/CN2017/070156 CN2017070156W WO2017118380A1 WO 2017118380 A1 WO2017118380 A1 WO 2017118380A1 CN 2017070156 W CN2017070156 W CN 2017070156W WO 2017118380 A1 WO2017118380 A1 WO 2017118380A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
network
performance
anomaly
rules
Prior art date
Application number
PCT/CN2017/070156
Other languages
French (fr)
Inventor
Kai Yang
Yanjia SUN
Ruilin LIU
Jin Yang
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to CN201780006162.8A priority Critical patent/CN108463973A/en
Priority to EP17735818.1A priority patent/EP3395012A4/en
Publication of WO2017118380A1 publication Critical patent/WO2017118380A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic

Definitions

  • the performance of a cellular network is affected by a collection of factors such as the data and voice traffic load, the RF coverage, the level of inter-cell interference, the location of users, and hardware failures.
  • the performance of a few wireless cells within a cellular network may appear abnormal, and mobile users that are served by these cells will suffer from poor user experience. A poor user experience will give rise to customer dissatisfaction.
  • operators often need to detect the abnormal behaviors and then take actions to fix the problems.
  • operators rely on network experts to analyze the behavior of a particular cell to identify the root causes.
  • Traditional approaches of root cause analysis for wireless cellular networks are generally based on a correlation study or rely heavily on engineering knowledge. Such approaches are often heuristic in nature and it is in general difficult to quantify its accuracy. These approaches are also very time-consuming. It may take a few hours if not days to identify the root causes for the performance degradation.
  • One aspect comprises a processor implemented method of identifying a root cause of degraded network quality in a wireless network.
  • the method includes accessing historical network performance data, the performance data including a time sequenced measure of performance indicators for the network.
  • the method further includes evaluating the historical performance data to determine regularly occurring associations between indicators to define a set of rules characterizing the associations of the wireless network, and storing the set of rules in a data structure.
  • the method comprises monitoring the wireless network by accessing analysis data reporting time sequenced performance indicator data.
  • the method includes detecting an anomaly in a performance indicator in the analysis data and the anomaly is matched to at least one rule in the set of rules.
  • the method provides for outputting an indication of a cause of degradation in the wireless network resulting from the anomaly in the performance indicator.
  • the evaluating comprises determining said regularly occurring associations between indicators using and associative learning algorithm.
  • the evaluating comprises applying one of an apriori algorithm, a FN-algorithm or an ECLAT algorithm and ranking an output of the evaluating by lift.
  • the method further includes determining one or more co-occurring anomalies in other performance indicators, and wherein the step of matching the anomaly to at least one rule includes accessing the data structure and determining a similarity between the anomaly, any co- occurring anomalies, and a rule in the data structure using a k-nearest neighbor algorithm.
  • the outputting comprises listing for a performance indictor one or more thresholds an anomaly, and for each threshold, one or more root cause classifications and an indication of an amount of affect each of the one or more root cause classifications has on the network.
  • the historical data and the analysis data each include a set of quantified key quality indictors and key performance indicators.
  • the method further includes adjusting an element of the network to address a root cause identified by the outputting.
  • One general aspect includes a non-transitory computer-readable medium storing computer instructions, that when executed by one or more processors, cause the one or more processors to perform the steps of: computing a set of rules characterizing associations between performance indicators of elements of the wireless network based on one or both of engineering data and historical network data, the associations reflecting a set of elements having an effect on at least one other element of the wireless network; monitoring the wireless network by accessing analysis data reporting time sequenced performance indicator data; detecting an anomaly in at least one performance indicator in the analysis data; detecting co-occurring anomalies to said anomaly; matching the anomaly and co-occurring anomalies detected to at least one rule in the set of rules; and outputting an indication of a cause of degradation in the wireless network resulting from the anomaly in the performance indicator.
  • computing a set of rules comprises computer instructions to determine regularly occurring associations between indicators using processor implemented association learning.
  • computing a set of rules comprises computer instructions to execute one of an apriori algorithm, a FN-algorithm or an ECLAT algorithm on the historical data, and ranking an output of rules by a computed lift value.
  • computing a set of rules comprises instructions to store the set of rules in a data structure
  • the computer instructions for matching the anomaly and co-occurring anomalies includes computer instructions to access the data structure and code configured to detect a similarity between a set including the anomaly and the co-occurring anomalies with a rule in the data structure.
  • outputting includes computer instructions for listing for a performance indictor of one or more thresholds an anomaly, and for each threshold, of one or more root cause classifications and an indication of an amount of affect each of the one or more root cause classifications has on the network.
  • computing a set of rules characterizing associations between performance indicators of elements of the wireless network comprises computer instructions for comparing the historical data and the analysis data, wherein each includes a set of quantified key quality indictors and key performance indicators.
  • a mobile network monitoring system for a cellular network includes a processing system including at least a one processor, storage coupled to the processor, and a network interface. Instructions are stored on the storage operable to instruct the at least one processor to access historical network performance data, the performance data including a time sequenced measure of performance indicators for the cellular network. The instructions are operable to instruct the at least one processor to compute a set of rules characterizing regularly occurring associations between a group of performance indicators, each rule based on one or both of engineering knowledge data and the historical performance data, each rule defining for the group of indicators a set of indicators whose performance affects a one indicator in the group and store the set of rules in the storage.
  • the instructions are operable to instruct the at least one processor to monitor the cellular network by accessing time sequenced analysis data of network performance indicators via the network interface.
  • the instructions are operable to instruct the at least one processor to detect an anomaly in at least one performance indicator in the analysis data received via the network interface, and detect other anomalies co-occurring in time with said anomaly.
  • the instructions are operable to instruct the at least one processor to match the anomaly and anomalies co-occurring in time to at least one rule in the set of rules, and output an indication of a cause of degradation in the cellular network resulting from the anomaly in the performance indicator.
  • the analysis data is monitored in real time.
  • the analysis data is accessed periodically.
  • a mobile network monitoring system for a cellular network.
  • the system comprises an accessing element that accesses historical network performance data, the performance data comprising a time sequenced measure of performance indicators for the cellular network; a computing element that computes a set of rules characterizing regularly occurring associations between a group of performance indicators, each rule based on one or both of engineering knowledge data and the historical performance data, each rule defining for the group of indicators a set of indicators whose performance affects a one indicator in the group, and store the set of rules in the storage; a monitoring element that monitors the cellular network by accessing time sequenced analysis data of network performance indicators via the network interface; a detecting element that detects an anomaly in at least one performance indicator in the analysis data received via the network interface, and detect other anomalies co-occurring in time with said anomaly; a matching element that matches the anomaly and anomalies co-occurring in time to at least one rule in the set of rules; and an outputting element that outputs an indication of a cause of degradation in the
  • Figure 1 depicts functional and structural components an exemplary system in which the present system and method may be implemented.
  • Figure 2 is a flowchart illustrating a method for creating a fingerprint database in accordance with the system and method.
  • Figure 3 is a flowchart illustrating a method for analyzing a network utilizing the fingerprint database.
  • Figure 4 is a table illustrating a series of rules stored in the fingerprint database in accordance with the system and method.
  • Figure 5 is a graph illustrating the effectiveness of matching one or more performance indicators to a rule.
  • Figure 6A is an illustration of data from a cell compared with the output of a rule defining a key performance indictor relationship.
  • Figure 6B is an exemplary output report for the data illustrated in Figure 6A.
  • FIG. 7 is a block diagram of a processing device suitable for implementing the system and method.
  • a system and method are disclosed to allow discovery of root causes of network quality issues.
  • the system and method use association rule learning to create a fingerprint database for root cause analysis of the degraded user experiences with cellular wireless networks.
  • a fingerprint database is built using historical performance data comprising engineering knowledge and/or data mining of patterns from historic network data. Once the fingerprint database is built, the system and method can monitor for anomalies in analysis performance data comprising key performance indicators (KPIs) and key quality indicators (KQIs) of the network. If an anomaly is detected, a co-occurrence analysis is used to identify abnormal patterns in other key quality and performance indicators that happen simultaneously. The determined co-occurring anomalies in indicators are then matched to the fingerprint database and find the associated potential root causes of the quality issue. The matching can be performed via comparing the identified abnormal patterns with the records in the historic knowledge database using a similarity measure.
  • KPIs key performance indicators
  • KQIs key quality indicators
  • the analysis system and method may be implemented as part of a network management system which allows engineers to adjust parameters of a network based on the output provided by the system. It should be understood that the present system and method may be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the system and method to those skilled in the art. Indeed, the system and method is intended to cover alternatives, modifications and equivalents of these embodiments, which are included within the scope and spirit of the system and method as defined by the appended claims. Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the system and method. However, it will be clear to those of ordinary skill in the art that the present system and method may be practiced without such specific details.
  • Figure 1 depicts functional and structural components of an embodiment of the system which performs root cause analysis.
  • Figure 1 includes a network 100 which is the subject network to be monitored using the system. Although only one network is illustrated, multiple networks may be monitored, each having their own fingerprint database constructed based on such network’s historical network data and engineering data.
  • the network 100 may comprise any wired or wireless network that provides communication connectivity for devices.
  • the network 100 may include various cellular network and packet data network components such as a base transceiver station (BTS) , a node-B, a base station controller (BSC) , a radio network controller (RNC) , a service GPRS support node (SGSN) , a gateway GPRS support node (GGSN) , a WAP gateway, mobile switching center (MSC) , short message service centers (SMSC) , a home location registers (HLR) , a visitor location registers (VLR) , an Internet protocol multimedia subsystem (IMS) , and/or the like.
  • BTS base transceiver station
  • BSC base station controller
  • RNC radio network controller
  • SGSN service GPRS support node
  • GGSN gateway GPRS support node
  • WAP gateway WAP gateway
  • MSC mobile switching center
  • SMSC short message service centers
  • HLR home location register
  • the network 100 may employ any of the known and available communication protocols, such as Code Division Multiple Access (CDMA) , Global System for Mobile communications (GSM) , Universal Mobile Telecommunications System (UMTS) , Long Term Evolution (LTE) , or any other network protocol that facilitates communication between communication network 100 and a network enabled devices.
  • CDMA Code Division Multiple Access
  • GSM Global System for Mobile communications
  • UMTS Universal Mobile Telecommunications System
  • LTE Long Term Evolution
  • the network 100 may include other types of devices and nodes for receiving and transmitting voice, data, and combination information to and from radio transceivers, networks, the Internet, and other content delivery networks.
  • the network may support communication from any portable or non-portable communication device having network connectivity function, such as a cellular telephone, a computer, a tablet, and the like, can operatively connect to the communication network 100.
  • KQIs Key Quality Indicators
  • QoS Quality of Service
  • KPIs Key Performance Indicators
  • KPIs are internal indicators based on time-referenced network counters. Such KPIs are evaluated in the context of other counters and related to KQIs.
  • Each KPI and KQI is a time-referenced measure of the particular indictor. Variations in each KPI and KQI can be tracked to a particular time indication.
  • Network KPI's may be measured and monitored using defined standard interfaces in the wireless network. These KPIs include multiple network performance counters and timers. For example, in a mobile data service network, the service accessibility may be determined through the Packet Data Protocol (PDP) Context Activation Success Rate KPI, which may be an aggregated ratio of the successful PDP context activations to PDP context attempts. This KPI indicates the ability of the mobile subscriber to access the packet switched service.
  • PDP Packet Data Protocol
  • Context Activation Success Rate KPI which may be an aggregated ratio of the successful PDP context activations to PDP context attempts. This KPI indicates the ability of the mobile subscriber to access the packet switched service.
  • a customer utilizing a mobile device that communicates with a web server via a communication network 100 will have a perceived performance experience.
  • a network may include a large number of physical sub-systems, and network components, making problem identification, analysis or resolution difficult tasks.
  • a customer may experience an average download throughput rate of 1 Mbps during a time where a peak number of customer service sessions are being handled by the network, and a throughput rate of 2 Mbps otherwise.
  • a root cause analysis can be identified using the analysis system herein.
  • HTTP Large Page Display rate as measured in kbps which is a metric of a web browsing session which is used as an example HTTP_Page_Large_Display_Rate in Figure 6A.
  • an anomaly in this rate can be correlated to an anomaly in a KPI (Total_DLPS_Traffic_Bits in Figure 6A) and these used to determine the root cause of this problem in the network.
  • a network monitoring system 150 may include a database processor 152, a fingerprint database 155, a network monitor 160, and anomaly detector 165, an anomaly and root cause analyzer 170 and an output generator 180.
  • a network monitoring system 150 may be implemented in a computer system comprising one or more computing devices of various types. One example of such a computing system is illustrated in Figure 7.
  • Network monitoring system 150 may be a discrete system, or it may be integrated within other systems including the systems and components within the communication network 100.
  • Database processor 152 performs association rule learning on historical network data 110 and engineering data 120 to create the fingerprint database 155.
  • the historical network data 110 comprises historical network performance data as characterized by the KPIs and KQIs available for the network and is sequenced in time.
  • the database provides a set of rules reflecting the relationship between KPIs and KQI that influence network performance.
  • a method performed by the database processor is illustrated in Figure 2.
  • the database processor may be enabled by code operable to instruct a processing device to perform the method of Figure 2 or by processing specific hardware adapted to implement the learning algorithms discussed herein programmatically. Hence, the creation of the fingerprint database is performed in an automated fashion once access to historical network data 110 and engineering data is provided.
  • the fingerprint database 155 is utilized by the anomaly detector 165 and analyzer 170 to determine which factors may contribute to network quality issues, and thereby identify possible root causes of network quality issues.
  • a small example of the rules in the fingerprint database is illustrated in the data structure shown in Figure 4. The creation and use of the fingerprint database 155 is discussed further below.
  • the fingerprint database 155 may be updated periodically as new historical network data 110 or engineering data 120 is provided for the network 100.
  • the network monitor 160 accesses various components of network 100 to monitor analysis data for defined KPI and KQI data in real time and/or periodically. Anomalies in the data accessed by the network monitor 160 are then analyzed by analyzer 170 when detect by the anomaly detector 165 detects an anomaly. Alternatively, anomaly detector 165 and/or analyzer 170 may periodically analyzed stored data additionally to or instead of analyzing data in real time.
  • the anomaly detector 165 compares the historical data for KPIs to detect variations in KPI and KQI analysis data. Each KPI and KQI has a normal range of operational values which can be defined. When a data outlier in KPI or KQI analysis data occurs, an anomaly may be indicated. When a KQI or KPI anomaly is detected, the anomaly detector 165 may further perform a co-occurrence analysis to search for other KPIs and KQIs to determine whether simultaneous or near-simultaneous anomalies in other KPIs and KQIs have also occurred. These detected, co-occurred anomalies are then analyzed by a matching algorithm in the analyzer 170 relative to data retrieved from the fingerprint database 165 to determine likely root causes of the anomalies detected.
  • a detection cycle is triggered during which a fragment of analysis data is received. Time stamps of the data fragment are recorded.
  • the anomaly detector calculates the variance in any data point according to minimum threshold variations defined for each parameter. Such variations may be determined from historical data for each parameter.
  • the system may include a set of default thresholds for each monitored parameter, with such thresholds being adjustable by an operator of the network monitor 150.
  • a sample data fragment is illustrated in Figure 6A.
  • the analyzer 170 performs a matching analysis as described below.
  • anomalies in the KPI and KQI data analysis data which are co-occurring in time are matched to learned rules in the fingerprint database to identify potential root causes of network issues.
  • the matching between the KPI data, KQI data and the fingerprint database may in one embodiment be performed by comparing the identified abnormal patterns with rules derived from records in the historic knowledge database under a given similarity measure.
  • One example is the k-nearest neighbors (KNN) algorithm.
  • FIG. 2 is a flowchart illustrating a method for creating a fingerprint database 155.
  • historical KPI data and quantifiable KQI data if available, is accessed.
  • engineering knowledge is accessed. Engineering knowledge can comprise rules defined by one or more network engineers based on network knowledge which define the relationships between performance indicators and quality indicators.
  • step 210 is skipped, and the fingerprint database is created solely from engineering data knowledge.
  • step 220 is skipped and the fingerprint database is created solely from historical data.
  • both engineering knowledge at 220 and historical data at 210 are utilized to create the fingerprint database.
  • association rule learning is analyzed using association rule learning to create a database of regularly occurring associations in the data. For example, a regularly occurring spike in DLPS Traffic Bits (a KPI) may correlate with drops in Large Page Display Rate Throughput (a KQI) . This association is seen in the exemplary data shown in Figure 6A below.
  • Association rule learning is a method for discovering interesting relations between variables in large databases.
  • An association rule is expressed as “If x is occurs, then y occurs. ” Every rule is composed of two different sets of items X and Y, where X is the antecedent or left hand side (LHS) and Y is the consequence or right hand side (RHS) .
  • LHS left hand side
  • RHS right hand side
  • X defines the set of items of KPI for which a consequence or KQI occurs.
  • the rule found in the sales data of a supermarket would indicate the likelihood that if a customer buys diaper and milk together, they are likely to also buy beer.
  • constraints on various measures of significance and interest may be used.
  • the best-known constraints are minimum thresholds on support and confidence.
  • X be an item-set, an association rule and T a set of transactions of a given database.
  • the support value of X (supp (X) ) with respect to T is defined as the proportion of transactions in the database which contains the item-set X.
  • supp (X) is the joint probability of finding both X and Y together in a random basket. For example, if an item-set (diaper, milk beer) occurs in 1 out of 5 transactions, it has a support of . 2 since it occurs in 20%of all transactions.
  • the argument of supp () is a set of preconditions, and thus becomes more restrictive as it grows.
  • the confidence value of a rule, with respect to a set of transactions T is the proportion the transactions that contains X which also contains Y, in other words, the probability of finding Y in a basket if the basket already contains X.
  • supp (X ⁇ Y) means the support of the union of the items in X and Y.
  • conditional probability the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.
  • the Lift of a rule is defined as:
  • Lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole) , measured against a random choice targeting model.
  • a targeting model is doing a good job if the response within the target is much better than the average for the population as a whole.
  • Lift is simply the ratio of these values: target response divided by average response.
  • association rules may be analyzed to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time.
  • a minimum support threshold is applied to find all frequent item-sets in a database.
  • a minimum confidence constraint is applied to these frequent item-sets in order to form rules.
  • a lift of 1.5 is set as a threshold. All rules calculated to have a lift of less than 1.5 are discarded. In another embodiment, a higher or lower threshold may be set. A minimum lift is 1.0.
  • association rule learning is performed by calculation relations using any one of an Apriori, ELAT or FP-Growth Algorithm.
  • an Apriori algorithm is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation) where each transaction is seen as a set of items (an itemset) .
  • the Apriori algorithm Given a threshold C, the Apriori algorithm identifies the item sets which are subsets of at least C transactions in the database. Frequent subsets are extended one item at a time (candidate generation) , and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.
  • Apriori uses breadth-first search and a Hash tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k -1. Then it prunes the candidates which have an infrequent sub pattern. The algorithm terminates when frequent itemsets cannot be extended any more. But it has to generate a large amount of candidate itemsets and scans the data set as many times as the length of the longest frequent itemsets.
  • Apriori algorithm can be written by pseudocode as follows:
  • Ck means k-th candidate itemsets and Lk means k-th frequent itemsets.
  • FP Frequent Pattern
  • the algorithm counts occurrence of items (attribute-value pairs) in the dataset, and stores them to 'header table' .
  • the second pass it builds the FP-tree structure by inserting instances. Items in each instance have to be sorted by descending order of their frequency in the dataset, so that the tree can be processed quickly. Items in each instance that do not meet minimum coverage threshold are discarded. If many instances share most frequent items, FP-tree provides high compression close to tree root.
  • Recursive processing of this compressed version of main dataset grows large item sets directly, instead of generating candidate items and testing them against the entire database. Growth starts from the bottom of the header table (having longest branches) , by finding all instances matching given condition. A new tree is created, with counts projected from the original tree corresponding to the set of instances that are conditional on the attribute, with each node getting sum of its children counts. Recursive growth ends when no individual items conditional on the attribute meet minimum support threshold, and processing continues on the remaining header items of the original FP-tree. Once the recursive process has completed, all large item sets with minimum coverage have been found, and association rule creation begins.
  • step 220 may be performed using an Equivalence Class Transformation (ECLAT) algorithm.
  • ECLAT Equivalence Class Transformation
  • the ECLAT algorithm uses tidset intersections to compute the support of a candidate itemset avoiding the generation of subsets that does not exist in the prefix tree.
  • the ECLAT algorithm is defined recursively.
  • the initial call uses all the single items with their tidsets.
  • the function IntersectTidsets verifies each itemset-tidset pair ⁇ X, t (X) > with all the others pairs ⁇ Y, t (Y) > to generate new candidates N xy . If the new candidate is frequent, it is added to the set P x . Then, recursively, it finds all the frequent itemsets in the X branch.
  • the algorithm searches in a depth-first search manner to find all the frequent sets.
  • the fingerprint database is built programmatically using any of the aforementioned methods or their variations by the database processor 152.
  • Figure 4 is a table illustrating a portion of a data structure series of five rules stored in the fingerprint database.
  • a first or LHS column illustrates a set of KPI which may be present in analyzed data, while the RHS column illustrates the result (effect) of that set of items in the LHS of the associative rule likely present affecting the factor in the RHS column.
  • Support, confidence and Lift values for each of the 5 rules illustrated are also shown.
  • Each of the rules is defined for a video initiation duration factor (Video_Init_Duration) .
  • the exemplary table was generated as a portion of rules from historical data on a 200 cell network.
  • KPI1 MeanTotalTcpUtilityRatio
  • KPI2 TotalDLPSTrafficBits
  • RLC_AM_Disc_HsdpaTrfPDU packet
  • Video_Init_Duration is High (>4.91s) .
  • Figure 5 is a graph 502 illustrating the effectiveness of matching one or more performance indicators to a rule. As illustrated in Figures 5 the graph 502 shows how the combined factors on the left hand side (LHS) of a rule increase the accuracy of illustrating the effect on the KQI factor on the right hand side (RHS) . In Figure 5, graph 502 illustrates the respective effectiveness of combining KPI factors versus the accuracy of evaluating all factors, one factor alone, two factors and three factors.
  • a matching determines all three KPI anomalies are satisfied, then there is a 93 percent chance that the video initiation duration is bad.
  • a selection of three factors is analyzed. In other embodiments, more or fewer factors may be utilized in each rule.
  • association rules are stored in the fingerprint database 155 at step 240.
  • Figure 3 is a flowchart illustrating a method for analyzing a network utilizing the fingerprint database. The method of Figure 3 may be performed in real time or periodically as a method of monitoring analysis data from a network.
  • KPI and KQI analysis data is accessed.
  • the analysis data may include the same, more or fewer performance indicators than the historical data 120.
  • the analysis data may be accessed in real time by the network monitor.
  • KPI and KQI data may be accessed and analyzed periodically at 310.
  • the analysis data is monitored for anomalies. Examples of a KQI anomaly and a KPI anomaly are illustrated at 610 and 620 in Figure 6A. Monitoring may be performed by receiving analysis network data via the network interface of Figure 1.
  • an anomaly is detected at 320 when a data point in the analysis data varies in excess of a minimum threshold variation defined for each parameter.
  • the system may include a set of default thresholds for each monitored parameter, with such thresholds being adjustable by an operator of the network monitor 150.
  • two anomalies are illustrated for each of HTTP_PageLargeDisplayRate KQI and a TotalDLPSTrafficBit KPI.
  • Step 330 determines simultaneous or near simultaneous occurrences between anomalies in the analysis data.
  • the anomalies are data points for any parameter having a greater than threshold deviation from an average data values for the parameter. Note that the two anomalies for the KQI HTTP_PageLargeDisplayRate correspond simultaneously in time with the two anomalies for TotalDLPSTrafficBit.
  • the analyzer 170 performs step 340 using a matching by a k-nearest neighbor (KNN) algorithm.
  • KNN k-nearest neighbor
  • the KNN algorithm stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions) .
  • a distance measure such Euclidean or Hamming distance.
  • the output may take many forms or reporting or user interfaces designed to provide a network engineer with an indication of a root cause of the anomaly.
  • One example of an output is shown in Figure 6B, though many numerous alternatives for output reports are contemplated.
  • Figure 6B is an illustration of data from a cell compared with the output of a rule defining a key performance indictor relationship.
  • Four data sources (two KQI and two KPI) are illustrated in the left hand side of Figure 6.
  • the chart on the left shows two outliers in the HTTP_PageLargeDisplayRate KQI and a Total DLPS Traffic Bit KPI.
  • Figure 6B illustrates one possible output interface generated a step 350.
  • root causes can be grouped into different categories. Examples of categories of root causes include, as high traffic load, poor RF coverage, and hardware failure.
  • the root causes may be classified as congestion –too many users or an inadequate capacity for a given cell in a network and a coverage issues –inadequate signal strength or reach.
  • An output may be generated characterizing the relationship of the key performance indictor to categories of root causes, as in Figure 6B.
  • the threshold change of the performance indicator is provided in the first column, and the contribution of each root cause for the cell data listed in Figure 6B is listed in terms of congestion, coverage and both causes (mixed) .
  • a threshold for degradation in any one or more KPIs may be set and the output shown in Figure 6B provides, for each percentage, the amount of degradation in congestion and how much there would be in coverage, and how much the factors are mixed.
  • a user interface may provide an alert of one or more quality or performance indicators experiencing an anomaly, with the interface providing a facility to provide further information on the root cause (or potential root causes ordered confidence or lift) .
  • engineering knowledge stored in database 120 as engineering data may be utilized to classify root causes.
  • engineering data may be a known characterization of a network problem based on human knowledge of the construction of the network or previous events.
  • Engineering knowledge of a particular cell s capacity may be linked in a rule associating a detected problem in traffic congestion related to that cell.
  • a performance counter in analysis data returns an anomaly for a particular cell
  • engineering data for that cell may reflect an associated rule indicating a limited capacity for that cell as a potential root cause of the anomaly.
  • it categories of root causes can be classified into workable identifications for network engineers. This classification makes reporting of root causes more efficient.
  • FIG. 7 is a block diagram of a processing device suitable for implementing the system and method.
  • the computing system 702 may include, for example, a processor 710, random access memory (RAM) 720, non-volatile storage 730, a display unit (output device) 750, an input device 760, and a network interface device 740.
  • the computing system 702 may be embedded into a personal computer, mobile computer, mobile phone, tablet, or other suitable processing device.
  • non-volatile storage 730 Illustrated in non-volatile storage 730 are functional components which may be implemented by instructions operable to cause processor 710 to implement one or more of the processes described below. While illustrated as part of non-volatile storage 730, such instructions may be operate to cause the processor to perform various processes described herein using any one or more of the hardware components illustrated in Figure 7. These functional components include a virtual machine manager and a VNF.
  • Non-volatile storage 730 may comprise any combination of one or more computer readable media.
  • the computer readable media may be a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the computer system 702 can include a set of instructions that can be executed to cause computer system 702 to perform any one or more of the methods or computer based functions disclosed herein.
  • Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language conventional procedural programming languages.
  • the program code may execute entirely on the computer system 702, partly on the computer system 702, as a stand-alone software package, partly on the computer system 702 and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service.
  • LAN local area network
  • WAN wide area network
  • the computing system 702 includes a processor 710.
  • a processor 710 for computing system 702 is configured to execute software instructions in order to perform functions as described in the various embodiments herein.
  • a processor 710 for a computing system 702 may be a general purpose processor or may be part of an application specific integrated circuit (ASIC) .
  • a processor 710 for a computing system 702 may also be a microprocessor, a microcomputer, a processor chip, a controller, a microcontroller, a digital signal processor (DSP) , a state machine, or a programmable logic device.
  • DSP digital signal processor
  • a processor 710 for a computing system 702 may also be a logical circuit, including a programmable gate array (PGA) such as a field programmable gate array (FPGA) , or another type of circuit that includes discrete gate and/or transistor logic.
  • PGA programmable gate array
  • FPGA field programmable gate array
  • a processor 710 for a computing system 702 may be a central processing unit (CPU) , a graphics processing unit (GPU) , or both. Additionally, any processor described herein may include multiple processors, parallel processors, or both. Multiple processors may be included in, or coupled to, a single device or multiple devices.
  • the computing system 702 includes a RAM 720 and a non-volatile storage 730 that can communicate with each, and processor 710, other via a bus 708. Illustrated in the non-volatile storage 730 are components a network monitor 732 which may be utilized by the processor to create the network monitor 160 of Figure 1, a database creator 731 which may be utilized to create the fingerprint database 155, an anomaly detector 734 which may be utilized by the processor to create the anomaly detector 165 of Figure 1, analyzer 736 which is utilized create an anomaly detector 165 to detect data anomalies and co-occurrence analysis, and a virtual user interface generator 738 any of the output reports discussed herein. Each of the components may comprise instructions capable of causing the processor 770 to execute steps to perform the methods discussed herein.
  • the computing system 702 may further include a display unit (output device) 750, such as a liquid crystal display (LCD) , an organic light emitting diode (OLED) , a flat panel display, a solid state display, or a cathode ray tube (CRT) .
  • the imaging processor may include an input device 760, such as a keyboard/virtual keyboard or touch-sensitive input screen or speech input with speech recognition, and which may include a cursor control device, such as a mouse or touch-sensitive input screen or pad.
  • Memories described herein are tangible storage mediums that can store data and executable instructions, and are non-transitory during the time instructions are stored therein.
  • a memory described herein is an article of manufacture and/or machine component.
  • Memories will described herein are computer-readable mediums from which data and executable instructions can be read by a computer.
  • Memories as described herein may be random access memory (RAM) , read only memory (ROM) , flash memory, electrically programmable read only memory (EPROM) , electrically erasable programmable read-only memory (EEPROM) , registers, a hard disk, a removable disk, tape, compact disk read only memory (CD- ROM) , digital versatile disk (DVD) , floppy disk, Blu-ray disk, or any other form of storage medium known in the art.
  • Memories may be volatile or non-volatile, secure and/or encrypted, unsecure and/or unencrypted.
  • These computer program instructions may also be stored in a computer readable medium that when executed can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions when stored in the computer readable medium produce an article of manufacture including instructions which when executed, cause a computer to implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • Such computer readable media specifically excludes signals.
  • the computer program instructions may also be loaded onto a computer, other programmable instruction execution apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatuses or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • the subject matter herein advantageously provides a processor implemented method of identifying a root cause of degraded network quality in a wireless network.
  • the method accesses historical network performance data comprising a time sequenced measure of performance indicators for the network.
  • the historical data is evaluated using machine implemented association learning to determine regularly occurring associations between indicators and thereby define a set of rules characterizing the associations of the wireless network.
  • the rules are stored as a set of rules in a data structure.
  • the method monitors the wireless network by accessing analysis data reporting time sequenced performance indicator data. Anomalies in monitored performance indicators are then detected in the analysis data, and matched to at least one rule in the set of rules. An output of an indication of a root cause of degradation in the wireless network resulting from the anomaly in the performance indicator.
  • the method disclosed herein thus provides a relatively low complexity and automated method for determining root causes in a wireless network.
  • the system and method can be extended to any network or system wherein quantifiable key performance indicators and key quality indicators are formed.
  • the system and method is capable of adaptively learning rules based on both historical data and engineering information, and therefore may learn new associations as time goes on and the fingerprint database is updated.
  • system and method includes a database processor (152) for computing a set of rules using the processor, the rules identifying associations between performance indicators of the cellular network based on one or both of engineering knowledge and historical network data, the associations including a set of indicators having an effect on at least one other indicator of the cellular network, and store the set of rules in the storage; a network monitor (160) for monitoring the cellular network by accessing time sequenced analysis data of network performance indicators via the network interface; an anomaly detector (165) for detecting an anomaly in at least one performance indicator in the analysis data received via the network interface, and detect other anomalies co-occurring in time with said anomaly; an analyzer (170) for matching the anomaly and anomalies co-occurring in time to at least one rule in the set of rules; and a ouput (180) for outputting an indication of a cause of degradation in the cellular network resulting from the anomaly in the performance indicator.
  • a database processor for computing a set of rules using the processor, the rules identifying associations between performance indicators of the cellular network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A processor implemented method of identifying a root cause of degraded network quality in a wireless network. The method includes accessing historical network performance data, the performance data including a time sequenced measure of performance indicators for the network. The method evaluates the historical performance data to determine regularly occurring associations between indicators to define a set of rules characterizing the associations of the wireless network, and stores the set of rules in a data structure. The wireless network is monitored by accessing analysis data reporting time sequenced performance indicator data. Next, anomalies are detected in a performance indicator in the analysis data and matched to at least one rule in the set of rules. The method outputs an indication of a cause of degradation in the wireless network resulting from the anomaly in the performance indicator.

Description

FINGERPRINTING ROOT CAUSE ANALYSIS IN CELLULAR SYSTEMS
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to U.S. non-provisional patent application Serial No. 14/991,598, filed on January 8, 2016 and entitled “Fingerprinting Root Cause Analysis in Cellular Systems” , which is incorporated herein by reference as if reproduced in its entirety.
BACKGROUND
The performance of a cellular network is affected by a collection of factors such as the data and voice traffic load, the RF coverage, the level of inter-cell interference, the location of users, and hardware failures. In many cases, the performance of a few wireless cells within a cellular network may appear abnormal, and mobile users that are served by these cells will suffer from poor user experience. A poor user experience will give rise to customer dissatisfaction. As a remedy, operators often need to detect the abnormal behaviors and then take actions to fix the problems. Traditionally, operators rely on network experts to analyze the behavior of a particular cell to identify the root causes. Traditional approaches of root cause analysis for wireless cellular networks are generally based on a correlation study or rely heavily on engineering knowledge. Such approaches are often heuristic in nature and it is in general difficult to quantify its accuracy. These approaches are also very time-consuming. It may take a few hours if not days to identify the root causes for the performance degradation.
SUMMARY
One aspect comprises a processor implemented method of identifying a root cause of degraded network quality in a wireless network. The method includes accessing historical network performance data, the performance data including a time sequenced measure of performance indicators for the network. The method further includes evaluating the historical performance data to determine regularly occurring associations between indicators to define a set of rules characterizing the associations of the wireless network, and storing the set of rules in a data structure. Subsequent to the evaluating, the method comprises monitoring the wireless network by accessing analysis data reporting time sequenced performance indicator data. Next, the method includes detecting an anomaly in a performance indicator in the analysis data and the anomaly is matched to at least one rule in the set of rules. The method provides for outputting an indication of a cause of degradation in the wireless network resulting from the anomaly in the performance indicator.
According to one implementation, the evaluating comprises determining said regularly occurring associations between indicators using and associative learning algorithm.
According to one implementation, the evaluating comprises applying one of an apriori algorithm, a FN-algorithm or an ECLAT algorithm and ranking an output of the evaluating by lift.
According to one implementation, subsequent to said evaluating, the method further includes determining one or more co-occurring anomalies in other performance indicators, and wherein the step of matching the anomaly to at least one rule includes accessing the data structure and determining a similarity between the anomaly, any co- occurring anomalies, and a rule in the data structure using a k-nearest neighbor algorithm.
According to one implementation, the outputting comprises listing for a performance indictor one or more thresholds an anomaly, and for each threshold, one or more root cause classifications and an indication of an amount of affect each of the one or more root cause classifications has on the network.
According to one implementation, the historical data and the analysis data each include a set of quantified key quality indictors and key performance indicators.
According to one implementation, the method further includes adjusting an element of the network to address a root cause identified by the outputting.
One general aspect includes a non-transitory computer-readable medium storing computer instructions, that when executed by one or more processors, cause the one or more processors to perform the steps of: computing a set of rules characterizing associations between performance indicators of elements of the wireless network based on one or both of engineering data and historical network data, the associations reflecting a set of elements having an effect on at least one other element of the wireless network; monitoring the wireless network by accessing analysis data reporting time sequenced performance indicator data; detecting an anomaly in at least one performance indicator in the analysis data; detecting co-occurring anomalies to said anomaly; matching the anomaly and co-occurring anomalies detected to at least one rule in the set of rules; and outputting an indication of a cause of degradation in the wireless network resulting from the anomaly in the performance indicator.
According to one implementation, computing a set of rules comprises computer instructions to determine regularly occurring associations between indicators using processor implemented association learning.
According to one implementation, computing a set of rules comprises computer instructions to execute one of an apriori algorithm, a FN-algorithm or an ECLAT algorithm on the historical data, and ranking an output of rules by a computed lift value.
According to one implementation, computing a set of rules comprises instructions to store the set of rules in a data structure, and wherein the computer instructions for matching the anomaly and co-occurring anomalies includes computer instructions to access the data structure and code configured to detect a similarity between a set including the anomaly and the co-occurring anomalies with a rule in the data structure.
According to one implementation, outputting includes computer instructions for listing for a performance indictor of one or more thresholds an anomaly, and for each threshold, of one or more root cause classifications and an indication of an amount of affect each of the one or more root cause classifications has on the network.
According to one implementation, computing a set of rules characterizing associations between performance indicators of elements of the wireless network comprises computer instructions for comparing the historical data and the analysis data, wherein each includes a set of quantified key quality indictors and key performance indicators.
In another aspect, a mobile network monitoring system for a  cellular network is provided. The system includes a processing system including at least a one processor, storage coupled to the processor, and a network interface. Instructions are stored on the storage operable to instruct the at least one processor to access historical network performance data, the performance data including a time sequenced measure of performance indicators for the cellular network. The instructions are operable to instruct the at least one processor to compute a set of rules characterizing regularly occurring associations between a group of performance indicators, each rule based on one or both of engineering knowledge data and the historical performance data, each rule defining for the group of indicators a set of indicators whose performance affects a one indicator in the group and store the set of rules in the storage. The instructions are operable to instruct the at least one processor to monitor the cellular network by accessing time sequenced analysis data of network performance indicators via the network interface. The instructions are operable to instruct the at least one processor to detect an anomaly in at least one performance indicator in the analysis data received via the network interface, and detect other anomalies co-occurring in time with said anomaly. The instructions are operable to instruct the at least one processor to match the anomaly and anomalies co-occurring in time to at least one rule in the set of rules, and output an indication of a cause of degradation in the cellular network resulting from the anomaly in the performance indicator.
According to one implementation, the analysis data is monitored in real time.
According to one implementation, the analysis data is accessed periodically.
In another aspect, a mobile network monitoring system for a cellular network is provided. The system comprises an accessing element that accesses historical network performance data, the performance data comprising a time sequenced measure of performance indicators for the cellular network; a computing element that computes a set of rules characterizing regularly occurring associations between a group of performance indicators, each rule based on one or both of engineering knowledge data and the historical performance data, each rule defining for the group of indicators a set of indicators whose performance affects a one indicator in the group, and store the set of rules in the storage; a monitoring element that monitors the cellular network by accessing time sequenced analysis data of network performance indicators via the network interface; a detecting element that detects an anomaly in at least one performance indicator in the analysis data received via the network interface, and detect other anomalies co-occurring in time with said anomaly; a matching element that matches the anomaly and anomalies co-occurring in time to at least one rule in the set of rules; and an outputting element that outputs an indication of a cause of degradation in the cellular network resulting from the anomaly in the performance indicator.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 depicts functional and structural components an exemplary system in which the present system and method may be implemented.
Figure 2 is a flowchart illustrating a method for creating a fingerprint database in accordance with the system and method.
Figure 3 is a flowchart illustrating a method for analyzing a network utilizing the fingerprint database.
Figure 4 is a table illustrating a series of rules stored in the fingerprint database in accordance with the system and method.
Figure 5 is a graph illustrating the effectiveness of matching one or more performance indicators to a rule.
Figure 6A is an illustration of data from a cell compared with the output of a rule defining a key performance indictor relationship.
Figure 6B is an exemplary output report for the data illustrated in Figure 6A.
Figure 7 is a block diagram of a processing device suitable for implementing the system and method.
DETAILED DESCRIPTION
A system and method are disclosed to allow discovery of root causes of network quality issues. In one embodiment, the system and method use association rule learning to create a fingerprint database for root cause analysis of the degraded user experiences with cellular wireless networks. A fingerprint database is built using historical performance data comprising engineering knowledge and/or data mining of patterns from historic network data. Once the fingerprint database is built, the system and method can monitor for anomalies in analysis performance data comprising key performance indicators (KPIs) and key quality indicators (KQIs) of the network. If an anomaly is detected, a co-occurrence analysis is used to identify abnormal patterns in other key quality and performance indicators that happen simultaneously. The determined co-occurring anomalies in indicators are then matched to the fingerprint database and find the associated potential root causes of the quality issue. The matching can be performed via comparing the identified abnormal patterns with the records in the historic knowledge database using a similarity measure.
The analysis system and method may be implemented as part of a network management system which allows engineers to adjust parameters of a network based on the output provided by the system. It should be understood that the present system and method may be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the system and method to those skilled in the art. Indeed, the system and method is intended to cover alternatives, modifications and equivalents of these embodiments, which are included within the scope and spirit of the  system and method as defined by the appended claims. Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the system and method. However, it will be clear to those of ordinary skill in the art that the present system and method may be practiced without such specific details.
Figure 1 depicts functional and structural components of an embodiment of the system which performs root cause analysis. Figure 1 includes a network 100 which is the subject network to be monitored using the system. Although only one network is illustrated, multiple networks may be monitored, each having their own fingerprint database constructed based on such network’s historical network data and engineering data.
The network 100 may comprise any wired or wireless network that provides communication connectivity for devices. The network 100 may include various cellular network and packet data network components such as a base transceiver station (BTS) , a node-B, a base station controller (BSC) , a radio network controller (RNC) , a service GPRS support node (SGSN) , a gateway GPRS support node (GGSN) , a WAP gateway, mobile switching center (MSC) , short message service centers (SMSC) , a home location registers (HLR) , a visitor location registers (VLR) , an Internet protocol multimedia subsystem (IMS) , and/or the like. The network 100 may employ any of the known and available communication protocols, such as Code Division Multiple Access (CDMA) , Global System for Mobile communications (GSM) , Universal Mobile Telecommunications System (UMTS) , Long Term Evolution (LTE) , or any other network protocol that facilitates communication between communication network 100 and a network enabled devices. The communication network 100 may also be compatible with future mobile communication standards including, but not limited to, LTE-Advanced and WIMAX-Advanced. The network 100 may  include other types of devices and nodes for receiving and transmitting voice, data, and combination information to and from radio transceivers, networks, the Internet, and other content delivery networks. The network may support communication from any portable or non-portable communication device having network connectivity function, such as a cellular telephone, a computer, a tablet, and the like, can operatively connect to the communication network 100.
Key Quality Indicators (KQIs) are generally external indicators that serve as the basis for Quality of Service (QoS) assessment as perceived by the user. Some KQI’s are quantifiable and reportable via the network (including the examples provided herein) , while others may not be reportable by the network itself, but are nevertheless perceived by a user. Key Performance Indicators (KPIs) are internal indicators based on time-referenced network counters. Such KPIs are evaluated in the context of other counters and related to KQIs. Each KPI and KQI is a time-referenced measure of the particular indictor. Variations in each KPI and KQI can be tracked to a particular time indication. Network KPI's may be measured and monitored using defined standard interfaces in the wireless network. These KPIs include multiple network performance counters and timers. For example, in a mobile data service network, the service accessibility may be determined through the Packet Data Protocol (PDP) Context Activation Success Rate KPI, which may be an aggregated ratio of the successful PDP context activations to PDP context attempts. This KPI indicates the ability of the mobile subscriber to access the packet switched service.
In, for example, a web browsing session, a customer utilizing a mobile device that communicates with a web server via a communication network 100 will have a perceived performance experience. Such a network may include a large number of physical sub-systems, and network  components, making problem identification, analysis or resolution difficult tasks. In the context of the web browsing session, in one example a customer may experience an average download throughput rate of 1 Mbps during a time where a peak number of customer service sessions are being handled by the network, and a throughput rate of 2 Mbps otherwise. In a scenario where the download throughput rate for a customer service session deviates significantly from these learned trends, a root cause analysis can be identified using the analysis system herein. One KQI discussed herein is the HTTP Large Page Display rate as measured in kbps which is a metric of a web browsing session which is used as an example HTTP_Page_Large_Display_Rate in Figure 6A. As discussed herein, an anomaly in this rate can be correlated to an anomaly in a KPI (Total_DLPS_Traffic_Bits in Figure 6A) and these used to determine the root cause of this problem in the network.
Returning to Figure 1, a network monitoring system 150 may include a database processor 152, a fingerprint database 155, a network monitor 160, and anomaly detector 165, an anomaly and root cause analyzer 170 and an output generator 180. A network monitoring system 150 may be implemented in a computer system comprising one or more computing devices of various types. One example of such a computing system is illustrated in Figure 7. Network monitoring system 150 may be a discrete system, or it may be integrated within other systems including the systems and components within the communication network 100.
Database processor 152 performs association rule learning on historical network data 110 and engineering data 120 to create the fingerprint database 155. The historical network data 110 comprises historical network performance data as characterized by the KPIs and KQIs available for the network and is sequenced in time. The database provides  a set of rules reflecting the relationship between KPIs and KQI that influence network performance. A method performed by the database processor is illustrated in Figure 2. The database processor may be enabled by code operable to instruct a processing device to perform the method of Figure 2 or by processing specific hardware adapted to implement the learning algorithms discussed herein programmatically. Hence, the creation of the fingerprint database is performed in an automated fashion once access to historical network data 110 and engineering data is provided.
The fingerprint database 155 is utilized by the anomaly detector 165 and analyzer 170 to determine which factors may contribute to network quality issues, and thereby identify possible root causes of network quality issues. A small example of the rules in the fingerprint database is illustrated in the data structure shown in Figure 4. The creation and use of the fingerprint database 155 is discussed further below. The fingerprint database 155 may be updated periodically as new historical network data 110 or engineering data 120 is provided for the network 100.
The network monitor 160 accesses various components of network 100 to monitor analysis data for defined KPI and KQI data in real time and/or periodically. Anomalies in the data accessed by the network monitor 160 are then analyzed by analyzer 170 when detect by the anomaly detector 165 detects an anomaly. Alternatively, anomaly detector 165 and/or analyzer 170 may periodically analyzed stored data additionally to or instead of analyzing data in real time.
The anomaly detector 165 compares the historical data for KPIs to detect variations in KPI and KQI analysis data. Each KPI and KQI has a normal range of operational values which can be defined. When a data  outlier in KPI or KQI analysis data occurs, an anomaly may be indicated. When a KQI or KPI anomaly is detected, the anomaly detector 165 may further perform a co-occurrence analysis to search for other KPIs and KQIs to determine whether simultaneous or near-simultaneous anomalies in other KPIs and KQIs have also occurred. These detected, co-occurred anomalies are then analyzed by a matching algorithm in the analyzer 170 relative to data retrieved from the fingerprint database 165 to determine likely root causes of the anomalies detected. To detect anomalies in analysis data, for each parameter monitored, a detection cycle is triggered during which a fragment of analysis data is received. Time stamps of the data fragment are recorded. The anomaly detector calculates the variance in any data point according to minimum threshold variations defined for each parameter. Such variations may be determined from historical data for each parameter. The system may include a set of default thresholds for each monitored parameter, with such thresholds being adjustable by an operator of the network monitor 150. A sample data fragment is illustrated in Figure 6A.
The analyzer 170 performs a matching analysis as described below. In the matching analysis, anomalies in the KPI and KQI data analysis data which are co-occurring in time are matched to learned rules in the fingerprint database to identify potential root causes of network issues. The matching between the KPI data, KQI data and the fingerprint database may in one embodiment be performed by comparing the identified abnormal patterns with rules derived from records in the historic knowledge database under a given similarity measure. One example is the k-nearest neighbors (KNN) algorithm.
Figure 2 is a flowchart illustrating a method for creating a fingerprint database 155. At step 210, historical KPI data and quantifiable  KQI data, if available, is accessed. At step 220, engineering knowledge is accessed. Engineering knowledge can comprise rules defined by one or more network engineers based on network knowledge which define the relationships between performance indicators and quality indicators. In one embodiment, step 210 is skipped, and the fingerprint database is created solely from engineering data knowledge. In one embodiment, step 220 is skipped and the fingerprint database is created solely from historical data. In a further embodiment, both engineering knowledge at 220 and historical data at 210 are utilized to create the fingerprint database.
At 230, historical data and/or engineering knowledge is analyzed using association rule learning to create a database of regularly occurring associations in the data. For example, a regularly occurring spike in DLPS Traffic Bits (a KPI) may correlate with drops in Large Page Display Rate Throughput (a KQI) . This association is seen in the exemplary data shown in Figure 6A below.
Association rule learning is a method for discovering interesting relations between variables in large databases. An association rule is expressed as “If x is occurs, then y occurs. ” Every rule is composed of two different sets of items X and Y, where X is the antecedent or left hand side (LHS) and Y is the consequence or right hand side (RHS) . In the present system and method, X defines the set of items of KPI for which a consequence or KQI occurs. For example, the rule
Figure PCTCN2017070156-appb-000001
Figure PCTCN2017070156-appb-000002
found in the sales data of a supermarket would indicate the likelihood that if a customer buys diaper and milk together, they are likely to also buy beer. In the context of a cellular network, the association between performance indicators of a MeanTotalTCPUtilityRatio, TotalDLPSTrafficBits, and VS_RLC_AM_Disc_HSDPATrfPDU. packet (discarded traffic data in HSDPA RLC layer packets) may be associated  with an occurrence of slow video initialization as illustrated in the rules of Figure 4.
In order to select interesting rules from the set of all possible rules, constraints on various measures of significance and interest may be used. The best-known constraints are minimum thresholds on support and confidence.
Let X be an item-set, 
Figure PCTCN2017070156-appb-000003
an association rule and T a set of transactions of a given database. The support value of X (supp (X) ) with respect to T is defined as the proportion of transactions in the database which contains the item-set X. Expressed as a formula: supp (X) is the joint probability of finding both X and Y together in a random basket. For example, if an item-set (diaper, milk beer) occurs in 1 out of 5 transactions, it has a support of . 2 since it occurs in 20%of all transactions. The argument of supp () is a set of preconditions, and thus becomes more restrictive as it grows.
The confidence value of a rule, 
Figure PCTCN2017070156-appb-000004
with respect to a set of transactions T, is the proportion the transactions that contains X which also contains Y, in other words, the probability of finding Y in a basket if the basket already contains X.
Confidence is defined as:
conf (v) = supp (X∪Y) /supp (X) .
where supp (X∪Y) means the support of the union of the items in X and Y. Thus confidence can be interpreted as an estimate of the conditional probability --the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.
The Lift of a rule is defined as:
Figure PCTCN2017070156-appb-000005
or the ratio of the observed support to that expected if X and Y were independent.
Lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole) , measured against a random choice targeting model. A targeting model is doing a good job if the response within the target is much better than the average for the population as a whole. Lift is simply the ratio of these values: target response divided by average response. Hence:
Figure PCTCN2017070156-appb-000006
Figure PCTCN2017070156-appb-000007
High lift therefore implies a strong association between two items. Generally, association rules may be analyzed to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. A minimum support threshold is applied to find all frequent item-sets in a database. A minimum confidence constraint is applied to these frequent item-sets in order to form rules.
In the present system, a lift of 1.5 is set as a threshold. All rules calculated to have a lift of less than 1.5 are discarded. In another embodiment, a higher or lower threshold may be set. A minimum lift is 1.0.
In one embodiment, association rule learning is performed by calculation relations using any one of an Apriori, ELAT or FP-Growth Algorithm. For example, an Apriori algorithm is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation) where each transaction is seen as a set of items (an itemset) . Given a threshold C, the Apriori algorithm identifies the item sets which are subsets of at least C transactions in the database. Frequent subsets are extended one item at a time (candidate generation) , and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.
Apriori uses breadth-first search and a Hash tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k -1. Then it prunes the candidates which have an infrequent sub pattern. The algorithm terminates when frequent itemsets cannot be extended any more. But it has to generate a large amount of candidate itemsets and scans the data set as many times as the length of the longest frequent itemsets. Apriori algorithm can be written by pseudocode as follows:
Input: data set D, minimum support minsup
Output: frequent itemsets L
Figure PCTCN2017070156-appb-000008
Figure PCTCN2017070156-appb-000009
In the above pseudocode, Ck means k-th candidate itemsets and Lk means k-th frequent itemsets.
Alternatively a Frequent Pattern (FP) algorithm may be used to determine rules. In the first pass, the algorithm counts occurrence of items (attribute-value pairs) in the dataset, and stores them to 'header table' . In the second pass, it builds the FP-tree structure by inserting instances. Items in each instance have to be sorted by descending order of their frequency in the dataset, so that the tree can be processed quickly. Items in each instance that do not meet minimum coverage threshold are discarded. If many instances share most frequent items, FP-tree provides high compression close to tree root.
Recursive processing of this compressed version of main dataset grows large item sets directly, instead of generating candidate items and testing them against the entire database. Growth starts from the bottom of the header table (having longest branches) , by finding all instances matching given condition. A new tree is created, with counts projected from the original tree corresponding to the set of instances that are conditional on the attribute, with each node getting sum of its children counts. Recursive growth ends when no individual items conditional on the attribute meet minimum support threshold, and processing continues on the remaining header items of the original FP-tree. Once the recursive process  has completed, all large item sets with minimum coverage have been found, and association rule creation begins.
Still further, step 220 may be performed using an Equivalence Class Transformation (ECLAT) algorithm. The ECLAT algorithm uses tidset intersections to compute the support of a candidate itemset avoiding the generation of subsets that does not exist in the prefix tree. The ECLAT algorithm is defined recursively. The initial call uses all the single items with their tidsets. In each recursive call, the function IntersectTidsets verifies each itemset-tidset pair <X, t (X) > with all the others pairs <Y, t (Y) > to generate new candidates Nxy. If the new candidate is frequent, it is added to the set Px. Then, recursively, it finds all the frequent itemsets in the X branch. The algorithm searches in a depth-first search manner to find all the frequent sets.
Hence, the fingerprint database is built programmatically using any of the aforementioned methods or their variations by the database processor 152.
Figure 4 is a table illustrating a portion of a data structure series of five rules stored in the fingerprint database. A first or LHS column illustrates a set of KPI which may be present in analyzed data, while the RHS column illustrates the result (effect) of that set of items in the LHS of the associative rule likely present affecting the factor in the RHS column. Support, confidence and Lift values for each of the 5 rules illustrated are also shown. Each of the rules is defined for a video initiation duration factor (Video_Init_Duration) . The exemplary table was generated as a portion of rules from historical data on a 200 cell network.
In Row 1 of the table, three performance indicators are shown –MeanTotalTCPUtilityRatio, TotalDLPSTrafficBits, and  VS_RLC_AM_Disc_HSDPATrfPDU. packet. Rule 1 has the highest lift of the rules listed in the table. In Row 2 of the table, a second rule defines MeanTotalTCPUtilityRatio, VS_RLC_AM_Disc_HSDPATrfPDU. packet and VS_HSDPA_MeanChThrougput_TotalBytes. byte as affecting Video_Init_Duration. Although having lower support than rules 3 –5, rule 2 has greater lift.
In the example of Figure 4, when exemplary values of the three KPIs measured are, for example, MeanTotalTcpUtilityRatio (KPI1) >=0.84, TotalDLPSTrafficBits (KPI2) >= 0.57 Mb and RLC_AM_Disc_HsdpaTrfPDU. packet (KPI3) >=1.7k, then Video_Init_Duration is High (>4.91s) .
Figure 5 is a graph 502 illustrating the effectiveness of matching one or more performance indicators to a rule. As illustrated in Figures 5 the graph 502 shows how the combined factors on the left hand side (LHS) of a rule increase the accuracy of illustrating the effect on the KQI factor on the right hand side (RHS) . In Figure 5, graph 502 illustrates the respective effectiveness of combining KPI factors versus the accuracy of evaluating all factors, one factor alone, two factors and three factors. The relative distribution of all cases of Video_Init_Duration is . 33, while combining two factor (MeanTotalTCPUtilityRatio = . 84 and RLC_AM_Disc_HsdpaTrfPDU. packet = 1, 67K) increases the likelihood that Video_init_Duration will be affected to 0.9, and adding TotalDLPSTrafficBits = 0.57mb increases the liklihood to . 93.
Hence, if a matching determines all three KPI anomalies are satisfied, then there is a 93 percent chance that the video initiation duration is bad. In one embodiment, a selection of three factors is analyzed. In other embodiments, more or fewer factors may be utilized in each rule.
Returning to Figure 2, once association rules have been computed, the rules determined by step 220 are stored in the fingerprint database 155 at step 240.
Figure 3 is a flowchart illustrating a method for analyzing a network utilizing the fingerprint database. The method of Figure 3 may be performed in real time or periodically as a method of monitoring analysis data from a network.
At 310, KPI and KQI analysis data is accessed. The analysis data may include the same, more or fewer performance indicators than the historical data 120. In one embodiment, the analysis data may be accessed in real time by the network monitor. Alternatively or in addition, KPI and KQI data may be accessed and analyzed periodically at 310.
At 320, for each KPI and each KQI, the analysis data is monitored for anomalies. Examples of a KQI anomaly and a KPI anomaly are illustrated at 610 and 620 in Figure 6A. Monitoring may be performed by receiving analysis network data via the network interface of Figure 1.
As noted, an anomaly is detected at 320 when a data point in the analysis data varies in excess of a minimum threshold variation defined for each parameter. The system may include a set of default thresholds for each monitored parameter, with such thresholds being adjustable by an operator of the network monitor 150. In the example of Figure 6A, two anomalies (data outliers) are illustrated for each of HTTP_PageLargeDisplayRate KQI and a TotalDLPSTrafficBit KPI.
Once a data anomaly is detected, a co-occurrence analysis on the analysis data is performed at 330. Step 330 determines simultaneous or near simultaneous occurrences between anomalies in the analysis data.  In the example of Figure 6A, a determination has been made that the KQI and KPI outliers occurred at two different simultaneous time periods. The anomalies are data points for any parameter having a greater than threshold deviation from an average data values for the parameter. Note that the two anomalies for the KQI HTTP_PageLargeDisplayRate correspond simultaneously in time with the two anomalies for TotalDLPSTrafficBit.
Once all simultaneous or near simultaneous anomalies have been detected within a given time frame, fingerprint matching against the rules in the database is performed to determine which indicators will affect create a network problem at 340. In one embodiment, the analyzer 170 performs step 340 using a matching by a k-nearest neighbor (KNN) algorithm.
The KNN algorithm stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions) . A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor using a distance measure, such Euclidean or Hamming distance. In the case of similarity classification, the most prevalent rule or a ranked number of rules (causes) may be returned.
Finally, an output is generated at 350. The output may take many forms or reporting or user interfaces designed to provide a network engineer with an indication of a root cause of the anomaly. One example of an output is shown in Figure 6B, though many numerous alternatives for output reports are contemplated.
Figure 6B is an illustration of data from a cell compared with the  output of a rule defining a key performance indictor relationship. Four data sources (two KQI and two KPI) are illustrated in the left hand side of Figure 6. The chart on the left shows two outliers in the HTTP_PageLargeDisplayRate KQI and a Total DLPS Traffic Bit KPI. To understand the cause of the KQI and KPI outliers, using the fingerprint database, we know that the KQI will degrade with a fair amount of certainty when the KPI outliers are found.
Figure 6B illustrates one possible output interface generated a step 350. In order to generate an output that is usable for a user, root causes can be grouped into different categories. Examples of categories of root causes include, as high traffic load, poor RF coverage, and hardware failure.
In the report generated and output at Figure 6B, in one application, the root causes may be classified as congestion –too many users or an inadequate capacity for a given cell in a network and a coverage issues –inadequate signal strength or reach. An output may be generated characterizing the relationship of the key performance indictor to categories of root causes, as in Figure 6B. In Figure 6B, the threshold change of the performance indicator is provided in the first column, and the contribution of each root cause for the cell data listed in Figure 6B is listed in terms of congestion, coverage and both causes (mixed) . In Figure 6B, a threshold for degradation in any one or more KPIs may be set and the output shown in Figure 6B provides, for each percentage, the amount of degradation in congestion and how much there would be in coverage, and how much the factors are mixed.
It will be recognized that numerous alternative forms of output may be provided. In one alternative, a user interface may provide an alert  of one or more quality or performance indicators experiencing an anomaly, with the interface providing a facility to provide further information on the root cause (or potential root causes ordered confidence or lift) .
In the case of the mobile network, engineering knowledge stored in database 120 as engineering data may be utilized to classify root causes. For example, engineering data may be a known characterization of a network problem based on human knowledge of the construction of the network or previous events. Engineering knowledge of a particular cell’s capacity may be linked in a rule associating a detected problem in traffic congestion related to that cell. Where a performance counter in analysis data returns an anomaly for a particular cell, engineering data for that cell may reflect an associated rule indicating a limited capacity for that cell as a potential root cause of the anomaly. Based on engineering knowledge, it categories of root causes can be classified into workable identifications for network engineers. This classification makes reporting of root causes more efficient.
Figure 7 is a block diagram of a processing device suitable for implementing the system and method. The computing system 702 may include, for example, a processor 710, random access memory (RAM) 720, non-volatile storage 730, a display unit (output device) 750, an input device 760, and a network interface device 740. In certain embodiments, the computing system 702 may be embedded into a personal computer, mobile computer, mobile phone, tablet, or other suitable processing device.
Illustrated in non-volatile storage 730 are functional components which may be implemented by instructions operable to cause processor 710 to implement one or more of the processes described below. While illustrated as part of non-volatile storage 730, such instructions may be  operate to cause the processor to perform various processes described herein using any one or more of the hardware components illustrated in Figure 7. These functional components include a virtual machine manager and a VNF.
Non-volatile storage 730 may comprise any combination of one or more computer readable media. The computer readable media may be a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an appropriate optical fiber with a repeater, a compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer system 702 can include a set of instructions that can be executed to cause computer system 702 to perform any one or more of the methods or computer based functions disclosed herein. Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language conventional procedural programming languages. The program code may execute entirely on the computer system 702, partly on the  computer system 702, as a stand-alone software package, partly on the computer system 702 and partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service.
As illustrated in FIG. 7, the computing system 702 includes a processor 710. A processor 710 for computing system 702 is configured to execute software instructions in order to perform functions as described in the various embodiments herein. A processor 710 for a computing system 702 may be a general purpose processor or may be part of an application specific integrated circuit (ASIC) . A processor 710 for a computing system 702 may also be a microprocessor, a microcomputer, a processor chip, a controller, a microcontroller, a digital signal processor (DSP) , a state machine, or a programmable logic device. A processor 710 for a computing system 702 may also be a logical circuit, including a programmable gate array (PGA) such as a field programmable gate array (FPGA) , or another type of circuit that includes discrete gate and/or transistor logic. A processor 710 for a computing system 702 may be a central processing unit (CPU) , a graphics processing unit (GPU) , or both. Additionally, any processor described herein may include multiple processors, parallel processors, or both. Multiple processors may be included in, or coupled to, a single device or multiple devices.
Moreover, the computing system 702 includes a RAM 720 and a non-volatile storage 730 that can communicate with each, and processor 710, other via a bus 708. Illustrated in the non-volatile storage 730 are  components a network monitor 732 which may be utilized by the processor to create the network monitor 160 of Figure 1, a database creator 731 which may be utilized to create the fingerprint database 155, an anomaly detector 734 which may be utilized by the processor to create the anomaly detector 165 of Figure 1, analyzer 736 which is utilized create an anomaly detector 165 to detect data anomalies and co-occurrence analysis, and a virtual user interface generator 738 any of the output reports discussed herein. Each of the components may comprise instructions capable of causing the processor 770 to execute steps to perform the methods discussed herein.
As shown, the computing system 702 may further include a display unit (output device) 750, such as a liquid crystal display (LCD) , an organic light emitting diode (OLED) , a flat panel display, a solid state display, or a cathode ray tube (CRT) . Additionally, the imaging processor may include an input device 760, such as a keyboard/virtual keyboard or touch-sensitive input screen or speech input with speech recognition, and which may include a cursor control device, such as a mouse or touch-sensitive input screen or pad.
Memories described herein are tangible storage mediums that can store data and executable instructions, and are non-transitory during the time instructions are stored therein. A memory described herein is an article of manufacture and/or machine component. Memories will described herein are computer-readable mediums from which data and executable instructions can be read by a computer. Memories as described herein may be random access memory (RAM) , read only memory (ROM) , flash memory, electrically programmable read only memory (EPROM) , electrically erasable programmable read-only memory (EEPROM) , registers, a hard disk, a removable disk, tape, compact disk read only memory (CD- ROM) , digital versatile disk (DVD) , floppy disk, Blu-ray disk, or any other form of storage medium known in the art. Memories may be volatile or non-volatile, secure and/or encrypted, unsecure and/or unencrypted.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatuses (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable instruction execution apparatus, create a mechanism for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that when executed can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions when stored in the computer readable medium produce an article of manufacture including instructions which when executed, cause a computer to implement the function/act specified in the flowchart and/or block diagram block or blocks. Such computer readable media specifically excludes signals. The computer program instructions may also be loaded onto a computer, other programmable instruction execution apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatuses or other devices to produce a computer  implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The subject matter herein advantageously provides a processor implemented method of identifying a root cause of degraded network quality in a wireless network. The method accesses historical network performance data comprising a time sequenced measure of performance indicators for the network. The historical data is evaluated using machine implemented association learning to determine regularly occurring associations between indicators and thereby define a set of rules characterizing the associations of the wireless network. The rules are stored as a set of rules in a data structure. After evaluation of the historical data, the method monitors the wireless network by accessing analysis data reporting time sequenced performance indicator data. Anomalies in monitored performance indicators are then detected in the analysis data, and matched to at least one rule in the set of rules. An output of an indication of a root cause of degradation in the wireless network resulting from the anomaly in the performance indicator.
The method disclosed herein thus provides a relatively low complexity and automated method for determining root causes in a wireless network. The system and method can be extended to any network or system wherein quantifiable key performance indicators and key quality indicators are formed. The system and method is capable of adaptively learning rules based on both historical data and engineering information, and therefore may learn new associations as time goes on and the fingerprint database is updated.
In accordance with the above advantages, system and method includes a database processor (152) for computing a set of rules using the processor, the rules identifying associations between performance indicators of the cellular network based on one or both of engineering knowledge and historical network data, the associations including a set of indicators having an effect on at least one other indicator of the cellular network, and store the set of rules in the storage; a network monitor (160) for monitoring the cellular network by accessing time sequenced analysis data of network performance indicators via the network interface; an anomaly detector (165) for detecting an anomaly in at least one performance indicator in the analysis data received via the network interface, and detect other anomalies co-occurring in time with said anomaly; an analyzer (170) for matching the anomaly and anomalies co-occurring in time to at least one rule in the set of rules; and a ouput (180) for outputting an indication of a cause of degradation in the cellular network resulting from the anomaly in the performance indicator.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (20)

  1. A processor implemented method of identifying a root cause of degraded network quality in a wireless network, comprising:
    accessing historical network performance data, the performance data comprising a time sequenced measure of performance indicators for the network;
    evaluating the historical performance data to determine regularly occurring associations between indicators to define a set of rules characterizing the associations of the wireless network, and storing the set of rules in a data structure;
    subsequent to said evaluating, monitoring the wireless network by accessing analysis data reporting time sequenced performance indicator data;
    detecting an anomaly in a performance indicator in the analysis data;
    matching the anomaly to at least one rule in the set of rules; and
    outputting an indication of a cause of degradation in the wireless network resulting from the anomaly in the performance indicator.
  2. The method of claim 1, wherein the evaluating comprises determining said regularly occurring associations between indicators using and associative learning algorithm.
  3. The method of claim 2, wherein the evaluating comprises applying one of an apriori algorithm, a FN-algorithm or an ECLAT algorithm and ranking an output of the evaluating by lift.
  4. The method of any one of claims 1 to 3, wherein subsequent to said evaluating, the method further includes determining one or more co-occurring anomalies in other performance indicators, and wherein the step of matching the anomaly to at least one rule includes accessing the data structure and determining a similarity between the anomaly, any co-occurring anomalies, and a rule in the data structure using a k-nearest neighbor algorithm.
  5. The method of any one claims 1 to 4, wherein the outputting comprises listing for a performance indictor one or more thresholds an anomaly, and for each threshold, one or more root cause classifications and an indication of an amount of affect each of the one or more root cause classifications has on the network.
  6. The method of any one claims 1 to 5, wherein the historical data and the analysis data each include a set of quantified key quality indictors and key performance indicators.
  7. The method of any one claims 1 to 6, further including adjusting an element of the network to address a root cause identified by the outputting.
  8. A non-transitory computer-readable medium storing computer instructions, that when executed by one or more processors, cause the one or more processors to perform the steps of:
    computing a set of rules characterizing associations between performance indicators of elements of the wireless network based on one or both of engineering data and historical network data, the associations reflecting a set of elements having an effect on at least one other element  of the wireless network;
    monitoring the wireless network by accessing analysis data reporting time sequenced performance indicator data;
    detecting an anomaly in at least one performance indicator in the analysis data;
    detecting co-occurring anomalies to said anomaly;
    matching the anomaly and co-occurring anomalies detected to at least one rule in the set of rules; and
    outputting an indication of a cause of degradation in the wireless network resulting from the anomaly in the performance indicator.
  9. The non-transitory computer-readable medium of claim 8, wherein the computer instructions for performing the steps of computing a set of rules comprises computer instructions to determine regularly occurring associations between indicators using processor implemented association learning.
  10. The non-transitory computer-readable medium of claim 9, wherein the computer instructions for performing the steps of computing a set of rules comprises computer instructions to execute one of an apriori algorithm, a FN-algorithm or an ECLAT algorithm on the historical data, and ranking an output of rules by a computed lift value.
  11. The non-transitory computer-readable medium of claim 10, wherein the computer instructions for performing the steps of computing a set of rules comprises instructions to store the set of rules in a data structure, and wherein the computer instructions for matching the anomaly and co-occurring anomalies includes computer instructions to access the data structure and code configured to detect a similarity between a set  including the anomaly and the co-occurring anomalies with a rule in the data structure.
  12. The non-transitory computer-readable medium of any one claims 8 to 11, wherein the computer instructions for outputting includes computer instructions for listing for a performance indictor of one or more thresholds an anomaly, and for each threshold, of one or more root cause classifications and an indication of an amount of affect each of the one or more root cause classifications has on the network.
  13. The non-transitory computer-readable medium of any one claims 8 to 12, wherein the computer instructions for computing a set of rules characterizing associations between performance indicators of elements of the wireless network comprises computer instructions for comparing the historical data and the analysis data, wherein each includes a set of quantified key quality indictors and key performance indicators.
  14. A mobile network monitoring system for a cellular network, comprising:
    a processing system including at least one processor, storage coupled to the processor, and a network interface;
    instructions stored on the storage operable to instruct the at least one processor to:
    access historical network performance data, the performance data comprising a time sequenced measure of performance indicators for the cellular network;
    compute a set of rules characterizing regularly occurring associations between a group of performance indicators, each rule based on one or both of engineering knowledge data and the  historical performance data, each rule defining for the group of indicators a set of indicators whose performance affects a one indicator in the group, and store the set of rules in the storage;
    monitor the cellular network by accessing time sequenced analysis data of network performance indicators via the network interface;
    detect an anomaly in at least one performance indicator in the analysis data received via the network interface, and detect other anomalies co-occurring in time with said anomaly;
    match the anomaly and anomalies co-occurring in time to at least one rule in the set of rules; and
    output an indication of a cause of degradation in the cellular network resulting from the anomaly in the performance indicator.
  15. The mobile network monitoring system of claim 14 wherein the instructions configured to compute a set of rules comprises instructions to compute regularly occurring associations between indicators using processor implemented association learning.
  16. The mobile network monitoring system of any one of claims 14 and 15, wherein the instructions configured to compute a set of rules is configured to cause the processor to execute one of an apriori algorithm, a FN-algorithm or an ECLAT algorithm on the one or both of engineering knowledge and historical network data, and rank the rules by a computed lift value.
  17. The mobile network monitoring system of any one of claims 14 to 16, wherein instructions to compute a set of rules further includes instructions store the set of rules in a data structure, and wherein the  instructions to match the anomaly and co-occurring anomalies includes instructions to access the data structure and instructions to detect a similarity between a set including the anomaly and the co-occurring anomalies with a rule in the data structure.
  18. The mobile network monitoring system of one of claims 14 to 17, wherein the output comprises listing for a performance indictor one or more thresholds an anomaly, and for each threshold, one or more root cause classifications and an indication of an amount of affect each of the one or more root cause classifications has on the network.
  19. The mobile network monitoring system of one of claims 14 to 18, wherein the analysis data is monitored in real time.
  20. The mobile network monitoring system of one of claims 14 to 19, wherein the analysis data is accessed periodically.
PCT/CN2017/070156 2016-01-08 2017-01-04 Fingerprinting root cause analysis in cellular systems WO2017118380A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780006162.8A CN108463973A (en) 2016-01-08 2017-01-04 Fingerprint recognition basic reason is analyzed in cellular system
EP17735818.1A EP3395012A4 (en) 2016-01-08 2017-01-04 Fingerprinting root cause analysis in cellular systems

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/991,598 2016-01-08
US14/991,598 US10397810B2 (en) 2016-01-08 2016-01-08 Fingerprinting root cause analysis in cellular systems

Publications (1)

Publication Number Publication Date
WO2017118380A1 true WO2017118380A1 (en) 2017-07-13

Family

ID=59273670

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/070156 WO2017118380A1 (en) 2016-01-08 2017-01-04 Fingerprinting root cause analysis in cellular systems

Country Status (4)

Country Link
US (1) US10397810B2 (en)
EP (1) EP3395012A4 (en)
CN (1) CN108463973A (en)
WO (1) WO2017118380A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109963295A (en) * 2017-12-26 2019-07-02 中国移动通信集团上海有限公司 A kind of method and device of determining performance indicator monitoring thresholding
WO2020242275A1 (en) 2019-05-30 2020-12-03 Samsung Electronics Co., Ltd. Root cause analysis and automation using machine learning
CN114286360A (en) * 2020-09-27 2022-04-05 中国移动通信集团设计院有限公司 Wireless network communication optimization method and device, electronic equipment and storage medium
WO2022075893A1 (en) * 2020-10-06 2022-04-14 Telefonaktiebolaget Lm Ericsson (Publ) Radio node and method in a wireless communications network

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10795757B2 (en) 2017-06-29 2020-10-06 Level 3 Communications, Llc Streaming server statistics and predictive mitigation
CN110837841B (en) * 2018-08-17 2024-05-21 北京亿阳信通科技有限公司 KPI degradation root cause identification method and device based on random forest
US11228506B2 (en) * 2018-09-06 2022-01-18 Hewlett Packard Enterprise Development Lp Systems and methods for detecting anomalies in performance indicators of network devices
US11388040B2 (en) 2018-10-31 2022-07-12 EXFO Solutions SAS Automatic root cause diagnosis in networks
US11645293B2 (en) 2018-12-11 2023-05-09 EXFO Solutions SAS Anomaly detection in big data time series analysis
CN109815042B (en) * 2019-01-21 2022-05-27 南方科技大学 Abnormal factor positioning method, abnormal factor positioning device, server and storage medium
CN110147387B (en) * 2019-05-08 2023-06-09 腾讯科技(上海)有限公司 Root cause analysis method, root cause analysis device, root cause analysis equipment and storage medium
US11271835B2 (en) 2019-05-10 2022-03-08 Cisco Technology, Inc. Composite key performance indicators for network health monitoring
US11138163B2 (en) 2019-07-11 2021-10-05 EXFO Solutions SAS Automatic root cause diagnosis in networks based on hypothesis testing
US11558271B2 (en) * 2019-09-04 2023-01-17 Cisco Technology, Inc. System and method of comparing time periods before and after a network temporal event
CN110609858A (en) * 2019-09-17 2019-12-24 南京邮电大学 Index association method based on Apriori algorithm
CN110633195B (en) * 2019-09-29 2023-01-03 北京博睿宏远数据科技股份有限公司 Performance data display method and device, electronic equipment and storage medium
CN113079521B (en) * 2020-01-03 2023-03-21 中国移动通信集团广东有限公司 Call quality optimization method, device and equipment
EP4070197A1 (en) * 2020-01-30 2022-10-12 Huawei Technologies Co., Ltd. Device for monitoring a computer network system
US11522766B2 (en) 2020-02-12 2022-12-06 EXFO Solutions SAS Method and system for determining root-cause diagnosis of events occurring during the operation of a communication network
GB2594512B (en) * 2020-04-30 2022-08-24 Spatialbuzz Ltd Network fault diagnosis
US11743272B2 (en) * 2020-08-10 2023-08-29 International Business Machines Corporation Low-latency identification of network-device properties
US20220099532A1 (en) * 2020-09-25 2022-03-31 General Electric Company Systems and methods for operating a power generating asset
CN112732771A (en) * 2020-11-06 2021-04-30 河北上晟医疗科技发展有限公司 Application of association rule mining technology based on PACS system
US11457371B2 (en) * 2021-01-08 2022-09-27 Verizon Patent And Licensing Inc. Systems and methods for determining baselines for network parameters used to configure base stations
CN112689299B (en) * 2021-01-14 2022-07-01 广州市贝讯通信技术有限公司 Cell load distribution method and device based on FP-growth algorithm
US11343373B1 (en) 2021-01-29 2022-05-24 T-Mobile Usa, Inc. Machine intelligent isolation of international calling performance degradation
CN115150250B (en) * 2021-03-31 2024-01-12 中国电信股份有限公司 Causal learning-based method and causal learning-based device for positioning abnormal root cause of Internet of things
US11936542B2 (en) 2021-04-02 2024-03-19 Samsung Electronics Co., Ltd. Method of solving problem of network and apparatus for performing the same
CN113626090B (en) * 2021-08-06 2023-12-29 济南浪潮数据技术有限公司 Method, device, equipment and readable medium for configuring server firmware
CN113923099B (en) * 2021-09-03 2022-12-27 华为技术有限公司 Root cause positioning method for communication network fault and related equipment
US11800398B2 (en) 2021-10-27 2023-10-24 T-Mobile Usa, Inc. Predicting an attribute of an immature wireless telecommunication network, such as a 5G network
CN114338424A (en) * 2021-12-29 2022-04-12 中国电信股份有限公司 Evaluation method and evaluation device for operation health degree of Internet of things
CN114338351B (en) * 2021-12-31 2024-01-12 天翼物联科技有限公司 Network anomaly root cause determination method and device, computer equipment and storage medium
CN114641028A (en) * 2022-03-21 2022-06-17 中国联合网络通信集团有限公司 User perception data determination method and device, electronic equipment and storage medium
WO2024018257A1 (en) * 2022-07-19 2024-01-25 Telefonaktiebolaget Lm Ericsson (Publ) Early detection of irregular patterns in mobile networks
CN117272398B (en) * 2023-11-23 2024-01-26 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008129243A2 (en) 2007-04-18 2008-10-30 Zenulta Limited Method of identifying a root cause of a network event
US20100123575A1 (en) 2008-11-14 2010-05-20 Qualcomm Incorporated System and method for facilitating capacity monitoring and recommending action for wireless networks
WO2014146690A1 (en) 2013-03-19 2014-09-25 Nokia Solutions And Networks Oy System and method for rule creation and parameter adaptation by data mining in a self-organizing network
CN104396188A (en) * 2012-03-30 2015-03-04 阿尔卡特朗讯 System and method for root cause analysis of mobile network performance problems
US20150148040A1 (en) * 2013-11-26 2015-05-28 At&T Intellectual Property I, Lp Anomaly correlation mechanism for analysis of handovers in a communication network
EP2894813A1 (en) 2014-01-08 2015-07-15 Telefonaktiebolaget L M Ericsson (publ) Technique for creating a knowledge base for alarm management in a communications network
US20150333998A1 (en) * 2014-05-15 2015-11-19 Futurewei Technologies, Inc. System and Method for Anomaly Detection

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7480640B1 (en) * 2003-12-16 2009-01-20 Quantum Leap Research, Inc. Automated method and system for generating models from data
US7844701B2 (en) * 2005-08-01 2010-11-30 Network Appliance, Inc. Rule-based performance analysis of storage appliances
US8135990B2 (en) * 2006-08-11 2012-03-13 Opnet Technologies, Inc. Multi-variate network survivability analysis
US7483934B1 (en) * 2007-12-18 2009-01-27 International Busniess Machines Corporation Methods involving computing correlation anomaly scores
US8595200B2 (en) * 2012-01-03 2013-11-26 Wizsoft Ltd. Finding suspicious association rules in data records
US9026851B2 (en) * 2012-09-05 2015-05-05 Wipro Limited System and method for intelligent troubleshooting of in-service customer experience issues in communication networks
US9203689B2 (en) 2012-10-26 2015-12-01 International Business Machines Corporation Differential dynamic host configuration protocol lease allocation
US9424121B2 (en) * 2014-12-08 2016-08-23 Alcatel Lucent Root cause analysis for service degradation in computer networks
US10042697B2 (en) * 2015-05-28 2018-08-07 Oracle International Corporation Automatic anomaly detection and resolution system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008129243A2 (en) 2007-04-18 2008-10-30 Zenulta Limited Method of identifying a root cause of a network event
US20100123575A1 (en) 2008-11-14 2010-05-20 Qualcomm Incorporated System and method for facilitating capacity monitoring and recommending action for wireless networks
CN104396188A (en) * 2012-03-30 2015-03-04 阿尔卡特朗讯 System and method for root cause analysis of mobile network performance problems
WO2014146690A1 (en) 2013-03-19 2014-09-25 Nokia Solutions And Networks Oy System and method for rule creation and parameter adaptation by data mining in a self-organizing network
US20150148040A1 (en) * 2013-11-26 2015-05-28 At&T Intellectual Property I, Lp Anomaly correlation mechanism for analysis of handovers in a communication network
EP2894813A1 (en) 2014-01-08 2015-07-15 Telefonaktiebolaget L M Ericsson (publ) Technique for creating a knowledge base for alarm management in a communications network
US20150333998A1 (en) * 2014-05-15 2015-11-19 Futurewei Technologies, Inc. System and Method for Anomaly Detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3395012A4
SZABOLCS NOVACZKI ET AL.: "Radio Channel Degradation Detection and Diagnosis Based on a Statistical Analysis", VEHICULAR TECHNOLOGY CONFEENCE, 2011

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109963295A (en) * 2017-12-26 2019-07-02 中国移动通信集团上海有限公司 A kind of method and device of determining performance indicator monitoring thresholding
CN109963295B (en) * 2017-12-26 2022-04-08 中国移动通信集团上海有限公司 Method and device for determining performance index monitoring threshold
WO2020242275A1 (en) 2019-05-30 2020-12-03 Samsung Electronics Co., Ltd. Root cause analysis and automation using machine learning
EP3921980A4 (en) * 2019-05-30 2022-04-06 Samsung Electronics Co., Ltd. Root cause analysis and automation using machine learning
US11496353B2 (en) 2019-05-30 2022-11-08 Samsung Electronics Co., Ltd. Root cause analysis and automation using machine learning
CN114286360A (en) * 2020-09-27 2022-04-05 中国移动通信集团设计院有限公司 Wireless network communication optimization method and device, electronic equipment and storage medium
CN114286360B (en) * 2020-09-27 2023-09-05 中国移动通信集团设计院有限公司 Wireless network communication optimization method and device, electronic equipment and storage medium
WO2022075893A1 (en) * 2020-10-06 2022-04-14 Telefonaktiebolaget Lm Ericsson (Publ) Radio node and method in a wireless communications network

Also Published As

Publication number Publication date
EP3395012A4 (en) 2018-12-05
US20170201897A1 (en) 2017-07-13
US10397810B2 (en) 2019-08-27
EP3395012A1 (en) 2018-10-31
CN108463973A (en) 2018-08-28

Similar Documents

Publication Publication Date Title
US10397810B2 (en) Fingerprinting root cause analysis in cellular systems
US10482158B2 (en) User-level KQI anomaly detection using markov chain model
CN109983798B (en) Prediction of performance indicators in cellular networks
US20200059805A1 (en) Association rule analysis and data visualization for mobile networks
US20230188409A1 (en) Network system fault resolution via a machine learning model
JP7145764B2 (en) Network advisor based on artificial intelligence
US10068176B2 (en) Defect prediction method and apparatus
WO2017215647A1 (en) Root cause analysis in a communication network via probabilistic network structure
US10664837B2 (en) Method and system for real-time, load-driven multidimensional and hierarchical classification of monitored transaction executions for visualization and analysis tasks like statistical anomaly detection
US11736339B2 (en) Automatic root cause diagnosis in networks
EP3895077A1 (en) Explainability-based adjustment of machine learning models
US10819735B2 (en) Resolving customer communication security vulnerabilities
US10602223B2 (en) Methods and apparatus to categorize media impressions by age
JP6097889B2 (en) Monitoring system, monitoring device, and inspection device
JP7195264B2 (en) Automated decision-making using step-by-step machine learning
US20170324759A1 (en) Network sampling based path decomposition and anomaly detection
US10291493B1 (en) System and method for determining relevant computer performance events
EP4120653A1 (en) Communication network performance and fault analysis using learning models with model interpretation
US9479414B1 (en) System and method for analyzing computing performance
US20200342340A1 (en) Techniques to use machine learning for risk management
WO2018188733A1 (en) A computer implemented data processing method
Yu et al. TraceRank: Abnormal service localization with dis‐aggregated end‐to‐end tracing data in cloud native systems
Paul et al. The importance of contextualization of crowdsourced active speed test measurements
US20200202231A1 (en) Self-generating rules for internet of things
CN111222897B (en) Client Internet surfing satisfaction prediction method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17735818

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2017735818

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017735818

Country of ref document: EP

Effective date: 20180723