WO2011009000A1 - Method and apparatus for telecommunications network performance anomaly events detection and notification - Google Patents

Method and apparatus for telecommunications network performance anomaly events detection and notification Download PDF

Info

Publication number
WO2011009000A1
WO2011009000A1 PCT/US2010/042192 US2010042192W WO2011009000A1 WO 2011009000 A1 WO2011009000 A1 WO 2011009000A1 US 2010042192 W US2010042192 W US 2010042192W WO 2011009000 A1 WO2011009000 A1 WO 2011009000A1
Authority
WO
WIPO (PCT)
Prior art keywords
npi
degradation
detection processor
oms
threshold
Prior art date
Application number
PCT/US2010/042192
Other languages
French (fr)
Inventor
Channarong Tontiruttananon
Kuntaporn Saiyos
Deborah Case
Peter Wenzel
Aamir Sattar
Original Assignee
Nortel Networks Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nortel Networks Limited filed Critical Nortel Networks Limited
Priority to CA2768220A priority Critical patent/CA2768220A1/en
Priority to EP10735395A priority patent/EP2454848A1/en
Priority to US13/383,971 priority patent/US8862119B2/en
Publication of WO2011009000A1 publication Critical patent/WO2011009000A1/en
Priority to US14/489,925 priority patent/US20150004964A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0686Additional information in the notification, e.g. enhancement of specific meta-data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition

Definitions

  • the invention pertains to the detection of network performance anomaly events based on Network Performance Indicator (NPI) Operational Measurements (OM).
  • NPI Network Performance Indicator
  • OM Operational Measurements
  • OM's Operational Measurements
  • NPFs Network Performance Indicators
  • NPI process to help avoid or reduce the outage downtime of the network and other problems such as memory leak by devising a way to automatically process the NPIs to detect the occurrence of slow and persistent NPI OM degradation, severe and sudden degradation in NPI OM, and potential outage events and raise an appropriate log or alarm to alert the operator of the observed performance anomaly so that they can be investigated and dealt with in a timely manner.
  • Figure 1 illustrates a network environment that incorporates the claimed network degradation processor that is configured to perform the claimed steps of the invention
  • Figure 2 is a flow diagram illustrating operation of a severe & abrupt NPI degradation detection node and rule in accordance with the principles of the present invention
  • Figure 3 is a flow diagram illustrating the severe performance degradation alarms used to indentify a probable network outage.
  • FIG. 4 is a flow diagram illustrating the operation of an NPI Slow & Persistent Degradation Detection node based on NPI Monitoring. Detailed Description Of The Embodiments
  • the present invention will now be described in connection with an exemplary embodiment for a mobile (cellular) telephone network.
  • the present invention is broadly applicable to many other types of network degradation detection schemes and to networks other than mobile telephone networks.
  • Figure 1 illustrates a network environment that incorporates the claimed network degradation processor that is configured to perform the claimed steps of the invention.
  • a network environment 100 includes a core network 101 coupled to an internet network 103, a PSTN network 105, and a cellular/radio network 107.
  • Various communication interfaces are coupled to the networks to enable communication between users of the network.
  • a VoIP phone 103-1, and laptop computer 103-2, and a desktop computer 103-3 are coupled to the internet network 103; a landline telephone 105-1 is connected to PSTN network 105; and a laptop computer 107-1, a mobile telephone 107-2, a machine-to-machine device 107-3 and a fixed wireless access device 107-4 are coupled to cellular/radio network 107.
  • Core network 101 includes a mobile switching center 101-1, a data support node 101- 2, a home location register 101-3, and other network functionality 101-4.
  • core network 101 also includes network degradation detection processor 101-5, which can comprise a severe and abrupt NPI degradation detection node, an NPI slow and persistent degradation detection node, or a combination of both.
  • Network degradation processor 101-5 is a processor that is configurable to perform the steps described in connection with Figures 2-4 and to perform the processes described herein.
  • Figure 2 is a flow diagram illustrating operation of a severe & abrupt NPI degradation detection node and rule in accordance with the principles of the present invention.
  • the degradation detection is taking place in a mobile telephone network, but it is understood that the invention is not limited to this example.
  • Specific exemplary algorithms for performing the steps described in connection with the figures are provided in later sections of this application.
  • processing begins by reading and storing the current (most recent) NPI OM, e.g., an operational measurement related to call success rate for the telephone network.
  • NPI OM e.g., an operational measurement related to call success rate for the telephone network.
  • call failures, SMS failures, handover failures etc will be occurring every hour, but on a per-second basis, there typically will not be very many such failures.
  • the NPI OMs will be taken every few minutes, every 10 minutes, etc. It is understood that the choice of how often to take such measurements is within the discretion of the network operator.
  • the mean (average) value of the last n immediately preceding NPI OMs is calculated by adding up the NPI OM values of the last n immediately-preceding NPI OMs and dividing the sum by n.
  • the value of n is small, e.g., 2 or 3, so that the current NPI OM value is being compared to only the last few NPI OMs rather than a larger window spanning a larger time period.
  • the network has NPI OMs taken every 10 minutes, and the current NPI OM value is taken at time t and if n is decided to be 3, then the NPI OM values at t-10minutes, t- 20 minutes, and t-30 minutes would be combined and then divided by 3 to determine the mean value for the purpose of step 203.
  • the value of n should be small, e.g., 2 or 3.
  • the value of n can be changed depending on the needs of the network operator, and an n value of 10 could, for example, still be considered "small" for the purpose of this invention.
  • the variance value of the last n NPI OMs is calculated by taking the standard deviation of the last n NPI OMs (the NPI OMs taken at t-10 minutes, t-20 minutes, and t-30 minutes in this example).
  • the processes of steps 103 and 105 are calculations of a moving average for the mean and variance values of the NPI OMs within the moving average window defined by the value of n.
  • a Severe and Abrupt Performance Degradation Threshold is calculated using the moving averages calculated in steps 203 and 205.
  • the threshold essentially identifies what was, in this example, a "normal" rate of call success (and thus call failures) over the last n (3 in this example) NPI measurement periods and establishes a predetermined rate of call success (and failure) that will be considered as acceptable.
  • a comparison is made to determine if the current NPI OM has crossed the SAPDT, indicating the existence of a severe performance degradation relative to the current moving average window. If the comparison indicates the existence of a severe performance degradation, the process proceeds to step 213 where a severe performance degradation alarm is triggered, and any action desired can be taken by the network operator or other monitoring entity. If the comparison indicates no existence of a severe performance degradation, the process proceeds back to step 201, where the moving window is "moved" and the process begins again on the next current NPI and on the new set of n NPI OMs.
  • the severe performance degradation alarms are used to indentify a probable network outage.
  • certain conditions for example, periods when heavy network traffic is experienced, may increase the number of certain NPI alarms for a particular NPI, but not for others.
  • the severe performance degradation alarms triggered using the process of Figure 2 can be used to identify such network outage conditions and trigger a potential outage alarm so that measures can be taken to investigate and/or correct any problems that may be occurring.
  • Step 301 indicates the beginning of a new NPI OM
  • n with regard to Figure 3 corresponds to the number of NPI OMs being measured by the system.
  • steps 305-1, 305-2, 305-3, ....304-n a determination is made as to whether or not a severe performance degradation (SPD) alarm has been issued for any of the NPIs being measured.
  • SPD severe performance degradation
  • a summing process 307 for example, if an SPD alarm has been issued for a particular NPI, a "1" can be forwarded to the summing process 307, and if no SPD alarm has been issued for a particular NPI, a "0" can be forwarded to the summing process 307).
  • the summing process determines how many of the NPI OMs are indicating an SPD condition as indicated by the issuance of an SPD alarm.
  • a determination is made as to whether or not a Potential Outage Alarm threshold had been met. This threshold can be arbitrarily set by the network operator so that a certain number of simultaneous SPD alarms during the same NPI measurement period must occur before an alarm condition is considered to exist.
  • the Potential Outage Alarm threshold is set at 3, and the if the sytems is set to issue an Potential Outage Alarm if the threshold is exceeded, then if 4 or more NPI OMs cause SPD alarms to be triggered at the same time, at step 313 a potential outage alarm is issued so that investigations and/or corrective actions can be instituted, and then at step 311 the summing process 307 is reset. If the potential outage alarm threshold is not exceeded (i.e., if the number of SPD alarms issued for a particular NPI measurement period is 3 or less, at step 311 the summing process 307 is reset (e.g., the sum is returned to zero to await the next NPI measurement period data).
  • the process detects potential outages by detecting simultaneous large abrupt and sudden change in, for example, the call success rates from multiple NPIs.
  • the false alarm probability is reduced while maximizing the probability of outage detection.
  • Each NPI has a severe and abrupt performance degradation dynamic threshold which adapts to changes in the most recent mean and variance values of the NPI success rate.
  • the severe and abrupt NPI degradation decision rule compares the most recent measurement of NPI call success rate against this dynamic threshold to determine if there is a statistically significant large and sudden change from the most recent mean value. If the NPI call success rate has dropped below the threshold, a severe performance degradation alarm is issued for that NPI.
  • a Potential Outage Alarm is set whenever a pre-determined number of severe performance degradation alarms are raised in the same NPI measurement period.
  • An objective of this embodiment is NPI Slow & Persistent Degradation Detection Based on NPI Monitoring.
  • Slow & persistent degradation of NPIs is detected to enable early detection of problems like memory leak and thus help reduce outage events.
  • the process uses the uniformly most powerful (UMP) hypothesis testing to identify the slow & persistent NPIs degradation events.
  • the UMP test is the test with highest detection probability under the constraint that the false alarm probability does not exceed a given value.
  • a slow & persistent NPI degradation decision rule compares a test statistic, which is a function of M most recent measurements of NPI call success rate, against a fixed threshold to determine if there is a statistically significant NPI degradation trend.
  • the slow performance degradation alarm is issued for that NPI. If the slow performance degradation alarm is currently set and the test statistic at time instant i for that NPI is greater than the threshold, the slow performance degradation alarm is cleared for that NPI.
  • the most recent NPI value is read and stored by the system.
  • a moving average is obtained for the NPI data immediately preceding the current NPI value; however, in contrast to the embodiment of Figure 2, in this embodiment, a larger window size is utilized, for example, the window could be of a size that would cover several days, weeks or months of NPI OMs.
  • the moving average can be obtained, for example, by low-pass filtering the data to smooth out the NPIs daily cycle fluctuation.
  • the slope of the data trend line resulting from the smoothed data is determined by, for example using the linear least square fit for some or all of the data trend line.
  • the determined slope is compared with a predetermined Slope and
  • step 411 it is determined that the determined slope does not meet or exceed the threshold, the process continues back to step 401 to perform the same steps for the next current NPI value.
  • the resetting of the alarms can be triggered by using similar processes that essentially flip the process around so that the threshold levels to be met are indicative of a recovery in performance (abatement) rather than a problem with performance.
  • NPI OMs dynamic thresholding approach automatically detects an onset/abatement of network performance anomaly events utilizing Network Performance Indicators Operation Measurement (NPI OMs) as input data.
  • Network performance anomaly events considered in this document are the following: 1. Severe and abrupt degradation of the NPI OM,
  • Table 1 NPI log, set/clear alarm behaviors based on network states
  • NPI Voice Core Network Performance Indicator
  • Section 4.1 presents an algorithm assuming floating point arithmetic is used whereas Section 4.2 present an algorithm when the calculation is to be performed using integer arithmetic.
  • T slow -1.6772
  • a test statistic constructed from a 7-day moving average estimate of the mean and variance of each of the NPI OMs can be used. Suppose there are J samples of the OMs over the 7-day period, then the sample mean value of the m -th NPI at the time instant k is given by
  • Performance Anomaly Clear Alarm Algorithm Use the same ordering of the MNPI as the severe & abrupt NPI degradation/outage detection.
  • a scheme for network performance anomaly detection has been disclosed based on 1) detecting severe and sudden change in the NPI OM 2) Detecting slow and persistent degradation of the NPI OM. Furthermore, utilizing multiple NPIs helps reduce the false alarm probability while maximizing the probability of outage detection.
  • Each NPI has two network performance degradation thresholds which dynamically adapts to changes in the most recent mean and variance values of the NPI success rate in the severe and abrupt NPI degradation detection.
  • the severe and abrupt performance degradation decision rule compares the most recent measurement of NPI call success rate against this dynamic threshold value in order to determine if there is a statistically significant large and sudden change from the most recent mean value. At any particular time instant at which the NPI call success rate dropped below the threshold, a network performance anomaly alarm is issued for that NPI.
  • the second dynamic threshold for network performance anomaly detection uses a low pass filter (i.e. long moving average window) to smooth out the normal NPIs daily fluctuation in order to discriminate real versus fictitious slow downward trend in the NPI OM
  • abatement dynamic thresholds are proposed. The first two thresholds concern the detection of the abatement event related to the severe and abrupt NPI degradation detection and the slow and persistent NPI detection algorithm. The last: abatement dynamic threshold is used to check whether the net-work performance has recovered to its long term average performance level. Once a network performance anomaly alarm has been set, it could only be cleared when the relevant NPI OM value exceeds all three abatement thresholds.
  • Software programming code which embodies the present invention is typically stored in permanent storage. In a client/server environment, such software programming code may be stored with storage associated with a server.
  • the software programming code may be embodied on any of a variety of known media for use with a data processing system, such as a diskette, or hard drive, or CD ROM.
  • the code may be distributed on such media, or may be distributed to users from the memory or storage of one computer system over a network of some type to other computer systems for use by users of such other systems.
  • the techniques and methods for embodying software program code on physical media and/or distributing software code via networks are well known and will not be further discussed herein.
  • program instructions may be provided to a processor to produce a machine, such that the instructions that execute on the processor create means for implementing the functions specified in the illustrations.
  • the computer program instructions may be executed by a processor to cause a series of operational steps to be performed by the processor to produce a computer-implemented process such that the instructions that execute on the processor provide steps for implementing the functions specified in the illustrations.
  • the figures support combinations of means for performing the specified functions, combinations of steps for performing the specified functions, and program instruction means for performing the specified functions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Monitoring And Testing Of Exchanges (AREA)

Abstract

In order to provide an early and more accurate determination of network problems, current NPI OMs are compared with samples of recent historical NPI OMs so that changes in the NPI OM are detected based on current overall network conditions rather than on conditions that may have existed at statistically insignificant earlier operational periods. By constantly adjusting a performance threshold, against which the current NPI OM is compared, by using a smaller and very recent sampling of NPIs (in the case of sudden and abrupt performance-NPI degradation detection) or a larger and greater number of NPIs over a wider time priod (in the case of slow and persistent NPI degradation detection) to establish the threshold, detection results are more accurate and meaningful.

Description

METHOD AND APPARATUS FOR TELECOMMUNICATIONS NETWORK PERFORMANCE ANOMALY EVENTS DETECTION AND NOTIFICATION
Cross-Reference To Related Applications
This application claims the priority of U.S. Provisional Application No. 61/225,672, filed July 15, 2009, the entire contents of which is hereby incorporated fully by reference.
Field Of The Invention
The invention pertains to the detection of network performance anomaly events based on Network Performance Indicator (NPI) Operational Measurements (OM).
Background Of The Invention
As communications technology has evolved, communications technology users have become increasingly reliant on the ability to communicate almost instantaneously with others all over the globe. With this technology seemingly available everywhere, users of network resources have come to perceive performance delays of as little as 2-3 seconds as unacceptable. Time delays in data transfers and dropped phone calls in mobile telephone systems irritate and alienate customers and thus, service providers try to pay close attention to performance problems and correct them as quickly as possible. Operational Measurements (OM's) in the context of network performance are network parameters that are measured and used as Network Performance Indicators (NPFs). These measurements can include call success rates, call termination rates, Quality of Service (QOS) measurements, traffic and routing measurements, network outage statistics, and the like. These OM's are typically measured over a fixed period of time, referred to as "OM transfer periods".
Early detection of network performance anomalies could help avoid network outage events. A slow and persistent degradation of NPIs can indicate an issue such as memory leak. Additionally, simultaneous large abrupt and sudden changes in, for example, the call success rates from multiple NPIs can indicate the onset of outage events (the outage can be partial, i.e. losing > 10% of capacity, or total outage). Therefore, it would be desirable to utilize the NPI process to help avoid or reduce the outage downtime of the network and other problems such as memory leak by devising a way to automatically process the NPIs to detect the occurrence of slow and persistent NPI OM degradation, severe and sudden degradation in NPI OM, and potential outage events and raise an appropriate log or alarm to alert the operator of the observed performance anomaly so that they can be investigated and dealt with in a timely manner.
There are many relevant existing stochastic process control algorithms that are routinely used in various industries to monitor product quality such as Shewhart, EWMA, and Page's CUSUM control charts. However, these standard quality control algorithms only deal with detecting deviations of the monitored quality metric from a fixed (known or unknown) mean value that is constant over time. In the NPI performance anomaly detection problem, the mean value of success rates can fluctuate slowly over time in normal operation (e.g., due to the change in traffic level or services usage pattern during the day), and thus only a statistically significant large and abrupt degradation, or a slow but steady degradation, from the most recent average success rates would indicate a possible onset of a new outage. This time-varying statistical characteristic of the NPI prevents direct application of these traditional stochastic process control algorithms.
Summary Of The Invention
In order to provide an early and more accurate determination of network problems, current NPI OMs are compared with samples of recent historical NPI OMs so that changes in the NPI OM are detected based on current overall network conditions rather than on conditions that may have existed at statistically insignificant earlier operational periods. By constantly adjusting a performance threshold, against which the current NPI OM is compared, by using a smaller and very recent sampling of NPIs (in the case of sudden and abrupt performance-NPI degradation detection) or a larger and greater number of NPIs over a wider time priod (in the case of slow and persistent NPI degradation detection) to establish the threshold, detection results are more accurate and meaningful. Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures. Brief Description Of The Drawings
Figure 1 illustrates a network environment that incorporates the claimed network degradation processor that is configured to perform the claimed steps of the invention;
Figure 2 is a flow diagram illustrating operation of a severe & abrupt NPI degradation detection node and rule in accordance with the principles of the present invention;
Figure 3 is a flow diagram illustrating the severe performance degradation alarms used to indentify a probable network outage; and
Figure 4 is a flow diagram illustrating the operation of an NPI Slow & Persistent Degradation Detection node based on NPI Monitoring. Detailed Description Of The Embodiments
The present invention will now be described in connection with an exemplary embodiment for a mobile (cellular) telephone network. However, it should be noted that the present invention is broadly applicable to many other types of network degradation detection schemes and to networks other than mobile telephone networks.
Figure 1 illustrates a network environment that incorporates the claimed network degradation processor that is configured to perform the claimed steps of the invention. As shown in Figure 1, a network environment 100 includes a core network 101 coupled to an internet network 103, a PSTN network 105, and a cellular/radio network 107. Various communication interfaces are coupled to the networks to enable communication between users of the network. For example, a VoIP phone 103-1, and laptop computer 103-2, and a desktop computer 103-3 are coupled to the internet network 103; a landline telephone 105-1 is connected to PSTN network 105; and a laptop computer 107-1, a mobile telephone 107-2, a machine-to-machine device 107-3 and a fixed wireless access device 107-4 are coupled to cellular/radio network 107.
Core network 101 includes a mobile switching center 101-1, a data support node 101- 2, a home location register 101-3, and other network functionality 101-4. In addition, however, core network 101 also includes network degradation detection processor 101-5, which can comprise a severe and abrupt NPI degradation detection node, an NPI slow and persistent degradation detection node, or a combination of both. Network degradation processor 101-5 is a processor that is configurable to perform the steps described in connection with Figures 2-4 and to perform the processes described herein.
Figure 2 is a flow diagram illustrating operation of a severe & abrupt NPI degradation detection node and rule in accordance with the principles of the present invention. In this example, the degradation detection is taking place in a mobile telephone network, but it is understood that the invention is not limited to this example. Specific exemplary algorithms for performing the steps described in connection with the figures are provided in later sections of this application.
As shown at step 201, processing begins by reading and storing the current (most recent) NPI OM, e.g., an operational measurement related to call success rate for the telephone network. In a typical large telephone network operating normally, call failures, SMS failures, handover failures etc, will be occurring every hour, but on a per-second basis, there typically will not be very many such failures. Thus, in such a system, the NPI OMs will be taken every few minutes, every 10 minutes, etc. It is understood that the choice of how often to take such measurements is within the discretion of the network operator.
At step 203, the mean (average) value of the last n immediately preceding NPI OMs is calculated by adding up the NPI OM values of the last n immediately-preceding NPI OMs and dividing the sum by n. For the purpose of detecting severe and abrupt degradation, the value of n is small, e.g., 2 or 3, so that the current NPI OM value is being compared to only the last few NPI OMs rather than a larger window spanning a larger time period. Thus, for example, if the network has NPI OMs taken every 10 minutes, and the current NPI OM value is taken at time t and if n is decided to be 3, then the NPI OM values at t-10minutes, t- 20 minutes, and t-30 minutes would be combined and then divided by 3 to determine the mean value for the purpose of step 203. In the example given above, it is suggested that the value of n should be small, e.g., 2 or 3. However, it is understood that the value of n can be changed depending on the needs of the network operator, and an n value of 10 could, for example, still be considered "small" for the purpose of this invention. At step 205, the variance value of the last n NPI OMs is calculated by taking the standard deviation of the last n NPI OMs (the NPI OMs taken at t-10 minutes, t-20 minutes, and t-30 minutes in this example). As is clear, the processes of steps 103 and 105 are calculations of a moving average for the mean and variance values of the NPI OMs within the moving average window defined by the value of n.
At step 207, a Severe and Abrupt Performance Degradation Threshold (SAPDT) is calculated using the moving averages calculated in steps 203 and 205. The threshold essentially identifies what was, in this example, a "normal" rate of call success (and thus call failures) over the last n (3 in this example) NPI measurement periods and establishes a predetermined rate of call success (and failure) that will be considered as acceptable.
Conversely, this also establishes the point at which the rate of call success (and failure) has become unacceptable. This enables a comparison of what was "normal" degradation over the previous n sample periods with what the current level of degradation is (described below with respect to steps 211 and 213. Specific examples of algorithms for performing the calculation of the SAPDT are provided later in this application.
At step 211, a comparison is made to determine if the current NPI OM has crossed the SAPDT, indicating the existence of a severe performance degradation relative to the current moving average window. If the comparison indicates the existence of a severe performance degradation, the process proceeds to step 213 where a severe performance degradation alarm is triggered, and any action desired can be taken by the network operator or other monitoring entity. If the comparison indicates no existence of a severe performance degradation, the process proceeds back to step 201, where the moving window is "moved" and the process begins again on the next current NPI and on the new set of n NPI OMs. Using the process of the embodiment described with respect to Figure 2, the comparison is made based on a changing and up-to-date threshold, in contrast to prior art systems which use fixed and potentially out-of-date thresholds In another embodiment of the invention, described now with respect to Figure 3, the severe performance degradation alarms are used to indentify a probable network outage. In the everyday operation of a network, certain conditions, for example, periods when heavy network traffic is experienced, may increase the number of certain NPI alarms for a particular NPI, but not for others. However, if within a particular NPI OM measurement period there are multiple NPI indicating severe performance degradation simultaneously, this could indicate a partial or complete network outage. In accordance with the operations described in connection with Figure 3, the severe performance degradation alarms triggered using the process of Figure 2 can be used to identify such network outage conditions and trigger a potential outage alarm so that measures can be taken to investigate and/or correct any problems that may be occurring.
Referring to Figure 3, Step 301 indicates the beginning of a new NPI OM
measurement period. In this example, there are multiple NPI OMs taking place
simultaneously, as indicated by steps 303-1, 303-2, 303-3, ....303-n (the value of n with regard to Figure 3 corresponds to the number of NPI OMs being measured by the system). At steps 305-1, 305-2, 305-3, ....304-n, a determination is made as to whether or not a severe performance degradation (SPD) alarm has been issued for any of the NPIs being measured. The process for issuing or not issuing an SPD alarm for each NPI can follow the process described with respect to Figure 2.
As can be seen, if an SPD alarm has been issued for a particular NPI OM, this fact is conveyed to a summing process 307 (for example, if an SPD alarm has been issued for a particular NPI, a "1" can be forwarded to the summing process 307, and if no SPD alarm has been issued for a particular NPI, a "0" can be forwarded to the summing process 307).
At step 307, the summing process determines how many of the NPI OMs are indicating an SPD condition as indicated by the issuance of an SPD alarm. At step 309, a determination is made as to whether or not a Potential Outage Alarm threshold had been met. This threshold can be arbitrarily set by the network operator so that a certain number of simultaneous SPD alarms during the same NPI measurement period must occur before an alarm condition is considered to exist. For example, if the Potential Outage Alarm threshold is set at 3, and the if the sytems is set to issue an Potential Outage Alarm if the threshold is exceeded, then if 4 or more NPI OMs cause SPD alarms to be triggered at the same time, at step 313 a potential outage alarm is issued so that investigations and/or corrective actions can be instituted, and then at step 311 the summing process 307 is reset. If the potential outage alarm threshold is not exceeded (i.e., if the number of SPD alarms issued for a particular NPI measurement period is 3 or less, at step 311 the summing process 307 is reset (e.g., the sum is returned to zero to await the next NPI measurement period data). To summarize the operations performed by the processes of Figures 2 and 3, the process detects potential outages by detecting simultaneous large abrupt and sudden change in, for example, the call success rates from multiple NPIs. The false alarm probability is reduced while maximizing the probability of outage detection. Each NPI has a severe and abrupt performance degradation dynamic threshold which adapts to changes in the most recent mean and variance values of the NPI success rate. The severe and abrupt NPI degradation decision rule compares the most recent measurement of NPI call success rate against this dynamic threshold to determine if there is a statistically significant large and sudden change from the most recent mean value. If the NPI call success rate has dropped below the threshold, a severe performance degradation alarm is issued for that NPI. A Potential Outage Alarm is set whenever a pre-determined number of severe performance degradation alarms are raised in the same NPI measurement period. An additional embodiment of the invention is described with reference to Figure 4.
An objective of this embodiment is NPI Slow & Persistent Degradation Detection Based on NPI Monitoring. Slow & persistent degradation of NPIs is detected to enable early detection of problems like memory leak and thus help reduce outage events. The process uses the uniformly most powerful (UMP) hypothesis testing to identify the slow & persistent NPIs degradation events. The UMP test is the test with highest detection probability under the constraint that the false alarm probability does not exceed a given value. In an example using call success rate as the NPI of interest, a slow & persistent NPI degradation decision rule compares a test statistic, which is a function of M most recent measurements of NPI call success rate, against a fixed threshold to determine if there is a statistically significant NPI degradation trend. If the test statistic at a time instant i drops below the threshold, the slow performance degradation alarm is issued for that NPI. If the slow performance degradation alarm is currently set and the test statistic at time instant i for that NPI is greater than the threshold, the slow performance degradation alarm is cleared for that NPI. Referring to Figure 4, at step 401, the most recent NPI value is read and stored by the system. At step 403, a moving average is obtained for the NPI data immediately preceding the current NPI value; however, in contrast to the embodiment of Figure 2, in this embodiment, a larger window size is utilized, for example, the window could be of a size that would cover several days, weeks or months of NPI OMs. Although the exact number of NPI OMs that would be considered "large" is a decision left up to the network operator, it is contemplated that a "larger window" would comprise more than 10 NPI measurement periods beyond the time period t. The moving average can be obtained, for example, by low-pass filtering the data to smooth out the NPIs daily cycle fluctuation.
At step 407, the slope of the data trend line resulting from the smoothed data is determined by, for example using the linear least square fit for some or all of the data trend line. At step 409, the determined slope is compared with a predetermined Slope and
Persistent Degradation threshold. At step 411, if the determined slope meets or exceeds the predetermined Slope and Persistent Degradation threshold, an Long Term Performance
Anomaly alarm is triggered so that investigation and/or corrective measures can be taken. If at step 411 it is determined that the determined slope does not meet or exceed the threshold, the process continues back to step 401 to perform the same steps for the next current NPI value.
For any of the alarm conditions described with respect to Figures 2-4, the resetting of the alarms can be triggered by using similar processes that essentially flip the process around so that the threshold levels to be met are indicative of a recovery in performance (abatement) rather than a problem with performance.
What follows are examples of specific algorithms and elements that can be used to perform the processes described in Figures 2-4.
1. Summary
An NPI OMs dynamic thresholding approach is described that automatically detects an onset/abatement of network performance anomaly events utilizing Network Performance Indicators Operation Measurement (NPI OMs) as input data. Network performance anomaly events considered in this document are the following: 1. Severe and abrupt degradation of the NPI OM,
2. Potential network outage events,
3. Slow and persistent degradation of the NPI OM. The following are the NPI performance anomaly onset and abatement events defined in this algorithm:
Al/) Severe and abrupt performance degration event detected for the ith NPI
A2/) Slow and persistent performance degradation event detected for the ith NPI
A3) Network outage event detected (detection of simultaneous severe and abrupt performance degration from the NPIs in some non-empty set K
Rl/) Severe and abrupt performance anomaly event recovery detected for the ith NPI
R2/) Slow and persistent performance anomaly event recovery detected fro the ith NPI
R3) Network outage event recovery detected (detection of recovery from performance degradation for every NPIs in non-empty set K
R4z) Recovery to long term average performance event detected for the ith NPI
Figure imgf000011_0001
Table 1 : NPI log, set/clear alarm behaviors based on network states
2. Introduction
G/U and CDMA Voice Core Network Performance Indicator (NPI) process provides measurement of various call success rates at the end of every k OM transfer periods. A low and persistent degradation of NPIs can indicate an issue such as memory leak. Early detection of the network performance anomaly problem could help avoid network outage events. Additionally, simultaneous large abrupt and sudden changes in the call success rates from multiple NPIs can indicate the onset of an outage event (the outage can be partial, i.e., losing > 10% of capacity, or total outage), therefore, it is of interest to utilize the NPI process to help avoid or reduce the outage downtime of the network by devising an algorithm to automatically process the NPIs data in order to detect occurrence of slow and persistent NPI OM degradation, severe and sudden degradation in NPI OM, and potential outage events and raise an appropriate log or alarm to alert the operator of the observed performance anomaly so that they can be investigated and dealt with in a timely manner. 3. Detection of Severe and Abrupt NPI Degradation and Potential Outage Event
An algorithm for detecting severe and abrupt degradation of NPI and the detecting a potential outage event is summarized in the next two Sections. Section 4.1 presents an algorithm assuming floating point arithmetic is used whereas Section 4.2 present an algorithm when the calculation is to be performed using integer arithmetic.
3.1 Summary of the Severe and Abrupt NPI Degradation/Outage
Detection Algorithm (Floating Point Implementation)
0) Initialization: Define the ordering of the AfNPI, e.g., m = 1 for the Mobile Originated call success rate, m = 2 or the Land Originated call success rate, etc. Then,
- read in the first NPI call success rates measurement, umιCurrent,m = 1, 2, ..., M, and set
- c=3, λ=0.8, b = \ + '(l-λ)(2 + λ)f
1 + λ
- values of the m -th NPI sever and abrupt NPI degradation secondary threshold; gmJ), m=l, 2, ..., M
- values of the m -th NPI performance anomaly set alarm indicator; pm=0, m=l, 2, ..., M
- values of the set outage alarm indicator; P=O
- values of the m -th NPI call success rates; wm, prevιous=um, current, m=\ ,2, ..., M
- Values Of the Weighted mean Of m -th NPI; Wm, current = «m, current, W m, previous = Umi current, m=l, 2, ..., M
- values of the standard variation of m -th NPI; vm, current = Mb, σmj^rev, = Mb, m= 1, 2, ..., M - values of the severe and abrupt NPI degradation thresholds; Tm = 0, m = 1,2,. . M. (This will suppress the outage alarm at the next decision instant after initialization.)
- values of the severe and abrupt NPI degradation abatement thresholds; Zm = 100, m = 1,2,...., M.
- value of the outage detection threshold; routage = 3
- values of the abrupt change decision function value; Jl m = 0, m - 1,2,.. M
- value of the outage decision function value; D = O (i.e., no outage alarm at initialization.) 1) At the next NPI measurement time instant, read in the new NPI values,
Um.current, W = 1,2, . . ., M
2) For m = 1 ,2, ..., M, set the abrupt change decision function value
Q> if u m current - ^m (there is no sudden change of the m'h 'NF 'I)
JL = dim, if T1n≤um curren, < Zm (there is no change of the m'h NPI alarm state)
1> '/ u m current < Tm (there is a sudden change oj 'the m"1 NPI)
3) For m = 1 ,2, ... M, if Jl m = 1 and pm = 0, issue an alarm for m -th NPI performance anomaly and set/>m = 1, gm = um,previous - 10
4) Compute the outage decision function value
0, z/^] =i Jl m = 0 (no outage in this NPI measurement interval)
D = I D,if O < γ^ d\m < Toulage (no change in outage state)
1, // ^ Jl m≥ Tmlage (possible outage in this NPI measurement interval) .
5) If Z) = 1 and P = O, issue an alarm for possible outage and set P = 1
6) If D = 0 and P = I, clear outage alarm and set P = 0
7) For m = 1,2,... M, set yvmιCWrent = rkum>current + (1 - λ) wm,prewo^
8) For m = 1,2,... M, set
Figure imgf000014_0001
where
Figure imgf000014_0002
9) For m, = 1,2,... M, set the severe and abrupt NPI degradation onset/abatement thresholds
Figure imgf000014_0003
Figure imgf000014_0004
where
Figure imgf000014_0005
10) Set um,previous = um,current, m = 1 ,2, ... M
11) Set wm,previous = Wm,current, m = 1,2,... M
12) Set σm,previous = σm,current, m = 1,2,... M
13) Repeat step 1). 3.2 Summary of a Severe & Abrupt NPI Degradation /Outage
Detection Algorithm (Integer Arithmetic Implementation)
0) Initialization: Define the ordering of the MNPI, e.g. m = 1 for the Mobile Originated call success rate, m = 2 or the Land Originated call success rate, etc. Then,
- read in the first NPI call success rates measurement, umiCυrrent, m = l,2,..., M, and set
Figure imgf000015_0001
- values of the m -th NPI severe and abrupt NPI degradation secondary threshold; gm = 0, m = 1,2,..., M
- values of the m -th NPI performance anomaly set alarm indicator; pm = 0, m = 1,2,..., M
- values of the set outage alarm indicator; P = O
- values of the m -th NPI call success rates; um,previous = um,current, m = 1,2,..., M
- values of the weighted mean of the m -th NPI; wm,current = um,current, wm,previous =
Um,current, m = 1 ,2, . . ., M
- values of the standard variation ofm -th NPI; am,current = 10000/έ = 64, σm,previous = 10000/b = 64, m = 1 ,2, . .. M
- values of the severe and abrupt NPI degradation thresholds; Tm = 0, m = 1,2,..., M. (This will suppress the outage alarm at the next decision instant after initialization.)
- values of the severe and abrupt NPI degradation abatement thresholds; Zm = 100, m = 1,2,..., M.
- value of the outage detection threshold; Toutage = 3
- values of the abrupt change decision function value; d1m = 0, m = 1,2,..., M
- value of the outage decision function value; D = O (i.e., no outage alarm at initialization.) 1) At the next NPI measurement time instant, read in the new NPI values,
Um,current, m = 1 ,2, . .. M
2) For /w = 1,2,..., M, set the abrupt change decision function value
Figure imgf000016_0004
3) For m = 1 ,2, .. M, if d\ m = 1 and pm = 0, issue an alarm for m -th NPI performance anomaly and setpm = \, gm = um:Prevιous - 10
4) Compute the outage decision function value
Figure imgf000016_0001
5) If D = 1 and P = O, issue an alarm for possible outage and set P = 1
6) If Z) = 0 and P - I, clear outage alarm and set P = 0
7) For m = 1,2,... M, set
Figure imgf000016_0003
8) For m = 1,2,... M, set
Figure imgf000016_0002
where
Figure imgf000017_0001
9) For m = 1,2,.. M, set the severe and abrupt NPI degradation onset/abatement thresholds
Figure imgf000017_0002
where
Figure imgf000017_0003
10) Set um,previous = um,current, m = 1 ,2, ... M
11) Set wm,previous = Wm,current, m = 1,2,... M
12) Set σm,previous = σm,current, m = 1,2,... M
13) Repeat step 1). 4. Slow and Persistent NPI Degradation Detection Algorithm
4.1 Summary of a Slow and Persistent NPI Degradation
Detection Algorithm (Floating Point Implementation)
0) Initialization: Use the same ordering of the MNPI as in the severe & abrupt
NPI degradation/outage detection algorithm. Then initializes
values of the NPI OM moving average window; W= 100
values of the slow and persistent NPI degradation moving test window; N= 50
values of the m-th ΝPI slow and persistent degradation test statistics; Qm=0,m=l,2,...,M
values of the minimum negative slop threshold of -3% per week, b0 =
Figure imgf000018_0001
values of the W present and past w-th ΝPI values; umj = -\,j = 0, 1, ..., W, m= 1,2, ...,M; um,new=0,m= 1,2, ...M
values oftheNrø-th ΝPI average values;^ = -l,j = 0, 1, ...,N,m = 1,2,...,M
Figure imgf000018_0002
Figure imgf000018_0003
values of the w-th ΝPI performance anomaly set alarm indicator; pm = 0, m= 1,2, ...,Af
values of the slow and persistent ΝPI degradation decision function value; d2m = 0, m = 1, 2, ..., M
values of the slow and persistent ΝPI degradation detection threshold; Tslow= -1.6772
1) At the next ΝPI measurement time instant, read in the new ΝPI values,
Um,new, m = 1,2, ...M 2) For m = 1, 2, .. M, update the set of W+ l present and past m -th NPI values
Figure imgf000019_0001
Figure imgf000019_0007
3) For m = 1, 2, ... M, if ωm,o≠ -1, update the set of Nm -th ΝPI average values fory = 1 : N-I, set ymj = ymj+!i end
Figure imgf000019_0005
4) For m = 1,2, ...M, if ymj≠-l, update the m -th ΝPI slow and persistent degradation test statistics Qm:
Figure imgf000019_0002
Figure imgf000019_0003
Figure imgf000019_0004
Figure imgf000019_0006
5) For m = 1, 2, ...M, if ymj = -1, set the slow and persistent ΝPI degradation decision function value (there is no slow change of the m'h NPI)
(there is a slow change of the mlh NPI)
Figure imgf000020_0002
6) For m = 1 , 2,m ... M, if d2m = 1 and pm = 0, issue an alarm for m -th NPI
performance anomaly and setpm = 1
7) Repeat stepl).
4.2 Summary of a Slow and Persistent NPI Degradation
Detection Algorithm (Integer Arithmetic Implementation: Signed 32 Bits)
0) Initialization: Use the same ordering of the MNPI as in the severe & abrupt NPI degradation/outage detection algorithm, then initializes
values of the NPI OM moving average window; W= 100 values of the slow and persistent NPI degradation moving test window;
N= so
values of the m -th ΝPI slow and persistent degradation test statistics; Qm= 0, m = l, 2, ..., M
values of the W present and past m -th ΝPI values; umj = -l,j = 0, 1, ...,
W, m = 1, 2, ..., M; um,new = 0, m = 1, 2, .. M
values of the N m -th ΝPI average values; ymj = - 1 , _/' = 0, 1 , ... , N, m = 1, 2, ..., M
- value of the integer scale factor; G = 1000
values of the minimum negative slop threshold of -3% per week,
Figure imgf000020_0003
Figure imgf000020_0001
Figure imgf000020_0004
values of the m-th NPI performance anomaly set alarm indicator; pm = 0,m = 1,2, ...,M
values of the slow and persistent NPI degradation decision function value; d2m = 0,m = 1,2, ..., M
values of the slow and persistent NPI degradation detection threshold;
Figure imgf000021_0001
1) At the next NPI measurement time instant, read in the new NPI values,
Um,new,m = 1,2, ...M
2) For m = 1, 2, ...M, update the set of W+ 1 present and past m-th NPI values fory = 0:W-l,setumj = umj+1, end
um, W ~ Um.new
3) For m = 1, 2, ... M, if wm,o≠-l, update the set of N m-th ΝPI average values for j = 1 : N-I, set ymj = ymJ+i, end
Figure imgf000021_0005
Figure imgf000021_0002
4) For m = 1,2, .. M,
Figure imgf000021_0003
≠-l, update the m-th ΝPI slow and persistent
degradation test statistics Qm:
Figure imgf000021_0004
Figure imgf000022_0001
5) For m = 1, 2, .. M, \ϊym,ι≠ -1, set the slow and persistent NPI degradation decision function value (there is no slow change of the m'h NPI)
(there is a slow change of the m'h NPI)
Figure imgf000022_0002
6) For m = 1 , 2,m ... M, if d2m = 1 and pm = 0, issue an alarm for m -th NPI
performance anomaly and setpm = 1
7) Repeat stepl).
5. Recovery of NPI Long Term Average Performance Detection Algorithm
Once the system has entered the network performance anomaly state and the alarm has been set, in order to declare that the network performance anomaly event has abated and the system has entered the 'normal' state it is necessary to make sure that the system
performance has reverted back to its most recent long term average performance. To achieve this recovery detection goal, a test statistic constructed from a 7-day moving average estimate of the mean and variance of each of the NPI OMs can be used. Suppose there are J samples of the OMs over the 7-day period, then the sample mean value of the m -th NPI at the time instant k is given by
Figure imgf000022_0003
Let qmtk '■ then the sample variance of the m -th NPI at the time instant k is
Figure imgf000022_0004
given by
Figure imgf000023_0001
With the above recursive relations for sample mean and sample variance, it is straight forward to construct a recovery to long term performance detection algorithm.
5.1 Summary of a Recovery of NPI Long Term Average
Performance Detection Algorithm (Floating Point Implementation)
O.a) Initialization Option 1 (Without using data from prior to the start time of the algorithm): Use the same ordering of the MNPI as in the severe & abrupt NPI degradation/outage detection algorithm. Then initializes - values of the NPI OM moving average window is the number of samples during a
7-day period; J
- values of the m -th NPI long term average performance recovery threshold; Vm=0, m=\,2,..., M
- values of the J present and past w-th NPI; umj = -IJ = 0,1,..., J 0,1,..., J, m = 1,2,..., M; Um,new, 0, m = 1,2,... M
- values of the m -th NPI sample average; lm = -1, m = 1 ,2, . .. M
- values of the m -th NPI sum of square; qm=J2, m=1,2,..., M
- values of the m -th NPI sample variance; LM=O, m= 1 ,2, ..., M
- values of the m -th NPI long term average performance recovery decision function value; d3m=0, m=\,2,..., M
1) At the next NPI measurement time instant, read in the new NPI values, Um,new, m=1,2,...
M
2) For m = 1,2, ... M, update the set of J +7 present and past m -th NPI values forj = 0:J-1, set umJ = umJ+1, end
Figure imgf000024_0006
3) For m= 1,2,... M, ιϊ uo φ-\, update the m-th NPI sample average values
Figure imgf000024_0003
4) For w= 1,2,.. M, if wm,o≠ -1, update the m-th NPI sample variance values
Figure imgf000024_0001
5) For m= l,2,...m,ifumo≠-l, update the m-th NPI sample sum of squares values
Figure imgf000024_0004
6) For m = 1 ,2, ... M, if lm≠ 100, update the m-th NPI long term average performance recovery threshold Vm:
Figure imgf000024_0002
For m = 1,2,... M, if lm≠ 100, set the long term average performance recovery decision function value (recovery of long term average performance m'h NPI)
Figure imgf000024_0005
(l°ng term average performance has not recovered yet m'h NPI) 7) Repeat step 1).
5.2 Summary of a Recovery of NPI Long Term Average
Performance Detection Algorithm (Integer Arithmetic Implementation)
0) Initialization: Use the same ordering of the MNPI as in the severe & abrupt NPI degradation/outage detection algorithm. Then initializes
- values of the NPI OM moving average window is the number of samples during a 7-day period; J
- values of the m -th NPI long term average performance recovery threshold; Vn, = 0, m = 1,2,..., M - values of the J present and past m -th NPI; umj- = -l,j = 0,l,..., J, m = \,2,..., M; um,neW = 0, m=l,2,... M
- values of the m -th NPI sample average; lm = 100, m = 1,2,..., M
- values of the m -th NPI sum of square; qm = Q, m = 1,2,..., M
- values of the m -th NPI sample variance; Lm = 0, m = 1,2,...,M - values of the m -th NPI long term average performance recovery decision function value; d3m = 0, m = 1,2,..., M
1) At the next NPI measurement time instant, read in the new NPI values, um,new> m = 1,2,...
M
2) For m = 1,2, ... M, update the set of J +7 present and past m -th NPI values forj = 0 : 1— 7, set umj = umj+i, end
Figure imgf000025_0002
3) For m = 1,2,...M, if umβ≠ -I, update the m -th NPI sample average values
Figure imgf000025_0001
4) For m = 1,2,... M, if umio≠ -\, update the m -th NPI sample variance values
Figure imgf000026_0004
5) For m = 1,2,... M, if wm,ø≠ ~1> update the m -th NPI sample sum of square values
Figure imgf000026_0001
6) For m = 1,2,... M, if lm≠ -100, update the m -th NPI long term average performance
recovery threshold Vm:
Figure imgf000026_0002
7) For m = 1 ,2, ... M, if lm≠ -100, set the long term average performance recovery decision function value
(recovery of long term average performance m'h NPI)
(long term average performance has not recovered yet m'h NPI
Figure imgf000026_0003
8) Repeat step 1).
6. Performance Anomaly Clear Alarm Algorithm Use the same ordering of the MNPI as the severe & abrupt NPI degradation/outage detection.
1) At the next NPI measurement time instant after finish executing all the three performance anomaly detection algorithm, for m = 1,2,... M, read dlm, d2m, d3m andpm
2) lϊdlm = 0, d2m = 0, d3m = 0, and/>m = 1, clear alarm for NPI performance anomaly of the m -th NPI and set/?m = 0 3) Repeat 1).
As set forth above, a scheme for network performance anomaly detection has been disclosed based on 1) detecting severe and sudden change in the NPI OM 2) Detecting slow and persistent degradation of the NPI OM. Furthermore, utilizing multiple NPIs helps reduce the false alarm probability while maximizing the probability of outage detection. Each NPI has two network performance degradation thresholds which dynamically adapts to changes in the most recent mean and variance values of the NPI success rate in the severe and abrupt NPI degradation detection. The severe and abrupt performance degradation decision rule compares the most recent measurement of NPI call success rate against this dynamic threshold value in order to determine if there is a statistically significant large and sudden change from the most recent mean value. At any particular time instant at which the NPI call success rate dropped below the threshold, a network performance anomaly alarm is issued for that NPI.
The second dynamic threshold for network performance anomaly detection uses a low pass filter (i.e. long moving average window) to smooth out the normal NPIs daily fluctuation in order to discriminate real versus fictitious slow downward trend in the NPI OM
performance. To ascertain that the network performance anomaly event has abated, three abatement dynamic thresholds are proposed. The first two thresholds concern the detection of the abatement event related to the severe and abrupt NPI degradation detection and the slow and persistent NPI detection algorithm. The last: abatement dynamic threshold is used to check whether the net-work performance has recovered to its long term average performance level. Once a network performance anomaly alarm has been set, it could only be cleared when the relevant NPI OM value exceeds all three abatement thresholds.
The above-described steps can be implemented using standard well-known programming techniques. The novelty of the above-described embodiment lies not in the specific programming techniques but in the use of the steps described to achieve the described results. Software programming code which embodies the present invention is typically stored in permanent storage. In a client/server environment, such software programming code may be stored with storage associated with a server. The software programming code may be embodied on any of a variety of known media for use with a data processing system, such as a diskette, or hard drive, or CD ROM. The code may be distributed on such media, or may be distributed to users from the memory or storage of one computer system over a network of some type to other computer systems for use by users of such other systems. The techniques and methods for embodying software program code on physical media and/or distributing software code via networks are well known and will not be further discussed herein.
It will be understood that each element of the illustrations, and combinations of elements in the illustrations, can be implemented by general and/or special purpose hardware- based systems that perform the specified functions or steps, or by combinations of general and/or special-purpose hardware and computer instructions.
These program instructions may be provided to a processor to produce a machine, such that the instructions that execute on the processor create means for implementing the functions specified in the illustrations. The computer program instructions may be executed by a processor to cause a series of operational steps to be performed by the processor to produce a computer-implemented process such that the instructions that execute on the processor provide steps for implementing the functions specified in the illustrations.
Accordingly, the figures support combinations of means for performing the specified functions, combinations of steps for performing the specified functions, and program instruction means for performing the specified functions.
Although the present invention has been described with respect to a specific preferred embodiment thereof, various changes and modifications may be suggested to one skilled in the art and it is intended that the present invention encompass such changes and
modifications as fall within the scope of the appended claims.

Claims

We claim:
1. A method of detecting performance anomaly events in a communication network comprising at least a network degradation detection processor, the method comprising:
reading current Network Performance Indicator (NPI) Operational Measurements
(OMs) at a current time t into the degradation detection processor;
determining, using the degradation detection processor, a dynamic performance degradation threshold which changes based on a plurality of historical NPI OM values immediately preceding the current NPI OMs;
comparing, using the degradation detection processor, the current NPI OMs with the dynamic performance degradation threshold; and
generating an indication of an alarm condition if the comparison indicates the existence of a performance degradation condition.
2. The method of claim 1, wherein the dynamic performance degradation threshold comprises a severe and abrupt performance degradation threshold, and wherein the degradation detection processor repeatedly takes NPI OMs during NPI measurement periods, and wherein the severe and abrupt performance degradation threshold is determined using the average value and variance value of the NPI OMs read by the degradation detection processor during no more than the last ten NPI measurement periods immediately preceding the current NPI OM.
3. The method of claim 1, wherein the dynamic performance degradation threshold comprises a severe and abrupt performance degradation threshold, and wherein the degradation detection processor repeatedly takes NPI OMs during NPI measurement periods, and wherein the severe and abrupt performance degradation threshold is determined using the average value and variance value of the NPI OMs read by the degradation detection processor during no more than the last five NPI measurement periods immediately preceding the current NPI OM.
4. The method of claim 2, further comprising:
inputting a predetermined potential outage alarm threshold value into the degradation detection processor, said predetermined potential outage alarm threshold corresponding to the occurrence of a predetermined plurality of alarm condition indications being generated by said degradation detection processor within the same NPI measurement period;
monitoring the number of alarm condition indications generated by said degradation detection processor during a particular NPI measurement period;
comparing the number of alarm condition indication generated by said degradation detection processor during said particular NPI measurement period with said potential outage alarm threshold value; and
generating a potential outage alarm if the comparison of the number of alarm condition indications and said potential outage alarm threshold value indicates the existence of a potential outage condition.
5. A method of detecting performance anomaly events in a communication network comprising at least a network degradation detection processor, the method comprising:
reading current Network Performance Indicator (NPI) Operational Measurements (OMs) at a current time t into the degradation detection processor;
determining, using the degradation detection processor, a data trend line based on a moving average of NPI OMs for historical NPI OM values immediately preceding the current NPI OMs;
determining, using the degradation detection processor, the slope of the data trend line;
comparing, using the degradation detection processor, the slope of the data trend line with a predetermined slow & persistent degradation threshold; and
generating an indication of an alarm condition if the comparison indicates the existence of a long-term performance anomaly.
6. The method of claim 5, wherein the degradation detection processor repeatedly takes NPI OMs during NPI measurement periods and wherein the data trend line is determined the moving average NPI OMs read by the degradation detection processor during more than, but including, the last ten NPI measurement periods immediately preceding the current NPI OM.
7. A computer readable medium containing computer instructions for detecting performance anomaly events in a communication network comprising at least a network degradation detection processor, the medium comprising computer executable instructions for:
reading current Network Performance Indicator (NPI) Operational Measurements
(OMs) at a current time t into the degradation detection processor;
determining a dynamic performance degradation threshold which changes based on a plurality of historical NPI OM values immediately preceding the current NPI OMs;
comparing the current NPI OMs with the dynamic performance degradation threshold; and
generating an indication of an alarm condition if the comparison indicates the existence of a performance degradation condition.
8. The computer readable product of claim 7, wherein the dynamic performance degradation threshold comprises a severe and abrupt performance degradation threshold, and wherein the degradation detection processor repeatedly takes NPI OMs during NPI measurement periods, and wherein the severe and abrupt performance degradation threshold is determined using the average value and variance value of the NPI OMs read by the degradation detection processor during no more than the last ten NPI measurement periods immediately preceding the current NPI OM.
9. The computer readable product of claim 7, wherein the dynamic performance degradation threshold comprises a severe and abrupt performance degradation threshold, and wherein the degradation detection processor repeatedly takes NPI OMs during NPI measurement periods, and wherein the severe and abrupt performance degradation threshold is determined using the average value and variance value of the NPI OMs read by the degradation detection processor during no more than the last five NPI measurement periods immediately preceding the current NPI OM.
10. The computer readable product of claim 8, further comprising: inputting a predetermined potential outage alarm threshold value into the degradation detection processor, said predetermined potential outage alarm threshold corresponding to the occurrence of a predetermined plurality of alarm condition indications being generated by said degradation detection processor within the same NPI measurement period;
monitoring the number of alarm condition indications generated by said degradation detection processor during a particular NPI measurement period;
comparing the number of alarm condition indication generated by said degradation detection processor during said particular NPI measurement period with said potential outage alarm threshold value; and
generating a potential outage alarm if the comparison of the number of alarm condition indications and said potential outage alarm threshold value indicates the existence of a potential outage condition.
PCT/US2010/042192 2009-07-15 2010-07-15 Method and apparatus for telecommunications network performance anomaly events detection and notification WO2011009000A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CA2768220A CA2768220A1 (en) 2009-07-15 2010-07-15 Method and apparatus for telecommunications network performance anomaly events detection and notification
EP10735395A EP2454848A1 (en) 2009-07-15 2010-07-15 Method and apparatus for telecommunications network performance anomaly events detection and notification
US13/383,971 US8862119B2 (en) 2009-07-15 2010-07-15 Method and apparatus for telecommunications network performance anomaly events detection and notification
US14/489,925 US20150004964A1 (en) 2009-07-15 2014-09-18 Method and apparatus for telecommunications network performance anomaly events detection and notification

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US22567209P 2009-07-15 2009-07-15
US61/225,672 2009-07-15

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US13/383,971 A-371-Of-International US8862119B2 (en) 2009-07-15 2010-07-15 Method and apparatus for telecommunications network performance anomaly events detection and notification
US14/489,925 Continuation US20150004964A1 (en) 2009-07-15 2014-09-18 Method and apparatus for telecommunications network performance anomaly events detection and notification

Publications (1)

Publication Number Publication Date
WO2011009000A1 true WO2011009000A1 (en) 2011-01-20

Family

ID=42674592

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/042192 WO2011009000A1 (en) 2009-07-15 2010-07-15 Method and apparatus for telecommunications network performance anomaly events detection and notification

Country Status (4)

Country Link
US (2) US8862119B2 (en)
EP (1) EP2454848A1 (en)
CA (1) CA2768220A1 (en)
WO (1) WO2011009000A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9414244B2 (en) 2013-07-22 2016-08-09 Motorola Solutions, Inc. Apparatus and method for determining context-aware and adaptive thresholds in a communications system

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10381869B2 (en) * 2010-10-29 2019-08-13 Verizon Patent And Licensing Inc. Remote power outage and restoration notification
CN102955719B (en) * 2011-08-31 2016-06-22 国际商业机器公司 The defining method of doubtful memory overflow and device
US10938650B1 (en) * 2018-03-07 2021-03-02 Amdocs Development Limited System, method, and computer program for improving a quality of experience based on artificial intelligence
US9638792B2 (en) * 2014-07-09 2017-05-02 Verizon Patent And Licensing Inc. Method and apparatus for detecting obstacles in propagation path of a wireless communication signal
US10390341B2 (en) * 2014-10-31 2019-08-20 Realtek Semiconductor Corp. Wireless communication system and associated wireless communication method
EP3041283B1 (en) * 2014-12-30 2019-05-29 Comptel Corporation Prediction of failures in cellular radio access networks and scheduling of preemptive maintenance
US20170070397A1 (en) * 2015-09-09 2017-03-09 Ca, Inc. Proactive infrastructure fault, root cause, and impact management
US10546241B2 (en) * 2016-01-08 2020-01-28 Futurewei Technologies, Inc. System and method for analyzing a root cause of anomalous behavior using hypothesis testing
US10609587B2 (en) 2016-05-01 2020-03-31 Teoco Corporation System, method, and computer program product for location-based detection of indicator anomalies
CN110266532B (en) * 2019-06-17 2021-11-16 福建广电网络集团股份有限公司 CMTS-based unified network management method and system
CN113758608B (en) * 2020-07-30 2023-11-07 北京京东振世信息技术有限公司 Alarm processing method and device
CN112565009A (en) * 2020-11-27 2021-03-26 中盈优创资讯科技有限公司 Processing method and device based on custom performance threshold alarm rule
CN115001954B (en) * 2022-05-30 2023-06-09 广东电网有限责任公司 Network security situation awareness method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1065827A1 (en) * 1999-06-29 2001-01-03 Lucent Technologies Inc. Method and apparatus for detecting service anomalies in transaction-oriented networks
US20080016412A1 (en) * 2002-07-01 2008-01-17 Opnet Technologies, Inc. Performance metric collection and automated analysis
US7467067B2 (en) * 2006-09-27 2008-12-16 Integrien Corporation Self-learning integrity management system and related methods
US7542428B1 (en) * 2004-01-28 2009-06-02 Cox Communications, Inc. Geographical network alarm viewer

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7765294B2 (en) * 2006-06-30 2010-07-27 Embarq Holdings Company, Llc System and method for managing subscriber usage of a communications network
US8130793B2 (en) * 2006-08-22 2012-03-06 Embarq Holdings Company, Llc System and method for enabling reciprocal billing for different types of communications over a packet network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1065827A1 (en) * 1999-06-29 2001-01-03 Lucent Technologies Inc. Method and apparatus for detecting service anomalies in transaction-oriented networks
US20080016412A1 (en) * 2002-07-01 2008-01-17 Opnet Technologies, Inc. Performance metric collection and automated analysis
US7542428B1 (en) * 2004-01-28 2009-06-02 Cox Communications, Inc. Geographical network alarm viewer
US7467067B2 (en) * 2006-09-27 2008-12-16 Integrien Corporation Self-learning integrity management system and related methods

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9414244B2 (en) 2013-07-22 2016-08-09 Motorola Solutions, Inc. Apparatus and method for determining context-aware and adaptive thresholds in a communications system

Also Published As

Publication number Publication date
US8862119B2 (en) 2014-10-14
CA2768220A1 (en) 2011-01-20
US20130324111A1 (en) 2013-12-05
EP2454848A1 (en) 2012-05-23
US20150004964A1 (en) 2015-01-01

Similar Documents

Publication Publication Date Title
EP2454848A1 (en) Method and apparatus for telecommunications network performance anomaly events detection and notification
US8175253B2 (en) System and method for automated performance monitoring for a call servicing system
US8700761B2 (en) Method and system for detecting and managing a fault alarm storm
US7855952B2 (en) Silent failure identification and trouble diagnosis
CN108989135B (en) Network equipment fault detection method and device
KR20060051451A (en) Method of monitoring wireless network performance
CN113824768B (en) Health check method and device in load balancing system and flow forwarding method
CN109861856B (en) Method and device for notifying system fault information, storage medium and computer equipment
CN110955586A (en) System fault prediction method, device and equipment based on log
WO2018125628A1 (en) A network monitor and method for event based prediction of radio network outages and their root cause
CN109992473A (en) Monitoring method, device, equipment and the storage medium of application system
CN106933677A (en) System exception processing method and processing device
CN107231344B (en) Flow cleaning method and device
CN112152833B (en) Network abnormity alarm method and device and electronic equipment
US20150281008A1 (en) Automatic derivation of system performance metric thresholds
WO2017059904A1 (en) Anomaly detection in a data packet access network
CN112436958B (en) Method, system, device and medium for predicting failure of data center network device
CN111601329A (en) Method and device for processing port interrupt alarm
CN115002001B (en) Method, device, equipment and medium for detecting sub-health of cluster network
CN111078503A (en) Abnormity monitoring method and system
CN116416764A (en) Alarm threshold generation method and device, electronic equipment and storage medium
CN101192962A (en) Generation and recovery method for adhesion value alarm in telecom network management system
CN109710552B (en) Bus transmission quality evaluation method, system and computer storage medium
CN113297039A (en) Data monitoring method and device
US7908546B2 (en) Methods and apparatus for detection of performance conditions in processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10735395

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2768220

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2010735395

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 13383971

Country of ref document: US