WO2003054704A1 - Method and system for analyzing and predicting the behavior of systems - Google Patents

Method and system for analyzing and predicting the behavior of systems Download PDF

Info

Publication number
WO2003054704A1
WO2003054704A1 PCT/US2002/040837 US0240837W WO03054704A1 WO 2003054704 A1 WO2003054704 A1 WO 2003054704A1 US 0240837 W US0240837 W US 0240837W WO 03054704 A1 WO03054704 A1 WO 03054704A1
Authority
WO
WIPO (PCT)
Prior art keywords
input variable
time
selected input
computing
variance
Prior art date
Application number
PCT/US2002/040837
Other languages
French (fr)
Inventor
David Helsper
David Hamoki
Amanda Rasmussen
Robert Jannarone
Jean-Fran¢ois HUARD
Original Assignee
Netuitive Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netuitive Inc. filed Critical Netuitive Inc.
Priority to EP02795970A priority Critical patent/EP1468361A1/en
Priority to AU2002360691A priority patent/AU2002360691A1/en
Priority to CA2471013A priority patent/CA2471013C/en
Publication of WO2003054704A1 publication Critical patent/WO2003054704A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • G06F11/3082Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting the data filtering being achieved by aggregating or compressing the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/81Threshold

Definitions

  • Patent No. 6,289,330 and co-pending U.S. Patent Application Serial No. 09/811 ,163.
  • This invention relates to computerized monitoring and modeling systems and, more specifically, relates to a method and system for modeling, estimating, predicting, and detecting abnormal behavior in systems, such as complex computer networks, servers, and other computer-based systems.
  • alarm thresholds should be adjustable to conform to changes in the normal system operational pattern. For example, low alarm thresholds may be appropriate during low system usage periods, whereas much larger alarm thresholds may be appropriate for higher system usage periods. Accordingly, systems analysts have attempted to design monitoring systems with alarm thresholds that track the expected normal operational pattern of a monitored system. In particular, historical usage patterns for the monitored system may be analyzed to detect normal usage patterns, and deterministic functions may then be "fit" to the historical pattern to develop a predictive function for the normal operational pattern of the system.
  • the alarm thresholds may then be set based on the predictive estimate of the normal operational pattern of the system.
  • These types of predictive monitoring system exhibit two major drawbacks. First, historical data is not always available for the monitored system and, even when it is available, the task of developing a predictive function for the normal operational pattern of the system based on historical usage patterns is technically challenging, expensive, and time consuming. Second, the normal behavior of complex computer networks tends to change over time, which periodically renders the historically-determined
  • the present invention meets the needs described above for modeling, estimating, predicting, and detecting abnormal behavior in systems, such as complex computer networks.
  • this system will be referred to as a "monitoring system" for descriptive convenience, it should be understood that it may perform additional functions, such as predicting abnormal system behavior, activating alerts and alarms based on detected or predicted system problems, implementing response actions to detected system problems, accumulating descriptive statistics, implementing user- defined system tuning, and so forth.
  • the monitoring system typically includes a baseline model that automatically captures and models normal system behavior, a correlation model that employs multivariate autoregression analysis to detect abnormal system behavior, and an alarm service that weights and scores a variety of alerts to determine alarm status and appropriate response actions. Individually and in combination, these features represent significant advances in the field of network monitoring and modeling systems that overcomes a number of persistent shortcomings in prior monitoring systems
  • the baseline model automatically captures and updates signatures for multiple input variables representing normal system operation on an ongoing basis. Although historical data may be used to initialize the system, this step is optional, and the system may "bootstrap" itself into a working model without the need for extensive analysis of historical data. More importantly, the monitoring system automatically updates the baseline model for each set of data inputs (i.e., each time trial) to continually refine the model while detecting and adapting to changes in the normal system operation that occur over time. The baseline model also tracks and models each input variable individually, and decomposes the signature for each input variable into components that track different characteristics of the variable. For example, each input variable may be modeled by a global trend component, a cyclical component, and a seasonal component.
  • Modeling and continually updating these components separately permits a more accurate identification of the erratic component of the input variable, which typically reflects abnormal patterns when they occur. This improved modeling accuracy allows the system to implement tighter alarm thresholds while maintaining acceptable levels of false alarms.
  • the historical usage pattern for each input variable is represented by a time-based baseline mean and variance, which are in turn based on combinations of the means and variances for the components making up the baseline. This obviates the need for a large database of historical analysis and periodic reevaluation of that historical data.
  • the time-based baseline means and variances are updated in a weighted manner that gives accentuated relevance to the measurement received during the most recent time trial, which allows the model to adapt appropriately to changes in the normal operational pattern.
  • the weighting parameters are typically implemented through user-defined inputs, which allows the user to tune this aspect of the baseline model for each input variable individually.
  • the correlation model is defined by a "two cycle" use of a connection weight matrix to compute expected values of the erratic components of the input variables for current and future time trials, followed by updating the connection weight matrix to reflect learning from the current time trial.
  • the correlation model receives input values, computes expected output values for the current time trial (i.e., imputed estimates), computes expected output values for future time trials (i.e., forecast estimates), and updates the learned parameters expressed in the connection weight matrix.
  • monitoring and forecasting using current data and a historically determined connection weight matrix represents the first cycle
  • updating the connection weight matrix to reflect learning from the current data represents the second cycle.
  • This two cycle process is repeated for each time trial to implement monitoring, forecasting and learning concurrently as the correlation model operates over a number of time trials.
  • the monitoring function involves receiving erratic components for the current time trial and computing imputed estimates for those same erratic components for the current time trial based on the actual data received for the current time trial and learned parameters that reflect observed covariance relationships among the erratic components based on previous time trials.
  • This aspect of the correlation model recognizes and takes advantage of the fact that one input variable may vary instantaneously as a function of other variables. For example, consider an instance in which a network response time is directly affected by other monitored variables, including the quantity of network traffic, the number of users accessing the application, the availability of buffer space, and the cache memory level. Therefore, the expected value of the application response time necessarily varies instantaneously with changes in these other variables.
  • the correlation model properly models this relationship by using real-time data (i.e., measurements for the current time trial) and the learned parameters to compute the imputed estimates for the current time trial.
  • This type of modeling accuracy is not possible with systems that rely on historical data alone to compute the predictive indicators of system performance, and cannot be implemented without the use of a multivariate correlation model.
  • the alarm service computes imputed estimate alerts and forecast estimate alerts for each input individually. Typically, these alerts are based on threshold values based on user-defined parameters, which allows the user to tune this aspect of the alarm service for each input variable individually, and for imputed estimate alerts and forecast estimate alerts separately.
  • the alarm service includes an alarm generator that accumulates and weights the various alerts generated for the various input variables and computes an alarm score based on the overall alert condition. This avoids false alarms from isolated outliers or data errors.
  • the present invention may be implemented as a monitoring system deployed on a host computer system, which may be local or remote, or it may be expressed in computer-executable instructions stored on a computer storage medium.
  • the monitoring system continually receives measurements defining signatures for a plurality of input variables reflecting the behavior of a monitored system, such as a computer network. Each signature includes a time series of measurements including historical measurements for past time trials and a current measurement for a current time trial.
  • the monitoring system also computes a time- based baseline mean and variance for a selected input variable based on the historical measurements for the selected input variable.
  • the monitoring system then computes an erratic component for the selected input variable by comparing the measurement for the selected input value for the current time trial to the time-based baseline mean for the selected input variable.
  • the monitoring system then computes an imputed estimate, a forecast estimate, or both.
  • the monitoring system may compute an imputed estimate for the selected input variable based on erratic components computed for other input variables for the current time trial and learned parameters reflecting observed relationships between the erratic component for the selected input variable and the erratic components for the other input variables.
  • the monitoring system determines an alert status for the imputed estimate based on the imputed estimate and the erratic component for the input variable for the current time trial.
  • the monitoring system may compute a forecast estimate for the selected input variable based on erratic components computed for other input variables and learned parameters reflecting observed relationships between the erratic component for the selected input variable and the erratic components for the other input variables.
  • the monitoring system determines an alert status for the forecast estimate.
  • the monitoring system also updates the time-based baseline mean and variance for the selected input variable based on the measurement received for the selected input value for the current time trial.
  • the process described above may be implemented for a single input variable or it may be repeated for multiple input variables.
  • the forecast estimate process is usually implemented for multiple future forecasts, for example to produce short, medium or long range operational forecasts.
  • the process is also repeated for multiple current time trials as new data arrives, and the monitoring system runs in an operational mode to provide the alarm service while it refines and adapts the baseline model and correlation models.
  • the monitoring system typically computes a confidence value for the imputed estimate, a threshold value for the imputed estimate based on the confidence value, and an imputed estimate alert value reflecting a difference between the imputed estimate and the erratic component for the selected input variable to the threshold value for the imputed estimate.
  • the monitoring system may then determine an alert status for the imputed estimate by comparing the alert value to the threshold value for the imputed estimate.
  • the confidence value for the imputed estimate may be based on a standard error associated with the imputed estimate
  • the threshold value for the imputed estimate may be based on the standard error and a user-defined configuration parameter.
  • the monitoring system may also compute a threshold value for the forecast estimate and determine an alert status for the forecast estimate by comparing the forecast estimate to the threshold value.
  • the threshold value may be based on the time- based baseline variance for the selected input variable and a user-defined configuration parameter.
  • the monitoring system computes the time- based baseline mean and variance for the selected input variable by decomposing the signature for the input variable into components and computing a mean and variance for each component. The monitoring system then combines the means for the components to obtain the time-based baseline mean, and also combines the variances for the components to obtain the time-based baseline variance. More specifically, the monitoring system typically decomposes the signature for the input variable into components by defining a repeating cycle for the historical measurements. The monitoring system then divides the cycle into a number of contiguous time periods or "time slices" such that each cycle includes a similar set of time periods, each having corresponding time indices within each cycle.
  • the monitoring system then computes a global trend component for the selected input variable reflecting measurements received for the selected input variable for temporally contiguous time indices. Also typically, the monitoring system computes a cyclical component for the selected input variable reflecting data accumulated across multiple cycles for each time index. This allows the monitoring system to update the time-based baseline mean by computing an updated mean for each component based on a weighted sum including the baseline mean for the component and the measurement received for the selected input variable for the current time trial, and summing the updated means for the components.
  • the monitoring system computes the updated time-based baseline variance by computing an updated variance for each component based on a weighted sum including the baseline variance for the component and the measurement received for the selected input variable for the current time trial, and summing the updated variances for the components.
  • the monitoring system receives imputed estimate and forecast estimate alerts corresponding to multiple input measurements, weights the alerts, and computes an alert score based on the weighted alerts. The monitoring system then determines whether to activate an alarm condition based on the alert score.
  • the present invention greatly improves upon preexisting methods and systems for modeling, estimating, predicting and detecting abnormal behavior in computer networks, servers and other computer- based systems that exhibit unpredictable abnormal events superimposed on top of rapidly fluctuating and continuously changing normal operational patterns.
  • the specific techniques and structures employed by the invention to improve over the drawbacks of prior monitoring systems to accomplish the advantages described above will become apparent from the following detailed description of the embodiments of the invention and the appended drawings and claims.
  • FIG. 1 is a functional block diagram illustrating the structure and operation of a monitoring system configured to implement the present invention.
  • FIG. 2 is a logic flow diagram illustrating a routine for provisioning the adaptive baseline engine component of the monitoring system.
  • FIG. 3 is a logic flow diagram illustrating a routine for running the adaptive baseline engine component of the monitoring system.
  • FIG. 4 is a logic flow diagram illustrating a routine for running the engine system component of the monitoring system.
  • FIG. 5 is a logic flow diagram illustrating a routine for running the adaptive correlation engine component of the monitoring system.
  • FIG. 6 is a logic flow diagram illustrating a routine for running a real-time alert detector component of the monitoring system.
  • FIG. 7 is a logic flow diagram illustrating a routine for running a forecast alert detector component of the monitoring system.
  • FIG. 8 is a logic flow diagram illustrating a routine for running an alarm generator component of the monitoring system.
  • FIG. 9A is graph illustrating a typical signature of an input variable for the monitoring system.
  • FIG. 9B is graph illustrating a global trend component of the input variable signature shown in FIG. 9A.
  • FIG. 9C is graph illustrating a cyclical component of the input variable signature shown in FIG. 9A.
  • FIG. 9D is graph illustrating a seasonal or scheduled component of the input variable signature shown in FIG. 9A.
  • FIG. 9E is graph illustrating an erratic component of the input variable signature shown in FIG. 9A.
  • FIG. 10 is graph illustrating time slicing and sampling of the input variable 0 signature shown in FIG. 9A.
  • FIG. 11 is series of graphs illustrating a process for capturing the cyclical component of the input variable signature shown in FIG. 9A.
  • the present invention may be embodied in a system intended to address the task of modeling, estimating, predicting and detecting abnormal behavior in systems that exhibit unpredictable abnormal events superimposed on top of rapidly fluctuating and continuously changing normally operational patterns.
  • this system will be referred to as a "monitoring system" for descriptive convenience.
  • this system may perform additional functions, such as predicting abnormal system behavior, activating alerts and alarms based on detected or predicted system problems, implementing response actions to detected system problems, accumulating descriptive statistics, implementing user-defined system tuning, and so forth.
  • monitoring system may be used for any of a wide range of systems, such as industrial processes, financial systems, electric power systems, aviation control systems, and any other type of system for which one or more input
  • the monitoring system is intended to provide an alarm service for systems, such as computer networks and server systems, exhibiting erratic motions that contain repeating characteristics, such as global trends, cyclical variations, and seasonal variations.
  • the monitoring system learns these behaviors by capturing the signatures of input variables measured for the monitored system and adapting the signatures over time, improving their accuracy and automatically learning changes in behavior as they occur.
  • the monitoring system captures the signatures based on an analysis of received data over a time frame, it bypasses the typical modeling stage of the commonly used methods of analysis. That is, the present modeling system captures and continually updates the monitored system's signatures, and uses these captures signatures to model the system rather than relying on deterministic curve fitting or other classical types of data analysis, such as time series analysis and statistical analysis.
  • any of the components shown as part of the monitoring system may be deployed as part of an integrated system or as separate components in separate enclosures.
  • any of the components of the monitoring system may be deployed in a combined enclosure, or they may be deployed in separate enclosures, or they may be combined in any manner suitable to a particular application.
  • each element may be located in a single physical location, or it may be distributed in a distributed computing environment.
  • the input variables may be provided from the monitored system to the monitoring system, and the alarm service output may be delivered from the monitoring system to the monitored system, with the internal computations of the monitoring system being implemented on a remote computer.
  • a single monitoring system computer may be used to support multiple monitoring applications.
  • a stand-alone monitoring system may be deployed at the location of the monitored system.
  • many other configurations may be used to implement the invention.
  • FIG. 1 is a functional block diagram illustrating the structure and operation of a monitoring system 5 configured to implement the present invention.
  • the monitoring system 5 includes an engine system 10, an adaptive baseline model (ABE) 20, an adaptive correlation engine (ACE) 30, and an alarm service 40, which includes a real-time alert detector 42, a forecast alert detector 44, and an alarm generator 46.
  • ABE adaptive baseline model
  • ACE adaptive correlation engine
  • alarm service 40 which includes a real-time alert detector 42, a forecast alert detector 44, and an alarm generator 46.
  • these components are typically deployed as separate computer objects, they may be further subdivided or combined with each other or other objects in any manner suitable for a particular application.
  • any of the components may be deployed locally or remotely, and may be combined with other functions and services.
  • the engine system 10 functions as a coordinator for the other system elements and manages the overall system activity during the running operation of the monitoring system 5.
  • the engine system 10 receives a set of measurements defining input variables for each time trial in a continually recurring set of time trials, which is represented by the input variable Y(t).
  • the input variables typically include a vector containing multiple measurements, each input variable may be handled in the same manner. Therefore, for descriptive convenience, the methodology of the monitoring system is described for the representative input variable Y(t), which is also referred to as the "subject input variable,” to distinguish it from the other input variables. However, it should be understood that this same methodology applies to all of the input variables in a multi-variable vector represented by Y(t).
  • each input of several input variables is treated as the "subject input variable" for its own processing, and all of these input variables are processed for each time trial, typically simultaneously, although they could be processed sequentially in some order.
  • the methodology is described below as applicable to a single "subject input variable" Y(t), and it is to be understood that multiple input variables are typically processed in a similar manner, either simultaneously or in some order.
  • the engine system 10 may also receive additional input variables that are not processed as signatures, such as status indicators and configuration parameters. However, these variables may be ignored for the purpose of describing the inventive aspects of the present monitoring system.
  • the operation of the engine system 10, as described below, is typically repeated for each time trial in a continual time series of time trials.
  • the engine system 10 continually receives measurement for the representative input variable Y(t), to define a time-based signature of measured values for that variable.
  • the ABE 20 maintains a time-based baseline model for the representative input variable Y(t). More specifically, the ABE 20 defines a repeating cycle for the input variable Y(t) and computes a time-based mean and variance for each time period or "time slice" of the cycle. In other words, the ABE 20 defines a time-based baseline model for the input variable Y(t) that includes a time-based baseline mean and variance for each time index in the cycle.
  • the cycle is typically defined by a user input parameter specifying a number of time slices "n" in the cycle. This parameter together with the inherent frequency of the input variable Y(t) defines a repeating cycle, which is typically the same for all of the input variables.
  • the time-based baseline model for the input variable Y(t) is typically composed from a number of components, which the ABE 20 tracks individually.
  • the signature for the input variable Y(t) may typically be decomposed into a global trend component G(t), a cyclical component C(t), a seasonal or scheduled component S(t), and an erratic component e(t), as shown in the decomposition equation 904 shown in FIG. 9A.
  • G(t) global trend component
  • C(t) a cyclical component
  • S(t) seasonal or scheduled component
  • e(t) an erratic component
  • FIGS. 9A-E illustrate the decomposition of the representative input variable Y(t) into an illustrative set of components including the global trend component G(t), the cyclical component C(t), the seasonal or scheduled component S(t), and the erratic component e(t).
  • FIG. 9A is a time graph 902 illustrating a typical signature for the representative input variable Y(t), which is shown as a continuous function that may be sampled to obtain a time series for the input variable Y(t).
  • the signature 902 for the input variable Y(t) is shown as a continuous function, it should be understood that the signature for input variable Y(t) may be received as a time series of contiguous discrete points corresponding to the measurements received by the engine system 10 for the input variable Y(t) for each in a continual series of time periods or time slices. In this case, sampling is not required.
  • FIG. 9B is a graph illustrating the global trend component 906, which is referred to as G(t).
  • the global trend component G(t) is typically captured by computing a temporally contiguous running mean and variance for the input variable Y(t). That is, referring to FIG. 9A, the global trend component G(t) may be captured by computing a running mean and variance "horizontally" for a series of temporally contiguous data points forming the signature 902 of the input variable Y(t). Horizontal computation refers generally to including in the computation a number of temporally contiguous data points across the graph 902 shown in FIG. 9A horizontally from left to right to obtain the running mean and variance for that variable 904, which is shown in FIG. 9B.
  • running mean and variance generally refer to a weighted mean and variance that is continually updated as new data values are received for new time trials.
  • the values forming the time series defining the signature for the representative input variable Y(t) are typically weighted to give accentuated significance to more recent data.
  • a decreasing series of learning blocks may be used to weight the data points of the time series Y(t) when computing the running mean and variance.
  • the ABE 20 may represent all of the historical data for the input variable Y(t) as a single mean and variance pair, or as several mean and variance pairs representing different historical blocks, which may then be weighted and factored in with the newly received input data value for each current time trial in the continuing time series.
  • the running mean and variance for the global trend component G(t) may be represented by the following symbols:
  • Updated global trend mean ⁇ G (t) w G * ⁇ (t-1) + (1-w G )*Y(t)
  • Updated global trend variance ⁇ G 2 (t) w G * ⁇ G 2 (t-1 ) + (1-w G )*[Y(t) - ⁇ G 2 (t)] 2
  • the historical weighting parameter w G is typically a user-specified parameter, which may be adjusted to "tune" the baseline model to adapt at a desired rate of change in the input variable Y(t) occurring over time.
  • the running mean and variance computation shown above may be considered as the simplest approach with a single learning block extending over all input data. That is, the historical data is represented by a single block having a running mean and variance.
  • the learning function may be more complicated, for example by including multiple historical points or blocks, each having a separate learning weight Nevertheless, the relatively simple learning function shown above may be suitable for many applications. An alternative decomposition and learning function for the global is given later in the description.
  • FIG. 9C is a graph illustrating the cyclical component 908, which is referred to as
  • the cyclical component C(t) is typically captured by dividing the temporally contiguous signature Y(t) into a number of repeating cycles that each include "n" time periods or time slices, as shown generally in FIG. 10.
  • each cycle will have the same set of time indices that refer to the same recurring time periods.
  • the time cycle may represent a year with "n" equal to 8,760, producing one time slice for each hour of the year.
  • Consecutive cycles may then be thought of as being arranged in a vertical stack, as shown in FIG. 11 , such that the time slices align vertically for multiple cycles.
  • the cyclical component C(t) is typically captured by computing a running mean and variance for each time slice over multiple cycles.
  • the cyclical component C(t) is typically computed by computing the running mean and variance of similar time slices over multiple cycles for the input variable Y(t).
  • This computation which is depicted graphically in FIG.11 , is referred to generally as the "time-slice mean and variance.”
  • the cyclical component C(t) shown in FIG. 9C may be captured by "vertically" summing the measurements from the same time slice over multiple cycles of the input variable Y(t), as shown in FIG. 11. This process is repeated for each time period in the cycle to compute the baseline signature of the component C(t).
  • time-slice values forming time series Y(t) are typically weighted to give accentuated significance to more recent data in the computation of the cyclical component.
  • a decreasing series of learning blocks may be used to weight the time-slice values from successive cycles when computing the running mean and variance for the cyclical component C(t).
  • the ABE 20 may represent the historical data for each time slice as a single mean and variance pair, or as several mean and variance pairs representing different historical blocks, which may then be weighted and factored in with the newly received input data value for each current time trial in the continuing time series.
  • the mean and variance for the cyclical component C(t) may be represented by the following symbols:
  • Cyclical mean as of the same time index in the previous cycle ⁇ c (t-n)
  • Updated cyclical variance ⁇ c 2 (t) w c * ⁇ c 2 (t-n) + (1-w c )*[ Yc (t) - ⁇ c (t)] 2
  • FIG. 9D is graph illustrating the seasonal or scheduled component 910, which is referred to as S(t).
  • the seasonal or schedule component S(t) may be captured through a variety of analyses applied to the historical data for the input variable Y(t), or it may be established by a user input reflecting a known event, such as scheduled maintenance or another scheduled activity that will have a predictable effect on the observed system behavior.
  • the seasonal or schedule component S(t) represents known or predictable system events, which should be removed from the erratic component e(t) to avoid masking abnormal system behavior with this component of the representative input variable Y(t).
  • the seasonal or scheduled component S(t) should also be removed from the input variable Y(t) to obtain the unpredictable erratic component e(t).
  • the seasonal or scheduled component S(t) When the seasonal or scheduled component S(t) is a known deterministic parameter, it may be subtracted directly from Y(t) to when computing the erratic component e(t).
  • the seasonal or schedule component S(t) when the seasonal or schedule component S(t) is captured from measured data or modeled as a probabilistic function, it may be treated as a probabilistic component of the input variable Y(t) represented by a running mean and variance just like the global trend component G(t) and the cyclical component C(t). Therefore, the derivation of the seasonal or schedule component S(t) will not be further developed, except to note that it may be included as a probabilistic component of the input variable Y(t) along with the captured global trend component G(t) and the captured cyclical component C(t).
  • the following parameters are defined for seasonal or schedule component S(t) the for this purpose:
  • the input variable Y(t) may be decomposed into any number of additional components corresponding to observable or scheduled events, which can each be factored into the time-based baseline model maintained by the ABE 20 in a similar manner.
  • Yc(t) instead of adjusting the input measurement value on the cyclical component, Yc(t), to obtain a zero mean cyclical component C(t), an adjusted input measurement Y G (t) could have been defined in a similar manner to Yc(t) and used for the global trend signature computation.
  • the global trend could be viewed as a mean correction factor, and should normally be around zero (zero mean) except when unusual events happen and the global trends start shifting.
  • any adjusted input measurement should take into account all the other components.
  • FIG. 9E is a graph illustrating the erratic component 912, which is referred to as e(t).
  • the input variable Y(t) is decomposed into the global trend component G(t), the cyclical component C(t), the seasonal or scheduled component S(t), and the erratic component e(t) using the decomposition equation 904, which is shown adjacent to FIG. 9A.
  • Y(t) and e(t) may be expressed as follows:
  • ⁇ (t) Y(t) - G(t) - C(t) - S(t)
  • the erratic component e(t) is used as the basic process control variable for the predictive and monitoring operations because it is derived from the measured value Y(t) with the global trend component G(t), the cyclical component C(t), and the seasonal or scheduled component S(t) removed. Therefore, the erratic component e(t) is selected for further analysis because it reflects the abnormal system behaviors without the masking effects of the other components, which can be relatively large, widely fluctuating, and constantly changing over time. Moreover, the global trend component G(t), the cyclical component C(t), and the seasonal or scheduled component S(t) are separately modeled and tracked to detect any model changes that occur in these components over time.
  • time-based baseline mean and variance ⁇ (t) and ⁇ 2 (t)
  • Time-based baseline mean ⁇ (t) ⁇ G (t)+ ⁇ c(t) + ⁇ s(t)
  • Time-based baseline variance ⁇ 2 (t) ⁇ G 2 (t)+ ⁇ G 2 (t) + ⁇ s 2 (t)
  • the ABE 20 maintains a time-based baseline mean and variance [ ⁇ (t) and ⁇ 2 (t)] for each input variable, as represented by the input variable Y(t), for each time index in the time cycle. Further, the ABE 20 updates the time-based baseline mean and variance by updating and summing the components as shown above, for each time trial. In this manner, the baseline model maintained by the ABE 20 automatically tracks changes in the baseline components over time. Accordingly, the ABE 20 is operative to return the time-based baseline mean and variance [ ⁇ (t) and ⁇ 2 (t)] for each input variable for any given time index (t). In addition, the ABE 20 is operative to update the baseline model using the input data received for any given time trial.
  • the engine system 10 interacts with the ABE 20 by invoking the ABE for a particular current or future time trial, as desired. To do so, the engine system 10 only needs to specify the desired time index, and the ABE 20 returns the time-based baseline mean and variance [ ⁇ (t) and ⁇ 2 (t)] for that time index. In particular, for each current time trial (t), the engine system 10 typically invokes the ABE 20 to obtain the time-based baseline mean and variance [ ⁇ (t) and ⁇ 2 (t)] for that time trial. The monitoring system 5 then uses the time-based baseline mean and variance for the current time trial in a monitoring process, which is also referred to as an imputing process for the current time trial.
  • a monitoring process which is also referred to as an imputing process for the current time trial.
  • the engine system 10 typically invokes the ABE 20 for a number of future time trials, represented by the time index (t+T), to obtain the time-based baseline means and variances [ ⁇ (t+T) and ⁇ 2 (t+T)] for those future time periods.
  • the monitoring system 5 uses these time-based baseline means and variances for the future time trials in a forecasting process, which is also referred to as a predicting process.
  • the engine system 10 receives the real-time input variable Y(t) for the current time trial (t) and invokes the ABE 20 by sending the time index for the current time trial to the ABE.
  • the ABE 20 then computes the time-based baseline mean and variance [ ⁇ (t) and ⁇ 2 (t)] from its components as shown below:
  • Time-based baseline mean ⁇ (t) ⁇ G (t)+ ⁇ G (t) + ⁇ s(t)
  • Time-based baseline variance ⁇ 2 (t) ⁇ G 2 (t)+ ⁇ G 2 (t) + ⁇ s 2 (t)
  • the ABE 20 then returns the time-based baseline mean and variance [ ⁇ (t) and ⁇ 2 (t)] for the current time trial (t) to the engine system 10, which computes the erratic component e'(t) for the input variable Y(t) for the current time trial (t) as shown below:
  • time-based baseline mean and variance is returned to the engine system prior updating the baseline with the input measurement from the current time trial. This is to ensure the current erratic component remains independent from the learned history and that the baseline be unbiased when the estimation and alert detection are done as described below.
  • the engine system 10 then invokes the ACE 30 by sending the erratic component e(t) to the ACE.
  • the ACE 30 is a multivariate correlation engine described in the following commonly-owned patents and patent applications, which are hereby incorporated by reference: U.S. Patent No. 5,835,902; U.S. Patent No. 6,216,119; U.S. Patent No. 6,289,330; and co-pending U.S. Patent Application Serial No. 09/811 ,163.
  • the ACE computes an imputed estimate for the erratic component based on the values received for the other input variables for the current time trial and learned parameters representing observed relationships between the other variables and the subject input variable Y(t).
  • the monitoring or imputing step is based on realtime data for the current time trial for all of the input variables other than the subject input variable Y(t), and the covariance parameters represented by the learned parameters in the ACE, which represent observed relationships between the other variables and the subject input variable Y(t).
  • the ACE 30 then returns the imputed estimate and the standard error for the imputed estimate to the engine system 10. These parameters are referred to as shown below:
  • Imputed estimate e (t)
  • the engine system 10 then invokes the real-time alert detector 42 by sending it the erratic component e(t), the imputed estimate and the standard error ⁇ ! (t) for the imputed estimate.
  • the real-time alert detector 42 then computes a threshold value for the imputed estimate based on a confidence value, in this instance the standard error ⁇ x (t).
  • a confidence value in this instance the standard error ⁇ x (t).
  • other confidence values may be used, such as those based on the variance, standard deviation, or other statistics associated with the baseline model, the imputed estimate, or other parameters.
  • the threshold value may be computed as a weighting factor multiplied by the selected confidence value, where the weighting factor is a user-defined parameter that allows the user to tune this aspect of the monitoring system.
  • an alert status for the imputed estimate is typically determined by comparing the magnitude of the difference between the measured erratic component e(t) and the imputed estimate e (t) for the erratic component to the threshold value, as shown below:
  • the preceding description applies to the operation of the monitoring system 5 in the monitoring or imputing phase for the current time trial, which involves the computation of the imputed estimate for the current time trial, and the determination of imputed estimate alerts based on a comparison of the imputed estimate and the corresponding erratic component e(t), which is derived from the received measurement for the subject input variable Y(t) for the current time trial.
  • the imputing phase is only the first half of the operation.
  • the monitoring system 5 also updates the ABE 20 and the ACE 30 for the data received during the current time trial, and then computes forecast estimates and associated forecast alerts for one or more future time trials.
  • These future time trials are referred to by the time index (t+T), and it should be understood that the monitoring system 5 typically performs the forecasting processed described below for each of several future time trials (e.g., every time trial in the cycle, every time trial for multiple cycles into the future, or another desired set of future time trials). Because the forecasting process is identical for each future time trial except for the time index, the forecasting process is described only for the representative future time index (t+T). In sum, it should be appreciated that the entire monitoring process is repeated for each of several input variables, for the current time trial and for each of many future time trials, as the system runs over time.
  • the engine system 10 invokes the ABE 20 and supplies it with the real-time input value Y(t) for the current time trial.
  • the ABE 20 updates the time-based mean and variance as described previously in accordance with the equation shown below for the global trend component G(t), cyclical component C(t), and seasonal or scheduled component S(t). Additional components may be added as appropriate for a particular application.
  • Updated global trend mean ⁇ G (t) w G * ⁇ G (t-1) + (1-w G ) * Y(t)
  • Updated global trend variance ⁇ G 2 (t) w G * ⁇ G 2 (t-1) + (1-w G )*[Y(t) - ⁇ G (t)] 2
  • Updated cyclical mean ⁇ c (t) w c * ⁇ c (t-n) + (1-w c )*Y c (t)
  • Updated cyclical variance ⁇ c 2 (t) w c * ⁇ c 2 (t-n) + (1 -w c )*[Y c (t) - ⁇ c (t)] 2
  • Updated seasonal or scheduled mean ⁇ s (t) w s * ⁇ s(t-m) + (1-w s ) * Ys(t)
  • Updated time-based baseline mean ⁇ (t) ⁇ G (t)+ ⁇ c(t) + ⁇ s(t)
  • Updated time-based baseline variance ⁇ 2 (t) ⁇ G 2 (t)+ ⁇ c 2 (t) + ⁇ s 2 (t)
  • n corresponds to the number of time slice per cycle and m corresponds to the number of time slices since of the last similar seasonal or schedule event.
  • the ACE 30 automatically updates its learned parameters in the manner described in U.S. Patent No. 5,835,902; U.S. Patent No. 6,216,119; U.S. Patent No. 6,289,330; and co-pending U.S. Patent Application Serial No. 09/811 ,163.
  • the engine system 10 interacts with the ABE 20 to perform forecasting operations by invoking the ABE for a particular representative future time trial (t+T). To do so, the engine system 10 only needs to specify the future time index (t+T), and the ABE 20 returns the expected time-based baseline mean and variance [ ⁇ (t+T) and ⁇ 2 (t+T)] for that time index. In particular, for each time trial the engine system 10 typically invokes the ABE 20 to obtain the time-based baseline mean and variance [ ⁇ (t+T) and ⁇ 2 (t+T)]. The monitoring system then uses the time-based baseline mean and variance for the selected future time trial in the forecast process, which is also referred to as a predicting process.
  • the engine system 10 invokes the ABE 20 by sending the time index (t+T) to the ABE.
  • the ABE 20 then computes the time-based baseline mean and variance [ ⁇ (t+T) and ⁇ 2 (t+T)] from its components as shown below:
  • Time-based baseline mean ⁇ (t+T) ⁇ G (t+T)+ ⁇ c(t+T) + ⁇ s(t+T)
  • Time-based baseline variance ⁇ 2 (t+T) ⁇ G (t+T)+ ⁇ G 2 (t+T) + ⁇ s 2 (t+T)
  • the engine system 10 then invokes the ACE 30 for the selected future time trial.
  • the ACE 30 then returns the forecast estimate and the standard error for the forecast estimate to the engine system 10.
  • Forecast estimate e F (t+T)
  • the engine system 10 then invokes the forecast alert detector 44 by sending it the forecast estimate e F (t+T), the standard error for forecast estimate ⁇ F (t+T), and other values that may be used as or to compute a confidence value, such as the time-based baseline standard deviation ⁇ (t+T) (which is the square-root of the time-based baseline variance ⁇ 2 (t+T) returned by the ABE 20).
  • the forecast alert detector 44 then computes a threshold value for the forecast estimate based on a confidence value, in this particular example the time-based baseline standard deviation ⁇ (t).
  • the threshold value may be a weighting factor multiplied by the confidence value, where the weighting factor is a user-defined parameter that allows the user to tune this aspect of the monitoring system.
  • an alert status for the forecast estimate is typically determined by comparing the magnitude of the forecast estimate to the threshold value, as shown below:
  • a value of two (2) has been found to be suitable for the user-defined weighting parameter k 2 .
  • the forecast alert detector 44 then informs the alarm generator 46 of any forecast estimate alerts, which tracks and scores the alerts, and generates on or more alarms.
  • the forecast alert detector 44 could apply other suitable threshold tests instead of or in addition to the test described above.
  • a forecast estimate threshold test may be based on the standard error for forecast estimate ⁇ F (t+T), the forecast estimate e F (t+T) and the expected baseline mean ⁇ (t+T).
  • the monitoring system 5 receives the input value Y(t) for the current time trial, and invokes the ABE 20 for the current time trial, which returns the time- based baseline mean and variance as of the previous time trial [ ⁇ (t-1) and ⁇ 2 (t-1)].
  • k denote the current time index within the cycle.
  • the baseline mean and variances for the current time trial are then given as:
  • the global trend component G(t) is then updated as follows:
  • ⁇ G (t) [ w G (t) * G(t) + ⁇ G (t-1 ) ] / (1 + w G (t))
  • ⁇ G 2 (t) w G (t) * d * d + ⁇ G 2 (t-1) / (1 + w G (t))
  • ⁇ c (k) [ w c (t) * C(t) + ⁇ G (t-n) ] / (1 + w G (t))
  • ⁇ c 2 (k) ⁇ G 2 (t) / 2
  • w G (t) 1 / ( N* + 1 mod N*); where N* may or may not equal N,
  • FIG. 2 is a logic flow diagram illustrating a routine 200 for setting up or provisioning the adaptive baseline engine (ABE) 20 component of the monitoring system 5 in advance of running the system.
  • routine 200 will also refer to FIGS. 9A-E, 10 and 11 for clarity.
  • the ABE 20 receives historical data for a representative input variable Y(t), as shown in FIG. 9A.
  • the ABE 20 then processes the historical data to establish the time-based baseline mean and variance [ ⁇ (t) and ⁇ 2 (t)] for that variable.
  • the use of historical data is optional, and the ABE 20 may alternatively begin the process without historical data.
  • the ABE 20 uses a temporally contiguous running mean (i.e., the mean and variance for the cyclical component [ ⁇ c (t) and ⁇ c 2 (t)] computed horizontally for the first cycle). Thereafter, the cyclical component C(t) is computed as a time-slice mean and variance, as shown in FIG. 11.
  • a temporally contiguous running mean i.e., the mean and variance for the cyclical component [ ⁇ c (t) and ⁇ c 2 (t)] computed horizontally for the first cycle.
  • the cyclical component C(t) is computed as a time-slice mean and variance, as shown in FIG. 11.
  • Step 202 is followed by step 204, in which the ABE 20 defines a baseline cycle, such as a week, month or year.
  • the cycle is depicted as a calendar week in FIG. 11.
  • step 204 is followed by step 206, in which the ABE 20 receives a user input parameter "n" that defines the number of time slices in the cycle, as shown in FIG. 10.
  • the ABE 20 samples the input data to obtain a time series of "n" data points for each cycle, which is also shown in FIG. 10.
  • the user input parameter "n" sets the cycle length by defining the number of data points included in each cycle.
  • step 206 is followed by step 208, in which the ABE 20 effectively divides the historical data into a number of repeating cycles, each having "n" time slices that are numbered consecutively to establish a time index (i.e., time slices 1 through n) that repeats for each cycle.
  • step 208 is followed by step 210, in which the ABE 20 computes the global trend component G(t) of the input variable Y(t) by computing a temporally contiguous running mean and variance for the input variable Y(t), as depicted in FIG. 9B.
  • the global trend component G(t) is captured by computing a weighted mean and variance [ ⁇ G (t), ⁇ G 2 (t)] horizontally along the graph of Y(t) shown in FIG. 9A to obtain the global trend component G(t), which is shown in FIG. 9B.
  • Step 210 is followed by step 212, in which the ABE 20 computes the cyclical component C(t) of the input variable Y(t) by computing the time slice running mean and variance as shown in FIG. 11. That is, the cyclical component C(t) is captured by computing a weighted mean and variance [ ⁇ c(t), ⁇ G 2 (t)] vertically for the same time slice over multiple cycles, for each time increment n the cycle, as shown in FIG. 11. The resulting cyclical component C(t) is shown in FIG. 9C.
  • the ABE 20 may then capture other components of the input variable Y(t), such as a mean and variance [ ⁇ s(t), ⁇ s 2 (t)] for the seasonal or scheduled component S(t) shown in FIG.
  • the objective of the provisioning routine 200 is to decompose the input variable Y(t) into a number of components representing relatively predictable behaviors so that the erratic component e(t) may be isolated for further processing, as shown in FIG. 9E.
  • the decomposition process allows the relatively predictable components of the time-based baseline for the input variable Y(t) to be captured and modeled individually.
  • the baseline includes the global trend component G(t), the cyclical component C(t), and the seasonal or scheduled component S(t).
  • Step 212 is followed by step 214, in which the ABE 20 is ready for the run routine 300 shown on FIG. 3.
  • FIG. 3 is a logic flow diagram illustrating a routine 300 for running the ABE 20.
  • the ABE 20 receives a time index (t) for the current time trial from the engine system 10.
  • Step 302 is followed by step 304, in which the ABE 20 computes the time-based baseline mean for the current time trial (t), typically by summing the component means as shown below:
  • Time-based baseline mean ⁇ (t) ⁇ G (t)+ ⁇ c(t) + ⁇ s(t)
  • Step 304 is followed by step 306, in which the ABE 20 computes the time-based baseline variance for the current time trial (t), typically by summing the component variances as shown below:
  • Time-based baseline variance ⁇ 2 (t) ⁇ G 2 (t)+ ⁇ G (t) + ⁇ s 2 (t)
  • Step 306 is followed by step 308, in which the ABE 20 returns the time-based baseline mean and variance [ ⁇ (t), ⁇ 2 (t)] for the current time trial to the engine system 10.
  • Step 308 is followed by step 310, in which the ABE 20 receives the input value Y(t) for the current time trial from the engine system 10.
  • Step 310 is followed by step 312, in which the ABE 20 updates the baseline mean ⁇ (t), typically by updating the component means as shown below:
  • Updated global trend mean ⁇ G (t) w G * ⁇ G (t-1) + (1-w G )*Y(t)
  • Updated seasonal or scheduled mean ⁇ s(t)
  • Updated time-based baseline mean ⁇ (t) ⁇ G (t)+ ⁇ G (t) + ⁇ s(t)
  • Step 312 is followed by step 314, in which the ABE 20 updates the baseline variance ⁇ 2 (t), typically by updating the component variances as shown below:
  • Updated global trend variance ⁇ G 2 (t) w G * ⁇ G 2 (t-1) + (1-w G ) * [Y(t) - ⁇ G (t)] 2
  • Updated cyclical variance ⁇ c 2 (t) w c * ⁇ c 2 (t-n) + (1-w c ) * [Y(t) - ⁇ c (t-n) - ⁇ G (t)] 2
  • Updated time-based baseline variance ⁇ 2 (t) ⁇ G 2 (t)+ ⁇ G 2 (t) + ⁇ s 2 (t)
  • step 314 is followed by step 316, in which the ABE 20 receives a time index (t+T) for a future time trial from the engine system 10.
  • step 316 is followed by step 318, in which the ABE 20 computes the time-based baseline mean for the future time trial (t+T), typically by summing the component means as shown below:
  • Step 318 is followed by step 320, in which the ABE 20 computes the time-based baseline variance for the future time trial (t+T), typically by summing the component means as shown below:
  • Time-based baseline variance ⁇ 2 (t+T) ⁇ G 2 (t)+ ⁇ G 2 (t+T) + ⁇ s 2 (t+T)
  • Step 320 is followed by step 322, in which the ABE 20 returns the baseline mean and variance for the future time trial [ ⁇ (t+T), ⁇ 2 (t+T)] to the engine system 10.
  • Step 322 is followed by step 324, in which the ABE 20 determines whether the engine system 10 has specified another future time trial. If the engine system 10 has specified another future time trial, the "YES" branch is followed to step 316, and the ABE 20 computes and returns the baseline mean and variance [ ⁇ (t+T), ⁇ 2 (t+T)] for the additional future time trial. If the engine system 10 has not specified another future time trial, the "NO" branch is followed to step 303, in which the ABE 20 waits to receive another time index from the engine system 10 for the next current time trial.
  • the routine 300 described above is typically performed simultaneously or sequentially for each of several input variables, and for each in a continual series of current time trials.
  • FIG. 4 is a logic flow diagram illustrating a routine 400 for running the engine system 10 of the monitoring system 5.
  • the engine system 10 receives a measurement for the representative input variable Y(t) for the current time trial (t).
  • step 402 is followed by step 404, in which the engine system 10 invokes the ABE 20 by sending it the time index for the current time trial (t).
  • step 404 is followed by step 406, in which the engine system 10 receives the time-based baseline mean and variance [ ⁇ (t), ⁇ 2 (t)] for the current time trial (t) from the ABE 20.
  • step 406 is followed by step 408, in which the engine system 10 computes the erratic component e(t) for the current time trial as shown below:
  • Step 408 is followed by step 410, in which the engine system 10 invokes the ACE 30 by sending it the erratic component e(t) for the current time trial.
  • step 412 in which the engine system 10 invokes the
  • Step 412 is followed by step 414, in which the engine system 10 receives the time-based baseline mean and variance [ ⁇ (t+T), ⁇ 2 (t+T)] for the future time trial (t+T) from the ABE 20.
  • Step 416 is followed by step 418, in which the engine system 10 invokes the alarm service 20 by sending it one or more of the following parameters: the erratic component e(t), the baseline mean ⁇ (t), the baseline variance ⁇ 2 (t), the imputed estimate e ⁇ t), the standard error for the imputed estimate ⁇ J (t), the forecast estimate e F (t+T), and the standard error for forecast estimate ⁇ F (t+T).
  • Step 418 is followed by step 420, in which the engine system 10 waits for the next time trial, at which time it loops back to step 402 and repeats routine 400 for the next time trial. It should also be understood that for each time trial, routine 400 is repeated for each of several desired input variables, as represented by the input variable Y(t), and that steps 414 through 418 are typically be repeated for each of several future time trials, as desired.
  • FIG. 5 is a logic flow diagram illustrating a routine 500 for running the ACE 30.
  • the ACE 30 receives the erratic components e(t) for all of the applicable input variables, as represented by the input variable Y(t), for the current time trial (t).
  • Step 502 is followed by step 504, in which the ACE 30 computes an imputed estimate e t) for each input variable.
  • the imputed estimate e ⁇ ) for the subject input variable Y(t) is based on the data received for the current time trial for all of the other input variables and the learned parameters in the ACE 30, which represent observed relationships between the erratic component e(t) for the subject input variable and the erratic components for the other input variables.
  • This allows the imputed estimate e (t) to reflect the data received for the current time trial and covariance relationships based on historical time trials represented by the learned parameters in the ACE 30.
  • Step 504 is followed by step 506, in which the ACE 30 invokes the real-time alert detector 42 and sends it the erratic component e(t) and the e l (t).
  • Step 506 is followed by step 508, in which the ACE 30 updates its leaned parameters using the data received for the input variables for the current time trial.
  • This updating process is described in the following commonly-owned patents and patent applications: U.S. Patent No. 5,835,902; U.S. Patent No. 6,216,119; U.S. Patent No. 6,289,330; and co- pending U.S. Patent Application Serial No. 09/811 ,163.
  • Step 508 is followed by step 510, in which the ACE 30 computes the forecast estimate e F (t+T). Note that the learning is performed by the ACE 30 after the imputing step 504 and before the forecasting step 510.
  • Step 512 is followed by step 514, in which the ACE 30 returns the forecast estimate e F (t+T) and the standard error for forecast estimate ⁇ F (t+T) to the engine system 10, which in turn invokes the alert detector 42 by sending it one or more of these parameters and/or the baseline variance ⁇ 2 (t+T).
  • the ACE 30 waits for the next time trial. Again, it should also be understood that for each time trial, routine 500 is repeated for each of several desired input variables, as represented by the input variable Y(t), and that steps 510 through 512 are typically repeated for each of several future time trials, as desired.
  • FIG. 6 is a logic flow diagram illustrating a routine 600 for running the real-time alert detector 42.
  • the real-time alert detector 42 receives the erratic component e(t) for the subject input variable Y(t) from the engine system 10.
  • step 602 is followed by step 604, in which the real-time alert detector 42 receives the imputed estimate e ! (t) for the subject input variable Y(t) from the engine system 10.
  • step 604 is followed by step 606, in which the real-time alert detector 42 performs a threshold alert test, typically by performing the following operation:
  • Step 606 is followed by step 608, in which the real-time alert detector 42 determines whether an alert status is indicated based in the preceding or a similar threshold test. If an alert status is indicated, the "YES" branch is followed to step 610, in which the real-time alert detector 42 informs the alarm generator 46 of the imputed estimate alert. Step 610 is followed by step 612, in which the real-time alert detector 42 waits for the next time trial, at which time it loops to step 602 and repeats routine 600 for the next time trial.
  • routine 600 is typically repeated for each of several desired input variables, as represented by the input variable Y(t).
  • FIG. 7 is a logic flow diagram illustrating a routine 700 for running the forecast alert detector 44.
  • the forecast alert detector 44 receives the baseline variance ⁇ 2 (t+T) for the subject input variable Y(t) for the future time trial (t+T) from the engine system 10.
  • step 702 is followed by step 704, in which the forecast alert detector 44 receives the forecast estimate e F (t+T) for the future time trial (t+T).
  • step 704 is followed by step 706, in which the forecast alert detector 44 performs a threshold alert test, typically by performing the following operation:
  • Step 706 is followed by step 708, in which the forecast alert detector 44 determines whether an alert status is indicated based on the preceding or a similar threshold test. If an alert status is indicated, the "YES" branch is followed to step 710, in which the forecast alert detector 44 informs the alarm generator 46 of the forecast estimate alert. Step 710 is followed by step 712, in which the forecast alert detector 44 waits for the next time trial, at which time it loops to step 702 and repeats routine 700 for the next time trial.
  • routine 700 is typically repeated for each of several future time trials, and for each of several desired input variables, as desired.
  • FIG. 8 is a logic flow diagram illustrating a routine 800 for running the alarm generator 46.
  • the alarm generator 46 receives real-time alerts from the real-time alert detector 42.
  • step 802 is followed by step 804, in which the alarm generator 48 receives forecast alerts from the forecast alert detector 44.
  • step 804 is followed by step 806, in which the alarm generator 46 weights the alerts and computes an alarm score.
  • Many different weighting and scoring methodologies will become apparent to those skilled in the art. In particular, it is desirable to make the specific weights and alert combinations user-defined parameters so that the user may tune this aspect of the monitoring function based on experience with the system.
  • step 806 is followed by step 808, in which the forecast alarm generator 46 determines whether an alarm status is indicated based on the preceding alert scoring process.
  • step 810 the alarm generator 46 activates an alarm condition, and may take additional actions, such as restarting a software application, rebooting a server, activating a back- up systems, rerouting network traffic, dropping nonessential or interruptible activities, transmitting e-mail alarms, and so forth.
  • step 812 the alarm generator 46 waits for the next time trial, at which time it loops to step 802 and repeats routine 800 for the next time trial.
  • routine 800 is typically repeated for each of several future time trials, and for each of several desired input variables, as desired.
  • present invention greatly improves upon preexisting methods and systems for modeling, estimating, predicting and detecting abnormal behavior in computer networks that exhibit unpredictable abnormal events superimposed on top of rapidly fluctuating and continuously changing normally operational patterns. It should be understood that the foregoing relates only to the exemplary embodiments of the present invention, and that numerous changes may be made therein without departing from the spirit and scope of the invention as defined by the following claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Alarm Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A monitoring system including a baseline model that automatically captures and models normal system behavior, a correlation model that employs multivariate autoregression analysis to detect abnormal system behavior, and an alarm service that weight and scores a variety of alerts to determine an alarm status and implement appropriate response actions. The baseline model decomposes the input variables into a number of components representing relatively predictable behaviors so that the erratic component e(t) may be isolated for further processing. These components include a global trend component, a cyclical component, and a seasonal component. Modeling and continually updating these components separately permits a more accurate identification of the erratic component of the input variable, which typically reflects abnormal patterns when they occur.

Description

METHOD AND SYSTEM FOR ANALYZING AND PREDICTING
THE BEHAVIOR OF SYSTEMS
REFERENCED TO RELATED APPLICATIONS
This application claims priority to commonly-owned U.S. Provisional Patent Application Serial. No. 60/342,312 filed on December 19, 2001. This application incorporates by reference the disclosures of the following commonly-owned patents and patent applications: U.S. Patent No. 5,835,902; U.S. Patent No. 6,216,119; U.S.
Patent No. 6,289,330; and co-pending U.S. Patent Application Serial No. 09/811 ,163.
TECHNICAL FIELD
This invention relates to computerized monitoring and modeling systems and, more specifically, relates to a method and system for modeling, estimating, predicting, and detecting abnormal behavior in systems, such as complex computer networks, servers, and other computer-based systems.
BACKGROUND OF THE INVENTION
Many systems, and in particular complex computer networks, exhibit operational patterns that include broad trends that evolve gradually, cyclical components that fluctuate widely on a largely predictable basis, seasonal or scheduled events that result in even wider fluctuations that occur according to known or detectable schedules, and other less predictable or erratic components. In particular, abnormal system behavior indicative of system problems, such as those caused by individual component overloads and localized equipment or software failures, tend to exhibit relatively erratic operational patterns that are often smaller than the wide fluctuations that occur in the normal usage patterns. These abnormal operational patterns, which are superimposed on top of normal operational patterns that fluctuate widely and change continuously over time, can be difficult to reliably detect.
For this reason, systems analysts have long been engaged in a continuing challenge to develop increasingly effective ways to reliably detect real abnormal system behavior indicative of system problems while avoiding false alarms based on the normal operational patterns. A fundamental difficulty in this challenge arises from the fact that abnormal system behavior can sometimes be masked by normal operational patterns, which causes real system problems to go undetected. Conversely, normal operational patterns can sometimes be misdiagnosed as abnormal system behavior indicative of system problems, which cause false alarms. To combat this two-sided challenge, systems analysts often attempt to "tune" error detection systems to reliably identify real system problems while avoiding an unacceptable level of false alarms. Consistently acceptable tuning is not always possible because loosening the alarm thresholds tends to increase the occurrence of real problems that go undetected, whereas tightening the alarm thresholds tends to increase the occurrence of false alarms. In addition, it has been observed that alarm thresholds should be adjustable to conform to changes in the normal system operational pattern. For example, low alarm thresholds may be appropriate during low system usage periods, whereas much larger alarm thresholds may be appropriate for higher system usage periods. Accordingly, systems analysts have attempted to design monitoring systems with alarm thresholds that track the expected normal operational pattern of a monitored system. In particular, historical usage patterns for the monitored system may be analyzed to detect normal usage patterns, and deterministic functions may then be "fit" to the historical pattern to develop a predictive function for the normal operational pattern of the system. The alarm thresholds may then be set based on the predictive estimate of the normal operational pattern of the system. These types of predictive monitoring system exhibit two major drawbacks. First, historical data is not always available for the monitored system and, even when it is available, the task of developing a predictive function for the normal operational pattern of the system based on historical usage patterns is technically challenging, expensive, and time consuming. Second, the normal behavior of complex computer networks tends to change over time, which periodically renders the historically-determined
« predictive functions obsolete. Combating this problem requires periodic updating of the historical analysis, which adds further cost and complexity to the monitoring system. Moreover, unpredictable changes in the behavior of the monitored system can still occur, resulting in systemic failure of the monitoring system.
Still searching for reliable solutions to the system monitoring challenge, analysts have implemented systems that automatically update their predictive functions on an on-going basis. However, these types of systems may encounter problems related to the sensitivity of the updating process. For example, a predictive function that is updated too quickly can misdiagnose a real system problem as a developing change in the normal system behavior, whereas a predictive function that is updated too slowly can produce false alarms based on legitimate changes in the normal system behavior.
Moreover, the opportunity for truly adaptive monitoring systems to take full advantage of observable patterns in the normal behavior of complex computer networks remains largely unmet for a variety of reasons, including the inherent difficulty of the underlying problem, high data rates in the monitored systems, fast and highly fluctuating changes in normal system behavior, high levels of system complexity, and high levels of sophistication required in the monitoring systems themselves.
Therefore, a continuing need exists for more effective methods and systems for modeling, estimating, predicting and detecting abnormal behavior in computer networks that exhibit unpredictable abnormal events superimposed on top of rapidly fluctuating and continuously changing normally operational patterns.
SUMMARY OF THE INVENTION The present invention meets the needs described above for modeling, estimating, predicting, and detecting abnormal behavior in systems, such as complex computer networks. Although this system will be referred to as a "monitoring system" for descriptive convenience, it should be understood that it may perform additional functions, such as predicting abnormal system behavior, activating alerts and alarms based on detected or predicted system problems, implementing response actions to detected system problems, accumulating descriptive statistics, implementing user- defined system tuning, and so forth. The monitoring system typically includes a baseline model that automatically captures and models normal system behavior, a correlation model that employs multivariate autoregression analysis to detect abnormal system behavior, and an alarm service that weights and scores a variety of alerts to determine alarm status and appropriate response actions. Individually and in combination, these features represent significant advances in the field of network monitoring and modeling systems that overcomes a number of persistent shortcomings in prior monitoring systems
More specifically, the baseline model automatically captures and updates signatures for multiple input variables representing normal system operation on an ongoing basis. Although historical data may be used to initialize the system, this step is optional, and the system may "bootstrap" itself into a working model without the need for extensive analysis of historical data. More importantly, the monitoring system automatically updates the baseline model for each set of data inputs (i.e., each time trial) to continually refine the model while detecting and adapting to changes in the normal system operation that occur over time. The baseline model also tracks and models each input variable individually, and decomposes the signature for each input variable into components that track different characteristics of the variable. For example, each input variable may be modeled by a global trend component, a cyclical component, and a seasonal component. Modeling and continually updating these components separately permits a more accurate identification of the erratic component of the input variable, which typically reflects abnormal patterns when they occur. This improved modeling accuracy allows the system to implement tighter alarm thresholds while maintaining acceptable levels of false alarms. Further, the historical usage pattern for each input variable is represented by a time-based baseline mean and variance, which are in turn based on combinations of the means and variances for the components making up the baseline. This obviates the need for a large database of historical analysis and periodic reevaluation of that historical data. In addition, the time-based baseline means and variances are updated in a weighted manner that gives accentuated relevance to the measurement received during the most recent time trial, which allows the model to adapt appropriately to changes in the normal operational pattern. Moreover, the weighting parameters are typically implemented through user-defined inputs, which allows the user to tune this aspect of the baseline model for each input variable individually.
The correlation model is defined by a "two cycle" use of a connection weight matrix to compute expected values of the erratic components of the input variables for current and future time trials, followed by updating the connection weight matrix to reflect learning from the current time trial. Specifically, for a current trial, the correlation model receives input values, computes expected output values for the current time trial (i.e., imputed estimates), computes expected output values for future time trials (i.e., forecast estimates), and updates the learned parameters expressed in the connection weight matrix. In this two cycle process, monitoring and forecasting using current data and a historically determined connection weight matrix represents the first cycle, and updating the connection weight matrix to reflect learning from the current data represents the second cycle. This two cycle process is repeated for each time trial to implement monitoring, forecasting and learning concurrently as the correlation model operates over a number of time trials.
In the correlation model, the monitoring function involves receiving erratic components for the current time trial and computing imputed estimates for those same erratic components for the current time trial based on the actual data received for the current time trial and learned parameters that reflect observed covariance relationships among the erratic components based on previous time trials. This aspect of the correlation model recognizes and takes advantage of the fact that one input variable may vary instantaneously as a function of other variables. For example, consider an instance in which a network response time is directly affected by other monitored variables, including the quantity of network traffic, the number of users accessing the application, the availability of buffer space, and the cache memory level. Therefore, the expected value of the application response time necessarily varies instantaneously with changes in these other variables. The correlation model properly models this relationship by using real-time data (i.e., measurements for the current time trial) and the learned parameters to compute the imputed estimates for the current time trial. This type of modeling accuracy is not possible with systems that rely on historical data alone to compute the predictive indicators of system performance, and cannot be implemented without the use of a multivariate correlation model.
The alarm service computes imputed estimate alerts and forecast estimate alerts for each input individually. Typically, these alerts are based on threshold values based on user-defined parameters, which allows the user to tune this aspect of the alarm service for each input variable individually, and for imputed estimate alerts and forecast estimate alerts separately. In addition, the alarm service includes an alarm generator that accumulates and weights the various alerts generated for the various input variables and computes an alarm score based on the overall alert condition. This avoids false alarms from isolated outliers or data errors. These and other advantages of the invention will be readily appreciated by those skilled in the art.
Generally described, the present invention may be implemented as a monitoring system deployed on a host computer system, which may be local or remote, or it may be expressed in computer-executable instructions stored on a computer storage medium. The monitoring system continually receives measurements defining signatures for a plurality of input variables reflecting the behavior of a monitored system, such as a computer network. Each signature includes a time series of measurements including historical measurements for past time trials and a current measurement for a current time trial. The monitoring system also computes a time- based baseline mean and variance for a selected input variable based on the historical measurements for the selected input variable. The monitoring system then computes an erratic component for the selected input variable by comparing the measurement for the selected input value for the current time trial to the time-based baseline mean for the selected input variable. Typically, the monitoring system then computes an imputed estimate, a forecast estimate, or both. In particular, the monitoring system may compute an imputed estimate for the selected input variable based on erratic components computed for other input variables for the current time trial and learned parameters reflecting observed relationships between the erratic component for the selected input variable and the erratic components for the other input variables. The monitoring system then determines an alert status for the imputed estimate based on the imputed estimate and the erratic component for the input variable for the current time trial. Alternatively or additionally, the monitoring system may compute a forecast estimate for the selected input variable based on erratic components computed for other input variables and learned parameters reflecting observed relationships between the erratic component for the selected input variable and the erratic components for the other input variables. The monitoring system then determines an alert status for the forecast estimate.
The monitoring system also updates the time-based baseline mean and variance for the selected input variable based on the measurement received for the selected input value for the current time trial. In general, the process described above may be implemented for a single input variable or it may be repeated for multiple input variables. Further, the forecast estimate process is usually implemented for multiple future forecasts, for example to produce short, medium or long range operational forecasts. Of course, the process is also repeated for multiple current time trials as new data arrives, and the monitoring system runs in an operational mode to provide the alarm service while it refines and adapts the baseline model and correlation models.
To provide the alarm service, the monitoring system typically computes a confidence value for the imputed estimate, a threshold value for the imputed estimate based on the confidence value, and an imputed estimate alert value reflecting a difference between the imputed estimate and the erratic component for the selected input variable to the threshold value for the imputed estimate. The monitoring system may then determine an alert status for the imputed estimate by comparing the alert value to the threshold value for the imputed estimate. Specifically, the confidence value for the imputed estimate may be based on a standard error associated with the imputed estimate, and the threshold value for the imputed estimate may be based on the standard error and a user-defined configuration parameter. Further, the monitoring system may also compute a threshold value for the forecast estimate and determine an alert status for the forecast estimate by comparing the forecast estimate to the threshold value. For this computation, the threshold value may be based on the time- based baseline variance for the selected input variable and a user-defined configuration parameter. In another aspect of the invention, the monitoring system computes the time- based baseline mean and variance for the selected input variable by decomposing the signature for the input variable into components and computing a mean and variance for each component. The monitoring system then combines the means for the components to obtain the time-based baseline mean, and also combines the variances for the components to obtain the time-based baseline variance. More specifically, the monitoring system typically decomposes the signature for the input variable into components by defining a repeating cycle for the historical measurements. The monitoring system then divides the cycle into a number of contiguous time periods or "time slices" such that each cycle includes a similar set of time periods, each having corresponding time indices within each cycle.
Typically the monitoring system then computes a global trend component for the selected input variable reflecting measurements received for the selected input variable for temporally contiguous time indices. Also typically, the monitoring system computes a cyclical component for the selected input variable reflecting data accumulated across multiple cycles for each time index. This allows the monitoring system to update the time-based baseline mean by computing an updated mean for each component based on a weighted sum including the baseline mean for the component and the measurement received for the selected input variable for the current time trial, and summing the updated means for the components. Similarly, the monitoring system computes the updated time-based baseline variance by computing an updated variance for each component based on a weighted sum including the baseline variance for the component and the measurement received for the selected input variable for the current time trial, and summing the updated variances for the components.
In another aspect, the monitoring system receives imputed estimate and forecast estimate alerts corresponding to multiple input measurements, weights the alerts, and computes an alert score based on the weighted alerts. The monitoring system then determines whether to activate an alarm condition based on the alert score.
In view of the foregoing, it will be appreciated that the present invention greatly improves upon preexisting methods and systems for modeling, estimating, predicting and detecting abnormal behavior in computer networks, servers and other computer- based systems that exhibit unpredictable abnormal events superimposed on top of rapidly fluctuating and continuously changing normal operational patterns. The specific techniques and structures employed by the invention to improve over the drawbacks of prior monitoring systems to accomplish the advantages described above will become apparent from the following detailed description of the embodiments of the invention and the appended drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a functional block diagram illustrating the structure and operation of a monitoring system configured to implement the present invention.
FIG. 2 is a logic flow diagram illustrating a routine for provisioning the adaptive baseline engine component of the monitoring system.
FIG. 3 is a logic flow diagram illustrating a routine for running the adaptive baseline engine component of the monitoring system. FIG. 4 is a logic flow diagram illustrating a routine for running the engine system component of the monitoring system.
FIG. 5 is a logic flow diagram illustrating a routine for running the adaptive correlation engine component of the monitoring system.
FIG. 6 is a logic flow diagram illustrating a routine for running a real-time alert detector component of the monitoring system.
FIG. 7 is a logic flow diagram illustrating a routine for running a forecast alert detector component of the monitoring system.
FIG. 8 is a logic flow diagram illustrating a routine for running an alarm generator component of the monitoring system. FIG. 9A is graph illustrating a typical signature of an input variable for the monitoring system. FIG. 9B is graph illustrating a global trend component of the input variable signature shown in FIG. 9A.
FIG. 9C is graph illustrating a cyclical component of the input variable signature shown in FIG. 9A. 5 FIG. 9D is graph illustrating a seasonal or scheduled component of the input variable signature shown in FIG. 9A.
FIG. 9E is graph illustrating an erratic component of the input variable signature shown in FIG. 9A.
FIG. 10 is graph illustrating time slicing and sampling of the input variable 0 signature shown in FIG. 9A.
FIG. 11 is series of graphs illustrating a process for capturing the cyclical component of the input variable signature shown in FIG. 9A.
DETAILED DESCRIPTION OF THE EMBODIMENTS 5 The present invention may be embodied in a system intended to address the task of modeling, estimating, predicting and detecting abnormal behavior in systems that exhibit unpredictable abnormal events superimposed on top of rapidly fluctuating and continuously changing normally operational patterns. As noted above, this system will be referred to as a "monitoring system" for descriptive convenience. However, it 0 should be understood that it may perform additional functions, such as predicting abnormal system behavior, activating alerts and alarms based on detected or predicted system problems, implementing response actions to detected system problems, accumulating descriptive statistics, implementing user-defined system tuning, and so forth.
.5 Although the embodiments of the invention described below are tailored to operate as a monitoring system for a complex computer network or server system, it should be understood that the monitoring system may be used for any of a wide range of systems, such as industrial processes, financial systems, electric power systems, aviation control systems, and any other type of system for which one or more input
>0 variables may be measured for monitoring and control purposes. In particular, the monitoring system is intended to provide an alarm service for systems, such as computer networks and server systems, exhibiting erratic motions that contain repeating characteristics, such as global trends, cyclical variations, and seasonal variations. The monitoring system learns these behaviors by capturing the signatures of input variables measured for the monitored system and adapting the signatures over time, improving their accuracy and automatically learning changes in behavior as they occur. Because the monitoring system captures the signatures based on an analysis of received data over a time frame, it bypasses the typical modeling stage of the commonly used methods of analysis. That is, the present modeling system captures and continually updates the monitored system's signatures, and uses these captures signatures to model the system rather than relying on deterministic curve fitting or other classical types of data analysis, such as time series analysis and statistical analysis.
It also should be understood that other elements may be deployed as part of the monitoring system, and that any of the components shown as part of the monitoring system may be deployed as part of an integrated system or as separate components in separate enclosures. For example, any of the components of the monitoring system may be deployed in a combined enclosure, or they may be deployed in separate enclosures, or they may be combined in any manner suitable to a particular application. In addition, each element may be located in a single physical location, or it may be distributed in a distributed computing environment. For example, the input variables may be provided from the monitored system to the monitoring system, and the alarm service output may be delivered from the monitoring system to the monitored system, with the internal computations of the monitoring system being implemented on a remote computer. In this manner, a single monitoring system computer may be used to support multiple monitoring applications. Alternatively, a stand-alone monitoring system may be deployed at the location of the monitored system. Of course, many other configurations may be used to implement the invention.
Turning now to the drawings, in which the same element numeral refer to similar components in the several figures, FIG. 1 is a functional block diagram illustrating the structure and operation of a monitoring system 5 configured to implement the present invention. Generally, the monitoring system 5 includes an engine system 10, an adaptive baseline model (ABE) 20, an adaptive correlation engine (ACE) 30, and an alarm service 40, which includes a real-time alert detector 42, a forecast alert detector 44, and an alarm generator 46. Although these components are typically deployed as separate computer objects, they may be further subdivided or combined with each other or other objects in any manner suitable for a particular application. In addition, any of the components may be deployed locally or remotely, and may be combined with other functions and services.
The engine system 10 functions as a coordinator for the other system elements and manages the overall system activity during the running operation of the monitoring system 5. In particular, the engine system 10 receives a set of measurements defining input variables for each time trial in a continually recurring set of time trials, which is represented by the input variable Y(t). Although the input variables typically include a vector containing multiple measurements, each input variable may be handled in the same manner. Therefore, for descriptive convenience, the methodology of the monitoring system is described for the representative input variable Y(t), which is also referred to as the "subject input variable," to distinguish it from the other input variables. However, it should be understood that this same methodology applies to all of the input variables in a multi-variable vector represented by Y(t).
That is, each input of several input variables is treated as the "subject input variable" for its own processing, and all of these input variables are processed for each time trial, typically simultaneously, although they could be processed sequentially in some order. For this reason, the methodology is described below as applicable to a single "subject input variable" Y(t), and it is to be understood that multiple input variables are typically processed in a similar manner, either simultaneously or in some order. In addition, the engine system 10 may also receive additional input variables that are not processed as signatures, such as status indicators and configuration parameters. However, these variables may be ignored for the purpose of describing the inventive aspects of the present monitoring system.
The operation of the engine system 10, as described below, is typically repeated for each time trial in a continual time series of time trials. In general, the engine system 10 continually receives measurement for the representative input variable Y(t), to define a time-based signature of measured values for that variable. In this time-based series, the time units are referred to as the "time index," which typically begins with time index "t=1" and sequentially run through "n" successive time trials for a repeating cycle.
The ABE 20 maintains a time-based baseline model for the representative input variable Y(t). More specifically, the ABE 20 defines a repeating cycle for the input variable Y(t) and computes a time-based mean and variance for each time period or "time slice" of the cycle. In other words, the ABE 20 defines a time-based baseline model for the input variable Y(t) that includes a time-based baseline mean and variance for each time index in the cycle. The cycle is typically defined by a user input parameter specifying a number of time slices "n" in the cycle. This parameter together with the inherent frequency of the input variable Y(t) defines a repeating cycle, which is typically the same for all of the input variables.
Further, the time-based baseline model for the input variable Y(t) is typically composed from a number of components, which the ABE 20 tracks individually. In particular, the signature for the input variable Y(t) may typically be decomposed into a global trend component G(t), a cyclical component C(t), a seasonal or scheduled component S(t), and an erratic component e(t), as shown in the decomposition equation 904 shown in FIG. 9A. Nevertheless, it should be understood that these particular components are merely illustrative, and that the ABE 20 may track additional components, different components, only a subset of these components, as may be appropriate for a particular application. However, the illustrated set of components included in the decomposition equation 904 [i.e., Y(t) = G(t) + C(t) + S(t) + e(t)] have been found to be well suited to a monitoring system for a complex computer network or server system.
More specifically, FIGS. 9A-E illustrate the decomposition of the representative input variable Y(t) into an illustrative set of components including the global trend component G(t), the cyclical component C(t), the seasonal or scheduled component S(t), and the erratic component e(t). Specifically, FIG. 9A is a time graph 902 illustrating a typical signature for the representative input variable Y(t), which is shown as a continuous function that may be sampled to obtain a time series for the input variable Y(t). Further, although the signature 902 for the input variable Y(t) is shown as a continuous function, it should be understood that the signature for input variable Y(t) may be received as a time series of contiguous discrete points corresponding to the measurements received by the engine system 10 for the input variable Y(t) for each in a continual series of time periods or time slices. In this case, sampling is not required.
FIG. 9B is a graph illustrating the global trend component 906, which is referred to as G(t). The global trend component G(t) is typically captured by computing a temporally contiguous running mean and variance for the input variable Y(t). That is, referring to FIG. 9A, the global trend component G(t) may be captured by computing a running mean and variance "horizontally" for a series of temporally contiguous data points forming the signature 902 of the input variable Y(t). Horizontal computation refers generally to including in the computation a number of temporally contiguous data points across the graph 902 shown in FIG. 9A horizontally from left to right to obtain the running mean and variance for that variable 904, which is shown in FIG. 9B. This may also be thought of as the "DC offset" or the portion of the input variable Y(t) that has a non-zero average over time. A running variance is also computed for the time series defining input variable Y(t) in a similar horizontal manner. The terms "running mean and variance" generally refer to a weighted mean and variance that is continually updated as new data values are received for new time trials.
In other words, the values forming the time series defining the signature for the representative input variable Y(t) are typically weighted to give accentuated significance to more recent data. For example, a decreasing series of learning blocks may be used to weight the data points of the time series Y(t) when computing the running mean and variance. In particular, the ABE 20 may represent all of the historical data for the input variable Y(t) as a single mean and variance pair, or as several mean and variance pairs representing different historical blocks, which may then be weighted and factored in with the newly received input data value for each current time trial in the continuing time series. As will be obvious to someone skilled in the art, methods exist for generating weighting factors that allow the computation of running means and variances without the need to maintain any historical input data; i.e., only the current mean and variance and input measurement for the current time trial are needed to update the running mean and variance. This advantageously obviates the need to maintain a large historical database in the ABE 20. In general, the running mean and variance for the global trend component G(t) may be represented by the following symbols:
Global trend mean = μG(t)
Global trend variance = σG 2(t)
In the simplest learning format, these values are continually updated for each new time trial as follows:
Global trend mean as of the previous time trial = μG(t-1 )
Global trend variance as of the previous time trial = σG 2(t-1)
New input variable measurement for the current time trial = Y(t)
Updated global trend mean μG(t) = wG*μβ(t-1) + (1-wG)*Y(t)
Updated global trend variance σG 2(t) = wG* σG 2(t-1 ) + (1-wG)*[Y(t) - μG 2(t)] 2
In these equations, the historical weighting parameter wG is typically a user-specified parameter, which may be adjusted to "tune" the baseline model to adapt at a desired rate of change in the input variable Y(t) occurring over time. As noted above, the running mean and variance computation shown above may be considered as the simplest approach with a single learning block extending over all input data. That is, the historical data is represented by a single block having a running mean and variance. Of course, the learning function may be more complicated, for example by including multiple historical points or blocks, each having a separate learning weight Nevertheless, the relatively simple learning function shown above may be suitable for many applications. An alternative decomposition and learning function for the global is given later in the description. It should also be noted that the use of learning blocks virtually eliminates the need to maintain a large database of historical data in the model, because the historical data is represented by one or more blocks that are weighted and factored in with new data for each time trial when computing the running mean and variance. FIG. 9C is a graph illustrating the cyclical component 908, which is referred to as
C(t). The cyclical component C(t) is typically captured by dividing the temporally contiguous signature Y(t) into a number of repeating cycles that each include "n" time periods or time slices, as shown generally in FIG. 10. Thus, each cycle will have the same set of time indices that refer to the same recurring time periods. For example, the time cycle may represent a year with "n" equal to 8,760, producing one time slice for each hour of the year. Thus, the time index "t=1" refers to the time period ending at 1 :00 am on January 1 , the time index "t=2" refers to the time period ending at 2:00 am on January 1 , and so forth through the time index "t=8,760" referring to the time period ending at 12:00 am on January 1. Consecutive cycles may then be thought of as being arranged in a vertical stack, as shown in FIG. 11 , such that the time slices align vertically for multiple cycles. A value of one week for the cycle length, divided into 672 time slices of 15 minutes, has been found to be suitable for monitoring a majority of computer server applications.
The cyclical component C(t) is typically captured by computing a running mean and variance for each time slice over multiple cycles. In other words, the cyclical component C(t) is typically computed by computing the running mean and variance of similar time slices over multiple cycles for the input variable Y(t). This computation, which is depicted graphically in FIG.11 , is referred to generally as the "time-slice mean and variance." Referring to FIGS. 9C and 11 , the cyclical component C(t) shown in FIG. 9C may be captured by "vertically" summing the measurements from the same time slice over multiple cycles of the input variable Y(t), as shown in FIG. 11. This process is repeated for each time period in the cycle to compute the baseline signature of the component C(t).
Again, the time-slice values forming time series Y(t) are typically weighted to give accentuated significance to more recent data in the computation of the cyclical component. For example, a decreasing series of learning blocks may be used to weight the time-slice values from successive cycles when computing the running mean and variance for the cyclical component C(t). Moreover, the ABE 20 may represent the historical data for each time slice as a single mean and variance pair, or as several mean and variance pairs representing different historical blocks, which may then be weighted and factored in with the newly received input data value for each current time trial in the continuing time series. In general, the mean and variance for the cyclical component C(t) may be represented by the following symbols:
Cyclical mean = μc(t)
Cyclical variance = σc 2(t)
In the simplest learning format in which "n" represents the number of time slices in a cycle, these values are continually updated for each new time trial as follows:
Cyclical mean as of the same time index in the previous cycle = μc(t-n)
Cyclical variance as of the same time index in the previous cycle = σG 2(t-n)
New input variable measurement for the current time trial = Y(t)
Portion of the input variable measurement corresponding to the cyclical component for the current time trial, Yc(t) = Y(t) - μG (t).
Updated cyclical mean μc(t) = wcc(t-n) + (1-wc)* Yc (t)
Updated cyclical variance σc 2(t) = wc* σc 2(t-n) + (1-wc)*[ Yc (t) - μc(t)] 2
Again, the historical weighting parameter "wc" is typically a user-specified parameter, which may be adjusted to "tune" the baseline model to adapt at a desired rate to changes in the input variable Y(t) occurring over time. FIG. 9D is graph illustrating the seasonal or scheduled component 910, which is referred to as S(t). The seasonal or schedule component S(t) may be captured through a variety of analyses applied to the historical data for the input variable Y(t), or it may be established by a user input reflecting a known event, such as scheduled maintenance or another scheduled activity that will have a predictable effect on the observed system behavior. In general, the seasonal or schedule component S(t) represents known or predictable system events, which should be removed from the erratic component e(t) to avoid masking abnormal system behavior with this component of the representative input variable Y(t). Thus, in addition to the global trend component G(t) and the cyclical component C(t) described above, the seasonal or scheduled component S(t) should also be removed from the input variable Y(t) to obtain the unpredictable erratic component e(t).
When the seasonal or scheduled component S(t) is a known deterministic parameter, it may be subtracted directly from Y(t) to when computing the erratic component e(t). In addition, when the seasonal or schedule component S(t) is captured from measured data or modeled as a probabilistic function, it may be treated as a probabilistic component of the input variable Y(t) represented by a running mean and variance just like the global trend component G(t) and the cyclical component C(t). Therefore, the derivation of the seasonal or schedule component S(t) will not be further developed, except to note that it may be included as a probabilistic component of the input variable Y(t) along with the captured global trend component G(t) and the captured cyclical component C(t). The following parameters are defined for seasonal or schedule component S(t) the for this purpose:
Seasonal or scheduled mean = μs(t)
Seasonal or scheduled variance = σs2(t)
Moreover, it will be appreciated that the input variable Y(t) may be decomposed into any number of additional components corresponding to observable or scheduled events, which can each be factored into the time-based baseline model maintained by the ABE 20 in a similar manner. For instance, it will be obvious to one skilled in the art that instead of adjusting the input measurement value on the cyclical component, Yc(t), to obtain a zero mean cyclical component C(t), an adjusted input measurement YG(t) could have been defined in a similar manner to Yc(t) and used for the global trend signature computation. In the latter case, the global trend could be viewed as a mean correction factor, and should normally be around zero (zero mean) except when unusual events happen and the global trends start shifting. Finally, any adjusted input measurement should take into account all the other components.
FIG. 9E is a graph illustrating the erratic component 912, which is referred to as e(t). In particular, the input variable Y(t) is decomposed into the global trend component G(t), the cyclical component C(t), the seasonal or scheduled component S(t), and the erratic component e(t) using the decomposition equation 904, which is shown adjacent to FIG. 9A. According to this equation, Y(t) and e(t) may be expressed as follows:
Y(t) = G(t) + C(t) + S(t) + e(t)
θ(t) = Y(t) - G(t) - C(t) - S(t)
In the monitoring system 5, the erratic component e(t) is used as the basic process control variable for the predictive and monitoring operations because it is derived from the measured value Y(t) with the global trend component G(t), the cyclical component C(t), and the seasonal or scheduled component S(t) removed. Therefore, the erratic component e(t) is selected for further analysis because it reflects the abnormal system behaviors without the masking effects of the other components, which can be relatively large, widely fluctuating, and constantly changing over time. Moreover, the global trend component G(t), the cyclical component C(t), and the seasonal or scheduled component S(t) are separately modeled and tracked to detect any model changes that occur in these components over time. In order to use the erratic component e(t) as the basic process control variable for the predictive and monitoring operations, it is helpful to combine the global trend component G(t), the cyclical component C(t), and the seasonal or scheduled component S(t) into a single construct, which is referred to as the time-based baseline mean and variance. From these statistics, other meaningful values, such as the standard deviation and standard error, may be computed for the baseline in the usual ways. The time-based baseline mean and variance [μ(t) and σ2(t)] can be computed from the component running means and variances, as shown below:
Time-based baseline mean μ(t) = μG(t)+ μc(t) + μs(t)
Time-based baseline variance σ2(t) = σG 2(t)+ σG 2(t) + σs2(t)
From the preceding development, it should be appreciated that the ABE 20 maintains a time-based baseline mean and variance [μ(t) and σ2(t)] for each input variable, as represented by the input variable Y(t), for each time index in the time cycle. Further, the ABE 20 updates the time-based baseline mean and variance by updating and summing the components as shown above, for each time trial. In this manner, the baseline model maintained by the ABE 20 automatically tracks changes in the baseline components over time. Accordingly, the ABE 20 is operative to return the time-based baseline mean and variance [μ(t) and σ2(t)] for each input variable for any given time index (t). In addition, the ABE 20 is operative to update the baseline model using the input data received for any given time trial.
Returning to FIG. 1 , the engine system 10 interacts with the ABE 20 by invoking the ABE for a particular current or future time trial, as desired. To do so, the engine system 10 only needs to specify the desired time index, and the ABE 20 returns the time-based baseline mean and variance [μ(t) and σ2(t)] for that time index. In particular, for each current time trial (t), the engine system 10 typically invokes the ABE 20 to obtain the time-based baseline mean and variance [μ(t) and σ2(t)] for that time trial. The monitoring system 5 then uses the time-based baseline mean and variance for the current time trial in a monitoring process, which is also referred to as an imputing process for the current time trial. Also for each time trial, the engine system 10 typically invokes the ABE 20 for a number of future time trials, represented by the time index (t+T), to obtain the time-based baseline means and variances [μ(t+T) and σ2(t+T)] for those future time periods. The monitoring system 5 then uses these time-based baseline means and variances for the future time trials in a forecasting process, which is also referred to as a predicting process.
Referring to a current time trial (t) for illustrative purposes, the engine system 10 receives the real-time input variable Y(t) for the current time trial (t) and invokes the ABE 20 by sending the time index for the current time trial to the ABE. The ABE 20 then computes the time-based baseline mean and variance [μ(t) and σ2(t)] from its components as shown below:
Time-based baseline mean μ(t) = μG(t)+ μG(t) + μs(t)
Time-based baseline variance σ2(t) = σG 2(t)+ σG 2(t) + σs2(t)
The ABE 20 then returns the time-based baseline mean and variance [μ(t) and σ2(t)] for the current time trial (t) to the engine system 10, which computes the erratic component e'(t) for the input variable Y(t) for the current time trial (t) as shown below:
e(t) = Y(t) - μ(t)
It should be noted that the time-based baseline mean and variance is returned to the engine system prior updating the baseline with the input measurement from the current time trial. This is to ensure the current erratic component remains independent from the learned history and that the baseline be unbiased when the estimation and alert detection are done as described below.
The engine system 10 then invokes the ACE 30 by sending the erratic component e(t) to the ACE. The ACE 30 is a multivariate correlation engine described in the following commonly-owned patents and patent applications, which are hereby incorporated by reference: U.S. Patent No. 5,835,902; U.S. Patent No. 6,216,119; U.S. Patent No. 6,289,330; and co-pending U.S. Patent Application Serial No. 09/811 ,163. In general, the ACE computes an imputed estimate for the erratic component based on the values received for the other input variables for the current time trial and learned parameters representing observed relationships between the other variables and the subject input variable Y(t). That is, the monitoring or imputing step is based on realtime data for the current time trial for all of the input variables other than the subject input variable Y(t), and the covariance parameters represented by the learned parameters in the ACE, which represent observed relationships between the other variables and the subject input variable Y(t).
The ACE 30 then returns the imputed estimate and the standard error for the imputed estimate to the engine system 10. These parameters are referred to as shown below:
Imputed estimate = e (t)
Standard error for imputed estimate = η!(t)
The engine system 10 then invokes the real-time alert detector 42 by sending it the erratic component e(t), the imputed estimate
Figure imgf000024_0001
and the standard error η!(t) for the imputed estimate. The real-time alert detector 42 then computes a threshold value for the imputed estimate based on a confidence value, in this instance the standard error ηx(t). However, it will be appreciated that other confidence values may be used, such as those based on the variance, standard deviation, or other statistics associated with the baseline model, the imputed estimate, or other parameters.
Typically, the threshold value may be computed as a weighting factor multiplied by the selected confidence value, where the weighting factor is a user-defined parameter that allows the user to tune this aspect of the monitoring system. Further, an alert status for the imputed estimate is typically determined by comparing the magnitude of the difference between the measured erratic component e(t) and the imputed estimate e (t) for the erratic component to the threshold value, as shown below:
le(t) - e*(t)l < 'rt) In this equation, a value of six (6) has been found to be suitable for the user-defined weighting parameter k-i. The real-time alert detector 42 informs the alarm generator 46 of any impute estimate alerts, which tracks and scores the alerts, and generates one or more alarms. Of course, other threshold tests may be employed, as appropriate for the particular monitored system. However, the threshold test shown above has been found to be suitable for monitoring a typical complex computer network or server system.
The preceding description applies to the operation of the monitoring system 5 in the monitoring or imputing phase for the current time trial, which involves the computation of the imputed estimate for the current time trial, and the determination of imputed estimate alerts based on a comparison of the imputed estimate and the corresponding erratic component e(t), which is derived from the received measurement for the subject input variable Y(t) for the current time trial.
However, the imputing phase is only the first half of the operation. For each time trial, the monitoring system 5 also updates the ABE 20 and the ACE 30 for the data received during the current time trial, and then computes forecast estimates and associated forecast alerts for one or more future time trials. These future time trials are referred to by the time index (t+T), and it should be understood that the monitoring system 5 typically performs the forecasting processed described below for each of several future time trials (e.g., every time trial in the cycle, every time trial for multiple cycles into the future, or another desired set of future time trials). Because the forecasting process is identical for each future time trial except for the time index, the forecasting process is described only for the representative future time index (t+T). In sum, it should be appreciated that the entire monitoring process is repeated for each of several input variables, for the current time trial and for each of many future time trials, as the system runs over time.
To update the baseline model, the engine system 10 invokes the ABE 20 and supplies it with the real-time input value Y(t) for the current time trial. The ABE 20 then updates the time-based mean and variance as described previously in accordance with the equation shown below for the global trend component G(t), cyclical component C(t), and seasonal or scheduled component S(t). Additional components may be added as appropriate for a particular application. Updated global trend mean μG(t) = wG *μG(t-1) + (1-wG)*Y(t)
Updated global trend variance σG 2(t) = wG* σG 2(t-1) + (1-wG)*[Y(t) - μG(t)] 2
Adjusted input measurement for cyclical component YG(t) = Y(t) - μG (t-1).
Updated cyclical mean μc(t) = wcc(t-n) + (1-wc)*Yc(t)
Updated cyclical variance σc 2(t) = wc* σc 2(t-n) + (1 -wc)*[Yc(t) - μc(t)] 2
Adjusted input measurement for seasonal component Ys(t) = Y(t)-μG(t-1)-μc(t-n).
Updated seasonal or scheduled mean μs(t) = ws *μs(t-m) + (1-ws)*Ys(t)
Updated seasonal or variance σs2(t) = ws * σs 2(t-m) + (1-ws)*[Ys(t) - μs(t)] 2
Updated time-based baseline mean μ(t) = μG(t)+ μc(t) + μs(t)
Updated time-based baseline variance σ2(t) = σG 2(t)+ σc 2(t) + σs 2(t)
Where n corresponds to the number of time slice per cycle and m corresponds to the number of time slices since of the last similar seasonal or schedule event. In addition, the ACE 30 automatically updates its learned parameters in the manner described in U.S. Patent No. 5,835,902; U.S. Patent No. 6,216,119; U.S. Patent No. 6,289,330; and co-pending U.S. Patent Application Serial No. 09/811 ,163.
Still referring to FIG. 1 , the engine system 10 interacts with the ABE 20 to perform forecasting operations by invoking the ABE for a particular representative future time trial (t+T). To do so, the engine system 10 only needs to specify the future time index (t+T), and the ABE 20 returns the expected time-based baseline mean and variance [μ(t+T) and σ2(t+T)] for that time index. In particular, for each time trial the engine system 10 typically invokes the ABE 20 to obtain the time-based baseline mean and variance [μ(t+T) and σ2(t+T)]. The monitoring system then uses the time-based baseline mean and variance for the selected future time trial in the forecast process, which is also referred to as a predicting process. Referring to the representative future time trial (t+T) for illustrative purposes, the engine system 10 invokes the ABE 20 by sending the time index (t+T) to the ABE. The ABE 20 then computes the time-based baseline mean and variance [μ(t+T) and σ2(t+T)] from its components as shown below:
Time-based baseline mean μ(t+T) = μG(t+T)+ μc(t+T) + μs(t+T)
Time-based baseline variance σ2(t+T) = σG (t+T)+ σG 2(t+T) + σs 2(t+T)
The engine system 10 then invokes the ACE 30 for the selected future time trial. The ACE 30 then returns the forecast estimate and the standard error for the forecast estimate to the engine system 10. These parameters are referred to as shown below:
Forecast estimate = eF(t+T)
Standard error for forecast estimate = ηF(t+T)
The engine system 10 then invokes the forecast alert detector 44 by sending it the forecast estimate eF(t+T), the standard error for forecast estimate ηF(t+T), and other values that may be used as or to compute a confidence value, such as the time-based baseline standard deviation σ(t+T) (which is the square-root of the time-based baseline variance σ2(t+T) returned by the ABE 20). The forecast alert detector 44 then computes a threshold value for the forecast estimate based on a confidence value, in this particular example the time-based baseline standard deviation σ(t). In addition, the threshold value may be a weighting factor multiplied by the confidence value, where the weighting factor is a user-defined parameter that allows the user to tune this aspect of the monitoring system. Further, an alert status for the forecast estimate is typically determined by comparing the magnitude of the forecast estimate to the threshold value, as shown below:
leF(t+T)l < k2σ(t+T)
In this equation, a value of two (2) has been found to be suitable for the user-defined weighting parameter k2. The forecast alert detector 44 then informs the alarm generator 46 of any forecast estimate alerts, which tracks and scores the alerts, and generates on or more alarms. It should also be appreciated that the forecast alert detector 44 could apply other suitable threshold tests instead of or in addition to the test described above. In particular, a forecast estimate threshold test may be based on the standard error for forecast estimate ηF(t+T), the forecast estimate eF(t+T) and the expected baseline mean μ(t+T).
It should also be understood that other types of updating algorithms may be employed for the baseline model maintained by the ABE 20. For example, the following updating algorithm has been found to be suitable for monitoring systems applied to typical computer servers and networks. For this derivation, let (t) denote the time index for the current time trial and (t-1) represent the time index for the previous time trial. In addition, let (T) denote a cycle duration and (n) represent the number of time slices in a cycle. Also let the time index (k) correspond to the current time point "modulo n" as it appears in the successive cycles, that is, if t modulo n equals k, then the index for time trial t-n also correspond to time index k.
At time (t), the monitoring system 5 receives the input value Y(t) for the current time trial, and invokes the ABE 20 for the current time trial, which returns the time- based baseline mean and variance as of the previous time trial [μ(t-1) and σ2(t-1)]. To develop the updating algorithm, let k denote the current time index within the cycle. The baseline mean and variances for the current time trial are then given as:
μ(t) = μG(t - 1) + μc(t-n)
Figure imgf000028_0001
The erratic component e(t) is then computed as follows:
e(t) = Y(t) - μG(t-1) - μc(t-n)
The global trend component G(t) is then updated as follows:
G(t) = Y(t) - μG(t-n)
μG(t) = [ wG(t) * G(t) + μG(t-1 ) ] / (1 + wG(t))
d = G(t) - μG(t)
σG 2(t) = wG(t) * d * d + σG 2(t-1) / (1 + wG(t))
The cyclical component C(t) is then updated as follows:
C(t) = Y(t) - μG(t )
μc(k) = [ wc(t) * C(t) + μG(t-n) ] / (1 + wG(t))
d = C(t) - μc(t-n)
σG 2(k) = wc(t) * d * d + σc 2(t-n) / (1 + wc(t))
Where wG and wG are weighting factors that depend on learning blocks, adjusting for missing values, etc. Note that the mean and variance values at time point t-n correspond to the values of the mean and variance at time slice k before the corresponding values are updated. In addition, the time-based baseline means may be initialized in the following manner: Let μG(t) = 0; μc(k) = 0; σG 2(t) = 1 ; k = 0,...,N-1 , .
For the first cycle, compute only the baseline for the global trend using a learning block of length N, and set μG(k) = Y(t), which is equivalent to setting wG(t) = 1 / (1 + k). At the end of the first cycle, for k = 0,...,N-1 , compute the following:
μc(k) = μG(k) - μG(t)
σc 2(k) = σG 2(t) / 2
σG 2(t) = σG 2(t) / 2
Then, for cycle 2, compute the following:
wG (k) = 1 / (1 + t mod NC); where NC is the learning block in terms of cycles
wG(t) as below.
And thereafter, compute the following:
wG(t) = 1 / ( N* + 1 mod N*); where N* may or may not equal N,
wc(t) = 1 / (NC + [t/N*] mod NC).
Those skilled in the art will appreciate that other updating formulas using learning blocks and other suitable methodologies may be implemented.
FIG. 2 is a logic flow diagram illustrating a routine 200 for setting up or provisioning the adaptive baseline engine (ABE) 20 component of the monitoring system 5 in advance of running the system. The following description of routine 200 will also refer to FIGS. 9A-E, 10 and 11 for clarity. In step 202, the ABE 20 receives historical data for a representative input variable Y(t), as shown in FIG. 9A. The ABE 20 then processes the historical data to establish the time-based baseline mean and variance [μ(t) and σ2(t)] for that variable. However, the use of historical data is optional, and the ABE 20 may alternatively begin the process without historical data. In this case, the ABE 20 uses a temporally contiguous running mean (i.e., the mean and variance for the cyclical component [μc(t) and σc 2(t)] computed horizontally for the first cycle). Thereafter, the cyclical component C(t) is computed as a time-slice mean and variance, as shown in FIG. 11.
Step 202 is followed by step 204, in which the ABE 20 defines a baseline cycle, such as a week, month or year. For example, the cycle is depicted as a calendar week in FIG. 11. Step 204 is followed by step 206, in which the ABE 20 receives a user input parameter "n" that defines the number of time slices in the cycle, as shown in FIG. 10. The ABE 20 then samples the input data to obtain a time series of "n" data points for each cycle, which is also shown in FIG. 10. Alternatively, if the input data is already expressed as discrete data points, the user input parameter "n" sets the cycle length by defining the number of data points included in each cycle. In either case, step 206 is followed by step 208, in which the ABE 20 effectively divides the historical data into a number of repeating cycles, each having "n" time slices that are numbered consecutively to establish a time index (i.e., time slices 1 through n) that repeats for each cycle. Step 208 is followed by step 210, in which the ABE 20 computes the global trend component G(t) of the input variable Y(t) by computing a temporally contiguous running mean and variance for the input variable Y(t), as depicted in FIG. 9B. That is, the global trend component G(t) is captured by computing a weighted mean and variance [μG(t), σG 2(t)] horizontally along the graph of Y(t) shown in FIG. 9A to obtain the global trend component G(t), which is shown in FIG. 9B.
Step 210 is followed by step 212, in which the ABE 20 computes the cyclical component C(t) of the input variable Y(t) by computing the time slice running mean and variance as shown in FIG. 11. That is, the cyclical component C(t) is captured by computing a weighted mean and variance [μc(t), σG 2(t)] vertically for the same time slice over multiple cycles, for each time increment n the cycle, as shown in FIG. 11. The resulting cyclical component C(t) is shown in FIG. 9C. The ABE 20 may then capture other components of the input variable Y(t), such as a mean and variance [μs(t), σs2(t)] for the seasonal or scheduled component S(t) shown in FIG. 9D and other components that may be appropriate for a particular application. In each case, the objective of the provisioning routine 200 is to decompose the input variable Y(t) into a number of components representing relatively predictable behaviors so that the erratic component e(t) may be isolated for further processing, as shown in FIG. 9E. In addition, the decomposition process allows the relatively predictable components of the time-based baseline for the input variable Y(t) to be captured and modeled individually. In this particular example, the baseline includes the global trend component G(t), the cyclical component C(t), and the seasonal or scheduled component S(t). Step 212 is followed by step 214, in which the ABE 20 is ready for the run routine 300 shown on FIG. 3.
FIG. 3 is a logic flow diagram illustrating a routine 300 for running the ABE 20. In step 302, the ABE 20 receives a time index (t) for the current time trial from the engine system 10. Step 302 is followed by step 304, in which the ABE 20 computes the time-based baseline mean for the current time trial (t), typically by summing the component means as shown below:
Time-based baseline mean μ(t) = μG(t)+ μc(t) + μs(t)
Step 304 is followed by step 306, in which the ABE 20 computes the time-based baseline variance for the current time trial (t), typically by summing the component variances as shown below:
Time-based baseline variance σ2(t) = σG 2(t)+ σG (t) + σs2(t)
Step 306 is followed by step 308, in which the ABE 20 returns the time-based baseline mean and variance [μ(t), σ2(t)] for the current time trial to the engine system 10.
Step 308 is followed by step 310, in which the ABE 20 receives the input value Y(t) for the current time trial from the engine system 10. Step 310 is followed by step 312, in which the ABE 20 updates the baseline mean μ(t), typically by updating the component means as shown below:
Updated global trend mean μG(t) = wGG(t-1) + (1-wG)*Y(t)
Updated cyclical mean μc(t) = wcc(t-n) + (1-wc) * [Y(t) - μG(t)]
Updated seasonal or scheduled mean = μs(t)
Updated time-based baseline mean μ(t) = μG(t)+ μG(t) + μs(t)
Step 312 is followed by step 314, in which the ABE 20 updates the baseline variance σ2(t), typically by updating the component variances as shown below:
Updated global trend variance σG 2(t) = wG* σG 2(t-1) + (1-wG) * [Y(t) - μG(t)] 2
Updated cyclical variance σc 2(t) = wc* σc 2(t-n) + (1-wc) * [Y(t) - μc(t-n) - μG(t)] 2
Updated seasonal or variance = σs2(t)
Updated time-based baseline variance σ2(t) = σG 2(t)+ σG 2(t) + σs2(t)
After the baseline model has been updated as shown above, step 314 is followed by step 316, in which the ABE 20 receives a time index (t+T) for a future time trial from the engine system 10. Step 316 is followed by step 318, in which the ABE 20 computes the time-based baseline mean for the future time trial (t+T), typically by summing the component means as shown below:
Time-based baseline mean μ(t+T) = μG(t)+ μG(t+T) + μs(t+T) Step 318 is followed by step 320, in which the ABE 20 computes the time-based baseline variance for the future time trial (t+T), typically by summing the component means as shown below:
Time-based baseline variance σ2(t+T) = σG 2(t)+ σG 2(t+T) + σs2(t+T)
Step 320 is followed by step 322, in which the ABE 20 returns the baseline mean and variance for the future time trial [μ(t+T), σ2(t+T)] to the engine system 10. Step 322 is followed by step 324, in which the ABE 20 determines whether the engine system 10 has specified another future time trial. If the engine system 10 has specified another future time trial, the "YES" branch is followed to step 316, and the ABE 20 computes and returns the baseline mean and variance [μ(t+T), σ2(t+T)] for the additional future time trial. If the engine system 10 has not specified another future time trial, the "NO" branch is followed to step 303, in which the ABE 20 waits to receive another time index from the engine system 10 for the next current time trial. It should be understood that the routine 300 described above is typically performed simultaneously or sequentially for each of several input variables, and for each in a continual series of current time trials.
FIG. 4 is a logic flow diagram illustrating a routine 400 for running the engine system 10 of the monitoring system 5. In step 402, the engine system 10 receives a measurement for the representative input variable Y(t) for the current time trial (t). Step 402 is followed by step 404, in which the engine system 10 invokes the ABE 20 by sending it the time index for the current time trial (t). Step 404 is followed by step 406, in which the engine system 10 receives the time-based baseline mean and variance [μ(t), σ2(t)] for the current time trial (t) from the ABE 20. Step 406 is followed by step 408, in which the engine system 10 computes the erratic component e(t) for the current time trial as shown below:
e(t) = Y(t) - μ(t) Step 408 is followed by step 410, in which the engine system 10 invokes the ACE 30 by sending it the erratic component e(t) for the current time trial. Step 410 is followed by step 412, in which the engine system 10 receives the imputed estimate e!(t) and the standard error for imputed estimate = η!(t) from the ACE 30. Step 410 is followed by step 412, in which the engine system 10 invokes the
ABE 20 by sending it the time index for a future time trial (t+T). Step 412 is followed by step 414, in which the engine system 10 receives the time-based baseline mean and variance [μ(t+T), σ2(t+T)] for the future time trial (t+T) from the ABE 20. Step 416 is followed by step 418, in which the engine system 10 invokes the alarm service 20 by sending it one or more of the following parameters: the erratic component e(t), the baseline mean μ(t), the baseline variance σ2(t), the imputed estimate e^t), the standard error for the imputed estimate ηJ(t), the forecast estimate eF(t+T), and the standard error for forecast estimate ηF(t+T). In particular, for an illustrative embodiment, the engine system 10 may send the erratic component e(t), the imputed estimate eJ(t), and the standard error for the imputed estimate ηx(t) to the real-time alert detector 42; and it may send the forecast estimate eF(t+T) and baseline variance σ2(t+T) and/or the standard error for forecast estimate = ηF(t+T) to the forecast alert detector 44.
Step 418 is followed by step 420, in which the engine system 10 waits for the next time trial, at which time it loops back to step 402 and repeats routine 400 for the next time trial. It should also be understood that for each time trial, routine 400 is repeated for each of several desired input variables, as represented by the input variable Y(t), and that steps 414 through 418 are typically be repeated for each of several future time trials, as desired.
FIG. 5 is a logic flow diagram illustrating a routine 500 for running the ACE 30. In step 502, the ACE 30 receives the erratic components e(t) for all of the applicable input variables, as represented by the input variable Y(t), for the current time trial (t). Step 502 is followed by step 504, in which the ACE 30 computes an imputed estimate e t) for each input variable. In particular, the imputed estimate e ϊ) for the subject input variable Y(t) is based on the data received for the current time trial for all of the other input variables and the learned parameters in the ACE 30, which represent observed relationships between the erratic component e(t) for the subject input variable and the erratic components for the other input variables. This allows the imputed estimate e (t) to reflect the data received for the current time trial and covariance relationships based on historical time trials represented by the learned parameters in the ACE 30.
Step 504 is followed by step 506, in which the ACE 30 invokes the real-time alert detector 42 and sends it the erratic component e(t) and the el(t). Step 506 is followed by step 508, in which the ACE 30 updates its leaned parameters using the data received for the input variables for the current time trial. This updating process is described in the following commonly-owned patents and patent applications: U.S. Patent No. 5,835,902; U.S. Patent No. 6,216,119; U.S. Patent No. 6,289,330; and co- pending U.S. Patent Application Serial No. 09/811 ,163.
Step 508 is followed by step 510, in which the ACE 30 computes the forecast estimate eF(t+T). Note that the learning is performed by the ACE 30 after the imputing step 504 and before the forecasting step 510. Step 512 is followed by step 514, in which the ACE 30 returns the forecast estimate eF(t+T) and the standard error for forecast estimate ηF(t+T) to the engine system 10, which in turn invokes the alert detector 42 by sending it one or more of these parameters and/or the baseline variance σ2(t+T). Following step 514, the ACE 30 waits for the next time trial. Again, it should also be understood that for each time trial, routine 500 is repeated for each of several desired input variables, as represented by the input variable Y(t), and that steps 510 through 512 are typically repeated for each of several future time trials, as desired.
FIG. 6 is a logic flow diagram illustrating a routine 600 for running the real-time alert detector 42. In step 602, the real-time alert detector 42 receives the erratic component e(t) for the subject input variable Y(t) from the engine system 10. Step 602 is followed by step 604, in which the real-time alert detector 42 receives the imputed estimate e!(t) for the subject input variable Y(t) from the engine system 10. Step 604 is followed by step 606, in which the real-time alert detector 42 performs a threshold alert test, typically by performing the following operation:
Figure imgf000036_0001
As noted previously, in this equation a value of six (6) has been found to be suitable for the user-defined weighting parameter k-i. Step 606 is followed by step 608, in which the real-time alert detector 42 determines whether an alert status is indicated based in the preceding or a similar threshold test. If an alert status is indicated, the "YES" branch is followed to step 610, in which the real-time alert detector 42 informs the alarm generator 46 of the imputed estimate alert. Step 610 is followed by step 612, in which the real-time alert detector 42 waits for the next time trial, at which time it loops to step 602 and repeats routine 600 for the next time trial. If an alert status is not indicated, the "NO" branch is followed from step 608 to step 610, in which the real-time alert detector 42 waits for the next time trial without informing the alarm generator 46 of an imputed estimate alert. It should be understood that for each time trial, routine 600 is typically repeated for each of several desired input variables, as represented by the input variable Y(t).
FIG. 7 is a logic flow diagram illustrating a routine 700 for running the forecast alert detector 44. In step 702, the forecast alert detector 44 receives the baseline variance σ2(t+T) for the subject input variable Y(t) for the future time trial (t+T) from the engine system 10. Step 702 is followed by step 704, in which the forecast alert detector 44 receives the forecast estimate eF(t+T) for the future time trial (t+T). Step 704 is followed by step 706, in which the forecast alert detector 44 performs a threshold alert test, typically by performing the following operation:
leF(t+T)l < k2σ(t+T)
As noted previously, in this equation a value of two (2) has been found to be suitable for the user-defined weighting parameter k2. Step 706 is followed by step 708, in which the forecast alert detector 44 determines whether an alert status is indicated based on the preceding or a similar threshold test. If an alert status is indicated, the "YES" branch is followed to step 710, in which the forecast alert detector 44 informs the alarm generator 46 of the forecast estimate alert. Step 710 is followed by step 712, in which the forecast alert detector 44 waits for the next time trial, at which time it loops to step 702 and repeats routine 700 for the next time trial. If an alert status is not indicated, the "NO" branch is followed from step 708 to step 710, in which the forecast alert detector 44 waits for the next time trial without informing the alarm generator 46 of a forecast estimate alert. It should also be understood that for each time trial, routine 700 is typically repeated for each of several future time trials, and for each of several desired input variables, as desired.
FIG. 8 is a logic flow diagram illustrating a routine 800 for running the alarm generator 46. In step 802, the alarm generator 46 receives real-time alerts from the real-time alert detector 42. Step 802 is followed by step 804, in which the alarm generator 48 receives forecast alerts from the forecast alert detector 44. Step 804 is followed by step 806, in which the alarm generator 46 weights the alerts and computes an alarm score. Many different weighting and scoring methodologies will become apparent to those skilled in the art. In particular, it is desirable to make the specific weights and alert combinations user-defined parameters so that the user may tune this aspect of the monitoring function based on experience with the system. Step 806 is followed by step 808, in which the forecast alarm generator 46 determines whether an alarm status is indicated based on the preceding alert scoring process. If an alarm status is indicated, the "YES" branch is followed to step 810, in which the alarm generator 46 activates an alarm condition, and may take additional actions, such as restarting a software application, rebooting a server, activating a back- up systems, rerouting network traffic, dropping nonessential or interruptible activities, transmitting e-mail alarms, and so forth. Step 810 is followed by step 812, in which the alarm generator 46 waits for the next time trial, at which time it loops to step 802 and repeats routine 800 for the next time trial. If an alert status is not indicated, the "NO" branch is followed from step 808 to step 810, in which the alarm generator 46 waits for the next time trial without activating an alarm condition. It should also be understood that for each time trial, routine 800 is typically repeated for each of several future time trials, and for each of several desired input variables, as desired.
In view of the foregoing, it will be appreciated that present invention greatly improves upon preexisting methods and systems for modeling, estimating, predicting and detecting abnormal behavior in computer networks that exhibit unpredictable abnormal events superimposed on top of rapidly fluctuating and continuously changing normally operational patterns. It should be understood that the foregoing relates only to the exemplary embodiments of the present invention, and that numerous changes may be made therein without departing from the spirit and scope of the invention as defined by the following claims.

Claims

CLAIMS The invention claimed is:
1. A method for analyzing and predicting the behavior of a system, comprising the steps of: continually receiving measurements defining signatures for a plurality of input variables reflecting the behavior of the system, each signature comprising a time series of measurements including historical measurements for past time trials and a current measurement for a current time trial; computing a time-based baseline mean and variance for a selected input variable based on the historical measurements for the selected input variable; computing an erratic component for the selected input variable by comparing the measurement for the selected input value for the current time trial to the time-based baseline mean for the selected input variable; computing an imputed estimate for the selected input variable based on erratic components computed for other input variables for the current time trial and learned parameters reflecting observed relationships between the erratic component for the selected input variable and the erratic components for the other input variables; and determining an alert status for the imputed estimate based on the imputed estimate and the erratic component for the input variable for the current time trial.
2. The method of claim 1 , further comprising the step of updating the time- based baseline mean and variance for the selected input variable based on the measurement received for the selected input value for the current time trial.
3. The method of claim 2, comprising the step of repeating the steps of claims 1 and 2 for multiple input variables.
4. The method of claim 3, further comprising the step of continually repeating the steps of claims 1 , 2, and 3 for multiple current time trials.
5. The method of claim 1 , wherein the step of determining the alert status for the imputed estimate comprises the steps of: computing a confidence value associated with the imputed estimate; computing a threshold value for the imputed estimate based on the confidence value; computing an imputed estimate alert value reflecting a difference between the imputed estimate and the erratic component for the selected input variable to the threshold value; and determining an alert status for the imputed estimate by comparing the alert value to the threshold value.
6. The method of claim 5, wherein: the confidence value comprises a standard error associated with the imputed estimate; and the threshold value is based on the standard error and a user-defined configuration parameter.
7. The method of claim 2, wherein the step of computing the time-based baseline mean and variance for the selected input variable comprises the steps of: decomposing the signature for the input variable into components; computing a mean and variance for each component; combining the means for the components to obtain the time-based baseline mean; and combining the variances for the components to obtain the time-based baseline variance.
8. The method of claim 7, wherein the step of decomposing the signature for the input variable into components comprises the steps of: defining a repeating cycle for the historical measurements; dividing the cycle into a plurality of contiguous time periods wherein each cycle comprises a similar set of time periods, each time period having a corresponding time index; computing a global trend component for the selected input variable reflecting measurements received for the selected input variable for temporally contiguous time indices; and computing a cyclical component for the selected input variable reflecting data accumulated across multiple cycles for each time index.
9. The method of claim 2, wherein the step of updating the time-based baseline mean and variance for the selected input variable comprises the steps of: computing an updated baseline mean based on a weighted sum comprising the baseline mean for the selected input variable and the measurement for the current time trial for the selected input variable; and computing an updated baseline variance based on a weighted sum comprising the baseline variance for the selected input variable and the measurement for the current time trial for the selected input variable.
10. The method of claim 7, wherein the step of updating the time-based baseline mean and variance for the selected input variable comprises the steps of: computing an updated time-based baseline mean by: computing an updated mean for each component based on a weighted sum comprising the baseline mean for the component and the measurement received for the selected input variable for the current time trial, and summing the updated means for the components; and computing an updated time-based baseline variance by: computing an updated variance for each component based on a weighted sum comprising the baseline variance for the component and the measurement received for the selected input variable for the current time trial, and summing the updated variances for the components.
11. The method of claim 4, further comprising the steps of: receiving imputed estimate alerts corresponding to multiple input measurements; weighting the alerts; computing an alert score based on the weighted alerts; and determining whether to activate an alarm condition based on the alert score.
12. A computer storage medium storing computer-executable instruction for performing the method of claim 1.
13. An apparatus configured to perform the method of claim 1.
14. A method for analyzing and predicting the behavior of a system, comprising the steps of: continually receiving measurements defining signatures for a plurality of input variables reflecting the behavior of the system, each signature comprising a time series of measurements including historical measurements for past time trials and a current measurement for a current time trial; computing a time-based baseline mean and variance for a selected input variable based on the historical measurements for the selected input variable; computing an erratic component for the selected input variable by comparing the measurement for the selected input value for the current time trial to the baseline mean for the selected input variable; computing a forecast estimate for the selected input variable based on erratic components computed for other input variables and learned parameters reflecting observed relationships between the erratic component for the selected input variable and the erratic components for the other input variables; and determining an alert status for the forecast estimate.
15. The method of claim 14, further comprising the step of updating the time- based baseline mean and variance for the selected input variable based on the measurement received for the selected input value for the current time trial.
16. The method of claim 15, comprising the step of repeating the steps of claims 14 and 15 for multiple input variables.
17. The method of claim 16, further comprising the step of continually repeating the steps of claims 14, 15, and 16 for multiple future forecasts.
18. The method of claim 17, further comprising the step of continually repeating the steps of claims 14, 15, 16, and 17 for multiple current time trials.
19. The method of claim 15, wherein the step of determining the alert status for the forecast estimate comprises the steps of: computing a threshold value for the forecast estimate; and determining an alert status for the forecast estimate by comparing the forecast estimate to the threshold value.
20. The method of claim 19, wherein the threshold value is based on the time- based baseline variance for the selected input variable and a user-defined configuration parameter.
21. The method of claim 15, wherein the step of computing the time-based baseline mean and variance for the selected input variable comprises the steps of: decomposing the signature for the input variable into components; computing a mean and variance for each component; combining the means for the components to obtain the time-based baseline mean; and combining the variances for the components to obtain the time-based baseline variance.
22. The method of claim 21 , wherein the step of decomposing the signature for the input variable into components comprises the steps of: defining a repeating cycle for the historical measurements; dividing the cycle into a plurality of contiguous time periods wherein each cycle comprises a similar set of time periods, each time period having a corresponding time index; computing a global trend component for the selected input variable reflecting measurements received for the selected input variable for temporally contiguous time indices; and computing a cyclical component for the selected input variable reflecting data accumulated across multiple cycles for each time index.
23. The method of claim 15, wherein the step of updating the time-based baseline mean and variance for the selected input variable comprises the steps of: computing an updated baseline mean based on a weighted sum comprising the baseline mean for the selected input variable and the measurement for the current time trial for the selected input variable; and computing an updated baseline variance based on a weighted sum comprising the baseline variance for the selected input variable and the measurement for the current time trial for the selected input variable.
24. The method of claim 21 , wherein the step of updating the time-based baseline mean and variance for the selected input variable comprises the steps of: computing an updated time-based baseline mean by: computing an updated mean for each component based on a weighted sum comprising the baseline mean for the component and the measurement received for the selected input variable for the current time trial, and summing the updated means for the components; and computing an updated time-based baseline variance by: computing an updated variance for each component based on a weighted sum comprising the baseline variance for the component and the measurement received for the selected input variable for the current time trial, and summing the updated variances for the components.
25. The method of claim 18, further comprising the steps of: receiving imputed estimate alerts corresponding to multiple input measurements; weighting the alerts; computing an alert score based on the weighted alerts; and determining whether to activate an alarm condition based on the alert score.
26. A computer storage medium storing computer-executable instruction for performing the method of claim 18.
27. An apparatus configured to perform the method of claim 18.
28. A method for analyzing and predicting the behavior of a system, comprising the steps of: continually receiving measurements defining signatures for a plurality of input variables reflecting the behavior of the system, each signature comprising a time series of measurements including historical measurements for past time trials and a current measurement for a current time trial; computing a time-based baseline mean and variance for a selected input variable based on the historical measurements for the selected input variable; computing an erratic component for the selected input variable by comparing the measurement for the selected input value for the current time trial to the baseline mean for the selected input variable; computing an imputed estimate for the selected input variable based on erratic components computed for other input variables for the current time trial and learned parameters reflecting observed relationships between the erratic component for the selected input variable and the erratic components for the other input variables; determining an alert status for the imputed estimate based on the imputed estimate and the erratic component for the input variable for the current time trial; computing a forecast estimate for the selected input variable based on erratic components computed for other input variables and learned parameters reflecting observed relationships between the erratic component for the selected input variable and the erratic components for the other input variables; and determining an alert status for the forecast estimate.
29. The method of claim 28, further comprising the step of updating the time- based baseline mean and variance for the selected input variable based on the measurement received for the selected input value for the current time trial.
30. The method of claim 29, comprising the step of repeating the steps of claims 28 and 28 for multiple input variables.
31. The method of claim 30, further comprising the step of continually repeating the steps of claims 28, 29, and 30 for multiple future forecasts.
32. The method of claim 31 , further comprising the step of continually repeating the steps of claims 28, 29, 30, and 31 for multiple current time trials.
33. The method of claim 32, wherein the step of determining the alert status for the imputed estimate comprises the steps of: computing a confidence value for the imputed estimate; computing a threshold value for the imputed estimate based on the confidence value; computing an imputed estimate alert value reflecting a difference between the imputed estimate and the erratic component for the selected input variable to the threshold value for the imputed estimate; and determining an alert status for the imputed estimate by comparing the alert value to the threshold value for the imputed estimate.
34. The method of claim 33, wherein: the confidence value for the imputed estimate comprises a standard error associated with the imputed estimate; and the threshold value for the imputed estimate is based on the standard error and a user-defined configuration parameter.
35. The method of claim 34, wherein the step of determining the alert status for the forecast estimate comprises the steps of: computing a threshold value for the forecast estimate; and determining an alert status for the forecast estimate by comparing the forecast estimate to the threshold value.
36. The method of claim 35, wherein the threshold value is based on the time- based baseline variance for the selected input variable and a user-defined configuration parameter.
37. The method of claim 36, wherein the step of computing the time-based baseline mean and variance for the selected input variable comprises the steps of: decomposing the signature for the input variable into components; computing a mean and variance for each component; combining the means for the components to obtain the time-based baseline mean; and combining the variances for the components to obtain the time-based baseline variance.
38. The method of claim 37, wherein the step of decomposing the signature for the input variable into components comprises the steps of: defining a repeating cycle for the historical measurements; dividing the cycle into a plurality of contiguous time periods wherein each cycle comprises a similar set of time periods, each time period having a corresponding time index; computing a global trend component for the selected input variable reflecting measurements received for the selected input variable for temporally contiguous time indices; and computing a cyclical component for the selected input variable reflecting data accumulated across multiple cycles for each time index.
39. The method of claim 38, wherein the step of updating the time-based baseline mean and variance for the selected input variable comprises the steps of: computing an updated baseline mean based on a weighted sum comprising the baseline mean for the selected input variable and the measurement for the current time trial for the selected input variable; and computing an updated baseline variance based on a weighted sum comprising the baseline variance for the selected input variable and the measurement for the current time trial for the selected input variable.
40. The method of claim 39, wherein the step of updating the time-based baseline mean and variance for the selected input variable comprises the steps of: computing an updated time-based baseline mean by: computing an updated mean for each component based on a weighted sum comprising the baseline mean for the component and the measurement received for the selected input variable for the current time trial, and summing the updated means for the components; and computing an updated time-based baseline variance by: computing an updated variance for each component based on a weighted sum comprising the baseline variance for the component and the measurement received for the selected input variable for the current time trial, and summing the updated variances for the components.
41. The method of claim 40, further comprising the steps of: receiving imputed estimate and forecast estimate alerts corresponding to multiple input measurements; weighting the alerts; computing an alert score based on the weighted alerts; and determining whether to activate an alarm condition based on the alert score.
42. A computer storage medium storing computer-executable instruction for performing the method of claim 41.
43. An apparatus configured to perform the method of claim 41.
PCT/US2002/040837 2001-12-19 2002-12-19 Method and system for analyzing and predicting the behavior of systems WO2003054704A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP02795970A EP1468361A1 (en) 2001-12-19 2002-12-19 Method and system for analyzing and predicting the behavior of systems
AU2002360691A AU2002360691A1 (en) 2001-12-19 2002-12-19 Method and system for analyzing and predicting the behavior of systems
CA2471013A CA2471013C (en) 2001-12-19 2002-12-19 Method and system for analyzing and predicting the behavior of systems

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US34231201P 2001-12-19 2001-12-19
US60/342,312 2001-12-19

Publications (1)

Publication Number Publication Date
WO2003054704A1 true WO2003054704A1 (en) 2003-07-03

Family

ID=23341268

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/040837 WO2003054704A1 (en) 2001-12-19 2002-12-19 Method and system for analyzing and predicting the behavior of systems

Country Status (5)

Country Link
US (1) US7280988B2 (en)
EP (1) EP1468361A1 (en)
AU (1) AU2002360691A1 (en)
CA (1) CA2471013C (en)
WO (1) WO2003054704A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2400469A (en) * 2003-04-07 2004-10-13 Sun Microsystems Inc Generating and managing a knowledge base in a computer
WO2005045678A2 (en) 2003-10-27 2005-05-19 Netuitive, Inc. Computer performance estimation system configured to take expected events into consideration
US7395187B2 (en) 2006-02-06 2008-07-01 International Business Machines Corporation System and method for recording behavior history for abnormality detection
US7624174B2 (en) 2003-05-22 2009-11-24 Microsoft Corporation Self-learning method and system for detecting abnormalities
US20100138639A1 (en) * 2008-12-02 2010-06-03 Microsoft Corporation Sandboxed execution of plug-ins
WO2011128922A1 (en) * 2010-04-15 2011-10-20 Neptuny S.R.L. Automated upgrading method for capacity of it system resources
WO2014090821A2 (en) * 2012-12-13 2014-06-19 Telefonica, S.A. A method and a system for the self-adjustment optimization of a computing device
US20140266790A1 (en) * 2013-03-13 2014-09-18 Masimo Corporation Systems and methods for monitoring a patient health network
EP2911060A4 (en) * 2013-05-21 2016-06-08 Huawei Tech Co Ltd Method and device for determining resource leakage and for predicting resource usage state
EP3128425A1 (en) * 2015-08-07 2017-02-08 Tata Consultancy Services Limited System and method for smart alerts
US10628838B2 (en) 2013-04-24 2020-04-21 International Business Machines Corporation System and method for modeling and forecasting cyclical demand systems with dynamic controls and dynamic incentives

Families Citing this family (108)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7065566B2 (en) * 2001-03-30 2006-06-20 Tonic Software, Inc. System and method for business systems transactions and infrastructure management
WO2002101643A2 (en) 2001-06-08 2002-12-19 Netuitive, Inc. Automated analyzers for estimation systems
US20030208592A1 (en) * 2002-05-01 2003-11-06 Taylor William Scott System and method for proactive maintenance through monitoring the performance of a physical interface
US7539608B1 (en) * 2002-05-10 2009-05-26 Oracle International Corporation Techniques for determining effects on system performance of a memory management parameter
JP2004047885A (en) * 2002-07-15 2004-02-12 Matsushita Electric Ind Co Ltd Monitoring system and monitoring method of semiconductor manufacturing apparatus
US7281041B2 (en) * 2002-10-31 2007-10-09 Hewlett-Packard Development Company, L.P. Method and apparatus for providing a baselining and auto-thresholding framework
EP1652088B1 (en) * 2003-07-25 2017-09-13 Philips Intellectual Property & Standards GmbH Method and device for monitoring a system
US7873724B2 (en) * 2003-12-05 2011-01-18 Microsoft Corporation Systems and methods for guiding allocation of computational resources in automated perceptual systems
US20050216241A1 (en) * 2004-03-29 2005-09-29 Gadi Entin Method and apparatus for gathering statistical measures
WO2005116887A1 (en) * 2004-05-25 2005-12-08 Arion Human Capital Limited Data analysis and flow control system
US7729269B1 (en) * 2004-06-09 2010-06-01 Sprint Communications Company L.P. Method for identifying and estimating mean traffic for high traffic origin-destination node pairs in a network
US7756043B1 (en) 2004-06-09 2010-07-13 Sprint Communications Company L.P. Method for identifying high traffic origin-destination node pairs in a packet based network
US20060020924A1 (en) * 2004-06-15 2006-01-26 K5 Systems Inc. System and method for monitoring performance of groupings of network infrastructure and applications using statistical analysis
US20070260350A1 (en) * 2004-08-20 2007-11-08 Maxim Zagrebnov Method for Improving Efficiency of a Manufacturing Process Such as a Semiconductor Fab Process
US20060092851A1 (en) * 2004-10-29 2006-05-04 Jeffrey Forrest Edlund Method and apparatus for communicating predicted future network requirements of a data center to a number of adaptive network interfaces
US7693982B2 (en) * 2004-11-12 2010-04-06 Hewlett-Packard Development Company, L.P. Automated diagnosis and forecasting of service level objective states
US20060150188A1 (en) * 2004-12-21 2006-07-06 Manuel Roman Method and apparatus for supporting soft real-time behavior
US8001527B1 (en) 2004-12-21 2011-08-16 Zenprise, Inc. Automated root cause analysis of problems associated with software application deployments
US7743286B2 (en) * 2005-05-17 2010-06-22 International Business Machines Corporation Method, system and program product for analyzing demographical factors of a computer system to address error conditions
US7908357B2 (en) * 2005-09-21 2011-03-15 Battelle Memorial Institute Methods and systems for detecting abnormal digital traffic
US20070174074A1 (en) * 2006-01-24 2007-07-26 International Business Machine Corporation Method, system, and program product for detecting behavior change in transactional data
US7523014B2 (en) * 2006-02-06 2009-04-21 Sun Microsystems, Inc. High-sensitivity detection of an anomaly in a quantized signal
US7533070B2 (en) * 2006-05-30 2009-05-12 Honeywell International Inc. Automatic fault classification for model-based process monitoring
US7987106B1 (en) * 2006-06-05 2011-07-26 Turgut Aykin System and methods for forecasting time series with multiple seasonal patterns
US7546198B2 (en) * 2006-08-03 2009-06-09 Spectral Dynamics, Inc. Dynamic noise-reduction baselining for real-time spectral analysis of internal combustion engine knock
US20080052145A1 (en) * 2006-08-10 2008-02-28 V2 Green, Inc. Power Aggregation System for Distributed Electric Resources
US20080056144A1 (en) * 2006-09-06 2008-03-06 Cypheredge Technologies System and method for analyzing and tracking communications network operations
US7917240B2 (en) * 2006-09-29 2011-03-29 Fisher-Rosemount Systems, Inc. Univariate method for monitoring and analysis of multivariate data
US7676706B2 (en) * 2006-11-03 2010-03-09 Computer Associates Think, Inc. Baselining backend component response time to determine application performance
US7673191B2 (en) * 2006-11-03 2010-03-02 Computer Associates Think, Inc. Baselining backend component error rate to determine application performance
US7840377B2 (en) * 2006-12-12 2010-11-23 International Business Machines Corporation Detecting trends in real time analytics
US8271981B2 (en) * 2006-12-12 2012-09-18 International Business Machines Corporation Detecting an extraordinary behavior
US20080183444A1 (en) * 2007-01-26 2008-07-31 Grichnik Anthony J Modeling and monitoring method and system
US20100131082A1 (en) * 2007-05-23 2010-05-27 Chandler Larry S Inversion Loci Generator and Criteria Evaluator for Rendering Errors in Variable Data Processing
US7930146B2 (en) * 2007-05-23 2011-04-19 Chandler Larry S Errors-in-variables data processing including essential weighting of mapped path-oriented deviations with normal component discrimination
US8032867B2 (en) * 2007-06-05 2011-10-04 Computer Associates Think, Inc. Programmatic root cause analysis for application performance management
US20090077156A1 (en) * 2007-09-14 2009-03-19 Srinivas Raghav Kashyap Efficient constraint monitoring using adaptive thresholds
US20090198559A1 (en) * 2008-02-06 2009-08-06 Disney Enterprises, Inc. Multi-resolutional forecasting system
US20090248722A1 (en) * 2008-03-27 2009-10-01 International Business Machines Corporation Clustering analytic functions
US9363143B2 (en) * 2008-03-27 2016-06-07 International Business Machines Corporation Selective computation using analytic functions
US20100257533A1 (en) 2009-04-01 2010-10-07 Soluto Ltd Computer applications scheduler
US10255419B1 (en) 2009-06-03 2019-04-09 James F. Kragh Identity validation and verification system and associated methods
US8984282B1 (en) 2009-06-03 2015-03-17 James F. Kragh Identity validation and verification system and associated methods
US9805213B1 (en) 2009-06-03 2017-10-31 James F. Kragh Identity validation and verification system and associated methods
US9280684B1 (en) 2009-06-03 2016-03-08 James F. Kragh Identity validation and verification system and associated methods
JPWO2011046228A1 (en) * 2009-10-15 2013-03-07 日本電気株式会社 System operation management apparatus, system operation management method, and program storage medium
US20110098973A1 (en) * 2009-10-23 2011-04-28 Computer Associates Think, Inc. Automatic Baselining Of Metrics For Application Performance Management
US8644813B1 (en) 2009-12-02 2014-02-04 Sprint Communications Company L.P. Customer initiated mobile diagnostics service
EP2360590A3 (en) 2009-12-10 2011-10-26 Prelert Ltd. Apparatus and method for analysing a computer infrastructure
US20110161048A1 (en) * 2009-12-31 2011-06-30 Bmc Software, Inc. Method to Optimize Prediction of Threshold Violations Using Baselines
US8478569B2 (en) * 2010-03-26 2013-07-02 Bmc Software, Inc. Auto adjustment of baseline on configuration change
US8457928B2 (en) * 2010-03-26 2013-06-04 Bmc Software, Inc. Automatic determination of dynamic threshold for accurate detection of abnormalities
WO2011156080A1 (en) * 2010-06-09 2011-12-15 Siemens Corporation Systems and methods for learning of normal sensor signatures, condition monitoring and diagnosis
EP2583225A4 (en) * 2010-06-21 2014-03-05 Hewlett Packard Development Co System for testing and certifying a virtual appliance on a customer computer system
WO2012029500A1 (en) * 2010-09-01 2012-03-08 日本電気株式会社 Operations management device, operations management method, and program
US8560544B2 (en) 2010-09-15 2013-10-15 International Business Machines Corporation Clustering of analytic functions
US8620851B2 (en) 2010-11-23 2013-12-31 Novell, Inc. System and method for determining fuzzy cause and effect relationships in an intelligent workload management system
US20150235312A1 (en) 2014-02-14 2015-08-20 Stephen Dodson Method and Apparatus for Detecting Rogue Trading Activity
CN103339613B (en) * 2011-01-24 2016-01-06 日本电气株式会社 Operation management device, operation management method and program
CA2831900A1 (en) * 2011-04-04 2012-10-11 Numerex Corp. Systems and method for monitoring and managing the communications of remote devices
US20120272103A1 (en) * 2011-04-21 2012-10-25 Microsoft Corporation Software operability service
CN102270271B (en) * 2011-05-03 2014-03-19 北京中瑞泰科技有限公司 Equipment failure early warning and optimizing method and system based on similarity curve
US8880442B2 (en) * 2011-06-03 2014-11-04 Beet, Llc Method for generating a machine heartbeat
US9928130B2 (en) * 2011-06-03 2018-03-27 Beet, Llc Method for generating a machine heartbeat
US20130166337A1 (en) * 2011-12-26 2013-06-27 John MacGregor Analyzing visual representation of data
US8588764B1 (en) * 2012-01-26 2013-11-19 Sprint Communications Company L.P. Wireless network edge guardian
US10353738B2 (en) * 2012-03-21 2019-07-16 International Business Machines Corporation Resource allocation based on social networking trends in a networked computing environment
EP2645257A3 (en) 2012-03-29 2014-06-18 Prelert Ltd. System and method for visualisation of behaviour within computer infrastructure
US10162693B1 (en) 2012-10-18 2018-12-25 Sprint Communications Company L.P. Evaluation of mobile device state and performance metrics for diagnosis and troubleshooting of performance issues
US9450839B2 (en) * 2012-11-09 2016-09-20 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Efficient network bandwidth utilization in a distributed processing system
US9386463B1 (en) 2012-11-19 2016-07-05 Sprint Communications Company L.P. Application risk analysis
KR101980834B1 (en) * 2012-11-28 2019-05-21 한국전자통신연구원 Method and apparatus for managing applications based on contexts
GB2517147A (en) 2013-08-12 2015-02-18 Ibm Performance metrics of a computer system
GB2519941B (en) 2013-09-13 2021-08-25 Elasticsearch Bv Method and apparatus for detecting irregularities on device
US10114148B2 (en) 2013-10-02 2018-10-30 Nec Corporation Heterogeneous log analysis
EP2887236A1 (en) * 2013-12-23 2015-06-24 D square N.V. System and method for similarity search in process data
CN103927695B (en) * 2014-04-22 2017-11-24 国家电网公司 Ultrashort-term wind power prediction method based on self study complex data source
US9400731B1 (en) * 2014-04-23 2016-07-26 Amazon Technologies, Inc. Forecasting server behavior
US10048670B2 (en) * 2014-05-08 2018-08-14 Beet, Llc Automation operating and management system
US11017330B2 (en) * 2014-05-20 2021-05-25 Elasticsearch B.V. Method and system for analysing data
CN104317681B (en) * 2014-09-02 2017-09-08 上海交通大学 For the behavioral abnormal automatic detection method and detecting system of computer system
US10417076B2 (en) * 2014-12-01 2019-09-17 Uptake Technologies, Inc. Asset health score
US9866578B2 (en) * 2014-12-03 2018-01-09 AlphaSix Corp. System and method for network intrusion detection anomaly risk scoring
US9646264B2 (en) * 2015-02-25 2017-05-09 International Business Machines Corporation Relevance-weighted forecasting based on time-series decomposition
US10254751B2 (en) 2015-06-05 2019-04-09 Uptake Technologies, Inc. Local analytics at an asset
US10579750B2 (en) 2015-06-05 2020-03-03 Uptake Technologies, Inc. Dynamic execution of predictive models
US10176279B2 (en) 2015-06-05 2019-01-08 Uptake Technologies, Inc. Dynamic execution of predictive models and workflows
US10878385B2 (en) 2015-06-19 2020-12-29 Uptake Technologies, Inc. Computer system and method for distributing execution of a predictive model
JP6680866B2 (en) * 2015-08-24 2020-04-15 ビート インク Method and system for generating mechanical beats
US9959158B2 (en) * 2015-10-13 2018-05-01 Honeywell International Inc. Methods and apparatus for the creation and use of reusable fault model components in fault modeling and complex system prognostics
US10101049B2 (en) 2015-11-12 2018-10-16 Oracle International Corporation Determining parameters of air-cooling mechanisms
US10102033B2 (en) * 2016-05-26 2018-10-16 International Business Machines Corporation Method and system for performance ticket reduction
US10565046B2 (en) * 2016-09-01 2020-02-18 Intel Corporation Fault detection using data distribution characteristics
CN106407077A (en) * 2016-09-21 2017-02-15 广州华多网络科技有限公司 A real-time alarm method and system
US10771369B2 (en) * 2017-03-20 2020-09-08 International Business Machines Corporation Analyzing performance and capacity of a complex storage environment for predicting expected incident of resource exhaustion on a data path of interest by analyzing maximum values of resource usage over time
US11783046B2 (en) 2017-04-26 2023-10-10 Elasticsearch B.V. Anomaly and causation detection in computing environments
US11621969B2 (en) 2017-04-26 2023-04-04 Elasticsearch B.V. Clustering and outlier detection in anomaly and causation detection for computing environments
US11014780B2 (en) 2017-07-06 2021-05-25 Otis Elevator Company Elevator sensor calibration
US10829344B2 (en) 2017-07-06 2020-11-10 Otis Elevator Company Elevator sensor system calibration
US11023280B2 (en) * 2017-09-15 2021-06-01 Splunk Inc. Processing data streams received from instrumented software using incremental finite window double exponential smoothing
US10922204B2 (en) * 2018-06-13 2021-02-16 Ca, Inc. Efficient behavioral analysis of time series data
US11132248B2 (en) * 2018-11-29 2021-09-28 Nec Corporation Automated information technology system failure recommendation and mitigation
US11150165B2 (en) * 2019-03-01 2021-10-19 Dell Products, L.P. System and method for configuration drift detection and remediation
CN112580903A (en) * 2019-09-27 2021-03-30 华晨宝马汽车有限公司 Method and apparatus for evaluating quality stability of engine and storage medium
US20220180227A1 (en) * 2020-12-08 2022-06-09 Intuit Inc. Forecasting based on bernoulli uncertainty characterization
US11392473B2 (en) * 2020-12-10 2022-07-19 International Business Machines Corporation Automated extension of program data storage
US11294929B1 (en) * 2021-06-09 2022-04-05 Aeec Smart water data analytics
CN113870304B (en) * 2021-12-07 2022-06-07 江西中业智能科技有限公司 Abnormal behavior detection and tracking method and device, readable storage medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5748508A (en) 1992-12-23 1998-05-05 Baleanu; Michael-Alin Method and device for signal analysis, process identification and monitoring of a technical process
US5835902A (en) 1994-11-02 1998-11-10 Jannarone; Robert J. Concurrent learning and performance information processing system
US6182022B1 (en) 1998-01-26 2001-01-30 Hewlett-Packard Company Automated adaptive baselining and thresholding method and system
US6216119B1 (en) 1997-11-19 2001-04-10 Netuitive, Inc. Multi-kernel neural network concurrent learning, monitoring, and forecasting system
US20020049687A1 (en) 2000-10-23 2002-04-25 David Helsper Enhanced computer performance forecasting system

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE68922567T2 (en) 1988-10-06 1995-08-17 Toshiba Kawasaki Kk Neural network system.
US5239594A (en) 1991-02-12 1993-08-24 Mitsubishi Denki Kabushiki Kaisha Self-organizing pattern classification neural network system
US5355434A (en) 1991-08-19 1994-10-11 Toyoda Koki Kabushiki Kaisha Method and apparatus for performing learning in a neural network
JPH05342191A (en) * 1992-06-08 1993-12-24 Mitsubishi Electric Corp System for predicting and analyzing economic time sequential data
US5461699A (en) * 1993-10-25 1995-10-24 International Business Machines Corporation Forecasting using a neural network and a statistical forecast
KR0142483B1 (en) * 1994-02-28 1998-08-17 고지마 게이지 Non-linear time sequential data predicting device
EP0770967A3 (en) * 1995-10-26 1998-12-30 Koninklijke Philips Electronics N.V. Decision support system for the management of an agile supply chain
US5727128A (en) 1996-05-08 1998-03-10 Fisher-Rosemount Systems, Inc. System and method for automatically determining a set of variables for use in creating a process model
JP4300281B2 (en) 1996-11-20 2009-07-22 ネテュイティブ インコーポレイテッド Multi-kernel neural network simultaneous learning, monitoring and prediction system
DE59712546D1 (en) 1997-07-31 2006-04-06 Sulzer Markets & Technology Ag Method for monitoring systems with mechanical components
US6327677B1 (en) 1998-04-27 2001-12-04 Proactive Networks Method and apparatus for monitoring a network environment
US6205431B1 (en) * 1998-10-29 2001-03-20 Smart Software, Inc. System and method for forecasting intermittent demand
US6792399B1 (en) 1999-09-08 2004-09-14 C4Cast.Com, Inc. Combination forecasting using clusterization
US6801945B2 (en) 2000-02-04 2004-10-05 Yahoo ! Inc. Systems and methods for predicting traffic on internet sites
US6879988B2 (en) * 2000-03-09 2005-04-12 Pkware System and method for manipulating and managing computer archive files
US6928398B1 (en) * 2000-11-09 2005-08-09 Spss, Inc. System and method for building a time series model
US20030036890A1 (en) 2001-04-30 2003-02-20 Billet Bradford E. Predictive method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5748508A (en) 1992-12-23 1998-05-05 Baleanu; Michael-Alin Method and device for signal analysis, process identification and monitoring of a technical process
US5835902A (en) 1994-11-02 1998-11-10 Jannarone; Robert J. Concurrent learning and performance information processing system
US6289330B1 (en) 1994-11-02 2001-09-11 Netuitive, Inc. Concurrent learning and performance information processing system
US6216119B1 (en) 1997-11-19 2001-04-10 Netuitive, Inc. Multi-kernel neural network concurrent learning, monitoring, and forecasting system
US6182022B1 (en) 1998-01-26 2001-01-30 Hewlett-Packard Company Automated adaptive baselining and thresholding method and system
US20020049687A1 (en) 2000-10-23 2002-04-25 David Helsper Enhanced computer performance forecasting system

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2400469A (en) * 2003-04-07 2004-10-13 Sun Microsystems Inc Generating and managing a knowledge base in a computer
US7624174B2 (en) 2003-05-22 2009-11-24 Microsoft Corporation Self-learning method and system for detecting abnormalities
WO2005045678A2 (en) 2003-10-27 2005-05-19 Netuitive, Inc. Computer performance estimation system configured to take expected events into consideration
WO2005045678A3 (en) * 2003-10-27 2006-03-09 Netuitive Inc Computer performance estimation system configured to take expected events into consideration
US7395187B2 (en) 2006-02-06 2008-07-01 International Business Machines Corporation System and method for recording behavior history for abnormality detection
US7711520B2 (en) 2006-02-06 2010-05-04 International Business Machines Corporation System and method for recording behavior history for abnormality detection
US9705905B2 (en) 2008-12-02 2017-07-11 Microsoft Technology Licensing, Llc Sandboxed execution of plug-ins
US20100138639A1 (en) * 2008-12-02 2010-06-03 Microsoft Corporation Sandboxed execution of plug-ins
US8745361B2 (en) * 2008-12-02 2014-06-03 Microsoft Corporation Sandboxed execution of plug-ins
US10542022B2 (en) 2008-12-02 2020-01-21 Microsoft Technology Licensing, Llc Sandboxed execution of plug-ins
WO2011128922A1 (en) * 2010-04-15 2011-10-20 Neptuny S.R.L. Automated upgrading method for capacity of it system resources
WO2012020329A1 (en) * 2010-04-15 2012-02-16 Caplan Software Development S.R.L. Automated upgrading method for capacity of it system resources
WO2014090821A2 (en) * 2012-12-13 2014-06-19 Telefonica, S.A. A method and a system for the self-adjustment optimization of a computing device
WO2014090821A3 (en) * 2012-12-13 2014-09-12 Telefonica, S.A. A method and a system for the self-adjustment optimization of a computing device
WO2014164139A1 (en) * 2013-03-13 2014-10-09 Masimo Corporation Systems and methods for monitoring a patient health network
US20140266790A1 (en) * 2013-03-13 2014-09-18 Masimo Corporation Systems and methods for monitoring a patient health network
US9965946B2 (en) 2013-03-13 2018-05-08 Masimo Corporation Systems and methods for monitoring a patient health network
US10672260B2 (en) 2013-03-13 2020-06-02 Masimo Corporation Systems and methods for monitoring a patient health network
US11645905B2 (en) 2013-03-13 2023-05-09 Masimo Corporation Systems and methods for monitoring a patient health network
US10628838B2 (en) 2013-04-24 2020-04-21 International Business Machines Corporation System and method for modeling and forecasting cyclical demand systems with dynamic controls and dynamic incentives
EP2911060A4 (en) * 2013-05-21 2016-06-08 Huawei Tech Co Ltd Method and device for determining resource leakage and for predicting resource usage state
US9846601B2 (en) 2013-05-21 2017-12-19 Huawei Technologies Co., Ltd. Method and apparatuses for determining a leak of resource and predicting usage of resource
EP3128425A1 (en) * 2015-08-07 2017-02-08 Tata Consultancy Services Limited System and method for smart alerts

Also Published As

Publication number Publication date
CA2471013A1 (en) 2003-07-03
US20030139905A1 (en) 2003-07-24
AU2002360691A1 (en) 2003-07-09
US7280988B2 (en) 2007-10-09
CA2471013C (en) 2011-07-26
EP1468361A1 (en) 2004-10-20

Similar Documents

Publication Publication Date Title
US7280988B2 (en) Method and system for analyzing and predicting the performance of computer network using time series measurements
US8140454B2 (en) Systems and/or methods for prediction and/or root cause analysis of events based on business activity monitoring related data
CN104350471B (en) Method and system for detecting anomalies in real-time in processing environment
US7610214B1 (en) Robust forecasting techniques with reduced sensitivity to anomalous data
US8370194B2 (en) Robust forecasting techniques with reduced sensitivity to anomalous data
US7778715B2 (en) Methods and systems for a prediction model
US7636051B2 (en) Status monitor apparatus
US7409316B1 (en) Method for performance monitoring and modeling
US7082381B1 (en) Method for performance monitoring and modeling
US8516104B1 (en) Method and apparatus for detecting anomalies in aggregated traffic volume data
CN113518011B (en) Abnormality detection method and apparatus, electronic device, and computer-readable storage medium
US20100063773A1 (en) Nonparametric method for determination of anomalous event states in complex systems exhibiting non-stationarity
US20170169143A1 (en) System for maintenance recommendation based on performance degradation modeling and monitoring
US11966319B2 (en) Identifying anomalies in a data center using composite metrics and/or machine learning
US20050216793A1 (en) Method and apparatus for detecting abnormal behavior of enterprise software applications
US20140108324A1 (en) Data analytic engine towards the self-management of complex physical systems
US20040088406A1 (en) Method and apparatus for determining time varying thresholds for monitored metrics
US20050097207A1 (en) System and method of predicting future behavior of a battery of end-to-end probes to anticipate and prevent computer network performance degradation
CN109981328A (en) A kind of fault early warning method and device
CN111045894A (en) Database anomaly detection method and device, computer equipment and storage medium
EP2350933A1 (en) Performance analysis of applications
JP2004531815A (en) Diagnostic system and method for predictive condition monitoring
US7197428B1 (en) Method for performance monitoring and modeling
Downey A novel changepoint detection algorithm
Fu et al. SPC methods for nonstationary correlated count data with application to network surveillance

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2471013

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2002795970

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2002795970

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP