CROSS REFERENCE TO RELATED APPLICATIONS

[0001]
This application is a continuation of U.S. patent application Ser. No. 09/802,482, filed Mar. 9, 2001, which claims the benefit of priority under 35 U.S.C. §119(e) to U.S. Provisional Application No. 60/188,102 filed Mar. 9, 2000.
BACKGROUND OF THE INVENTION

[0002]
1. Field of the Invention

[0003]
The present invention relates generally to equipment and process monitoring, and more particularly to monitoring systems instrumented with sensors that measure correlated phenomena. The present invention further relates to modeling instrumented, realtime processes using the aggregate sensor information to ascertain information about the state of the process.

[0004]
2. Description of the Related Art

[0005]
Conventional methods are known for monitoring equipment or processes—generically “systems”—using sensors to measure operational parameters of the system. The data values from sensors can be observed directly to understand how the system is functioning. Alternatively, for unattended operation, it is known to compare sensor data values against stored or predetermined thresholds in an automated fashion, and generate an exception condition or alarm requiring human intervention only when a sensor datum value exceeds a corresponding threshold.

[0006]
A number of problems exist with monitoring systems using thresholds. One problem is the difficulty of selecting a threshold for a dynamic parameter that avoids a burdensome number of false alarms, yet catches real alarms and provides sufficient warning to take corrective action when a system parameter—as measured by a sensor—moves outside of acceptable operation. Another problem is posed by sensor failure, which may result in spurious parameter values. It may not be clear from a sensor data value that the sensor has failed. Such a failure can entirely undermine monitoring of the subject system.

[0007]
In systems with a plurality of sensors measuring correlated phenomena in the system, it is known to use certain methods to consider all sensors in aggregate to overcome some of these problems. By observing the behavior of all the sensor data values in aggregate, it can be possible to dramatically improve monitoring without suffering unduly from false and missed alarms. Also, knowledge of how all the correlated parameters behave in unison can help determine that a sensor has failed, when isolated monitoring of data from that sensor in and of itself would not indicate the sensor failure.

[0008]
Known methods for viewing aggregate sensor data typically employ a modeling function that embodies prior knowledge of the system. One such technique known as “firstprinciples” modeling requires a welldefined mathematical description of the dynamics of the system, which is used as a reference against which current aggregate sensor data can be compared to view nascent problems or sensor failures. However, this technique is particularly vulnerable to even the slightest structural change in the observed system. The mathematical model of the system is often very costly to obtain, and in many cases, may not be reasonably possible at all.

[0009]
Another class of techniques involves empirically modeling the system as a “black box” without discerning any specific mechanics within the system. System modeling using such techniques can be easier and more resilient in the face of structural system changes. Modeling in these techniques typically involves providing some historic sensor data corresponding to desired or normal system operation, which is then used to “train” the model.

[0010]
One particular technique is described in U.S. Pat. No. 5,987,399, the teachings of which are incorporated herein by reference. As taught therein, sensor data is gathered from a plurality of sensors measuring correlated parameters of a system in a desired operating state. This historical data is used to derive an empirical model comprising certain acceptable system states. Realtime sensor data from the system is provided to a modeling engine embodying the empirical model, which computes a measure of the similarity of the realtime state to all prior known acceptable states in the model. From that measure of similarity, an estimate is generated for expected sensor data values. The realtime sensor data and the estimated expected sensor data are compared, and if there is a discrepancy, corrective action can be taken.

[0011]
The bounded area ratio test (BART) as taught in U.S. Pat. No. 5,987,399, is a well known state of the art similarity operator, wherein an angle is used to gauge the similarity of two values. The similarity operator is insensitive to variations across the training set range of the particular signal or sensor. BART uses the sensor range of values from low to high across all snapshots in the training set to form the hypotenuse of a triangle—preferably a right triangle—which is its base. BART, therefore, forms a straight line with minimum and maximum expected values disposed at either end. During system monitoring, BART periodically maps two points representative of an expected and a parameter value onto the base. These two points are placed, according to their values, within the range of values in the training set. A comparison angle is formed at the apex, opposite the base, by drawing a line to the apex from each of the points and the angle is the basis by which two values are compared for similarity. Furthermore, BART typically locates the apex point at a point above the median or mean of the range, and at a height that provides a right angle at the apex (for easy computation).

[0012]
BART does not exhibit equal sensitivity to similarity values across the base range. Differences between values in the middle of the range, i.e., around 45° are amplified, and differences at the ends of the range, i.e., at 0° or 90° are diminished. Consequently, prior models, such as those employing a BART operator or other operators, might not optimally model all nonlinear systems. In certain value ranges for certain sensors, these prior models may be inaccurate. Apart from selecting new or additional training data, both of which require additional time, as well as computer capacity, without providing any guarantee of improving the model, no effective way has been found in the prior art to adjust the empirical model to improve modeling fidelity.

[0013]
Thus, there is a need for system monitoring mathematical operators for accurately measuring similarities between a monitored system and expected system states, flexibly modeling and improving model sensitivity such that component failures can be accurately predicted and so that acceptably functioning components are not prematurely replaced.
SUMMARY OF THE INVENTION

[0014]
It is an object of the present invention to provide for equipment and process monitoring using empirical modeling with a class of improved operators for determining measures of similarities between modeled or known states of a system and a current or selected state of the system.

[0015]
The present invention provides for monitoring equipment, processes or other closed systems instrumented with sensors and periodically, aperiodically or randomly recording a system snapshot therefrom. Thus, a monitored system, e.g., equipment, a process or any closed system, is empirically modeled using improved operators for determining system state similarity to known acceptable states. The improved operators provide for modeling with heightened or adjusted sensitivity to system state similarity for particular ranges of sensor values. The invention thus provides for greater possible fidelity of the model to the underlying monitored system.

[0016]
The similarity between a system data snapshot and a selected known state vector is measured based on similarity values between corresponding parameter values from the data snapshot and the selected known state vector. Each similarity value is effectively computed according to a ratio of angles formed by the difference of the corresponding data values and by the range of corresponding values across all the known state vectors. Importantly, the ratio of angles is affected by the location within this range of the data value from the snapshot and the data value from the selected known state vector. The similarity engine can be flexibly honed to focus as through a lens on certain parts of the range with altered sensitivity, expanding or contracting those parts.

[0017]
The similarity operator class of this invention can be used in a multivariate state estimation technique (MSET) type process monitoring technique as taught in U.S. Pat. No. 5,764,509, and can also be used for a variety of complex signal decomposition applications. In these applications, a complex signal can be decomposed into components (e.g., a frequency domain or wavelets), which are input to this MSET similarity engine. The similarity operator can be embodied both as general purpose computer software for a mainframe computer or a microprocessor or as code for an embedded processor. The result of the similarity operation can be used for generating estimated or expected states, or for identifying which one of a finite set of patterns stored in memory that most closely matches the input pattern.

[0018]
By allowing selection of a curve instead of the base of a triangle in combination with angle selection, the present invention adds the advantage of providing a lens function for “lensing” certain parts of the range for greater or lesser sensitivity to differences that, ultimately, are reflected in the similarity for the two values. Where ease of computation is not an issue, the present invention provides improved lensing flexibility that allows freeform location of the apex point at different locations above the base.

[0019]
The advantage afforded by lensing is that focus can be directed to different regions of interest in a particular range for a given sensor, when performing a similarity determination between a current state vector and a prior known expected state vector. Using this similarity determination an estimated state vector can be computed for a realtime system that is being monitored and modeled using MSET or the like. The model performance can be honed for improved model estimates using the improved class of similarity operators of the present invention.

[0020]
The similarity operation of the present invention is rendered particularly nonlinear and adaptive. The present invention can be used in system state classification, system state alarm notification, system virtual parameter generation, system component end of life determination and other techniques where an empirical model is useful. The present invention overcomes the above restrictions of the prior art methods by providing more flexibility to adapt and improve modeling fidelity.

[0021]
The present invention also includes a similarity engine in an information processor embodiment. Preprocessed known state vectors characteristic of a desired operating condition, i.e., historic data, of a monitored system are stored in memory. A data acquisition unit acquires system parameter data, such as realtime sensor data, representative of the current state of the monitored system. The information processor is coupled to the memory and to the data acquisition system, and operates to process one system state frame or snapshot at a time from the data acquisition unit against the known state vector snapshots in the memory. A measure of similarity is computed between system state snapshots from the data acquisition unit and each known state vector in the memory. An expected state vector is computed from the snapshot for the monitored system.

[0022]
The information processor may be further disposed to compare the state snapshots with the expected state vectors sequentially, to determine if they are the same or different. This determination can be used for an alarm or event trigger.

[0023]
Briefly summarized, in a machine for monitoring an instrumented process or for analyzing one or more signals, an empirical modeling module for modeling nonlinearly and linearly correlated signal inputs using a nonlinear angular similarity function with variable sensitivity across the range of a signal input is described. Different anglebased similarity functions can be chosen for different inputs to improve sensitivity particular to the behavior of that input. Sections of interest within a range of a signal input can be lensed for particular sensitivity.
BRIEF DESCRIPTION OF THE DRAWINGS

[0024]
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as the preferred mode of use, further objectives and advantages thereof, is best understood by reference to the following detailed description of the embodiments in conjunction with the accompanying drawings, wherein:

[0025]
[0025]FIG. 1 is a functional block diagram of an example of an empirical modeling apparatus for monitoring an instrumented system;

[0026]
[0026]FIGS. 2 and 3 are diagrams showing an example of a prior art similarity operator;

[0027]
[0027]FIG. 4 is a diagram generally showing an example of a similarity operator according to the invention;

[0028]
[0028]FIG. 5 illustrates distillation of sensor data to create a training data set representative of the similarity domain;

[0029]
[0029]FIG. 6 shows the steps of a method of distilling sensor data to a training set for use with the present invention;

[0030]
[0030]FIG. 7A is a diagram showing an example of a polynomial embodiment of a similarity operator according to the invention;

[0031]
[0031]FIG. 7B is a diagram showing an example of an elliptical embodiment of a similarity operator according to the invention;

[0032]
[0032]FIG. 7C is a diagram showing an example of a trigonometric embodiment of a similarity operator according to the invention;

[0033]
[0033]FIG. 8A is a diagram showing an example of the lensing effect of the similarity operator of the present invention;

[0034]
[0034]FIG. 8B is a diagram showing an example of an alternative approach to the use of the lensing effect of the similarity operator of the present invention;

[0035]
[0035]FIGS. 9A9D through 12A12D illustrate alternate embodiments showing extension of range and lensing functions in similarity operators in accordance with the invention;

[0036]
[0036]FIGS. 13A13B are flow diagrams showing preferred methods of generating a generalized lensing Similarity Operator; and

[0037]
[0037]FIG. 14 is yet another embodiment of the similarity operator of the present invention showing discontinuous lensing effects.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0038]
As depicted in the example of FIG. 1, the inventive system 100 in a preferred embodiment comprises a data acquisition module 102, an information processor 104, a memory 106 and an output module 108, which can be coupled to other software, to a display, to an alarm system, or any other system that can utilize the results, as may be known in the art. The processor 104 generally may include a Similarity Engine 110, an Estimated State Generator 112 and a Deviation Detection Engine 114.

[0039]
Memory 106 stores a plurality of selected timecorrelated snapshots of sensor values characterizing normal, optimal, desirable or acceptable operation of a monitored process or machine. This plurality of snapshots, distilled according to a selected “training” method as described below, comprises an empirical model of the process or machine being monitored. In operation, the inventive monitoring system 100 samples current snapshots of sensor data via acquisition module 102. For a given set of timecorrelated sensor data from the monitored process or machine running in realtime, the estimates for the sensors can be generated by the Estimated State Generator 112 according to:

{right arrow over (y)} _{estimated} ={overscore (D)}·{right arrow over (W)} (1)

[0040]
where D is a matrix comprised of the plurality of snapshots in memory
106 and W is a contribution weighting vector determined by Similarity Engine
110 and Estimated State Generator
112 using a similarity operator such as the inventive class of similarity operators of the present invention. The multiplication operation is the standard matrix/vector multiplication operator. W has as many elements as there are snapshots in D, and is determined by:
$\begin{array}{cc}\overrightarrow{W}=\frac{\hat{\underrightarrow{W}}}{\left(\sum _{j=1}^{N}\ue89e\text{\hspace{1em}}\ue89e\hat{W}\ue8a0\left(j\right)\right)}& \left(2\right)\\ \hat{\underrightarrow{W}}={\left({\stackrel{\_}{D}}^{T}\otimes \stackrel{\_}{D}\right)}^{1}\xb7\left({\stackrel{\_}{D}}^{T}\otimes {\overrightarrow{y}}_{i\ue89e\text{\hspace{1em}}\ue89en}\right)& \left(3\right)\end{array}$

[0041]
where the T superscript denotes transpose of the matrix, and Y(in) is the current snapshot of actual, realtime sensor data. The improved similarity operator of the present invention is symbolized in the equation above as
. Yin is the realtime or actual sensor values from the underlying system, and therefore it is a vector snapshot.

[0042]
The similarity operation typically returns a scalar value between 0 and 1 for each comparison of one vector or matrix row to another vector. It represents a numeric quantification of the overall similarity of two system states represented by two snapshots of the same sensors. A similarity value closer to 1 indicates sameness, whereas a similarity value closer to 0 typically indicates difference.

[0043]
Deviation detection engine 114 receives both the actual current snapshot of sensor values and the set of sensor value estimates from the estimated state generator 112, and compares the two. A variety of tests can be used, including the sequential probability ratio test (SPRT), or a CUSUM test, both of which are known in the art. Preferably, the set of actual sensor values and the set of estimated sensor values are differenced to provide residual values, one for each sensor. Applying the SPRT to a sequence of such residual values for a given sensor provides an advantageously early indication of any difference between the actual sensor values and what is expect under normal operation.

[0044]
Applying the SPRT to a sequence of such residual values for a given sensor provides an advantageously early indication of any difference between the actual sensor values and what is expected under normal operation.

[0045]
[0045]FIG. 2 graphically illustrates the prior art BART similarity operation wherein a right triangle 120 is formed having a monotonically linear base 122 bounded by the range for a given sensor in training data, the range minimum and maximum forming vertices 124, 126 at opposite ends of the base 122. The triangle 120 was formed preferably as a right triangle with the right angle located at height (h) above the median of the range data along the base 122. In this prior art method the height (h) was required to be chosen so that the apex angle is a right angle. Then, in performing a similarity operation on two values of the sensor, each value was plotted along the base between minimum 124 and maximum 126 according to its value, and lines 128 and 129 were drawn from the apex to each plotted point X_{0 }and X_{1}, forming an angle therebetween. The similarity of the two values was then computed as a function of the comparison of the formed angle θ to the right angle Ω of the apex.

[0046]
As can be seen from FIG. 3, which shows each of two different comparisons 130, 132, equally spaced pairs of values are compared in each instance for similarity by mapping the value pairs in the range for the sensor along the base 134. One of each of the pairs represents a sensor value from a training set vector and the other of the pair represents a sensor value from an input data vector. Each pair of values identifies a segment that, in combination with the apex, identifies a smaller triangle within the original right triangle. The angle in each of the smaller triangles 136, 138, that shares the apex and is a fraction of the right angle, provides a measure of similarity for the respective pair of values when scaled against the full ninety degrees (90°) of the right angle. This angle is zero degrees (0°) for an identical pair and 90° for a completely dissimilar pair at the extrema of the range stored in the training set.

[0047]
The inventors have found that the restrictions of the prior art analysis method, i.e. a right triangle based model with its apex at the right angle and disposed immediately above the median value on the base (hypotenuse) for the particular parameter, may be ignored to provide a more useful, flexible and all encompassing analysis tool. Further, the inventors have determined that the analysis model need not be triangular at all but merely defined by two partial rays of an angle extending to endpoints identified by either a system parameter minimum or maximum and connected therebetween by a curve that may be linear or nonlinear. The curve may be selected, for example, to highlight one region of operation while deemphasizing another or others as set forth herebelow.

[0048]
The most general form of the similarity operation of the invention is shown in FIG. 4. A range of data for a given parameter sensor across a training set is mapped to an arc length forming the curve 140 and being identified as a Similarity Domain. An apex location 142 may be chosen above the similarity domain curve 140, and an angle Ω is defined by connecting the apex with straight line segments 144 and 146 to the ends of the similarity domain 140. Alternately, an angle may be selected and an apex location 142 derived accordingly.

[0049]
According to one embodiment of the invention, the similarity domain (being the curve length) for a given sensor or parameter in a monitored system can be mapped by equating one end of the curve to the lowest value observed across the reference library or training set for that sensor, and equating the other end to the highest value observed across the training set for that sensor. The length between these extrema is scaled linearly (or in some other appropriate fashion, e.g., logarithmically where appropriate). According to another embodiment of the invention, expected lower and upper limits for a sensor can be chosen based on knowledge of the application domain, e.g., industrial, medical, etc., knowhow. According to yet another embodiment, the similarity domain can be mapped using the extrema of the original data set from which the reference library or training set is distilled. This can be advantageous if the training method does not necessarily include the highest and lowest sensor readings.

[0050]
The similarity of value pairs (“elemental similarity”) is found by mapping that pair of values X
_{0 }and X
_{1 }onto the Similarity Domain for that sensor. Connecting these two points from the similarity domain curve with lines
147 and
148 to the apex
142 defines a second angle θ. The similarity of the pair of values is then defined as equal to:
$\begin{array}{cc}S=1\frac{\theta}{\Omega}& \left(4\right)\end{array}$

[0051]
Thus, the similarity value S is closer to one for value pairs that are more similar, and S is closer to zero for value pairs that are less similar. The elemental similarities are calculated for each corresponding pairs of sensor values (elements) of the two snapshots being compared. Then, the elemental similarities are combined in some statistical fashion to generate a single similarity scalar value for the vectortovector comparison. Preferably, this overall similarity, S
_{snapshot}, of two snapshots is equal to the average of the number N (the element count) of elemental similarity values S
_{c}:
$\begin{array}{cc}{S}_{\mathrm{snapshot}}=\frac{\sum _{c=1}^{N}\ue89e\text{\hspace{1em}}\ue89e{S}_{c}}{N}& \left(5\right)\end{array}$

[0052]
It can be understood that the general result of the similarity operation of the present invention applied to two matrices (or a matrix D and a vector Y_{in}, as per equation 3 above) is a matrix (or vector) wherein the element of the i^{th }row and j^{th }column is determined from the i^{th }row of the first operand and the j^{th }column of the second operand. The resulting element (i,j) is a measure of the sameness of these two vectors. In the present invention, the i^{th }row of the first operand generally has elements corresponding to sensor values for a given temporally related state of the process or machine, and the same is true for the j^{th }column of the second operand. Effectively, the resulting array of similarity measurements represents the similarity of each state vector in one operand to each state vector in the other operand.

[0053]
By way of example, two vectors (the ith row and jth column) are compared for similarity according to equation 4 above on an elementbyelement basis. Only corresponding elements are compared, e.g., element (i,m) with element (m,j) but not element (i,m) with element (n,j). For each such comparison, the similarity is given by equation 4, with reference to a similarity operator construct as in FIG. 4. Hence, if the values are identical, the similarity is equal to one, and if the values are grossly unequal, the similarity approaches zero. When all the elemental similarities are computed, the overall similarity of the two vectors is equal to the average of the elemental similarities. A different statistical combination of the elemental similarities can also be used in place of averaging, e.g., median.

[0054]
The matrix D of reference snapshots stored in memory 106 characterizing acceptable operation of the monitored process or machine is composed using a method of training, that is, a method of distilling a larger set of data gathered from the sensors on the process or machine while it is running in known acceptable states. FIG. 5 graphically depicts such a method for distilling the collected sensor data to create a representative training data set (D matrix) for defining a Similarity Domain. In this simple example only five sensor signals 152, 154, 156, 158 and 160 are shown for the process or machine to be monitored. Although described herein generically as comparing system vectors, “system” is used for example only and not intended as a limitation. System is intended to include any system living or dead whether a machine, a process being carried out in a system or any other monitorable closed system.

[0055]
Continuing this example, the sample number or a time stamp of the collected sensor data is on the abscissa axis 162, where the data is digitally sampled and the sensor data is temporally correlated at each sample. The ordinate axis 164 represents the relative magnitude of each sensor reading over the samples or “snapshots.” In this example, each snapshot represents a vector of five elements, one reading for each sensor in that snapshot. Of all the sensor data collected (in all of the snapshots), according to this training method example, only those fiveelement snapshots are included in the representative training set that contain either a global minimum or a global maximum value for any given sensor. Therefore, the global maximum 166 for sensor signal 152 justifies inclusion of the five sensor values at the intersections of line 168 with each sensor signal 152, 154, 156, 158, 160, including global maximum 166, in the representative training set, as a vector of five elements. Similarly, the global minimum 170 for sensor signal 152 justifies inclusion of the five sensor values at the intersections of line 172 with each sensor signal 152, 154, 156, 158, 160. So, collections of such snapshots represent states the system has taken on and, that are expected to reoccur. The precollected sensor data is filtered to produce a “training” subset that reflects all states that the system takes on while operating “normally” or “acceptably” or “preferably.” This training set forms a matrix, having as many rows as there are sensors of interest, and as many columns (snapshots) as necessary to capture all the acceptable states without redundancy.

[0056]
Turning to FIG. 6, the training method of FIG. 5 is shown in a flowchart. Data so collected in step 180 from N sensors at L observations or snapshots or from temporally related sets of sensor parameter data, form an array X of N rows and L columns. In step 182, an element number counter (i) is initialized to zero, and an observation or snapshot counter (t) is initialized to one. Two arrays, “max” and “min,” for containing maximum and minimum values respectively across the collected data for each sensor, are initialized to be vectors each of N elements which are set equal to the first column of X. Two additional arrays, Tmax and Tmin, for holding the observation number of the maximum and minimum value seen in the collected data for each sensor, are initialized to be vectors each of N elements, all zero.

[0057]
In step 184, if the value of sensor number i at snapshot number t in X is greater than the maximum yet seen for that sensor in the collected data, max(i) is updated to equal the sensor value and Tmax(i) stores the number t of the observation in step 186. If not, a similar test is done for the minimum for that sensor in steps 188 and 190. The observation counter is incremented in step 192. In step 194, if all the observations have been reviewed for a given sensor (i.e., t=L), then t is reset to zero and i is incremented (in preparation for finding the maximum and minimum for the next sensor) in step 196. If the limits have been found for the last sensor (i.e., i=N), step 198, then redundancies are removed (i.e., eliminate multiple occurrences of snapshots that have been selected for two or more parameters) and an array D is created from the resulting subset of snapshot vectors from X.

[0058]
So, in step 200, counters i an j are initialized to one. In step 202, arrays Tmax and Tmin are concatenated to form a single vector Ttmp having 2N elements. These array elements are sorted into ascending (or descending) order in step 204 to form array T. In step 206, holder tmp is set to the first value in T (an observation number that contains a sensor minimum or maximum). The first column of D is set equal to the column of X corresponding to the observation number that is the first element of T. In the loop starting with decision step 208, the ith element of T is compared to the value of tmp that contains the previous element of T. If the two adjacent values of T are equal indicating that the corresponding observation vector is a minimum or maximum for more than one sensor, then, it has already been included in D and need not be included again. Counter i is incremented in step 210. If the two adjacent values are not equal, D is updated to include the column from X that corresponds to the observation number of T(i) in step 212, and tmp is updated with the value at T(i). The counter (j) is then incremented in step 214. In step 216, if all the elements of T have been checked, then the distillation into training set D has finished in step 218 and D is stored in memory 106.

[0059]
The training set as selected according to the above method may additionally be augmented using a number of techniques. For example, once the snapshots selected according to the above MinMax method are determined, the remaining original set of data may be selected from and added to the training set at regular time stamp intervals. Yet another way of adding more snapshots to the MinMax training set involves randomly selecting a remaining number of snapshots from the original set of data.

[0060]
Once the D matrix has been determined, in a training and implementation phase, the preferred similarity engine 110 is turned on with the underlying system being monitored, and through time, actual snapshots of real sensor values are input to the Similarity Engine 110 from Data Acquisition Unit 102. The output of the results from Similarity Engine 110 can be similarity values, expected values, or the “residual” values (being the difference between the actual and expected values).

[0061]
One of these output types is selected and passed to the deviation detection engine 114 of FIG. 1, which then determines through a series of such snapshots, whether a statistically significant change has occurred as set forth hereinbelow. In other words, the statistical significance engine effectively determines if those real values represent a significant change from the “acceptable” states stored in the D matrix. Thus, a vector (Y) is generated in Estimated State Generator 112 of expected sensor values from contributions by each of the snapshots in D, which contributions are determined by a weight vector W. W has as many elements as there are snapshots in D and W is determined according to equations 2 and 3 above.

[0062]
The deviation detection engine
114 can implement a comparison of the residuals to selected thresholds to determine when an alert should be output of a deviation in the monitored process or machine from recognized states stored in the reference library. Alternatively, a statistical test, preferably the sequential probability ratio test (SPRT) can be used to determine when a deviation has occurred. The basic approach of the SPRT technique is to analyze successive observations of a sampled parameter. A sequence of sampled differences between the generated expected value and the actual value for a monitored sensor signal should be distributed according to some kind of distribution function around a mean of zero. Typically, this will be a Gaussian distribution, but it may be a different distribution, as for example a binomial distribution for a parameter that takes on only two discrete values (this can be common in telecommunications and networking machines and processes). Then, with each observation, a test statistic is calculated and compared to one or more decision limits or thresholds. The SPRT test statistic generally is the likelihood ratio In, which is the ratio of the probability that a hypothesis H
_{1 }is true to the probability that a hypothesis H
_{0 }is true:
$\begin{array}{cc}{l}_{n}=\frac{\left({y}_{1},{y}_{2},\dots \ue89e\text{\hspace{1em}},{y}_{n}{H}_{1}\right)}{\left({y}_{1},{y}_{2},\dots \ue89e\text{\hspace{1em}},{y}_{n}{H}_{0}\right)}& \left(6\right)\end{array}$

[0063]
where Y_{n }are the individual observations and H_{n }are the probability distributions for those hypotheses. This general SPRT test ratio can be compared to a decision threshold to reach a decision with any observation. For example, if the outcome is greater than 0.80, then decide H_{1 }is the case, if less than 0.20 then decide H_{0 }is the case, and if in between then make no decision.

[0064]
The SPRT test can be applied to various statistical measures of the respective distributions. Thus, for a Gaussian distribution, a first SPRT test can be applied to the mean and a second SPRT test can be applied to the variance. For example, there can be a positive mean test and a negative mean test for data such as residuals that should distribute around zero. The positive mean test involves the ratio of the likelihood that a sequence of values belongs to a distribution H_{0 }around zero, versus belonging to a distribution H_{1 }around a positive value, typically the one standard deviation above zero. The negative mean test is similar, except H_{1 }is around zero minus one standard deviation. Furthermore, the variance SPRT test can be to test whether the sequence of values belongs to a first distribution H_{0 }having a known variance, or a second distribution H_{2 }having a variance equal to a multiple of the known variance.

[0065]
For residuals derived for sensor signals from the monitored process or machine behaving as expected, the mean is zero, and the variance can be determined. Then in runtime monitoring mode, for the mean SPRT test, the likelihood that H
_{0 }is true (mean is zero and variance is σ
^{2}) is given by:
$\begin{array}{cc}L\ue8a0\left({y}_{1},{y}_{2},\dots \ue89e\text{\hspace{1em}},{y}_{n}{H}_{0}\right)=\frac{1}{{\left(2\ue89e\pi \ue89e\text{\hspace{1em}}\ue89e\sigma \right)}^{n/2}}\ue89e{\uf74d}^{\left[\frac{1}{2\ue89e{\sigma}^{2}}\ue89e\sum _{k=1}^{n}\ue89e\text{\hspace{1em}}\ue89e{y}_{k}^{2}\right]}& \left(7\right)\end{array}$

[0066]
and similarly, for H
_{1}, where the mean is M (typically one standard deviation below or above zero, using the variance determined for the residuals from normal operation) and the variance is again σ
^{2 }(variance is assumed the same):
$\begin{array}{cc}L\ue8a0\left({y}_{1},{y}_{2},\dots \ue89e\text{\hspace{1em}},{y}_{n}{H}_{1}\right)=\frac{1}{{\left(2\ue89e\pi \ue89e\text{\hspace{1em}}\ue89e\sigma \right)}^{n/2}}\ue89e{\uf74d}^{\left[\frac{1}{2\ue89e{\sigma}^{2}}\ue89e\left(\sum _{k=1}^{n}\ue89e\text{\hspace{1em}}\ue89e{y}_{k}^{2}2\ue89e\text{\hspace{1em}}\ue89e\sum _{k=1}^{n}\ue89e{y}_{k}\ue89eM+\sum _{k=1}^{n}\ue89e{M}^{2}\right)\right]}& \left(8\right)\end{array}$

[0067]
The ratio l
_{n }from equations 7 and 8 then becomes:
$\begin{array}{cc}{l}_{n}={\uf74d}^{\left[\frac{1}{2\ue89e{\sigma}^{2}}\ue89e\sum _{k=1}^{n}\ue89eM\ue8a0\left(M2\ue89e{y}_{k}\right)\right]}& \left(9\right)\end{array}$

[0068]
A SPRT statistic can be defined for the mean test to be the exponent in equation 9:
$\begin{array}{cc}{\mathrm{SPRT}}_{\mathrm{mean}}=\frac{1}{2\ue89e\text{\hspace{1em}}\ue89e{\sigma}^{2}}\ue89e\sum _{k=1}^{n}\ue89e\text{\hspace{1em}}\ue89eM\ue8a0\left(M2\ue89e{y}_{k}\right)& \left(10\right)\end{array}$

[0069]
The SPRT test is advantageous because a userselectable false alarm probability α and a missed alarm probability β can provide thresholds against with SPRT_{mean }can be tested to produce a decision:

[0070]
1. If SPRT_{mean}≦ln(β/(1−α)), then accept hypothesis H_{0 }as true;

[0071]
2. If SPRT_{mean}≧ln((1−β)/α), then accept hypothesis H1 as true; and

[0072]
3. If ln(β/(1−α))<SPRT_{mean}<ln((1−β)/α), then make no decision and continue sampling.

[0073]
For the variance SPRT test, the problem is to decide between two hypotheses: H
_{2 }where the residual forms a Gaussian probability density function with a mean of zero and a variance of Vσ
^{2}; and H
_{0 }where the residual forms a Gaussian probability density function with a mean of zero and a variance of σ
^{2}. The likelihood that H
_{2 }is true is given by:
$\begin{array}{cc}L\ue8a0\left({y}_{1},{y}_{2},\dots \ue89e\text{\hspace{1em}},{y}_{n}{H}_{2}\right)=\frac{1}{{\left(2\ue89e\pi \ue89e\text{\hspace{1em}}\ue89e{V}^{1/2}\ue89e\text{\hspace{1em}}\ue89e\sigma \right)}^{n/2}}\ue89e{\uf74d}^{\left[\frac{1}{2\ue89eV\ue89e\text{\hspace{1em}}\ue89e{\sigma}^{2}}\ue89e\sum _{k=1}^{n}\ue89e\text{\hspace{1em}}\ue89e{y}_{k}^{2}\right]}& \left(11\right)\end{array}$

[0074]
The ratio l
_{n }is then provided for the variance SPRT test as the ratio of equation 11 over equation 7, to provide:
$\begin{array}{cc}{l}_{n}={V}^{1/2}\ue89e{\uf74d}^{\left[\frac{1}{2\ue89e{\sigma}^{2}}\ue89e\sum _{k=1}^{n}\ue89e\text{\hspace{1em}}\ue89e{y}_{k}^{2}\ue8a0\left(\frac{1V}{V}\right)\right]}& \left(12\right)\end{array}$

[0075]
and the SPRT statistic for the variance test is then:
$\begin{array}{cc}{\mathrm{SPRT}}_{\mathrm{variance}}=\frac{1}{2\ue89e{\sigma}^{2}}\ue89e\left(\frac{V1}{V}\right)\ue89e\sum _{k=1}^{n}\ue89e{y}_{k}^{2}\frac{\mathrm{ln}\ue89e\text{\hspace{1em}}\ue89eV}{2}& \left(13\right)\end{array}$

[0076]
Thereafter, the above tests (1) through (3) can be applied as above:

[0077]
1. If SPRT_{variance}≦ln(β/(1−α)), then accept hypothesis H_{0 }as true;

[0078]
2. If SPRT_{variance}≧ln((1−β)/α), then accept hypothesis H_{2 }as true; and

[0079]
3. If ln(β/(1−α))<SPRT_{variance}<ln((1−β)/α), then make no decision and continue sampling.

[0080]
Each snapshot of residuals (one residual “signal” per sensor) that is passed to the SPRT test module, can have SPRT test decisions for positive mean, negative mean, and variance for each parameter in the snapshot. In an empirical modelbased monitoring system according to the present invention, any such SPRT test on any such parameter that results in a hypothesis other than H_{0 }being accepted as true is effectively an alert on that parameter. Of course, it lies within the scope of the invention for logic to be inserted between the SPRT tests and the output alerts, such that a combination of a nonH_{0 }result is required for both the mean and variance SPRT tests in order for the alert to be generated for the parameter, or some other such rule.

[0081]
The output of the deviation detection engine 114 will represent a decision for each sensor signal input, as to whether the estimate is different or the same. These decisions, in turn, can be used to diagnose the state of the process or equipment being monitored. The occurrence of some difference decisions in conjunction with other sameness decisions can be used as an indicator of likely future machine health or process states. The SPRT decisions can be used to index into a diagnostic lookup database, automatically diagnosing the condition of the process or equipment being monitored.

[0082]
Generally, any statistical hypothesis test as known by those skilled in the statistical arts can be substituted for the abovedescribed application of SPRT. In addition, decisioning methods known in the art such as fuzzy logic sets and neural networks can be used to render a decision with regard to the sameness or difference of the estimates and the actual values.

[0083]
In contrast to the restrictions imposed on the abovedescribed BART technique, the location of the apex and the shape and length of the curve forming the similarity domain of the preferred embodiment can be selected to adjust sensitivity to similarity of two values differently for different parts of the Similarity Domain. In so doing, regions of interest for particular sensors can be lensed to enhance sensitivity to similarity, flexibility not available in prior techniques. Mathematical methods for computing the angles Ω and θ are known in the art, and can include numerical techniques for approximating the angles.

[0084]
FIGS. 7AC show examples of particular forms of the similarity operator of the invention in which lensing is applied to the Similarity Domain. The example of FIG. 7A shows a Similarity Domain defined by a polynomial curve 220, in this example a function based on a polynomial including terms a fourth power, a third power, and a square. FIG. 7B shows yet another example of a particular form of the similarity operator of the invention in which the Similarity Domain is defined by an elliptical arc 222. In this example the elliptical arc 222 forms a convex similarity domain from the perspective of the apex and line segments forming angle Ω. It is also within the scope of the invention to use the concave elliptical arc. An example of a trigonometric Similarity Domain shown in FIG. 7C wherein the Similarity Domain curve 224 is defined by a function of the sum of a sine and a cosine and wherein the amplitude of the sine is twice that of the cosine.

[0085]
[0085]FIG. 8A shows an example wherein the lensing effect of the similarity operator according to the present invention is enhanced for visible understanding. Although the Similarity Domain distance between value pairs at arcs 230, 232 are of equal arc length, they are mapped to different areas of the similarity domain 234. Thus, these arcs 230, 232 represent two separate pairs of values being compared for similarity with quite different results. Even though the scalar difference between the values in the two pairs is equal, one pair at arc 230 falls toward a part of the range in the training set (a part of the similarity domain 234) that yields a very narrow angle 236, whereas the other pair at arc 232 falls in a part of the similarity domain 234 that yields a much wider angle 238. The pair at arc 232 with the wider angle 238 will thus have a similarity value lower than the pair at arc 230 with the narrower angle 236, even though both pairs are separated by arcs 230, 232 having the same scalar distance.

[0086]
Turning to FIG. 8B, an alternative approach to the similarity operator of the present invention is shown. Similarity domain 234 is now mapped to from the straight baseline 802, which provides the linear scale from an expected overall minimum 804 to an expected overall maximum 806 for the sensor, on which to map the sensor value differences 230 and 232 (which are equal differences, but at different parts of the expected range). Mapping sensor value differences 230 and 232 to the similarity domain 234 provides angles 810 and 812. The angles 810 and 812 can be seen to be different, even though the length of the sensor value difference (either 230 or 232) is equal, hence providing the advantageous lensing effect. An angle 810 or 812 is compared to the overall angle Q to provide a measure of similarity as per the equations above for two sensor values that have a difference of 230 or 232 respectively.

[0087]
This alternative approach is further understood with reference to FIGS. 9A9D through 12A12D, which show examples of four additional alternate embodiments with lensing functions being defined according to sinusoidal and polynomial functions for use with the similarity operators. In particular, FIG. 9A shows a cosine function 240 as the lensing function extending the range for Ω beyond 90° and showing equal length sensor value differences 903, 905, 907, and 909 positioned over the cosine lensing function range. Each length 903, 905, 907 and 909 represents a same sensor value difference, but located in a different part of the expected range for the sensors being compared. Each forms a different angle θ with respect to lines drawn to the vertex 244, such as lines 913 and 915. This angle is then compared to the angle Ω shown therein to provide a measure of similarity, is generally defined by the edges of the mapped range, from a minimum expected range value to a maximum expected range value, and in this case was 90°. It can also be seen that the inventive similarity operation can accommodate data points outside the edges of the expected minimums and maximums. FIG. 9B shows the corresponding similarity values generated by smoothly moving the equal length sensor value difference (same as 903, etc., with a length of 0.2) across the entire range. FIG. 9C provides a threedimensional surface 242 illustrating a range of similarity values for the cosine lensing function 240 for a vertex 244 located at varying heights above the similarity domain, to demonstrate the effect on the similarity curve of FIG. 9B of the vertex height. Generally, an increase in the height of the vertex 244 above the similarity domain 240 flattens out the lensing effect of the curve and drives similarity values higher. FIG. 9B illustrates a slice in surface 242 at a vertex height of 3. FIG. 9D illustrates how changing the expected range angle Ω (in this example from 90° through 180°) results in changing similarity values.

[0088]
[0088]FIG. 10A is an example wherein x^{3 }is applied as a lensing function to form curve 250 with vertex 252 selected thereabove. FIG. 10B shows the effect of the lensing functions curve 250 on similarity values, which corresponds to vertex height −1.2 on surface 254 of FIG. 10C. Thus, the similarity values are plotted in FIG. 10B for the x^{3 }lensing function, illustrating a segment at approximately −1.2 as showing a similarity value of 1. This is further illustrated in the threedimensional surface plot of FIG. 10C which corresponds to the knee of the x^{3 }lensing function and generates a similarity value of 1 for points mapped from the apex to points on the polynomial curve that generate 2=0. The surface 254 of FIG. 10C illustrates the effect of vertex 252 height on similarity values. FIG. 10D illustrates the incremental effect of increasing Ω above 90° to 180°.

[0089]
[0089]FIGS. 11A and 12A illustrate analogous curves 260, 270 formed using polynomial lensing functions of x^{2 }and x^{4}, respectively. FIGS. 11B11C and 12B12C illustrate the similarity value and the effect of a variation in vertex height corresponding to FIGS. 10B10C. FIGS. 11D and 12D correspondingly illustrate variations in the Q range above 90° to 180°.

[0090]
Essentially, the similarity values are magnified, or lensed, when a pair of values falls along the similarity domain at a point where it is more orthogonal to the angle rays extending from the apex. The similarity values are diminished where the pair of values falls along the similarity domain at a point where it is more parallel to the rays from the apex. As can be seen, the lensing effect is further increased inversely with apex height, and distance of a portion of the similarity domain curve from the apex or vertex. According to the invention, different similarity curves can be empirically tested to determine which works best for a given sensor. The curve shapes can be numerical approximations (such as a lookup table of values) rather than equations for the curves. Thus, a similarity domain curve can be qualitatively generated by selecting various subranges of the expected range for a sensor to be more or less lensed. This can be done with the use of a smooth curve with the use of a spline technique to join curve segments together to provide the necessary lensing. Alternatively, turning to FIG. 14, the invention may also be accomplished with a discontinuous similarity domain line 405, such that a discontinuities 407 and 408 at the edges of a section 410 provide for a discrete jump in the distance from the vertex 415, and thus a discrete change in the angle, since a given arc length along domain line 405 will generate a smaller angle at a greater distance from the vertex 415.

[0091]
[0091]FIG. 13A is a flow diagram of a first preferred embodiment 300 for generating a lensing operator according to the present invention. First, in step 302 sensor data is collected as described hereinabove. Then in step 304 minimum and maximum vectors are identified for each parameter such as for example as is done in FIG. 6. Coincidentally, in step 306 a lensing function may be selected. Then, in step 308 using the min/max values provided in step 304 a Similarity Domain surface is generated based on the lensing function selected in step 306. Typically, the lensing surface is generated by identifying an origin with respect to the min and max values and then, generating curves to define the surface based on the origin and min/max values, each of the curves being generated with reference to a selected apex height. Then, any well known smoothing function may be applied to the curves to generate the surface. In step 310 the surface is stored for subsequent system monitoring which begins in step 312. For system monitoring, in step 314, an apex height is selected interactively. So, finally, in step 316 the Similarity Operator is generated from the apex height and throughout monitoring, different apex heights may be selected to vary the lensing and to vary the view provided to an operator monitoring system operation.

[0092]
[0092]FIG. 13B shows an alternate embodiment 320 wherein instead of varying apex height, viewing angle is varied. All steps except step 322 are identical to those at FIG. 13A and so, are labeled identically. Thus, in step 322 the operator is allowed to select different viewing angles and in step 316 the view of system operation is provided based on that selected viewing angle. In both embodiments, snapshots are taken of the monitored system and compared against training set vectors using the selected lensing Similarity Operator to provide enhanced system modeling and to facilitate better understanding of the system's current operating state.

[0093]
Thus, the advantage afforded by lensing is that focus can be directed to different regions of interest in a particular range for a given sensor, when performing a similarity determination between a current state vector and a prior known expected state vector. Using this similarity determination an estimated state vector can be computed for a realtime system that is being monitored and modeled using MSET or the like. The model performance can be honed for improved model estimates using the improved class of similarity operators of the present invention.

[0094]
Further, the similarity operation of the present invention is rendered particularly nonlinear and adaptive. The present invention can be used in system state classification, system state alarm notification, system virtual parameter generation, system component end of life determination and other techniques where an empirical model is useful. The present invention overcomes the above restrictions of the prior art methods by providing more flexibility to tweak and improve modeling fidelity.

[0095]
It should be appreciated that a wide range of changes and modifications may be made to the embodiments of the invention as described herein. Thus, it is intended that the foregoing detailed description be regarded as illustrative rather than limiting and that the following claims, including all equivalents, are intended to define the scope of the invention.