WO2017216647A1 - Health monitoring and prognostics of a system - Google Patents

Health monitoring and prognostics of a system Download PDF

Info

Publication number
WO2017216647A1
WO2017216647A1 PCT/IB2017/051621 IB2017051621W WO2017216647A1 WO 2017216647 A1 WO2017216647 A1 WO 2017216647A1 IB 2017051621 W IB2017051621 W IB 2017051621W WO 2017216647 A1 WO2017216647 A1 WO 2017216647A1
Authority
WO
WIPO (PCT)
Prior art keywords
instance
time series
test
train
reconstruction error
Prior art date
Application number
PCT/IB2017/051621
Other languages
French (fr)
Inventor
Pankaj Malhotra
Vishnu TV
Anusha RAMAKRISHNAN
Gaurangi ANAND
Lovekesh Vig
Puneet Agarwal
Gautam Shroff
Original Assignee
Tata Consultancy Services Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Limited filed Critical Tata Consultancy Services Limited
Publication of WO2017216647A1 publication Critical patent/WO2017216647A1/en

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0243Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • G06F2218/16Classification; Matching by matching signal segments
    • G06F2218/20Classification; Matching by matching signal segments by applying autoregressive analysis

Definitions

  • the embodiments herein generally relate to health monitoring and prognostics of a system, and, more particularly, sequence to sequence mapper based systems and methods for health monitoring and prognostics of a system.
  • Time-based maintenance of complex machines or systems leads to high maintenance costs and high downtime if machine breaks down before the scheduled maintenance date.
  • Prediction models have been used to learn models of normal behavior and then prediction errors are used to measure the health of a machine at any given time. These models assume that time series data is predictable which may not hold true for real-world applications with manual controls and unmonitored environmental conditions or loads leading to inherently unpredictable time series data.
  • Most models are also unable to capture complex non-linear dependencies between sensors and long term temporal correlations.
  • Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
  • Systems and methods of the present disclosure provide a health index that summarizes the health of a monitored system at any point in time.
  • the health index may be used for detecting anomalous behavior in a system, health monitoring, and prognostics for condition based maintenance.
  • the method of the present disclosure comprises learning an unsupervised model of normal or healthy behavior using multivariate time series data.
  • the model for healthy behavior is then applied to unseen multivariate time series data to predict the health of the monitored system.
  • the system of the present disclosure does not rely on domain knowledge to estimate the health of the monitored machine.
  • a sequence to sequence mapper is employed to capture long term temporal correlations as well as complex non-linear dependencies between multiple sensors.
  • the model does not assume that the time series data is predictable.
  • the method works well for predictable as well as unpredictable time series data.
  • RUL Remaining Useful Life
  • a processor implemented method comprising: receiving, by one or more hardware processors, a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstructing from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generating, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.
  • a system comprising: one or more data storage devices operatively coupled to the one or more processors and configured to store instructions configured for execution by the one or more processors to: receive a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstruct from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generate, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.
  • a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstruct from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generate, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.
  • the one or more hardware processors are further configured to learn a sequence mapper based model by: receiving a train set of time series data pertaining to one or more sensors co-operating with at least one train instance of the monitored system; identifying a healthy train set of time series data from the train set of time series data, the healthy train set pertaining to the healthy behavior of the at least one train instance; and training the sequence to sequence mapper based model to reconstruct the time series data in the healthy train set and generate the learnt sequence to sequence mapper based model.
  • the train set and the test set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters.
  • the one or more hardware processors are further configured to estimate a health state of the at least one test instance based on the degree of anomaly computed by: obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series from the test set; and comparing the anomaly score with a pre-defined threshold thereof to classify anomalous or normal subsequences in the test set.
  • the one or more hardware processors are further configured to generate health behavior trend for the at least one test instance based on the reconstruction error.
  • the one or more hardware processors are further configured to estimate Remaining Useful Life (RUL) of the at least one test instance based on one of: (i) obtaining health index (HI) based on the reconstruction error; training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curve(s) of the at least one train instance to estimate the RUL; (ii) generating reconstruction error curves pertaining to the at least one test instance and comparing with reconstruction error curves in a repository of reconstruction error curves of the at least one train instance to estimate the RUL; and (iii) generating the HI curve of the at least one test instance based on the reconstruction error curves thereof; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL.
  • RUL Remaining Useful Life
  • FIG.l illustrates an exemplary block diagram of a system for health monitoring and prognostics of a monitored system, in accordance with an embodiment of the present disclosure
  • FIG.2 illustrates an exemplary flow diagram of a method for health monitoring and prognostics of a monitored system, in accordance with an embodiment of the present disclosure
  • FIG.3 illustrates Long Short Term Memory based Encoder Decoder (LSTM- ED) inference steps for input ⁇ zi, z 2 , z 3 ⁇ to predict z 2 ' , z' 3 ⁇ , as known in the art;
  • LSTM- ED Long Short Term Memory based Encoder Decoder
  • FIG.4 illustrates an exemplary flow diagram for estimating Remaining Useful Life (RUL) using unsupervised HI based on LSTM-ED, in accordance with an embodiment of the present disclosure
  • FIG.5 illustrates an exemplary RUL estimation, in accordance with the present disclosure, using HI curve matching taken from a Turbofan engine dataset, wherein HI curve for a test instance is matched with HI curve for a train instance;
  • FIG.6 illustrates a graphical illustration of reconstruction error versus fraction of total life passed, obtained from an LSTM-ED model, in accordance with an embodiment of the present disclosure
  • FIG.7A through FIG.7D illustrate histograms of prediction errors for Turbofan Engine dataset from LSTM-ED, LR-Exp, LR-EDi and LR-ED 2 models respectively, in accordance with an embodiment of the present disclosure
  • FIG.8A illustrates actual RUL as compared with RUL estimates given by LR- Exp, LR-EDi and LR-ED 2 models for Turbofan engine dataset, in accordance with an embodiment of the present disclosure
  • FIG.8B illustrates standard deviation, maximum-minimum and absolute error of the RULs considered for estimating the final RUL w.r.t HI at last cycle for Turbofan engine dataset, in accordance with an embodiment of the present disclosure
  • FIG.9A and FIG.9E illustrate reconstruction errors pertaining to material 1 and material 2 respectively and FIG.9B through FIG.9D and FIG.9F through FIG.9H illustrate histograms of prediction errors, pertaining to material 1 and material 2 respectively, w.r.t cycles passed, for an exemplary milling machine dataset, in accordance with an embodiment of the present disclosure;
  • FIG.10A and FIG.10B illustrate RUL predictions at each cycle after interpolation for material -1 and material-2 respectively, for milling machine dataset, in accordance with an embodiment of the present disclosure
  • FIG.11 illustrates pointwise reconstruction errors for last 30 days before maintenance for pulverizer mill dataset, in accordance with an embodiment of the present disclosure.
  • FIG.12A1 through FIG.12E1 and FIG.12A2 through FIG.12E2 illustrate normal and anomalous sequences respectively pertaining to power demand, space shuttle valve, electrocardiogram (ECG) and engine datasets respectively, in accordance with an embodiment of the present disclosure.
  • ECG electrocardiogram
  • health degradation curve may not necessarily follow a fixed shape
  • time to reach same level of degradation by machines of same specifications is often different
  • each instance has a slightly different initial health or wear
  • v) sensor data till end-of-life is not easily available because in practice, periodic maintenance is performed.
  • HI health index
  • mathematical models of the underlying physical system, fault propagation models and conventional reliability models have also been used for RUL estimation.
  • the present disclosure provides an unsupervised technique to obtain health index (HI) for a monitored system using multi-sensor time series data, which does not make any assumption on the shape of the degradation curve.
  • a sequence to sequence mapper based model such as Long Short Term Memory based Encoder-Decoder (LSTM-ED) is used to learn a model of normal behavior of a monitored system, which is trained to reconstruct multivariate time series data corresponding to normal behavior. Reconstruction error at a point in a time series data is then used to compute HI at that point.
  • LSTM-ED based HI learnt in an unsupervised manner is able to capture degradation in a monitored system; the HI decreases as the system degrades.
  • LSTM-ED based HI can be used to learn a model for RUL estimation instead of relying on domain knowledge, or exponential/linear degradation assumption, while achieving comparable performance.
  • time series data used in the context of the present disclosure refers to either univariate or multivariate time series data pertaining to one or more sensors respectively.
  • FIGS. 1 through 12 where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and method.
  • FIG.l illustrates an exemplary block diagram of a system 100 for health monitoring and prognostics of a monitored system 200 in accordance with an embodiment of the present disclosure.
  • the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104.
  • the one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
  • the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory.
  • the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
  • the I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite.
  • the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
  • the memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM)
  • non-volatile memory such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • ROM read only memory
  • erasable programmable ROM erasable programmable ROM
  • FIG.2 illustrates an exemplary flow diagram of a method 300 for health monitoring and prognostics of the monitored system 200 in accordance with an embodiment of the present disclosure.
  • the system 100 comprises one or more data storage devices or memory 102 operatively coupled to the one or more processors 104 and is configured to store instructions configured for execution of steps of the method 200 by the one or more processors 104.
  • step 302 a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of the monitored system 200 is received.
  • one or more time series data, from the test set received at step 302 and pertaining to the at least one test instance is reconstructed by a sequence to sequence mapper based model.
  • a sequence to sequence mapper based model can be a Long Short Term Memory based Encoder-Decoder (LSTM-ED).
  • LSTM units in an LSTM-ED model are recurrent units that use current input time series data zt, hidden state activations a t - l, and memory cell activations Ct-i to compute hidden state activations a t at time t.
  • An LSTM unit uses a combination of a memory cell c and three types of gates: input gate i, forget gate/, and output gate o to decide if the input needs to be remembered (using input gate), when previous memory needs to be retained (using forget gate), and when memory content needs to be output (using output gate).
  • the values for input gate i, forget gate/ output gate o, hidden state a, and cell activation c for n LSTM units at time t are computed using the current input Zt, the previous hidden state a t -i, and memory cell value Ct-i as given by Equations 1-4 herein below.
  • i t o( V x z t + W 2 a t _ 1 + bi) — -> (1)
  • the operations ⁇ and tanh are applied elementwise.
  • z t G RP and all the other parameters h, ft, Ot, gt, at, Ct G R N , p refers to number of input units (or number of sensors), n is the number of hidden LSTM units, and
  • the encoder and decoder are jointly trained to reconstruct the time series data in reverse order, i.e. the target time series data is [z t Zi_ x . . . z-J.
  • the learnt sequence to sequence mapper based model is obtained by receiving a train set of time series data pertaining to one or more sensors cooperating with at least one train instance of the monitored system. A healthy train set of time series data from the train set of time series data that pertains to the healthy behavior of the at least one train instance is identified. The sequence to sequence mapper based model is then trained to reconstruct the healthy train set and generate the learnt sequence to sequence mapper based model.
  • the value z t at time instance t and the hidden state of the encoder at time t-l are used to obtain the hidden state aj. of the encoder at time t.
  • the decoder uses z t as input to obtain the state to target z t -i -
  • the predicted value z' t is input to the decoder to obtain and predict z' t _ x .
  • a reconstruction error is generated at each time instance of the one or more time series data from the test set.
  • the reconstruction error e t for a point z t is given by:
  • s N is the set of normal training subsequences of length I each.
  • the first few operational cycles can be assumed to correspond to healthy state for any instance.
  • test set and the train set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters.
  • the dimensionality reduction technique is Principle Component Analysis (PC A).
  • the reconstruction error generated in step 306 may be used for various applications.
  • health state of the at least one test instance may be estimated.
  • a degree of anomaly is computed by obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series data from the test set; and comparing the anomaly score with a pre-defined threshold ⁇ thereof to classify anomalous or normal subsequences in the test set.
  • a healthy time series data is divided into four sets of time series data: SN, VNI , VN2, and tN, and the anomalous time series data into two sets VA and tA.
  • the set of sequences SN is used to learn the LSTM encoder-decoder reconstruction model.
  • the set VNI is used for early stopping based regularization while training the encoder-decoder model.
  • the reconstruction error generated in step 306 may be used to generate health behavior trend for the at least one test instance. If the reconstruction error shows an increasing trend, the health of the monitored system 200 may be deemed to be deteriorating.
  • the reconstruction error maybe used to estimate Remaining Useful Life (RUL) of the at least one test instance.
  • health index (HI) is obtained by training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance and comparing with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL.
  • HI curve is the complete sequence of HI values over all time instances. For instance, if length of time- series is 10, then each of the 10 time instances would be associated with an HI value. The 10 HI values form a HI curve.
  • H (u) [ h M h (u) h (u) ⁇ represent me HI curve for instance u, where each point /i
  • ⁇ E R p , ⁇ 0 E R which computes HI from the derived sensor readings G R p at time t for instance u.
  • the parameters ⁇ and ⁇ 0 are estimated using Ordinary Least Squares methods.
  • the starting and ending ⁇ ' fraction of cycles are assigned values of 1 and 0, respectively.
  • Another possible assumption is: assume target HI values of 1 and 0 for data corresponding to healthy condition and failure conditions, respectively. Unlike the exponential HI curve which uses the entire time series of sensor readings, the sensor readings corresponding to only these points are used to learn the regression model.
  • the estimates ⁇ and ⁇ 0 based on the target HI curves for train instances are used to obtain the final HI curves for all the train instances and a new test instance for which RUL is to be estimated.
  • the HI curves thus obtained are used to estimate the RUL for the test instance based on similarity of train and test HI curves.
  • reconstruction error curves pertaining to the at least one test instance are generated and compared with reconstruction error curves pertaining to the at least one train instance to estimate the RUL.
  • each point in the original time series data for a train instance is predicted as many times as the number of subsequences it is part of ( / times for each point except for points z t with t ⁇ I or t > L-l which are predicted fewer number of times).
  • An average of all the predictions for a point is taken to be final prediction for that point.
  • the difference in actual and predicted values for a point is used as an un-normalized HI for that point.
  • the target HI values thus obtained for all train instances are used to obtain the estimates ⁇ and ⁇ 0 . Apart from is also considered to obtain the target HI values such that large reconstruction errors imply much smaller HI value.
  • the HI curve of the at least one test instance is generated and compared with the HI curves in a repository of the HI curves of the at least one train instance to estimate the RUL.
  • FIG.4 illustrates an exemplary flow diagram for estimating Remaining Useful Life (RUL) using unsupervised HI based on LSTM-ED, in accordance with an embodiment of the present disclosure.
  • the HI curve (online HI curve) for a test instance u is compared to the HI curves (offline HI curves) of all the train instances u G U.
  • the test instance and train instance may take different number of cycles to reach the same degradation level (HI value).
  • FIG.5 illustrates an exemplary RUL estimation, in accordance with the present disclosure, using HI curve matching taken from a Turbofan engine dataset, wherein HI curve for a test instance is matched with HI curve for a train instance.
  • the time-lag which corresponds to minimum Euclidean distance between the HI curves of the train and test instance is shown.
  • the number of remaining cycles for the train instance after the last cycle of the test instance gives the RUL estimate for the test instance.
  • Let u be a test instance and a be a train instance.
  • the following scenarios for curve matching based RUL estimation are given due consideration:
  • the initial health of an instance varies depending on various factors such as the inherent inconsistencies in the manufacturing process.
  • the initial health is assumed to be close to 1.
  • the HI values for an instance are divided by the average of its first few HI values (e.g. first 5% cycles).
  • a time-lag t is allowed such that the HI values of u* may be close to the HI values of (t, L ⁇ u -*) at time t such that t ⁇ ⁇ ' (refer equations 11 - 13). This takes care of instance specific variances in degree of initial wear and degradation evolution.
  • the HI curve H ⁇ u -* may have high similarity with (t, L ⁇ u -*) for multiple values of time-lag t, wherein / " -* refers to length of the test instance.
  • Multiple RUL estimates are considered for u based on total life of u, rather than considering only the RUL estimate corresponding to the time-lag t with minimum Euclidean distance between the curves H (u* ⁇ and H (u) (f, L (u* ⁇ ).
  • the multiple RUL estimates corresponding to each time-lag are assigned weights proportional to the similarity of the curves to get the final RUL estimate (refer equation 13).
  • Non-monotonic HI Due to inherent noise in sensor readings, the HI curves obtained using LR are non-monotonic. To reduce the noise in the estimates of HI, moving average smoothening, as known in the art, is used.
  • the parameter a decides the number of RUL estimates t) to be considered to get the final RUL estimate
  • the method of the present disclosure is evaluated on two publicly available datasets: C-MAPSS Turbofan Engine Dataset and Milling Machine Dataset, and a real world dataset from a pulverizer mill.
  • C-MAPSS Turbofan Engine Dataset and Milling Machine Dataset a real world dataset from a pulverizer mill.
  • RUL estimation performance metrics is used to measure efficacy of the method (refer paragraph 060).
  • the pulverizer mill undergoes repair on timely basis (around one year), and therefore ground truth in terms of actual RUL is not available.
  • a comparison is drawn between health index (HI) and the cost of maintenance of the mills.
  • LR-Lm and LR-Exp models assume linear and exponential form for the target HI curves, respectively.
  • LR-EDi and LR-ED2 use normalized reconstruction error and normalized squared-reconstruction error as the target HI (refer paragraph 052), respectively.
  • S Timeliness Score
  • A Accuracy
  • MAE Mean Absolute Error
  • MSE Mean Squared Error
  • MAPEi and MAPE2 Mean Absolute Percentage Error
  • ⁇ ( ⁇ fi ( u * _ R (u * ) between the estimated RUL ( « (u* ⁇ ) and actual RUL (R (u* ⁇ ).
  • the score S used to measure the performance of a model is given by: more compared to early predictions. The lower the value of S, the better is the performance.
  • a prediction is considered a false positive (FP) if ⁇ *-" -*) ⁇ - ⁇ 1 , and false negative (FN) if A (ut) ) > T 2 .
  • Model learning and parameter selection 80 engines are randomly selected for training the LSTM-ED model and estimating parameters ⁇ and ⁇ 0 of the LR model (refer Equation 8). The remaining 20 training instances are used as a validation set for selecting the parameters. The trajectories for these 20 engines are randomly truncated at five different locations such that five different cases are obtained from each instance. Minimum truncation is 20% of the total life and maximum truncation is 96%. For training LSTM-ED, only the first subsequence of length / for each of the selected 80 engines is used.
  • the parameters number of principal components p, the number of LSTM units in the hidden layers of encoder and decoder n, window/subsequence length /, maximum allowed time-lag ⁇ ' , similarity threshold a (refer equation 13), maximum predicted RUL R m ax, and parameter ⁇ (refer equation 11) are estimated using grid search to minimize S on the validation set.
  • the average reconstruction error is small. As the number of cycles passed increases, the reconstruction error increases. This suggests that reconstruction error can be used as an indicator of health of a machine.
  • FIG.7A through FIG.7D illustrate histograms of prediction errors for Turbofan Engine dataset from LSTM-ED (without using linear regression), LR- Exp, LR-EDi and LR-ED2 models respectively, in accordance with an embodiment of the present disclosure.
  • FIG. 7(A) and Table 1 suggest that RUL estimates given by HI from LSTM-ED are fairly accurate.
  • FIG.8A illustrates actual RUL as compared with RUL estimates given by LR-Exp, LR-EDi and LR-ED2 models in accordance with an embodiment of the present disclosure. For all the models, it is observed that as the actual RUL increases, the error in predicted values increases. Let R ⁇ u denote the set of all the RUL estimates ) (see Equation 13).
  • FIG.8B illustrates standard deviation, max-min and absolute error of the elements in
  • Milling machine dataset This data set presents milling tool wear measurements from a lab experiment. Flank wear is measured for 16 cases with each case having varying number of runs of varying durations. The wear is measured after runs but not necessarily after every run. The data contains readings for 10 variables (3 operating condition variables, 6 dependent sensors, 1 variable measuring time elapsed until completion of that run). A snapshot sequence of 9000 points during a run for the 6 dependent sensors is provided. It is assumed that each run represents one cycle in the life of the tool. Two operating regimes corresponding to the two types of material being milled are considered, and a different model for each material type are learnt. There are a total of 167 runs across cases with 109 runs and 58 runs for material types 1 and 2, respectively. Case number 6 of material 2 has only one run, and hence not considered for experiments.
  • Model learning and parameter selection Since number of cases is small, leave one out method for model learning and parameters selection is used.
  • the first run of each case is considered as normal with sequence length of 9000.
  • An average of the reconstruction error for a run is used to get the target HI for that run/cycle.
  • mean and standard deviation are computed for each sensor and considered for further evaluation.
  • the gap between two consecutive runs is reduced, via linear interpolation, to 1 second (if it is more); as a result HI curves for each case will have a cycle of one second.
  • the tool wear is also interpolated in the same manner and the data for each case is truncated until the point when the tool wear crosses a value of 0.45 for the first time.
  • the target HI from LSTM-ED for the LR model is also interpolated appropriately for learning the LR model.
  • FIG.9A and FIG.9E illustrate reconstruction errors from LSTM-ED w.r.t. the fraction of life passed for an exemplary milling machine dataset, pertaining to material 1 and material 2 respectively and
  • FIG.9B through FIG.9D and FIG.9F through 9H show the histograms of prediction errors pertaining to material 1 and material 2 respectively, in accordance with an embodiment of the present disclosure while
  • FIG.10A and FIG.10B illustrate RUL predictions at each cycle after interpolation for material- 1 and material-2 respectively for an exemplary milling machine dataset, in accordance with an embodiment of the present disclosure.
  • the reconstruction error increases with amount of life passed, and hence is an appropriate indicator of health.
  • FIG.9B through 9D, FIG9F through 9H, FIG.10A and FIG.10B show results based on almost every cycle of the data after interpolation.
  • the performance metrics on the original data points in the data set are summarized in Table 2.
  • PCAI PCAI
  • PCA1 and LR-EDi are the best models for material- 1 and material-2, respectively. It is observed that the best models of the present disclosure perform well as depicted in histograms in FIG.9A through FIG.9H.
  • FIG.9B through 9D and FIG.9F through 9H show the error distributions for different models for the two materials. As can be noted, most of the RUL prediction errors (around 70%) lie in the ranges [-4, 6] and [-3, 1] for material types 1 and 2, respectively. Also, FIG.10A and FIG.10B show predicted and actual RULs for different models for the two materials.
  • the mill is assumed to be healthy for the first 10% of the days of a year between any two consecutive time -based maintenances Mi and Mi + i, and use the corresponding subsequences for learning LSTM-ED models.
  • This data is divided into training and validation sets.
  • a different LSTM-ED model is learnt after each maintenance.
  • the architecture with minimum average reconstruction error over a validation set is chosen as the best model.
  • the LSTM-ED based reconstruction error for each day is z- normalized using the mean and standard deviation of the reconstruction errors over the sequences in validation set.
  • FIG.11 illustrates pointwise reconstruction errors for last 30 days before maintenance for pulverizer mill dataset, in accordance with an embodiment of the present disclosure. From the results in Table 3 and FIG.11, it is observed that average reconstruction error E on the last day before Mi is the least, and so is the cost C ⁇ Mi) incurred during Mi.
  • N, N n and N a represent number of original sequences, normal subsequences and anomalous subsequences, respectively.
  • the first three datasets are taken from (Chen et al., 2015) whereas the engine dataset is a proprietary one encountered in a real life project.
  • the engine dataset contains data for two different applications: Engine-P where the time series data is quasi-predictable, Engine-NP where the time series data is unpredictable.
  • Engine-P where the time series data is quasi-predictable
  • Engine-NP where the time series data is unpredictable.
  • architectures where both the encoder and decoder have single hidden layer with n LSTM units each are considered.
  • Mini-batch stochastic optimization based on Adam Optimizer (Kingma & Ba, 2014) is used for training the LSTM Encoder-Decoder.
  • FIG.12A1 through FIG.12E1 and FIG.12A2 through FIG.12E2 illustrate normal (N) and anomalous (A) sequences respectively pertaining to power demand, space shuttle valve, electrocardiogram (ECG) and engine datasets respectively.
  • Each of the figures represents original sequence, reconstructed sequence and anomaly score as particularly referenced in FIG.12A1 and FIG.12A2 for ease of reference.
  • Power demand dataset contains one univariate time series data with 35,040 readings for power demand recorded over a period of one year. The demand is normally high during the weekdays and low over the weekend. Within a day, the demand is high during working hours and low otherwise (refer FIG.12A1).
  • the original time series was down sampled by 3.
  • the normal and anomalous sequences in FIG.12B 1 and FIG. 12B2 belong to TEK17 and TEK14 time series, respectively.
  • Engine dataset contains readings for 12 sensors such as coolant temperature, torque, accelerator (control variable), etc.
  • Engine-P has a discrete external control with two states: 'high' and 'low' .
  • the resulting time series are predictable except at the time-instances when the control variable changes.
  • the external control for Engine-NP can assume any value within a certain range and changes very frequently, and hence the resulting time series are unpredictable.
  • the multivariate time series is reduced to univariate by considering only the first principal component after applying principal component analysis (Jolliffe, 2002). The first component captures 72% of the variance for Engine-P and 61% for Engine-NP.
  • ECG dataset contains quasi-periodic time series (duration of a cycle varies from one instance to another).
  • a subset of the first channel from qtdb/sell02 dataset where the time series contains one anomaly corresponding to a pre-ventricular contraction (refer FIG.12E2) is used for the experimental evaluation.
  • Non-overlapping subsequences with / 26 after were considered after down sampling the original signal by 8 (each subsequence corresponds to approximately 800ms). Since only one anomaly is present in the dataset, sets VN2 and VA are not created.
  • the best model, i.e. c is chosen based on the minimum reconstruction error on set VNI .
  • the best LSTM-AD model gives P, R, F0.05 and TPR/FPR (Ratio of True Positive Rate to False Positive Rate) of 0.03, 0.07, 0.03, 1.9, respectively (for a two hidden layer architecture with 30 LSTM units in each layer and prediction length of 1) owing to the fact that the time series is not predictable and hence a good prediction model could not be learnt, whereas the method of the present disclosure gives P, R, Fo.i score and TPR/FPR of 0.96, 0.18, 0.93 and 7.6, respectively.
  • the hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof.
  • the device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field- programmable gate array (FPGA), or a combination of hardware and software means, e.g.
  • ASIC application-specific integrated circuit
  • FPGA field- programmable gate array
  • the means can include both hardware means and software means.
  • the method embodiments described herein could be implemented in hardware and software.
  • the device may also include software means.
  • the embodiments of the present disclosure may be implemented on different hardware devices, e.g. using a plurality of CPUs.
  • the embodiments herein can comprise hardware and software elements.
  • the embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc.
  • the functions performed by various modules comprising the system of the present disclosure and described herein may be implemented in other modules or combinations of other modules.
  • a computer- usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the various modules described herein may be implemented as software and/or hardware modules and may be stored in any type of non-transitory computer readable medium or other storage device.
  • Some non-limiting examples of non-transitory computer-readable media include CDs, DVDs, BLU-RAY, flash memory, and hard disk drives.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present disclosure relates to sequence to sequence mapper based systems and methods for health monitoring and prognostics of a system via a health index (HI). The sequence to sequence mapper learns to reconstruct normal time series behavior, and thereafter uses reconstruction error to estimate the HI. The HI is used for generating health behavior trend, detection of anomalous behavior, and remaining useful life (RUL) pertaining to a monitored system. The present disclosure does not rely on domain knowledge, as in the prior art, when estimating the health index. The HI of the monitored system can be determined irrespective of the predictability of the time series data generated from the monitored system. Likewise, the present disclosure is relevant to time series data of varying nature: predictable, unpredictable, periodic, aperiodic, and quasi-periodic time series; short time series and long time series; and univariate and multivariate time series.

Description

HEALTH MONITORING AND PROGNOSTICS OF A SYSTEM
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application claims priority from Indian Patent Application No. 201621020886, filed on June 17th, 2016, the entirety of which is hereby incorporated by reference.
TECHNICAL FIELD
[002] The embodiments herein generally relate to health monitoring and prognostics of a system, and, more particularly, sequence to sequence mapper based systems and methods for health monitoring and prognostics of a system.
BACKGROUND
[003] Time-based maintenance of complex machines or systems leads to high maintenance costs and high downtime if machine breaks down before the scheduled maintenance date. Most models for automated health monitoring based on time series multi- sensor data received from devices such as engines, vehicles, aircrafts, and the like, rely on domain knowledge. Domain knowledge based systems may not be able to capture complex behavior of machines entirely and may require handcrafted rules and lack generalization capability to work across domains. Prediction models have been used to learn models of normal behavior and then prediction errors are used to measure the health of a machine at any given time. These models assume that time series data is predictable which may not hold true for real-world applications with manual controls and unmonitored environmental conditions or loads leading to inherently unpredictable time series data. Most models are also unable to capture complex non-linear dependencies between sensors and long term temporal correlations.
SUMMARY
[004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
[005] Systems and methods of the present disclosure provide a health index that summarizes the health of a monitored system at any point in time. The health index may be used for detecting anomalous behavior in a system, health monitoring, and prognostics for condition based maintenance. The method of the present disclosure comprises learning an unsupervised model of normal or healthy behavior using multivariate time series data. The model for healthy behavior is then applied to unseen multivariate time series data to predict the health of the monitored system. The system of the present disclosure does not rely on domain knowledge to estimate the health of the monitored machine. A sequence to sequence mapper is employed to capture long term temporal correlations as well as complex non-linear dependencies between multiple sensors. The model does not assume that the time series data is predictable. The method works well for predictable as well as unpredictable time series data. For estimating Remaining Useful Life (RUL), the method of the present disclosure is able to achieve a performance which is comparable to models which rely on domain knowledge.
[006] In an aspect, there is provided a processor implemented method comprising: receiving, by one or more hardware processors, a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstructing from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generating, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.
[007] In another aspect, there is provided a system comprising: one or more data storage devices operatively coupled to the one or more processors and configured to store instructions configured for execution by the one or more processors to: receive a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstruct from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generate, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.
[008] In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstruct from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generate, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.
[009] In an embodiment of the present disclosure, the one or more hardware processors are further configured to learn a sequence mapper based model by: receiving a train set of time series data pertaining to one or more sensors co-operating with at least one train instance of the monitored system; identifying a healthy train set of time series data from the train set of time series data, the healthy train set pertaining to the healthy behavior of the at least one train instance; and training the sequence to sequence mapper based model to reconstruct the time series data in the healthy train set and generate the learnt sequence to sequence mapper based model.
[010] In an embodiment of the present disclosure, the train set and the test set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters.
[011] In an embodiment of the present disclosure, the one or more hardware processors are further configured to estimate a health state of the at least one test instance based on the degree of anomaly computed by: obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series from the test set; and comparing the anomaly score with a pre-defined threshold thereof to classify anomalous or normal subsequences in the test set.
[012] In an embodiment of the present disclosure, the one or more hardware processors are further configured to generate health behavior trend for the at least one test instance based on the reconstruction error.
[013] In an embodiment of the present disclosure, the one or more hardware processors are further configured to estimate Remaining Useful Life (RUL) of the at least one test instance based on one of: (i) obtaining health index (HI) based on the reconstruction error; training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curve(s) of the at least one train instance to estimate the RUL; (ii) generating reconstruction error curves pertaining to the at least one test instance and comparing with reconstruction error curves in a repository of reconstruction error curves of the at least one train instance to estimate the RUL; and (iii) generating the HI curve of the at least one test instance based on the reconstruction error curves thereof; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL. [014] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments of the present disclosure, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[015] The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
[016] FIG.l illustrates an exemplary block diagram of a system for health monitoring and prognostics of a monitored system, in accordance with an embodiment of the present disclosure;
[017] FIG.2 illustrates an exemplary flow diagram of a method for health monitoring and prognostics of a monitored system, in accordance with an embodiment of the present disclosure;
[018] FIG.3 illustrates Long Short Term Memory based Encoder Decoder (LSTM- ED) inference steps for input {zi, z2, z3} to predict z2' , z'3 }, as known in the art;
[019] FIG.4 illustrates an exemplary flow diagram for estimating Remaining Useful Life (RUL) using unsupervised HI based on LSTM-ED, in accordance with an embodiment of the present disclosure;
[020] FIG.5 illustrates an exemplary RUL estimation, in accordance with the present disclosure, using HI curve matching taken from a Turbofan engine dataset, wherein HI curve for a test instance is matched with HI curve for a train instance;
[021] FIG.6 illustrates a graphical illustration of reconstruction error versus fraction of total life passed, obtained from an LSTM-ED model, in accordance with an embodiment of the present disclosure;
[022] FIG.7A through FIG.7D illustrate histograms of prediction errors for Turbofan Engine dataset from LSTM-ED, LR-Exp, LR-EDi and LR-ED2 models respectively, in accordance with an embodiment of the present disclosure;
[023] FIG.8A illustrates actual RUL as compared with RUL estimates given by LR- Exp, LR-EDi and LR-ED2 models for Turbofan engine dataset, in accordance with an embodiment of the present disclosure;
[024] FIG.8B illustrates standard deviation, maximum-minimum and absolute error of the RULs considered for estimating the final RUL w.r.t HI at last cycle for Turbofan engine dataset, in accordance with an embodiment of the present disclosure;
[025] FIG.9A and FIG.9E illustrate reconstruction errors pertaining to material 1 and material 2 respectively and FIG.9B through FIG.9D and FIG.9F through FIG.9H illustrate histograms of prediction errors, pertaining to material 1 and material 2 respectively, w.r.t cycles passed, for an exemplary milling machine dataset, in accordance with an embodiment of the present disclosure;
[026] FIG.10A and FIG.10B illustrate RUL predictions at each cycle after interpolation for material -1 and material-2 respectively, for milling machine dataset, in accordance with an embodiment of the present disclosure;
[027] FIG.11 illustrates pointwise reconstruction errors for last 30 days before maintenance for pulverizer mill dataset, in accordance with an embodiment of the present disclosure; and
[028] FIG.12A1 through FIG.12E1 and FIG.12A2 through FIG.12E2 illustrate normal and anomalous sequences respectively pertaining to power demand, space shuttle valve, electrocardiogram (ECG) and engine datasets respectively, in accordance with an embodiment of the present disclosure.
[029] It should be appreciated by those skilled in the art that any block diagram herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computing device or processor, whether or not such computing device or processor is explicitly shown.
DETAILED DESCRIPTION
[030] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
[031] The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. [032] It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the preferred, systems and methods are now described.
[033] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms.
[034] Before setting forth the detailed explanation, it is noted that all of the discussion below, regardless of the particular implementation being described, is exemplary in nature, rather than limiting.
[035] Industrial internet has given rise to availability of sensor data from numerous machines belonging to various domains such as agriculture, energy, manufacturing, and the like. Sensor readings can indicate health of machines. This has led to increased business desire to perform maintenance of machines based on their condition rather than following the current industry practice of time-based maintenance. It is also noted that condition based maintenance can lead to significant financial savings. Condition based maintenance can be achieved by building models for prediction of remaining useful life (RUL) of machines, based on their sensor readings. Traditional approach for RUL prediction is based on an assumption that health degradation curves (drawn w.r.t. time) follow a specific shape such as exponential or linear. However, such assumptions do not hold well with real world data sets. Some important challenges in solving prognostics related problems are: i) health degradation curve may not necessarily follow a fixed shape, ii) time to reach same level of degradation by machines of same specifications is often different, iii) each instance has a slightly different initial health or wear, iv) sensor readings, if available, are noisy, v) sensor data till end-of-life is not easily available because in practice, periodic maintenance is performed. Apart from health index (HI) based approach, mathematical models of the underlying physical system, fault propagation models and conventional reliability models have also been used for RUL estimation. The present disclosure provides an unsupervised technique to obtain health index (HI) for a monitored system using multi-sensor time series data, which does not make any assumption on the shape of the degradation curve. A sequence to sequence mapper based model such as Long Short Term Memory based Encoder-Decoder (LSTM-ED) is used to learn a model of normal behavior of a monitored system, which is trained to reconstruct multivariate time series data corresponding to normal behavior. Reconstruction error at a point in a time series data is then used to compute HI at that point. The present disclosure shows that LSTM-ED based HI learnt in an unsupervised manner is able to capture degradation in a monitored system; the HI decreases as the system degrades. Also, LSTM-ED based HI can be used to learn a model for RUL estimation instead of relying on domain knowledge, or exponential/linear degradation assumption, while achieving comparable performance.
[036] The expression "time series data" used in the context of the present disclosure refers to either univariate or multivariate time series data pertaining to one or more sensors respectively.
[037] In the present disclosure, the expressions "normal" and "healthy" may be used interchangeably.
[038] In the present disclosure, the expressions "sequence" and time-series may be used interchangeably.
[039] Referring now to the drawings, and more particularly to FIGS. 1 through 12, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and method.
[040] FIG.l illustrates an exemplary block diagram of a system 100 for health monitoring and prognostics of a monitored system 200 in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[041] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[042] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.
[043] FIG.2 illustrates an exemplary flow diagram of a method 300 for health monitoring and prognostics of the monitored system 200 in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or memory 102 operatively coupled to the one or more processors 104 and is configured to store instructions configured for execution of steps of the method 200 by the one or more processors 104.
[044] In an embodiment of the present disclosure, at step 302, a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of the monitored system 200 is received.
[045] In an embodiment of the present disclosure, at step 304, one or more time series data, from the test set received at step 302 and pertaining to the at least one test instance is reconstructed by a sequence to sequence mapper based model. In accordance with the present disclosure, an instance of the sequence to sequence mapper based model can be a Long Short Term Memory based Encoder-Decoder (LSTM-ED).
[046] Explanation of the systems and methods of the present disclosure is provided further based on LSTM-ED, however, any sequence to sequence mapper based model maybe used in the context of the present disclosure. In an embodiment, LSTM units in an LSTM-ED model are recurrent units that use current input time series data zt, hidden state activations at- l, and memory cell activations Ct-i to compute hidden state activations at at time t. An LSTM unit uses a combination of a memory cell c and three types of gates: input gate i, forget gate/, and output gate o to decide if the input needs to be remembered (using input gate), when previous memory needs to be retained (using forget gate), and when memory content needs to be output (using output gate). The values for input gate i, forget gate/ output gate o, hidden state a, and cell activation c for n LSTM units at time t are computed using the current input Zt, the previous hidden state at-i, and memory cell value Ct-i as given by Equations 1-4 herein below. it = o( Vxzt + W2at_1 + bi) — -> (1)
ft = a(W3zt + W at_x + bf) --> (2)
ot = a(W5zt + Wbat_x + b0) --> (3)
Figure imgf000011_0001
wherein Wi (i=l,2, ...,8) are weight matrices , bi, bf, b0, bg refer to bias vectors, p refers to an intermediate variable used to obtain ct.
Here, σ(ζ) = 1+e-z cmd tanh(z) = 2σ(2ζ)— 1. The operations σ and tanh are applied elementwise.
Also, zt G RP, and all the other parameters h, ft, Ot, gt, at, Ct G RN, p refers to number of input units (or number of sensors), n is the number of hidden LSTM units, and
Figure imgf000011_0002
at = ottanh(ct) -- -> (6)
[047] Windows of length / are considered to obtain L - / + 1 subsequences for a train instance with L cycles, wherein L is the length of the sequence of the train instance. LSTM- ED is trained to reconstruct the normal (healthy) subsequences of length / from all the training instances. The LSTM encoder learns a fixed length vector representation of the input time series data and the LSTM decoder uses this representation to reconstruct the time series data using current hidden state and the value predicted at the previous time-step. Given a time series data Z = zx z2 ... z , at is the hidden state of encoder at time t for each t ^ { 1, 2,
(ΕΛ
I}, where aj. G Rn, n is the number of LSTM units in the hidden layer of the encoder. The encoder and decoder are jointly trained to reconstruct the time series data in reverse order, i.e. the target time series data is [zt Zi_x . . . z-J.
[048] In an embodiment the learnt sequence to sequence mapper based model is obtained by receiving a train set of time series data pertaining to one or more sensors cooperating with at least one train instance of the monitored system. A healthy train set of time series data from the train set of time series data that pertains to the healthy behavior of the at least one train instance is identified. The sequence to sequence mapper based model is then trained to reconstruct the healthy train set and generate the learnt sequence to sequence mapper based model. FIG.3 illustrates Long Short Term Memory based Encoder Decoder (LSTM-ED) inference steps for input {zi, z2, z3} with I = 3 to predict {zx' , z2' , z'3 } , as known in the art. The value zt at time instance t and the hidden state of the encoder at time t-l are used to obtain the hidden state aj. of the encoder at time t. The hidden state at of the encoder at the end of the input sequence is used as the initial state α of the decoder such that
Figure imgf000012_0001
A linear layer with weight matrix w of size n x p and bias vector b G on top of the decoder is used to compute zt=w at +b. During training, the decoder uses zt as input to obtain the state
Figure imgf000012_0002
to target zt-i - During inference, the predicted value z't is input to the decoder to obtain and predict z't_x. In an embodiment of the present disclosure, at step 306, a reconstruction error is generated at each time instance of the one or more time series data from the test set. The reconstruction error etfor a point ztis given by:
et = \ \zt - z't\ \ --> {!)
wherein 11.11 refers to L2norm.
One of the goals of the present disclosure is to minimize cost function E=∑ eSjvt=i(et)2, where sN is the set of normal training subsequences of length I each. In accordance with the present disclosure, for training, only the subsequences which correspond to perfect health of an instance are considered. In an embodiment, the first few operational cycles can be assumed to correspond to healthy state for any instance.
[049] In an embodiment the test set and the train set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters. In accordance with one embodiment, the dimensionality reduction technique is Principle Component Analysis (PC A).
[050] The reconstruction error generated in step 306 may be used for various applications. In an embodiment, health state of the at least one test instance may be estimated. For estimating the health state, a degree of anomaly is computed by obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series data from the test set; and comparing the anomaly score with a pre-defined threshold τ thereof to classify anomalous or normal subsequences in the test set. In an embodiment, a healthy time series data is divided into four sets of time series data: SN, VNI , VN2, and tN, and the anomalous time series data into two sets VA and tA. The set of sequences SN is used to learn the LSTM encoder-decoder reconstruction model. The set VNI is used for early stopping based regularization while training the encoder-decoder model. The reconstruction error vector for ί is given byet = |zt— z't | . The error vectors for the points in the sequences in set VNI are used to estimate parameters μ (mean vector) and∑ (covariance matrix) of a Normal distribution Ν(μ, ∑) using Maximum Likelihood Estimation. Then, for any point zt, the anomaly score at = (et(l) - )τ_1(ει- μ). In a supervised setting, if at > τ, a point in a sequence can be predicted to be "anomalous", otherwise "normal" or "healthy". When anomalous sequences are available during training, a threshold τ over the likelihood values can be learnt to maximize Fp = (1+ β2) x P x R / (β2Ρ +R), where β > 0, P is precision, R is recall, "anomalous" is the positive class and "normal" is the negative class. If a subsequence contains an anomalous pattern, the actual label for the entire subsequence is considered to be "anomalous". This is helpful in many real-world applications where the exact position of anomaly is not known. For example, for an engine dataset (refer paragraph 077), the only information available is that the machine was repaired on a particular date. The last few operational runs prior to repair are assumed to be anomalous and the first few operational runs after the repair are assumed to be normal. It is assumed that β < 1 since the fraction of actual anomalous points in a sequence labeled as anomalous may not be high, and hence lower recall is expected. The parameters τ and n are chosen with maximum Fp score on the validation sequences in VN2 and VA.
[051] In another embodiment, the reconstruction error generated in step 306 may be used to generate health behavior trend for the at least one test instance. If the reconstruction error shows an increasing trend, the health of the monitored system 200 may be deemed to be deteriorating.
[052] In yet another embodiment, the reconstruction error maybe used to estimate Remaining Useful Life (RUL) of the at least one test instance. In an embodiment, health index (HI) is obtained by training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance and comparing with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL. In accordance with the present disclosure, HI curve is the complete sequence of HI values over all time instances. For instance, if length of time- series is 10, then each of the 10 time instances would be associated with an HI value. The 10 HI values form a HI curve. Let H(u) = [hM h(u) h(u)^ represent me HI curve for instance u, where each point /i| £ R, is the total number of cycles. It is assumed that 0 < < 1, such that when u is in
(u)
perfect health hj. = 1, and when u performs below an acceptable level (e.g. instance is about to fail), = 0. The method of the present disclosure constructs a mapping fe : →
Figure imgf000013_0001
such that a Linear Regression (LR) model may be expressed as:
Figure imgf000014_0001
where Θ E Rp, θ0 E R, which computes HI from the derived sensor readings G Rpat time t for instance u. Given the target HI curves for the training instances, the parameters Θ and θ0 are estimated using Ordinary Least Squares methods.
[053] The parameters Θ and θ0 of the abovementioned Linear Regression (LR) model are usually estimated by assuming a mathematical form for the target H^u with an exponential function being the most common and successfully employed the target HI curve which assumes the HI at time t for instance u as:
^ = 1 - exp ( ^S )^ e W-L (1 - β, -1^ 0 < β, < 1→ (9)
The starting and ending β' fraction of cycles are assigned values of 1 and 0, respectively. Another possible assumption is: assume target HI values of 1 and 0 for data corresponding to healthy condition and failure conditions, respectively. Unlike the exponential HI curve which uses the entire time series of sensor readings, the sensor readings corresponding to only these points are used to learn the regression model. The estimates Θ and θ0 based on the target HI curves for train instances are used to obtain the final HI curves for all the train instances and a new test instance for which RUL is to be estimated. The HI curves thus obtained are used to estimate the RUL for the test instance based on similarity of train and test HI curves. In another embodiment, reconstruction error curves pertaining to the at least one test instance are generated and compared with reconstruction error curves pertaining to the at least one train instance to estimate the RUL. A point zt in a time series data Z is part of multiple overlapping subsequences, and is therefore predicted by multiple subsequences Z(j, I) corresponding to j=t- +l, t-l+2, t. Hence each point in the original time series data for a train instance is predicted as many times as the number of subsequences it is part of ( / times for each point except for points zt with t < I or t > L-l which are predicted fewer number of times). An average of all the predictions for a point is taken to be final prediction for that point. The difference in actual and predicted values for a point is used as an un-normalized HI for that point.
Error e^is normalized to obtain the target HI as: t - -M (u) ^ ί υ) where and are the maximum and minimum values of reconstruction error for instance u over t = 1, 2, .. L(u), respectively. The target HI values thus obtained for all train instances are used to obtain the estimates Θ and θ0. Apart from
Figure imgf000015_0001
is also considered to obtain the target HI values such that large reconstruction errors imply much smaller HI value.
[054] In yet another embodiment, the HI curve of the at least one test instance is generated and compared with the HI curves in a repository of the HI curves of the at least one train instance to estimate the RUL. FIG.4 illustrates an exemplary flow diagram for estimating Remaining Useful Life (RUL) using unsupervised HI based on LSTM-ED, in accordance with an embodiment of the present disclosure. The HI curve (online HI curve) for a test instance u is compared to the HI curves (offline HI curves) of all the train instances u G U. The test instance and train instance may take different number of cycles to reach the same degradation level (HI value). FIG.5 illustrates an exemplary RUL estimation, in accordance with the present disclosure, using HI curve matching taken from a Turbofan engine dataset, wherein HI curve for a test instance is matched with HI curve for a train instance. The time-lag which corresponds to minimum Euclidean distance between the HI curves of the train and test instance is shown. For a given time-lag, the number of remaining cycles for the train instance after the last cycle of the test instance gives the RUL estimate for the test instance. Let u be a test instance and a be a train instance. In accordance with an embodiment of the present disclosure, the following scenarios for curve matching based RUL estimation are given due consideration:
[055] Varying initial health across instances: The initial health of an instance varies depending on various factors such as the inherent inconsistencies in the manufacturing process. The initial health is assumed to be close to 1. In order to ensure this, the HI values for an instance are divided by the average of its first few HI values (e.g. first 5% cycles). Also, while comparing the HI curves H^u -*and H^u a time-lag t is allowed such that the HI values of u* may be close to the HI values of (t, L^u -*) at time t such that t < τ' (refer equations 11 - 13). This takes care of instance specific variances in degree of initial wear and degradation evolution.
[056] Multiple time lags with high similarity - The HI curve H^u -* may have high similarity with (t, L^u -*) for multiple values of time-lag t, wherein / " -* refers to length of the test instance. Multiple RUL estimates are considered for u based on total life of u, rather than considering only the RUL estimate corresponding to the time-lag t with minimum Euclidean distance between the curves H(u* }and H(u) (f, L(u* }). The multiple RUL estimates corresponding to each time-lag are assigned weights proportional to the similarity of the curves to get the final RUL estimate (refer equation 13).
[057] Non-monotonic HI: Due to inherent noise in sensor readings, the HI curves obtained using LR are non-monotonic. To reduce the noise in the estimates of HI, moving average smoothening, as known in the art, is used.
[058] Maximum value of RUL estimate: When an instance is in very good health or has been operational for few cycles, estimating RUL is difficult. The maximum RUL estimate is limited for any test instance to Rmax. Also, the maximum RUL estimate for the instance u* based on HI curve comparison with instance u is limited by L(u )- L(u* }. This implies that the maximum RUL estimate for any test instance u will be such that the total length R(u ^ + L^u ½ Lmax, where Lmax is the maximum length for any training instance available. When very few cycles of a test instance are available, it becomes difficult to predict RUL beyond a certain point.
[059] Similarity between HI curves of test instance u* and train instance u with time-lag t as is defined as:
s u , u, t) = exp( )
A — > (11)
where,
Figure imgf000016_0001
is the squared Euclidean distance between H(u*} (1, L(u* }) and H(u) (t, L(u* }), and λ > 0, t
£ { 1, 2, , τ'}, t + L^u ^<L^U Here, λ controls the notion of similarity: a small value of λ would imply large difference in s even when d is not large. The RUL estimate for u* based on the HI curve for u and for time-lag t is given by
Figure imgf000016_0002
The estimate
Figure imgf000016_0003
for βθ*) is given by
Figure imgf000016_0004
where the summation is over only those combinations of u and t which satisfy s(«*, u, t) > .Smax, where smax = t)}, 0 < a < 1.
Figure imgf000016_0005
It is to be noted that the parameter a decides the number of RUL estimates
Figure imgf000016_0006
t) to be considered to get the final RUL estimate
Figure imgf000016_0007
t) considered for computing R^u ^can be used as a measure of confidence in the prediction, which is useful in practical applications (for instance, refer paragraph 066). During the initial stages of an instance's usage, when it is in good health and a fault has still not appeared, estimating RUL is tough, as it is difficult to know beforehand how exactly the fault would evolve over time once it appears.
EXPERIMENTAL EVALUATION FOR ESTIMATING RUL
[060] The method of the present disclosure is evaluated on two publicly available datasets: C-MAPSS Turbofan Engine Dataset and Milling Machine Dataset, and a real world dataset from a pulverizer mill. For the first two datasets, the ground truth in terms of the RUL is known, and RUL estimation performance metrics is used to measure efficacy of the method (refer paragraph 060). The pulverizer mill undergoes repair on timely basis (around one year), and therefore ground truth in terms of actual RUL is not available. A comparison is drawn between health index (HI) and the cost of maintenance of the mills. For the first two datasets, different target HI curves for learning the LR model (refer paragraph 052) are used: LR-Lm and LR-Exp models assume linear and exponential form for the target HI curves, respectively. LR-EDi and LR-ED2 use normalized reconstruction error and normalized squared-reconstruction error as the target HI (refer paragraph 052), respectively. The target HI values for LR-Exp are obtained using Equation 9 with β' = 5% as suggested in the art.
[061] Performance metrics considered: The performance is measured in terms of Timeliness Score (S), Accuracy (A), Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPEi and MAPE2) as mentioned in Equations 14-19, respectively. For test instance u*, the error
Δ= fi(u* _ R(u*) between the estimated RUL («(u*}) and actual RUL (R(u*}). The score S used to measure the performance of a model is given by:
Figure imgf000017_0001
more compared to early predictions. The lower the value of S, the better is the performance.
Figure imgf000017_0002
MAPE2 =≡N ^ \ _L_ ____> ( 19)
A prediction is considered a false positive (FP) if Δ*-" -*) < -τ1, and false negative (FN) if A(ut)) > T2 .
[062] C-MAPSS Turbofan Engine Dataset: The first dataset from the simulated turbofan engine data (NASA Ames Prognostics Data Repository) contains readings for 24 sensors (3 operation setting sensors, 21 dependent sensors) for 100 engines till a failure threshold is achieved, i.e., till end-of-life in train_ F DOOl.txt. Similar data is provided for 100 test engines in test_ F DOOl.txt where the time series data for engines are pruned some time prior to failure. The task is to predict RUL for these 100 engines. The actual RUL values are provided in RUL_ F DOOl.txt. There are a total of 20631 cycles for training engines, and 13096 cycles for test engines. Each engine has a different degree of initial wear. In the context of the present disclosure, the experiment uses τ1 = 13, τ2 = 10 as is known in the art.
[063] Model learning and parameter selection: 80 engines are randomly selected for training the LSTM-ED model and estimating parameters Θ and θ0 of the LR model (refer Equation 8). The remaining 20 training instances are used as a validation set for selecting the parameters. The trajectories for these 20 engines are randomly truncated at five different locations such that five different cases are obtained from each instance. Minimum truncation is 20% of the total life and maximum truncation is 96%. For training LSTM-ED, only the first subsequence of length / for each of the selected 80 engines is used. The parameters number of principal components p, the number of LSTM units in the hidden layers of encoder and decoder n, window/subsequence length /, maximum allowed time-lag τ' , similarity threshold a (refer equation 13), maximum predicted RUL Rmax, and parameter λ (refer equation 11) are estimated using grid search to minimize S on the validation set. The parameters obtained for the best model (LR-ED2) are p = 3, n = 30, / = 20, τ' = 40, a = 0.87, Rmax = 125, and λ = 0.0005.
[064] Results and Observations: FIG.6 shows the average pointwise reconstruction error (refer Equation 7) given by the model LSTM-ED which uses the pointwise reconstruction error as an un-normalized measure of health (higher the reconstruction error, poorer the health) of all the 100 test engines w.r.t. percentage of life passed (for HI sequences with p = 3 as used for model LR-EDi and LR-ED2). During initial stages of an engine's life, the average reconstruction error is small. As the number of cycles passed increases, the reconstruction error increases. This suggests that reconstruction error can be used as an indicator of health of a machine. FIG.7A through FIG.7D illustrate histograms of prediction errors for Turbofan Engine dataset from LSTM-ED (without using linear regression), LR- Exp, LR-EDi and LR-ED2 models respectively, in accordance with an embodiment of the present disclosure. FIG. 7(A) and Table 1 suggest that RUL estimates given by HI from LSTM-ED are fairly accurate.
Table 1 (bold text is indicative of best results)
Figure imgf000019_0001
On the other hand, the 1-sigma bars in FIG.6 also suggest that the reconstruction error at a given point in time (percentage of total life passed) varies significantly from engine to engine.
[065] Performance comparison: It was also seen that LR-ED2 performs significantly better compared to the other three models. LR-ED2 is better than the LR-Exp model which uses domain knowledge in the form of exponential degradation assumption. A comparison with RULCLIPPER (RC) which has the best performance in terms of timeliness S, accuracy A, MAE, and MSE, as known in the art, on the turbofan dataset considered. It may be noted that unlike RC, the method of the present disclosure learns the parameters of the model on a validation set rather than test set. RC relies on the exponential assumption to estimate a HI polygon and uses intersection of areas of polygons of train and test engines as a measure of similarity to estimate RUL (similar to Equation 13). The results show that LR-ED2 gives performance comparable to RC without relying on the domain-knowledge based exponential assumption.
[066] The single worst predicted test instance for LR-Exp, LR-EDi and LR-ED2 contributes 23%, 17%, and 23%, respectively, to the timeliness score S. For LR-Exp and LR- ED2 it is nearly l/4th of the total score, and suggests that for other 99 test engines the timeliness score S is very good. [067] HI at last cycle and RUL estimation error: FIG.8A illustrates actual RUL as compared with RUL estimates given by LR-Exp, LR-EDi and LR-ED2 models in accordance with an embodiment of the present disclosure. For all the models, it is observed that as the actual RUL increases, the error in predicted values increases. Let R^u denote the set of all the RUL estimates
Figure imgf000020_0001
)(see Equation 13). FIG.8B illustrates standard deviation, max-min and absolute error of the elements in
Rail w-r-t HI at last cycle and the difference of the maximum and the minimum value of the elements in R^u w.r.t. HI value at last cycle for Turbofan engine dataset, in accordance with an embodiment of the present disclosure. It suggests that when an instance is close to failure, i.e., HI at last cycle is low, RUL estimate is very accurate with low standard deviation of the elements in R^u . On the other hand, when an instance is in good health, i.e., when HI at last cycle is close to 1, the error in RUL estimate is high and the elements in R^n all have high standard deviation.
[068] Milling machine dataset: This data set presents milling tool wear measurements from a lab experiment. Flank wear is measured for 16 cases with each case having varying number of runs of varying durations. The wear is measured after runs but not necessarily after every run. The data contains readings for 10 variables (3 operating condition variables, 6 dependent sensors, 1 variable measuring time elapsed until completion of that run). A snapshot sequence of 9000 points during a run for the 6 dependent sensors is provided. It is assumed that each run represents one cycle in the life of the tool. Two operating regimes corresponding to the two types of material being milled are considered, and a different model for each material type are learnt. There are a total of 167 runs across cases with 109 runs and 58 runs for material types 1 and 2, respectively. Case number 6 of material 2 has only one run, and hence not considered for experiments.
[069] Model learning and parameter selection: Since number of cases is small, leave one out method for model learning and parameters selection is used. For training the LSTM- ED model, the first run of each case is considered as normal with sequence length of 9000. An average of the reconstruction error for a run is used to get the target HI for that run/cycle. Of the 9000 values obtained for each run, mean and standard deviation are computed for each sensor and considered for further evaluation. The gap between two consecutive runs is reduced, via linear interpolation, to 1 second (if it is more); as a result HI curves for each case will have a cycle of one second. The tool wear is also interpolated in the same manner and the data for each case is truncated until the point when the tool wear crosses a value of 0.45 for the first time. The target HI from LSTM-ED for the LR model is also interpolated appropriately for learning the LR model.
[070] The parameters obtained for the best models (based on minimum MAPEi) for material- 1 are p = 1, λ = 0.025, a = 0.98, τ' = 15 for model PCAi, for material-2 are p = 2, λ = 0.005, a = 0.87, τ' = 13, and n = 45 for LR-EDi. The best results are obtained without setting any limit Rmax. For both cases, / = 90 (after down sampling by 100) such that the time series data for first run is used for learning the LSTM-ED model.
[071] Results and observations: FIG.9A and FIG.9E illustrate reconstruction errors from LSTM-ED w.r.t. the fraction of life passed for an exemplary milling machine dataset, pertaining to material 1 and material 2 respectively and FIG.9B through FIG.9D and FIG.9F through 9H show the histograms of prediction errors pertaining to material 1 and material 2 respectively, in accordance with an embodiment of the present disclosure while FIG.10A and FIG.10B illustrate RUL predictions at each cycle after interpolation for material- 1 and material-2 respectively for an exemplary milling machine dataset, in accordance with an embodiment of the present disclosure. As shown in FIG.9A and 9E, the reconstruction error increases with amount of life passed, and hence is an appropriate indicator of health. FIG.9B through 9D, FIG9F through 9H, FIG.10A and FIG.10B show results based on almost every cycle of the data after interpolation. The performance metrics on the original data points in the data set are summarized in Table 2.
Table 2(bold text is indicative of best results)
Figure imgf000021_0001
It is observed that the first PCA component (PCAI, p = 1) gives better results than LR-Lin and LR-Exp models with p > 2, and hence results for PCAI are presented in Table 2. It is to be noted that for p = 1, all the four models LR-Lin, LR-Exp, LR-EDi, and LR-ED2 will give same results since all models will predict a different linearly scaled value of the first PCA component. PCA1 and LR-EDi are the best models for material- 1 and material-2, respectively. It is observed that the best models of the present disclosure perform well as depicted in histograms in FIG.9A through FIG.9H. For the last few cycles, when actual RUL is low, an error of even 1 in RUL estimation leads to MAPE1 of 100%. FIG.9B through 9D and FIG.9F through 9H, show the error distributions for different models for the two materials. As can be noted, most of the RUL prediction errors (around 70%) lie in the ranges [-4, 6] and [-3, 1] for material types 1 and 2, respectively. Also, FIG.10A and FIG.10B show predicted and actual RULs for different models for the two materials.
[072] Pulverizer Mill Dataset: This dataset consists of readings for 6 sensors (such as bearing vibration, feeder speed, etc.) for over three years of operation of a pulverizer mill. The data corresponds to sensor readings taken every half hour between four consecutive scheduled maintenances Mo, Mi, M2, and M3, such that the operational period between any two maintenances is roughly one year. Each day's multivariate time series data with length / = 48 is considered to be one subsequence. Apart from these scheduled maintenances, maintenances are done in between whenever the mill develops any unexpected fault affecting its normal operation. The costs incurred for any of the scheduled maintenances and unexpected maintenances are available.
[073] The mill is assumed to be healthy for the first 10% of the days of a year between any two consecutive time -based maintenances Mi and Mi+i, and use the corresponding subsequences for learning LSTM-ED models. This data is divided into training and validation sets. A different LSTM-ED model is learnt after each maintenance. The architecture with minimum average reconstruction error over a validation set is chosen as the best model. The best models learnt using data after Mo, Mi and M2 are obtained for n = 40, 20, and 100, respectively. The LSTM-ED based reconstruction error for each day is z- normalized using the mean and standard deviation of the reconstruction errors over the sequences in validation set.
[074] FIG.11 illustrates pointwise reconstruction errors for last 30 days before maintenance for pulverizer mill dataset, in accordance with an embodiment of the present disclosure. From the results in Table 3 and FIG.11, it is observed that average reconstruction error E on the last day before Mi is the least, and so is the cost C{Mi) incurred during Mi.
Table 3 Maint.ID tE P(E > tE) tc P(C > tc 1 E (last C (MO
E> tE) day)
Mi 1.50 0.25 7 0.61 2.4 92
M2 1.50 0.57 7 0.84 8.0 279
M3 1.50 0.43 7 0.75 16.2 209
For Mi and j, E as well as corresponding C(Mi) are higher compared to those of Mi. Further, it is observed that for the days when average value of reconstruction error E > ΐε, a large fraction (>0.61) of them have a high ongoing maintenance cost C > tc- The significant correlation between reconstruction error and cost incurred suggests that the LSTM-ED based reconstruction error is able to capture the health of the mill.
EXPERIMENTAL EVALUATION FOR ANOMALY DETECTION
[075] Four real-world datasets are considered: power demand, space shuttle valve, ECG, and engine as seen in Table 4 herein below, wherein N, Nn and Na represent number of original sequences, normal subsequences and anomalous subsequences, respectively.
Table 4 - Nature of datasets
Figure imgf000023_0001
The first three datasets are taken from (Chen et al., 2015) whereas the engine dataset is a proprietary one encountered in a real life project. The engine dataset contains data for two different applications: Engine-P where the time series data is quasi-predictable, Engine-NP where the time series data is unpredictable. For the experimental evaluation, architectures where both the encoder and decoder have single hidden layer with n LSTM units each are considered. Mini-batch stochastic optimization based on Adam Optimizer (Kingma & Ba, 2014) is used for training the LSTM Encoder-Decoder.
[076] Datasets: Table 5 herein below shows the performance of the method of the present disclosure on all the datasets.
Table 5
Figure imgf000024_0001
FIG.12A1 through FIG.12E1 and FIG.12A2 through FIG.12E2 illustrate normal (N) and anomalous (A) sequences respectively pertaining to power demand, space shuttle valve, electrocardiogram (ECG) and engine datasets respectively. Each of the figures represents original sequence, reconstructed sequence and anomaly score as particularly referenced in FIG.12A1 and FIG.12A2 for ease of reference. Power demand dataset contains one univariate time series data with 35,040 readings for power demand recorded over a period of one year. The demand is normally high during the weekdays and low over the weekend. Within a day, the demand is high during working hours and low otherwise (refer FIG.12A1). A week when any of the first 5 days has low power demands (similar to the demand over the weekend) is considered anomalous (refer FIG.12A2 where first day has low power demand). The original time series were down sampled by 8 to obtain non-overlapping sequences with / = 84 such that each window corresponds to one week.
[077] Space shuttle dataset contains periodic sequences with 1000 points per cycle, and 15 such cycles. / = 1500 was chosen deliberately such that a subsequence covers more than one cycle (1.5 cycles per subsequence) and sliding windows with step size of 500 was considered. The original time series was down sampled by 3. The normal and anomalous sequences in FIG.12B 1 and FIG. 12B2 belong to TEK17 and TEK14 time series, respectively.
[078] Engine dataset contains readings for 12 sensors such as coolant temperature, torque, accelerator (control variable), etc. Two different applications of the engine were considered: Engine-P and Engine-NP. Engine-P has a discrete external control with two states: 'high' and 'low' . The resulting time series are predictable except at the time-instances when the control variable changes. On the other hand, the external control for Engine-NP can assume any value within a certain range and changes very frequently, and hence the resulting time series are unpredictable. / = 60 for Engine-P and / = 100 for Engine-NP are randomly chosen. The multivariate time series is reduced to univariate by considering only the first principal component after applying principal component analysis (Jolliffe, 2002). The first component captures 72% of the variance for Engine-P and 61% for Engine-NP.
[079] ECG dataset contains quasi-periodic time series (duration of a cycle varies from one instance to another). A subset of the first channel from qtdb/sell02 dataset where the time series contains one anomaly corresponding to a pre-ventricular contraction (refer FIG.12E2) is used for the experimental evaluation. Non-overlapping subsequences with / = 26 after were considered after down sampling the original signal by 8 (each subsequence corresponds to approximately 800ms). Since only one anomaly is present in the dataset, sets VN2 and VA are not created. The best model, i.e. c, is chosen based on the minimum reconstruction error on set VNI . τ = μά + oa is chosen, where μα and oa are the mean and standard deviation of the anomaly scores of the points from VNI .
[080] Observations: The key observations from the experimental evaluation are as follows: 1) The positive likelihood ratio is significantly higher than 1.0 for all the datasets (refer Table 5). High positive likelihood ratio values suggest that the method of the present disclosure gives significantly higher anomaly scores for points in anomalous sequences compared to anomaly scores for points in normal sequences. 2) For periodic time series, the evaluation was performed with varying window lengths: window length same as the length of one cycle (power demand dataset) and window length greater than the length of one cycle (space shuttle dataset). A quasi-periodic time series (ECG) was also considered. The method of the present disclosure is able to detect anomalies in all these scenarios. 3) A time series prediction based anomaly detection model LSTM-AD (Malhotra et al., 2015) gives better results for the predictable datasets: Space Shuttle, Power and Engine-P (corresponding to Engine dataset in (Malhotra et al., 2015)) with Fo.i scores of 0.84, 0.90 and 0.89, respectively. On the other hand, the method of the present disclosure gives better results for Engine-NP where the sequences are not predictable. The best LSTM-AD model gives P, R, F0.05 and TPR/FPR (Ratio of True Positive Rate to False Positive Rate) of 0.03, 0.07, 0.03, 1.9, respectively (for a two hidden layer architecture with 30 LSTM units in each layer and prediction length of 1) owing to the fact that the time series is not predictable and hence a good prediction model could not be learnt, whereas the method of the present disclosure gives P, R, Fo.i score and TPR/FPR of 0.96, 0.18, 0.93 and 7.6, respectively.
[081] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments of the present disclosure. The scope of the subject matter embodiments defined here may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language.
[082] It is, however to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field- programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments of the present disclosure may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[083] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules comprising the system of the present disclosure and described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer- usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The various modules described herein may be implemented as software and/or hardware modules and may be stored in any type of non-transitory computer readable medium or other storage device. Some non-limiting examples of non-transitory computer-readable media include CDs, DVDs, BLU-RAY, flash memory, and hard disk drives.
[084] Further, although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[085] The preceding description has been presented with reference to various embodiments. Persons having ordinary skill in the art and technology to which this application pertains will appreciate that alterations and changes in the described structures and methods of operation can be practiced without meaningfully departing from the principle, spirit and scope.

Claims

WE CLAIM:
1. A processor implemented method comprising:
receiving, by one or more hardware processors, a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system (302);
reconstructing from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model (304); and
generating, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series data from the test set (306).
2. The method of claim 1, wherein the learnt sequence to sequence mapper based model is obtained by:
receiving, by the one or more hardware processors, a train set of time series data pertaining to one or more sensors co-operating with at least one train instance of the monitored system;
identifying, by the one or more hardware processors, a healthy train set of time series data from the train set of time series data, the healthy train set pertaining to the healthy behavior of the at least one train instance; and
training, by the one or more hardware processors, the sequence to sequence mapper based model to reconstruct the healthy train set and generate the learnt sequence to sequence mapper based model.
3. The method of claim 2, wherein the train set and the test set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters.
4. The method of claim 1 further comprising estimating a health state of the at least one test instance based on the degree of anomaly computed by: obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series data from the test set; and comparing the anomaly score with a predefined threshold thereof to classify anomalous or normal subsequences in the test set.
5. The method of claim 1 further comprising generating health behavior trend for the at least one test instance based on the reconstruction error.
6. The method of claim 1 further comprising estimating Remaining Useful Life (RUL) of the at least one test instance based on one of:
(i) obtaining health index (HI) based on the reconstruction error; training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL;
(ii) generating reconstruction error curves pertaining to the at least one test instance and comparing with reconstruction error curves in a repository of reconstruction error curves of the at least one train instance to estimate the RUL; and
(iii) generating the HI curve of the at least one test instance based on the reconstruction error curves thereof; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL.
7. A system (100) comprising:
one or more data storage devices (102) operatively coupled to one or more hardware processors (104) and configured to store instructions configured for execution by the one or more hardware processors to:
receive a test set of time series data pertaining to one or more sensors cooperating with at least one test instance of a monitored system;
reconstruct from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and
generate, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series data from the test set.
8. The system of claim 7, wherein the one or more hardware processors are further configured to learn a sequence mapper based model by: receiving a train set of time series data pertaining to one or more sensors cooperating with at least one train instance of the monitored system;
identifying a healthy train set of time series data from the train set of time series data, the healthy train set pertaining to the healthy behavior of the at least one train instance; and
training the sequence to sequence mapper based model to reconstruct the healthy train set and generate the learnt sequence to sequence mapper based model.
9. The system of claim 7, wherein the train set and the test set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters.
10. The system of claim 7, wherein the one or more hardware processors are further configured to estimate a health state of the at least one test instance based on the degree of anomaly computed by: obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series data from the test set; and comparing the anomaly score with a pre-defined threshold thereof to classify normal or anomalous subsequences in the test set.
11. The system of claim 7, wherein the one or more hardware processors are further configured to generate health behavior trend for the at least one test instance based on the reconstruction error.
12. The system of claim 7, wherein the one or more hardware processors are further configured to estimate Remaining Useful Life (RUL) of the at least one test instance based on one of:
(i) obtaining health index (HI) based on the reconstruction error; training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL; (ii) generating reconstruction error curves pertaining to the at least one test instance and comparing with reconstruction error curves in a repository of reconstruction error curves of the at least one train instance to estimate the RUL; and
(iii) generating the HI curve of the at least one test instance based on the reconstruction error curves thereof; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL.
PCT/IB2017/051621 2016-06-17 2017-03-21 Health monitoring and prognostics of a system WO2017216647A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201621020886 2016-06-17
IN201621020886 2016-06-17

Publications (1)

Publication Number Publication Date
WO2017216647A1 true WO2017216647A1 (en) 2017-12-21

Family

ID=60663969

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2017/051621 WO2017216647A1 (en) 2016-06-17 2017-03-21 Health monitoring and prognostics of a system

Country Status (1)

Country Link
WO (1) WO2017216647A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109253826A (en) * 2018-08-01 2019-01-22 西安交通大学 A kind of calorimeter method for predicting residual useful life based on the fusion of more degeneration sample datas
EP3594859A1 (en) * 2018-07-09 2020-01-15 Tata Consultancy Services Limited Failed and censored instances based remaining useful life (rul) estimation of entities
JP2020009411A (en) * 2018-07-09 2020-01-16 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited Sparse neural network-based abnormality detection in multidimensional time series
US10699040B2 (en) * 2017-08-07 2020-06-30 The Boeing Company System and method for remaining useful life determination
US20220236728A1 (en) * 2021-01-22 2022-07-28 Tata Consultancy Services Limited System and method for performance and health monitoring to optimize operation of a pulverizer mill

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020046004A1 (en) * 2000-08-15 2002-04-18 Joseph Cusumano General method for tracking the evolution of hidden damage or other unwanted changes in machinery components and predicting remaining useful life
US20030055610A1 (en) * 2000-02-17 2003-03-20 Webber Christopher J St C Signal processing technique
US20070239407A1 (en) * 2006-01-12 2007-10-11 Goldfine Neil J Remaining life prediction for individual components from sparse data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030055610A1 (en) * 2000-02-17 2003-03-20 Webber Christopher J St C Signal processing technique
US20020046004A1 (en) * 2000-08-15 2002-04-18 Joseph Cusumano General method for tracking the evolution of hidden damage or other unwanted changes in machinery components and predicting remaining useful life
US20070239407A1 (en) * 2006-01-12 2007-10-11 Goldfine Neil J Remaining life prediction for individual components from sparse data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10699040B2 (en) * 2017-08-07 2020-06-30 The Boeing Company System and method for remaining useful life determination
EP3594859A1 (en) * 2018-07-09 2020-01-15 Tata Consultancy Services Limited Failed and censored instances based remaining useful life (rul) estimation of entities
JP2020009411A (en) * 2018-07-09 2020-01-16 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited Sparse neural network-based abnormality detection in multidimensional time series
AU2019201789B2 (en) * 2018-07-09 2020-06-25 Tata Consultancy Services Limited Failed and censored instances based remaining useful life (rul) estimation of entities
CN109253826A (en) * 2018-08-01 2019-01-22 西安交通大学 A kind of calorimeter method for predicting residual useful life based on the fusion of more degeneration sample datas
US20220236728A1 (en) * 2021-01-22 2022-07-28 Tata Consultancy Services Limited System and method for performance and health monitoring to optimize operation of a pulverizer mill

Similar Documents

Publication Publication Date Title
AU2018203321B2 (en) Anomaly detection system and method
WO2017216647A1 (en) Health monitoring and prognostics of a system
Malhotra et al. Multi-sensor prognostics using an unsupervised health index based on LSTM encoder-decoder
Cai et al. Remaining useful life re-prediction methodology based on Wiener process: Subsea Christmas tree system as a case study
Salazar et al. Data-based models for the prediction of dam behaviour: a review and some methodological considerations
CA3037326C (en) Sparse neural network based anomaly detection in multi-dimensional time series
Cartella et al. Hidden Semi‐Markov Models for Predictive Maintenance
CN112154418A (en) Anomaly detection
Fang et al. Scalable prognostic models for large-scale condition monitoring applications
de Pater et al. Developing health indicators and RUL prognostics for systems with few failure instances and varying operating conditions using a LSTM autoencoder
Ding et al. On-line error detection and mitigation for time-series data of cyber-physical systems using deep learning based methods
Huang et al. Reliable machine prognostic health management in the presence of missing data
Villez Qualitative path estimation: A fast and reliable algorithm for qualitative trend analysis
Basak et al. Spatio-temporal AI inference engine for estimating hard disk reliability
He et al. Reliability analysis of systems with discrete event data using association rules
Jin A sequential process monitoring approach using hidden Markov model for unobservable process drift
US11320813B2 (en) Industrial asset temporal anomaly detection with fault variable ranking
Guo et al. Nonparametric, real-time detection of process deteriorations in manufacturing with parsimonious smoothing
US7840391B2 (en) Model-diversity technique for improved proactive fault monitoring
Sperl et al. Two-step anomaly detection for time series data
Bisi et al. Prediction of software inter-failure times using artificial neural network and particle swarm optimisation models
Xingzhi et al. Failure threshold setting for Wiener-process-based remaining useful life estimation
Zhao et al. A deep learning-based remaining useful life prediction approach for engineering systems
Chammas et al. Prognosis based on handling drifts in dynamical environments: Application to a wind turbine benchmark
Smets et al. Discovering novelty in spatio/temporal data using one-class support vector machines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17812826

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17812826

Country of ref document: EP

Kind code of ref document: A1