WO2017216647A1

WO2017216647A1 - Health monitoring and prognostics of a system

Info

Publication number: WO2017216647A1
Application number: PCT/IB2017/051621
Authority: WO
Inventors: Pankaj Malhotra; Vishnu TV; Anusha RAMAKRISHNAN; Gaurangi ANAND; Lovekesh Vig; Puneet Agarwal; Gautam Shroff
Original assignee: Tata Consultancy Services Limited
Priority date: 2016-06-17
Filing date: 2017-03-21
Publication date: 2017-12-21

Abstract

The present disclosure relates to sequence to sequence mapper based systems and methods for health monitoring and prognostics of a system via a health index (HI). The sequence to sequence mapper learns to reconstruct normal time series behavior, and thereafter uses reconstruction error to estimate the HI. The HI is used for generating health behavior trend, detection of anomalous behavior, and remaining useful life (RUL) pertaining to a monitored system. The present disclosure does not rely on domain knowledge, as in the prior art, when estimating the health index. The HI of the monitored system can be determined irrespective of the predictability of the time series data generated from the monitored system. Likewise, the present disclosure is relevant to time series data of varying nature: predictable, unpredictable, periodic, aperiodic, and quasi-periodic time series; short time series and long time series; and univariate and multivariate time series.

Description

HEALTH MONITORING AND PROGNOSTICS OF A SYSTEM

CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY

[001] The present application claims priority from Indian Patent Application No. 201621020886, filed on June 17^th, 2016, the entirety of which is hereby incorporated by reference.

TECHNICAL FIELD

[002] The embodiments herein generally relate to health monitoring and prognostics of a system, and, more particularly, sequence to sequence mapper based systems and methods for health monitoring and prognostics of a system.

BACKGROUND

[003] Time-based maintenance of complex machines or systems leads to high maintenance costs and high downtime if machine breaks down before the scheduled maintenance date. Most models for automated health monitoring based on time series multi- sensor data received from devices such as engines, vehicles, aircrafts, and the like, rely on domain knowledge. Domain knowledge based systems may not be able to capture complex behavior of machines entirely and may require handcrafted rules and lack generalization capability to work across domains. Prediction models have been used to learn models of normal behavior and then prediction errors are used to measure the health of a machine at any given time. These models assume that time series data is predictable which may not hold true for real-world applications with manual controls and unmonitored environmental conditions or loads leading to inherently unpredictable time series data. Most models are also unable to capture complex non-linear dependencies between sensors and long term temporal correlations.

SUMMARY

[004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.

[005] Systems and methods of the present disclosure provide a health index that summarizes the health of a monitored system at any point in time. The health index may be used for detecting anomalous behavior in a system, health monitoring, and prognostics for condition based maintenance. The method of the present disclosure comprises learning an unsupervised model of normal or healthy behavior using multivariate time series data. The model for healthy behavior is then applied to unseen multivariate time series data to predict the health of the monitored system. The system of the present disclosure does not rely on domain knowledge to estimate the health of the monitored machine. A sequence to sequence mapper is employed to capture long term temporal correlations as well as complex non-linear dependencies between multiple sensors. The model does not assume that the time series data is predictable. The method works well for predictable as well as unpredictable time series data. For estimating Remaining Useful Life (RUL), the method of the present disclosure is able to achieve a performance which is comparable to models which rely on domain knowledge.

[006] In an aspect, there is provided a processor implemented method comprising: receiving, by one or more hardware processors, a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstructing from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generating, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.

[007] In another aspect, there is provided a system comprising: one or more data storage devices operatively coupled to the one or more processors and configured to store instructions configured for execution by the one or more processors to: receive a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstruct from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generate, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.

[008] In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system; reconstruct from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and generate, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series from the test set.

[009] In an embodiment of the present disclosure, the one or more hardware processors are further configured to learn a sequence mapper based model by: receiving a train set of time series data pertaining to one or more sensors co-operating with at least one train instance of the monitored system; identifying a healthy train set of time series data from the train set of time series data, the healthy train set pertaining to the healthy behavior of the at least one train instance; and training the sequence to sequence mapper based model to reconstruct the time series data in the healthy train set and generate the learnt sequence to sequence mapper based model.

[010] In an embodiment of the present disclosure, the train set and the test set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters.

[011] In an embodiment of the present disclosure, the one or more hardware processors are further configured to estimate a health state of the at least one test instance based on the degree of anomaly computed by: obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series from the test set; and comparing the anomaly score with a pre-defined threshold thereof to classify anomalous or normal subsequences in the test set.

[012] In an embodiment of the present disclosure, the one or more hardware processors are further configured to generate health behavior trend for the at least one test instance based on the reconstruction error.

[013] In an embodiment of the present disclosure, the one or more hardware processors are further configured to estimate Remaining Useful Life (RUL) of the at least one test instance based on one of: (i) obtaining health index (HI) based on the reconstruction error; training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curve(s) of the at least one train instance to estimate the RUL; (ii) generating reconstruction error curves pertaining to the at least one test instance and comparing with reconstruction error curves in a repository of reconstruction error curves of the at least one train instance to estimate the RUL; and (iii) generating the HI curve of the at least one test instance based on the reconstruction error curves thereof; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL. [014] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments of the present disclosure, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

[015] The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:

[016] FIG.l illustrates an exemplary block diagram of a system for health monitoring and prognostics of a monitored system, in accordance with an embodiment of the present disclosure;

[017] FIG.2 illustrates an exemplary flow diagram of a method for health monitoring and prognostics of a monitored system, in accordance with an embodiment of the present disclosure;

[018] FIG.3 illustrates Long Short Term Memory based Encoder Decoder (LSTM- ED) inference steps for input {zi, z₂, z₃} to predict z₂' , z'₃ }, as known in the art;

[019] FIG.4 illustrates an exemplary flow diagram for estimating Remaining Useful Life (RUL) using unsupervised HI based on LSTM-ED, in accordance with an embodiment of the present disclosure;

[020] FIG.5 illustrates an exemplary RUL estimation, in accordance with the present disclosure, using HI curve matching taken from a Turbofan engine dataset, wherein HI curve for a test instance is matched with HI curve for a train instance;

[021] FIG.6 illustrates a graphical illustration of reconstruction error versus fraction of total life passed, obtained from an LSTM-ED model, in accordance with an embodiment of the present disclosure;

[022] FIG.7A through FIG.7D illustrate histograms of prediction errors for Turbofan Engine dataset from LSTM-ED, LR-Exp, LR-EDi and LR-ED₂ models respectively, in accordance with an embodiment of the present disclosure;

[023] FIG.8A illustrates actual RUL as compared with RUL estimates given by LR- Exp, LR-EDi and LR-ED₂ models for Turbofan engine dataset, in accordance with an embodiment of the present disclosure;

[024] FIG.8B illustrates standard deviation, maximum-minimum and absolute error of the RULs considered for estimating the final RUL w.r.t HI at last cycle for Turbofan engine dataset, in accordance with an embodiment of the present disclosure;

[025] FIG.9A and FIG.9E illustrate reconstruction errors pertaining to material 1 and material 2 respectively and FIG.9B through FIG.9D and FIG.9F through FIG.9H illustrate histograms of prediction errors, pertaining to material 1 and material 2 respectively, w.r.t cycles passed, for an exemplary milling machine dataset, in accordance with an embodiment of the present disclosure;

[026] FIG.10A and FIG.10B illustrate RUL predictions at each cycle after interpolation for material -1 and material-2 respectively, for milling machine dataset, in accordance with an embodiment of the present disclosure;

[027] FIG.11 illustrates pointwise reconstruction errors for last 30 days before maintenance for pulverizer mill dataset, in accordance with an embodiment of the present disclosure; and

[028] FIG.12A1 through FIG.12E1 and FIG.12A2 through FIG.12E2 illustrate normal and anomalous sequences respectively pertaining to power demand, space shuttle valve, electrocardiogram (ECG) and engine datasets respectively, in accordance with an embodiment of the present disclosure.

[029] It should be appreciated by those skilled in the art that any block diagram herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computing device or processor, whether or not such computing device or processor is explicitly shown.

DETAILED DESCRIPTION

[030] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

[031] The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. [032] It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the preferred, systems and methods are now described.

[033] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms.

[034] Before setting forth the detailed explanation, it is noted that all of the discussion below, regardless of the particular implementation being described, is exemplary in nature, rather than limiting.

[035] Industrial internet has given rise to availability of sensor data from numerous machines belonging to various domains such as agriculture, energy, manufacturing, and the like. Sensor readings can indicate health of machines. This has led to increased business desire to perform maintenance of machines based on their condition rather than following the current industry practice of time-based maintenance. It is also noted that condition based maintenance can lead to significant financial savings. Condition based maintenance can be achieved by building models for prediction of remaining useful life (RUL) of machines, based on their sensor readings. Traditional approach for RUL prediction is based on an assumption that health degradation curves (drawn w.r.t. time) follow a specific shape such as exponential or linear. However, such assumptions do not hold well with real world data sets. Some important challenges in solving prognostics related problems are: i) health degradation curve may not necessarily follow a fixed shape, ii) time to reach same level of degradation by machines of same specifications is often different, iii) each instance has a slightly different initial health or wear, iv) sensor readings, if available, are noisy, v) sensor data till end-of-life is not easily available because in practice, periodic maintenance is performed. Apart from health index (HI) based approach, mathematical models of the underlying physical system, fault propagation models and conventional reliability models have also been used for RUL estimation. The present disclosure provides an unsupervised technique to obtain health index (HI) for a monitored system using multi-sensor time series data, which does not make any assumption on the shape of the degradation curve. A sequence to sequence mapper based model such as Long Short Term Memory based Encoder-Decoder (LSTM-ED) is used to learn a model of normal behavior of a monitored system, which is trained to reconstruct multivariate time series data corresponding to normal behavior. Reconstruction error at a point in a time series data is then used to compute HI at that point. The present disclosure shows that LSTM-ED based HI learnt in an unsupervised manner is able to capture degradation in a monitored system; the HI decreases as the system degrades. Also, LSTM-ED based HI can be used to learn a model for RUL estimation instead of relying on domain knowledge, or exponential/linear degradation assumption, while achieving comparable performance.

[036] The expression "time series data" used in the context of the present disclosure refers to either univariate or multivariate time series data pertaining to one or more sensors respectively.

[037] In the present disclosure, the expressions "normal" and "healthy" may be used interchangeably.

[038] In the present disclosure, the expressions "sequence" and time-series may be used interchangeably.

[039] Referring now to the drawings, and more particularly to FIGS. 1 through 12, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and method.

[040] FIG.l illustrates an exemplary block diagram of a system 100 for health monitoring and prognostics of a monitored system 200 in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.

[041] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.

[042] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.

[043] FIG.2 illustrates an exemplary flow diagram of a method 300 for health monitoring and prognostics of the monitored system 200 in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or memory 102 operatively coupled to the one or more processors 104 and is configured to store instructions configured for execution of steps of the method 200 by the one or more processors 104.

[044] In an embodiment of the present disclosure, at step 302, a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of the monitored system 200 is received.

[045] In an embodiment of the present disclosure, at step 304, one or more time series data, from the test set received at step 302 and pertaining to the at least one test instance is reconstructed by a sequence to sequence mapper based model. In accordance with the present disclosure, an instance of the sequence to sequence mapper based model can be a Long Short Term Memory based Encoder-Decoder (LSTM-ED).

[046] Explanation of the systems and methods of the present disclosure is provided further based on LSTM-ED, however, any sequence to sequence mapper based model maybe used in the context of the present disclosure. In an embodiment, LSTM units in an LSTM-ED model are recurrent units that use current input time series data zt, hidden state activations a_t- l, and memory cell activations Ct-i to compute hidden state activations a_t at time t. An LSTM unit uses a combination of a memory cell c and three types of gates: input gate i, forget gate/, and output gate o to decide if the input needs to be remembered (using input gate), when previous memory needs to be retained (using forget gate), and when memory content needs to be output (using output gate). The values for input gate i, forget gate/ output gate o, hidden state a, and cell activation c for n LSTM units at time t are computed using the current input Zt, the previous hidden state a_t-i, and memory cell value Ct-i as given by Equations 1-4 herein below. i_t = o( V_xz_t + W₂a_t_₁ + bi) — -> (1)

f_t = a(W₃z_t + W a_t__x + b_f) --> (2)

o_t = a(W₅z_t + W_ba_t__x + b₀) --> (3)

wherein Wi (i=l,2, ...,8) are weight matrices , bi, bf, b₀, b_g refer to bias vectors, p refers to an intermediate variable used to obtain c_t.

Here, σ(ζ) = _1+e-_z cmd tanh(z) = 2σ(2ζ)— 1. The operations σ and tanh are applied elementwise.

Also, z_t G RP, and all the other parameters h, ft, Ot, gt, at, Ct G R^N, p refers to number of input units (or number of sensors), n is the number of hidden LSTM units, and

a_t = o_ttanh(c_t) -- -> (6)

[047] Windows of length / are considered to obtain L - / + 1 subsequences for a train instance with L cycles, wherein L is the length of the sequence of the train instance. LSTM- ED is trained to reconstruct the normal (healthy) subsequences of length / from all the training instances. The LSTM encoder learns a fixed length vector representation of the input time series data and the LSTM decoder uses this representation to reconstruct the time series data using current hidden state and the value predicted at the previous time-step. Given a time series data Z = z_x z₂ ... z , a_t is the hidden state of encoder at time t for each t ^ { 1, 2,

(ΕΛ

I}, where aj. G Rⁿ, n is the number of LSTM units in the hidden layer of the encoder. The encoder and decoder are jointly trained to reconstruct the time series data in reverse order, i.e. the target time series data is [z_t Zi__x . . . z-J.

[048] In an embodiment the learnt sequence to sequence mapper based model is obtained by receiving a train set of time series data pertaining to one or more sensors cooperating with at least one train instance of the monitored system. A healthy train set of time series data from the train set of time series data that pertains to the healthy behavior of the at least one train instance is identified. The sequence to sequence mapper based model is then trained to reconstruct the healthy train set and generate the learnt sequence to sequence mapper based model. FIG.3 illustrates Long Short Term Memory based Encoder Decoder (LSTM-ED) inference steps for input {zi, z₂, z₃} with I = 3 to predict {z_x' , z₂' , z'₃ } , as known in the art. The value z_t at time instance t and the hidden state of the encoder at time t-l are used to obtain the hidden state aj. of the encoder at time t. The hidden state a_t of the encoder at the end of the input sequence is used as the initial state α of the decoder such that

A linear layer with weight matrix w of size n x p and bias vector b G on top of the decoder is used to compute z_t=w a_t +b. During training, the decoder uses z_t as input to obtain the state

to target z_t-i - During inference, the predicted value z'_t is input to the decoder to obtain and predict z'_t__x. In an embodiment of the present disclosure, at step 306, a reconstruction error is generated at each time instance of the one or more time series data from the test set. The reconstruction error e_tfor a point z_tis given by:

e_t = \ \z_t - z'_t\ \ --> {!)

wherein 11.11 refers to L²norm.

One of the goals of the present disclosure is to minimize cost function E=∑ _eSjv∑_t=i(e_t)², where s_N is the set of normal training subsequences of length I each. In accordance with the present disclosure, for training, only the subsequences which correspond to perfect health of an instance are considered. In an embodiment, the first few operational cycles can be assumed to correspond to healthy state for any instance.

[049] In an embodiment the test set and the train set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters. In accordance with one embodiment, the dimensionality reduction technique is Principle Component Analysis (PC A).

[050] The reconstruction error generated in step 306 may be used for various applications. In an embodiment, health state of the at least one test instance may be estimated. For estimating the health state, a degree of anomaly is computed by obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series data from the test set; and comparing the anomaly score with a pre-defined threshold τ thereof to classify anomalous or normal subsequences in the test set. In an embodiment, a healthy time series data is divided into four sets of time series data: SN, VNI , VN2, and tN, and the anomalous time series data into two sets VA and tA. The set of sequences SN is used to learn the LSTM encoder-decoder reconstruction model. The set VNI is used for early stopping based regularization while training the encoder-decoder model. The reconstruction error vector for ί is given bye_t = |z_t— z'_t | . The error vectors for the points in the sequences in set VNI are used to estimate parameters μ (mean vector) and∑ (covariance matrix) of a Normal distribution Ν(μ, ∑) using Maximum Likelihood Estimation. Then, for any point z_t, the anomaly score a_t = (et^(l) - )^τ∑^_1(ει- μ). In a supervised setting, if at > τ, a point in a sequence can be predicted to be "anomalous", otherwise "normal" or "healthy". When anomalous sequences are available during training, a threshold τ over the likelihood values can be learnt to maximize Fp = (1+ β²) x P x R / (β²Ρ +R), where β > 0, P is precision, R is recall, "anomalous" is the positive class and "normal" is the negative class. If a subsequence contains an anomalous pattern, the actual label for the entire subsequence is considered to be "anomalous". This is helpful in many real-world applications where the exact position of anomaly is not known. For example, for an engine dataset (refer paragraph 077), the only information available is that the machine was repaired on a particular date. The last few operational runs prior to repair are assumed to be anomalous and the first few operational runs after the repair are assumed to be normal. It is assumed that β < 1 since the fraction of actual anomalous points in a sequence labeled as anomalous may not be high, and hence lower recall is expected. The parameters τ and n are chosen with maximum Fp score on the validation sequences in VN2 and VA.

[051] In another embodiment, the reconstruction error generated in step 306 may be used to generate health behavior trend for the at least one test instance. If the reconstruction error shows an increasing trend, the health of the monitored system 200 may be deemed to be deteriorating.

[052] In yet another embodiment, the reconstruction error maybe used to estimate Remaining Useful Life (RUL) of the at least one test instance. In an embodiment, health index (HI) is obtained by training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance and comparing with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL. In accordance with the present disclosure, HI curve is the complete sequence of HI values over all time instances. For instance, if length of time- series is 10, then each of the 10 time instances would be associated with an HI value. The 10 HI values form a HI curve. Let _H(u) ₌ [_hM _h(u) _h(u)^ _{represent me HI} curve for instance u, where each point /i| £ R, is the total number of cycles. It is assumed that 0 < < 1, such that when u is in

(u)

perfect health hj. = 1, and when u performs below an acceptable level (e.g. instance is about to fail), = 0. The method of the present disclosure constructs a mapping f_e : →

such that a Linear Regression (LR) model may be expressed as:

where Θ E R^p, θ₀ E R, which computes HI from the derived sensor readings G R^pat time t for instance u. Given the target HI curves for the training instances, the parameters Θ and θ₀ are estimated using Ordinary Least Squares methods.

[053] The parameters Θ and θ₀ of the abovementioned Linear Regression (LR) model are usually estimated by assuming a mathematical form for the target H^^u with an exponential function being the most common and successfully employed the target HI curve which assumes the HI at time t for instance u as:

^ = ¹ - ^exp ( ^S )^ ^e W-^{L (1} - ^β, -¹^ ^{0 < β, < 1→ (9)}

The starting and ending β' fraction of cycles are assigned values of 1 and 0, respectively. Another possible assumption is: assume target HI values of 1 and 0 for data corresponding to healthy condition and failure conditions, respectively. Unlike the exponential HI curve which uses the entire time series of sensor readings, the sensor readings corresponding to only these points are used to learn the regression model. The estimates Θ and θ₀ based on the target HI curves for train instances are used to obtain the final HI curves for all the train instances and a new test instance for which RUL is to be estimated. The HI curves thus obtained are used to estimate the RUL for the test instance based on similarity of train and test HI curves. In another embodiment, reconstruction error curves pertaining to the at least one test instance are generated and compared with reconstruction error curves pertaining to the at least one train instance to estimate the RUL. A point z_t in a time series data Z is part of multiple overlapping subsequences, and is therefore predicted by multiple subsequences Z(j, I) corresponding to j=t- +l, t-l+2, t. Hence each point in the original time series data for a train instance is predicted as many times as the number of subsequences it is part of ( / times for each point except for points z_t with t < I or t > L-l which are predicted fewer number of times). An average of all the predictions for a point is taken to be final prediction for that point. The difference in actual and predicted values for a point is used as an un-normalized HI for that point.

Error e^is normalized to obtain the target HI as: t - -M (u) ^ ^{ί υ}) where and are the maximum and minimum values of reconstruction error for instance u over t = 1, 2, .. L^(u), respectively. The target HI values thus obtained for all train instances are used to obtain the estimates Θ and θ₀. Apart from

is also considered to obtain the target HI values such that large reconstruction errors imply much smaller HI value.

[054] In yet another embodiment, the HI curve of the at least one test instance is generated and compared with the HI curves in a repository of the HI curves of the at least one train instance to estimate the RUL. FIG.4 illustrates an exemplary flow diagram for estimating Remaining Useful Life (RUL) using unsupervised HI based on LSTM-ED, in accordance with an embodiment of the present disclosure. The HI curve (online HI curve) for a test instance u is compared to the HI curves (offline HI curves) of all the train instances u G U. The test instance and train instance may take different number of cycles to reach the same degradation level (HI value). FIG.5 illustrates an exemplary RUL estimation, in accordance with the present disclosure, using HI curve matching taken from a Turbofan engine dataset, wherein HI curve for a test instance is matched with HI curve for a train instance. The time-lag which corresponds to minimum Euclidean distance between the HI curves of the train and test instance is shown. For a given time-lag, the number of remaining cycles for the train instance after the last cycle of the test instance gives the RUL estimate for the test instance. Let u be a test instance and a be a train instance. In accordance with an embodiment of the present disclosure, the following scenarios for curve matching based RUL estimation are given due consideration:

[055] Varying initial health across instances: The initial health of an instance varies depending on various factors such as the inherent inconsistencies in the manufacturing process. The initial health is assumed to be close to 1. In order to ensure this, the HI values for an instance are divided by the average of its first few HI values (e.g. first 5% cycles). Also, while comparing the HI curves H^^u -*and H^^u a time-lag t is allowed such that the HI values of u* may be close to the HI values of (t, L^^u -*) at time t such that t < τ' (refer equations 11 - 13). This takes care of instance specific variances in degree of initial wear and degradation evolution.

[056] Multiple time lags with high similarity - The HI curve H^^u -* may have high similarity with (t, L^^u -*) for multiple values of time-lag t, wherein / " -* refers to length of the test instance. Multiple RUL estimates are considered for u based on total life of u, rather than considering only the RUL estimate corresponding to the time-lag t with minimum Euclidean distance between the curves H^{(u* }}and H^(u) (f, L^{(u* }}). The multiple RUL estimates corresponding to each time-lag are assigned weights proportional to the similarity of the curves to get the final RUL estimate (refer equation 13).

[057] Non-monotonic HI: Due to inherent noise in sensor readings, the HI curves obtained using LR are non-monotonic. To reduce the noise in the estimates of HI, moving average smoothening, as known in the art, is used.

[058] Maximum value of RUL estimate: When an instance is in very good health or has been operational for few cycles, estimating RUL is difficult. The maximum RUL estimate is limited for any test instance to R_max. Also, the maximum RUL estimate for the instance u* based on HI curve comparison with instance u is limited by L^{(u )}- L^{(u* }}. This implies that the maximum RUL estimate for any test instance u will be such that the total length R^(u ^ + L^^u ½ Lmax, where L_max is the maximum length for any training instance available. When very few cycles of a test instance are available, it becomes difficult to predict RUL beyond a certain point.

[059] Similarity between HI curves of test instance u* and train instance u with time-lag t as is defined as:

s u , u, t) = exp( )

A — > (11)

where,

is the squared Euclidean distance between H^(u*} (1, L^{(u* }}) and H^(u) (t, L^{(u* }}), and λ > 0, t

£ { 1, 2, , τ'}, t + L^^u ^<L^^U Here, λ controls the notion of similarity: a small value of λ would imply large difference in s even when d is not large. The RUL estimate for u* based on the HI curve for u and for time-lag t is given by

The estimate

for βθ*⁾ i_s given by

where the summation is over only those combinations of u and t which satisfy s(«*, u, t) > .Smax, where s_max = t)}, 0 < a < 1.

It is to be noted that the parameter a decides the number of RUL estimates

t) to be considered to get the final RUL estimate

t) considered for computing R^^u ^can be used as a measure of confidence in the prediction, which is useful in practical applications (for instance, refer paragraph 066). During the initial stages of an instance's usage, when it is in good health and a fault has still not appeared, estimating RUL is tough, as it is difficult to know beforehand how exactly the fault would evolve over time once it appears.

EXPERIMENTAL EVALUATION FOR ESTIMATING RUL

[060] The method of the present disclosure is evaluated on two publicly available datasets: C-MAPSS Turbofan Engine Dataset and Milling Machine Dataset, and a real world dataset from a pulverizer mill. For the first two datasets, the ground truth in terms of the RUL is known, and RUL estimation performance metrics is used to measure efficacy of the method (refer paragraph 060). The pulverizer mill undergoes repair on timely basis (around one year), and therefore ground truth in terms of actual RUL is not available. A comparison is drawn between health index (HI) and the cost of maintenance of the mills. For the first two datasets, different target HI curves for learning the LR model (refer paragraph 052) are used: LR-Lm and LR-Exp models assume linear and exponential form for the target HI curves, respectively. LR-EDi and LR-ED2 use normalized reconstruction error and normalized squared-reconstruction error as the target HI (refer paragraph 052), respectively. The target HI values for LR-Exp are obtained using Equation 9 with β' = 5% as suggested in the art.

[061] Performance metrics considered: The performance is measured in terms of Timeliness Score (S), Accuracy (A), Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPEi and MAPE2) as mentioned in Equations 14-19, respectively. For test instance u*, the error

_Δ(ι ₌ fi⁽u^* _ _R(u^*) _{between the} estimated RUL («^(u*}) and actual RUL (R^(u*}). The score S used to measure the performance of a model is given by:

more compared to early predictions. The lower the value of S, the better is the performance.

_MAPE₂ =≡_∑N ^ \ _L_ ____> _{( 19)}

A prediction is considered a false positive (FP) if Δ*-" -*) < -τ₁, and false negative (FN) if A^(ut)) > T₂ .

[062] C-MAPSS Turbofan Engine Dataset: The first dataset from the simulated turbofan engine data (NASA Ames Prognostics Data Repository) contains readings for 24 sensors (3 operation setting sensors, 21 dependent sensors) for 100 engines till a failure threshold is achieved, i.e., till end-of-life in train_ F DOOl.txt. Similar data is provided for 100 test engines in test_ F DOOl.txt where the time series data for engines are pruned some time prior to failure. The task is to predict RUL for these 100 engines. The actual RUL values are provided in RUL_ F DOOl.txt. There are a total of 20631 cycles for training engines, and 13096 cycles for test engines. Each engine has a different degree of initial wear. In the context of the present disclosure, the experiment uses τ₁ = 13, τ₂ = 10 as is known in the art.

[063] Model learning and parameter selection: 80 engines are randomly selected for training the LSTM-ED model and estimating parameters Θ and θ₀ of the LR model (refer Equation 8). The remaining 20 training instances are used as a validation set for selecting the parameters. The trajectories for these 20 engines are randomly truncated at five different locations such that five different cases are obtained from each instance. Minimum truncation is 20% of the total life and maximum truncation is 96%. For training LSTM-ED, only the first subsequence of length / for each of the selected 80 engines is used. The parameters number of principal components p, the number of LSTM units in the hidden layers of encoder and decoder n, window/subsequence length /, maximum allowed time-lag τ' , similarity threshold a (refer equation 13), maximum predicted RUL R_max, and parameter λ (refer equation 11) are estimated using grid search to minimize S on the validation set. The parameters obtained for the best model (LR-ED₂) are p = 3, n = 30, / = 20, τ' = 40, a = 0.87, Rmax = 125, and λ = 0.0005.

[064] Results and Observations: FIG.6 shows the average pointwise reconstruction error (refer Equation 7) given by the model LSTM-ED which uses the pointwise reconstruction error as an un-normalized measure of health (higher the reconstruction error, poorer the health) of all the 100 test engines w.r.t. percentage of life passed (for HI sequences with p = 3 as used for model LR-EDi and LR-ED₂). During initial stages of an engine's life, the average reconstruction error is small. As the number of cycles passed increases, the reconstruction error increases. This suggests that reconstruction error can be used as an indicator of health of a machine. FIG.7A through FIG.7D illustrate histograms of prediction errors for Turbofan Engine dataset from LSTM-ED (without using linear regression), LR- Exp, LR-EDi and LR-ED2 models respectively, in accordance with an embodiment of the present disclosure. FIG. 7(A) and Table 1 suggest that RUL estimates given by HI from LSTM-ED are fairly accurate.

Table 1 (bold text is indicative of best results)

On the other hand, the 1-sigma bars in FIG.6 also suggest that the reconstruction error at a given point in time (percentage of total life passed) varies significantly from engine to engine.

[065] Performance comparison: It was also seen that LR-ED2 performs significantly better compared to the other three models. LR-ED2 is better than the LR-Exp model which uses domain knowledge in the form of exponential degradation assumption. A comparison with RULCLIPPER (RC) which has the best performance in terms of timeliness S, accuracy A, MAE, and MSE, as known in the art, on the turbofan dataset considered. It may be noted that unlike RC, the method of the present disclosure learns the parameters of the model on a validation set rather than test set. RC relies on the exponential assumption to estimate a HI polygon and uses intersection of areas of polygons of train and test engines as a measure of similarity to estimate RUL (similar to Equation 13). The results show that LR-ED2 gives performance comparable to RC without relying on the domain-knowledge based exponential assumption.

[066] The single worst predicted test instance for LR-Exp, LR-EDi and LR-ED2 contributes 23%, 17%, and 23%, respectively, to the timeliness score S. For LR-Exp and LR- ED2 it is nearly l/4^th of the total score, and suggests that for other 99 test engines the timeliness score S is very good. [067] HI at last cycle and RUL estimation error: FIG.8A illustrates actual RUL as compared with RUL estimates given by LR-Exp, LR-EDi and LR-ED2 models in accordance with an embodiment of the present disclosure. For all the models, it is observed that as the actual RUL increases, the error in predicted values increases. Let R^_u denote the set of all the RUL estimates

⁾(see Equation 13). FIG.8B illustrates standard deviation, max-min and absolute error of the elements in

Rai_l ^w-^r-t HI at last cycle and the difference of the maximum and the minimum value of the elements in R^_u w.r.t. HI value at last cycle for Turbofan engine dataset, in accordance with an embodiment of the present disclosure. It suggests that when an instance is close to failure, i.e., HI at last cycle is low, RUL estimate is very accurate with low standard deviation of the elements in R^_u . On the other hand, when an instance is in good health, i.e., when HI at last cycle is close to 1, the error in RUL estimate is high and the elements in R^n all have high standard deviation.

[068] Milling machine dataset: This data set presents milling tool wear measurements from a lab experiment. Flank wear is measured for 16 cases with each case having varying number of runs of varying durations. The wear is measured after runs but not necessarily after every run. The data contains readings for 10 variables (3 operating condition variables, 6 dependent sensors, 1 variable measuring time elapsed until completion of that run). A snapshot sequence of 9000 points during a run for the 6 dependent sensors is provided. It is assumed that each run represents one cycle in the life of the tool. Two operating regimes corresponding to the two types of material being milled are considered, and a different model for each material type are learnt. There are a total of 167 runs across cases with 109 runs and 58 runs for material types 1 and 2, respectively. Case number 6 of material 2 has only one run, and hence not considered for experiments.

[069] Model learning and parameter selection: Since number of cases is small, leave one out method for model learning and parameters selection is used. For training the LSTM- ED model, the first run of each case is considered as normal with sequence length of 9000. An average of the reconstruction error for a run is used to get the target HI for that run/cycle. Of the 9000 values obtained for each run, mean and standard deviation are computed for each sensor and considered for further evaluation. The gap between two consecutive runs is reduced, via linear interpolation, to 1 second (if it is more); as a result HI curves for each case will have a cycle of one second. The tool wear is also interpolated in the same manner and the data for each case is truncated until the point when the tool wear crosses a value of 0.45 for the first time. The target HI from LSTM-ED for the LR model is also interpolated appropriately for learning the LR model.

[070] The parameters obtained for the best models (based on minimum MAPEi) for material- 1 are p = 1, λ = 0.025, a = 0.98, τ' = 15 for model PCAi, for material-2 are p = 2, λ = 0.005, a = 0.87, τ' = 13, and n = 45 for LR-EDi. The best results are obtained without setting any limit R_max. For both cases, / = 90 (after down sampling by 100) such that the time series data for first run is used for learning the LSTM-ED model.

[071] Results and observations: FIG.9A and FIG.9E illustrate reconstruction errors from LSTM-ED w.r.t. the fraction of life passed for an exemplary milling machine dataset, pertaining to material 1 and material 2 respectively and FIG.9B through FIG.9D and FIG.9F through 9H show the histograms of prediction errors pertaining to material 1 and material 2 respectively, in accordance with an embodiment of the present disclosure while FIG.10A and FIG.10B illustrate RUL predictions at each cycle after interpolation for material- 1 and material-2 respectively for an exemplary milling machine dataset, in accordance with an embodiment of the present disclosure. As shown in FIG.9A and 9E, the reconstruction error increases with amount of life passed, and hence is an appropriate indicator of health. FIG.9B through 9D, FIG9F through 9H, FIG.10A and FIG.10B show results based on almost every cycle of the data after interpolation. The performance metrics on the original data points in the data set are summarized in Table 2.

Table 2(bold text is indicative of best results)

It is observed that the first PCA component (PCAI, p = 1) gives better results than LR-Lin and LR-Exp models with p > 2, and hence results for PCAI are presented in Table 2. It is to be noted that for p = 1, all the four models LR-Lin, LR-Exp, LR-EDi, and LR-ED2 will give same results since all models will predict a different linearly scaled value of the first PCA component. PCA1 and LR-EDi are the best models for material- 1 and material-2, respectively. It is observed that the best models of the present disclosure perform well as depicted in histograms in FIG.9A through FIG.9H. For the last few cycles, when actual RUL is low, an error of even 1 in RUL estimation leads to MAPE1 of 100%. FIG.9B through 9D and FIG.9F through 9H, show the error distributions for different models for the two materials. As can be noted, most of the RUL prediction errors (around 70%) lie in the ranges [-4, 6] and [-3, 1] for material types 1 and 2, respectively. Also, FIG.10A and FIG.10B show predicted and actual RULs for different models for the two materials.

[072] Pulverizer Mill Dataset: This dataset consists of readings for 6 sensors (such as bearing vibration, feeder speed, etc.) for over three years of operation of a pulverizer mill. The data corresponds to sensor readings taken every half hour between four consecutive scheduled maintenances Mo, Mi, M₂, and M3, such that the operational period between any two maintenances is roughly one year. Each day's multivariate time series data with length / = 48 is considered to be one subsequence. Apart from these scheduled maintenances, maintenances are done in between whenever the mill develops any unexpected fault affecting its normal operation. The costs incurred for any of the scheduled maintenances and unexpected maintenances are available.

[073] The mill is assumed to be healthy for the first 10% of the days of a year between any two consecutive time -based maintenances Mi and Mi₊i, and use the corresponding subsequences for learning LSTM-ED models. This data is divided into training and validation sets. A different LSTM-ED model is learnt after each maintenance. The architecture with minimum average reconstruction error over a validation set is chosen as the best model. The best models learnt using data after Mo, Mi and M₂ are obtained for n = 40, 20, and 100, respectively. The LSTM-ED based reconstruction error for each day is z- normalized using the mean and standard deviation of the reconstruction errors over the sequences in validation set.

[074] FIG.11 illustrates pointwise reconstruction errors for last 30 days before maintenance for pulverizer mill dataset, in accordance with an embodiment of the present disclosure. From the results in Table 3 and FIG.11, it is observed that average reconstruction error E on the last day before Mi is the least, and so is the cost C{Mi) incurred during Mi.

Table 3 Maint.ID t_E P(E > t_E) tc P(C > tc 1 E (last C (MO

E> t_E) day)

Mi 1.50 0.25 7 0.61 2.4 92

M₂ 1.50 0.57 7 0.84 8.0 279

M₃ 1.50 0.43 7 0.75 16.2 209

For Mi and j, E as well as corresponding C(Mi) are higher compared to those of Mi. Further, it is observed that for the days when average value of reconstruction error E > ΐε, a large fraction (>0.61) of them have a high ongoing maintenance cost C > tc- The significant correlation between reconstruction error and cost incurred suggests that the LSTM-ED based reconstruction error is able to capture the health of the mill.

EXPERIMENTAL EVALUATION FOR ANOMALY DETECTION

[075] Four real-world datasets are considered: power demand, space shuttle valve, ECG, and engine as seen in Table 4 herein below, wherein N, N_n and N_a represent number of original sequences, normal subsequences and anomalous subsequences, respectively.

Table 4 - Nature of datasets

The first three datasets are taken from (Chen et al., 2015) whereas the engine dataset is a proprietary one encountered in a real life project. The engine dataset contains data for two different applications: Engine-P where the time series data is quasi-predictable, Engine-NP where the time series data is unpredictable. For the experimental evaluation, architectures where both the encoder and decoder have single hidden layer with n LSTM units each are considered. Mini-batch stochastic optimization based on Adam Optimizer (Kingma & Ba, 2014) is used for training the LSTM Encoder-Decoder.

[076] Datasets: Table 5 herein below shows the performance of the method of the present disclosure on all the datasets.

Table 5

FIG.12A1 through FIG.12E1 and FIG.12A2 through FIG.12E2 illustrate normal (N) and anomalous (A) sequences respectively pertaining to power demand, space shuttle valve, electrocardiogram (ECG) and engine datasets respectively. Each of the figures represents original sequence, reconstructed sequence and anomaly score as particularly referenced in FIG.12A1 and FIG.12A2 for ease of reference. Power demand dataset contains one univariate time series data with 35,040 readings for power demand recorded over a period of one year. The demand is normally high during the weekdays and low over the weekend. Within a day, the demand is high during working hours and low otherwise (refer FIG.12A1). A week when any of the first 5 days has low power demands (similar to the demand over the weekend) is considered anomalous (refer FIG.12A2 where first day has low power demand). The original time series were down sampled by 8 to obtain non-overlapping sequences with / = 84 such that each window corresponds to one week.

[077] Space shuttle dataset contains periodic sequences with 1000 points per cycle, and 15 such cycles. / = 1500 was chosen deliberately such that a subsequence covers more than one cycle (1.5 cycles per subsequence) and sliding windows with step size of 500 was considered. The original time series was down sampled by 3. The normal and anomalous sequences in FIG.12B 1 and FIG. 12B2 belong to TEK17 and TEK14 time series, respectively.

[078] Engine dataset contains readings for 12 sensors such as coolant temperature, torque, accelerator (control variable), etc. Two different applications of the engine were considered: Engine-P and Engine-NP. Engine-P has a discrete external control with two states: 'high' and 'low' . The resulting time series are predictable except at the time-instances when the control variable changes. On the other hand, the external control for Engine-NP can assume any value within a certain range and changes very frequently, and hence the resulting time series are unpredictable. / = 60 for Engine-P and / = 100 for Engine-NP are randomly chosen. The multivariate time series is reduced to univariate by considering only the first principal component after applying principal component analysis (Jolliffe, 2002). The first component captures 72% of the variance for Engine-P and 61% for Engine-NP.

[079] ECG dataset contains quasi-periodic time series (duration of a cycle varies from one instance to another). A subset of the first channel from qtdb/sell02 dataset where the time series contains one anomaly corresponding to a pre-ventricular contraction (refer FIG.12E2) is used for the experimental evaluation. Non-overlapping subsequences with / = 26 after were considered after down sampling the original signal by 8 (each subsequence corresponds to approximately 800ms). Since only one anomaly is present in the dataset, sets VN2 and VA are not created. The best model, i.e. c, is chosen based on the minimum reconstruction error on set VNI . τ = μ_ά + o_a is chosen, where μ_α and o_a are the mean and standard deviation of the anomaly scores of the points from VNI .

[080] Observations: The key observations from the experimental evaluation are as follows: 1) The positive likelihood ratio is significantly higher than 1.0 for all the datasets (refer Table 5). High positive likelihood ratio values suggest that the method of the present disclosure gives significantly higher anomaly scores for points in anomalous sequences compared to anomaly scores for points in normal sequences. 2) For periodic time series, the evaluation was performed with varying window lengths: window length same as the length of one cycle (power demand dataset) and window length greater than the length of one cycle (space shuttle dataset). A quasi-periodic time series (ECG) was also considered. The method of the present disclosure is able to detect anomalies in all these scenarios. 3) A time series prediction based anomaly detection model LSTM-AD (Malhotra et al., 2015) gives better results for the predictable datasets: Space Shuttle, Power and Engine-P (corresponding to Engine dataset in (Malhotra et al., 2015)) with Fo.i scores of 0.84, 0.90 and 0.89, respectively. On the other hand, the method of the present disclosure gives better results for Engine-NP where the sequences are not predictable. The best LSTM-AD model gives P, R, F0.05 and TPR/FPR (Ratio of True Positive Rate to False Positive Rate) of 0.03, 0.07, 0.03, 1.9, respectively (for a two hidden layer architecture with 30 LSTM units in each layer and prediction length of 1) owing to the fact that the time series is not predictable and hence a good prediction model could not be learnt, whereas the method of the present disclosure gives P, R, Fo.i score and TPR/FPR of 0.96, 0.18, 0.93 and 7.6, respectively.

[081] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments of the present disclosure. The scope of the subject matter embodiments defined here may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language.

[082] It is, however to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field- programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments of the present disclosure may be implemented on different hardware devices, e.g. using a plurality of CPUs.

[083] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules comprising the system of the present disclosure and described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer- usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The various modules described herein may be implemented as software and/or hardware modules and may be stored in any type of non-transitory computer readable medium or other storage device. Some non-limiting examples of non-transitory computer-readable media include CDs, DVDs, BLU-RAY, flash memory, and hard disk drives.

[084] Further, although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.

[085] The preceding description has been presented with reference to various embodiments. Persons having ordinary skill in the art and technology to which this application pertains will appreciate that alterations and changes in the described structures and methods of operation can be practiced without meaningfully departing from the principle, spirit and scope.

Claims

WE CLAIM:

1. A processor implemented method comprising:

receiving, by one or more hardware processors, a test set of time series data pertaining to one or more sensors co-operating with at least one test instance of a monitored system (302);

reconstructing from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model (304); and

generating, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series data from the test set (306).

2. The method of claim 1, wherein the learnt sequence to sequence mapper based model is obtained by:

receiving, by the one or more hardware processors, a train set of time series data pertaining to one or more sensors co-operating with at least one train instance of the monitored system;

identifying, by the one or more hardware processors, a healthy train set of time series data from the train set of time series data, the healthy train set pertaining to the healthy behavior of the at least one train instance; and

training, by the one or more hardware processors, the sequence to sequence mapper based model to reconstruct the healthy train set and generate the learnt sequence to sequence mapper based model.

3. The method of claim 2, wherein the train set and the test set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters.

4. The method of claim 1 further comprising estimating a health state of the at least one test instance based on the degree of anomaly computed by: obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series data from the test set; and comparing the anomaly score with a predefined threshold thereof to classify anomalous or normal subsequences in the test set.

5. The method of claim 1 further comprising generating health behavior trend for the at least one test instance based on the reconstruction error.

6. The method of claim 1 further comprising estimating Remaining Useful Life (RUL) of the at least one test instance based on one of:

(i) obtaining health index (HI) based on the reconstruction error; training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL;

(ii) generating reconstruction error curves pertaining to the at least one test instance and comparing with reconstruction error curves in a repository of reconstruction error curves of the at least one train instance to estimate the RUL; and

(iii) generating the HI curve of the at least one test instance based on the reconstruction error curves thereof; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL.

7. A system (100) comprising:

one or more data storage devices (102) operatively coupled to one or more hardware processors (104) and configured to store instructions configured for execution by the one or more hardware processors to:

receive a test set of time series data pertaining to one or more sensors cooperating with at least one test instance of a monitored system;

reconstruct from the test set, one or more time series data pertaining to the at least one test instance, wherein the one or more hardware processors is a learnt sequence to sequence mapper based model; and

generate, by the one or more hardware processors, a reconstruction error at each time instance of the one or more time series data from the test set.

8. The system of claim 7, wherein the one or more hardware processors are further configured to learn a sequence mapper based model by: receiving a train set of time series data pertaining to one or more sensors cooperating with at least one train instance of the monitored system;

identifying a healthy train set of time series data from the train set of time series data, the healthy train set pertaining to the healthy behavior of the at least one train instance; and

training the sequence to sequence mapper based model to reconstruct the healthy train set and generate the learnt sequence to sequence mapper based model.

9. The system of claim 7, wherein the train set and the test set are optimized sets of time series data obtained by performing a dimensionality reduction technique using identical transformation parameters.

10. The system of claim 7, wherein the one or more hardware processors are further configured to estimate a health state of the at least one test instance based on the degree of anomaly computed by: obtaining a normalized reconstruction error based on the generated reconstruction error; transforming the normalized reconstruction error to an anomaly score for each time instance of the one or more time series data from the test set; and comparing the anomaly score with a pre-defined threshold thereof to classify normal or anomalous subsequences in the test set.

11. The system of claim 7, wherein the one or more hardware processors are further configured to generate health behavior trend for the at least one test instance based on the reconstruction error.

12. The system of claim 7, wherein the one or more hardware processors are further configured to estimate Remaining Useful Life (RUL) of the at least one test instance based on one of:

(i) obtaining health index (HI) based on the reconstruction error; training a regression model to generate an HI curve with an HI value associated with each time instance of the at least one test instance; and comparing the HI curve of the at least one test instance with HI curves in a repository of HI curves of the at least one train instance to estimate the RUL; (ii) generating reconstruction error curves pertaining to the at least one test instance and comparing with reconstruction error curves in a repository of reconstruction error curves of the at least one train instance to estimate the RUL; and