WO2018141410A1 - Method, electronic module and computer program product for detecting time delayed relationship between a first and at least a second sensor data stream of measurements - Google Patents

Method, electronic module and computer program product for detecting time delayed relationship between a first and at least a second sensor data stream of measurements Download PDF

Info

Publication number
WO2018141410A1
WO2018141410A1 PCT/EP2017/052521 EP2017052521W WO2018141410A1 WO 2018141410 A1 WO2018141410 A1 WO 2018141410A1 EP 2017052521 W EP2017052521 W EP 2017052521W WO 2018141410 A1 WO2018141410 A1 WO 2018141410A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
stream
lag
layer
data structures
Prior art date
Application number
PCT/EP2017/052521
Other languages
French (fr)
Inventor
Jonathan Christoph BOIDOL
Andreas Hapfelmeier
Original Assignee
Siemens Aktiengesellschaft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Aktiengesellschaft filed Critical Siemens Aktiengesellschaft
Priority to PCT/EP2017/052521 priority Critical patent/WO2018141410A1/en
Publication of WO2018141410A1 publication Critical patent/WO2018141410A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Definitions

  • the application is related to a method, an electronic module and a computer program for detecting a time delayed, i.e. lagged, relationship between a first and at least one second data stream of measurements, which are recorded on different sensors of a technical device, in particular an energy generation device.
  • Sensors detecting technical parameters like voltage, current, pressure and the like are usually applied in a large-scale wireless sensor network (WSN) .
  • the sensor networks may be use for remote monitoring of technical devices and industrial plants. This progress has spurred the need for processes and applications that work on high dimensional streaming data.
  • Streaming data analysis is concerned with applications where the records are processed in unbounded streams of
  • the relationships may only appear for a short time period or they may be stable for months and years.
  • the relationships might appear with short or also possibly large delays.
  • a straight forward approach for finding delayed dependencies would be to save all data streams and calculate a measure of dependency for every possible delay between the streams.
  • an object of the present invention is to find the delay or lag between data streams that maximize the
  • the solution should allow determining the lag that optimizes the dependence fast and with limited memory.
  • Lagged dependency detection allows identifying interesting relationships of a system, which creates streams. Monitoring of the dependant sensor streams could, e. g., reveal changes in the system. In a predictive scenario the determined delay could be used to detect changes in the system before the effects appear in all streams since the delay relation is known .
  • correlation to indicate a broad class of relationships, including but not limited to linear relationships. We explicitly refer to linear relationships as such, e.g. as linear correlation.
  • the present invention refers to a method for detecting a time delayed relationship between a first and at least one second data stream of measurements, which are recorded on different sensors of a technical device, in particular in an energy generation device, wherein the method comprises the following steps:
  • a data stream is a sequence of sensors measurement data.
  • the sequence may be a sequence of digits or other values or digital or analog signals.
  • a plurality of data streams is processed in parallel.
  • at least two streams are compared and all streams may be compared pair- wise.
  • the data stream may be captured directly from the sensor or may be read from memory storage.
  • Each layer of data structure can be seen as a window encoding a number of data structures, which is moved over the data streams
  • a dependency value is determined for data structures of the first and second data stream that are shifted to each other by a lag, especially a lag of one or several time intervals, by which the single data structures are spaced in time.
  • each layer contains the same number of data structures with subsequent layers containing data structures captured further backwards in time.
  • adjacent data structures have a temporal distance of a specified fixed time interval, and the same specified time interval is applied in each layer of the same level of each stream, wherein the specified time interval corresponds to the lag considered for a level of layers, and
  • the dependency algorithm is an entropy based algorithm
  • two of the number of layers, the time interval of the first layer, or the maximum lag is configured in a configuration phase.
  • each pair of these settings specifies sufficiently the runtime of the method for providing a dependency value for each layer of the sequence of layers.
  • every layer of data structures is incrementally updated by new captured sensor data and
  • dependency values between determined dependency values are estimated by interpolation, especially using the cubic spline method.
  • An interpolation of a function which provides a dependency value for continuous lag values allows approximating the maximum of the dependency function and therefore provides a better estimate for the actual maximum dependency between the data streams occurring at the resulting lag.
  • a resolution of the dependency value for a dedicated lag is increased by applying the dependency algorithm on the data structures of the lower layer with a time interval smaller than the dedicated lag and data structures of the lower layers being shifted to each other by the dedicated lag.
  • dependency values are determined within time intervals of the lower layer, e.g. layer i, instead of a time interval of the former evaluated layer i+k.
  • the refined resolution is 2 1 -1 0 instead of 2 1+ k ⁇ l 0 .
  • High resolution here means that the number of dependency values determined is higher with respect to the number of dependency values determined for the same lag in time
  • a method further comprises a step of estimating data of the first or second sensor of a similar device by applying the lag to a captured stream of only second sensor data or first sensor data of the similar device.
  • the invention refers to an electronic module (100) for detecting a time delayed
  • An input interface (101) which is adapted for capturing the streams of sensor data from sensors (201, 201, 203,)
  • a first processing unit (102) which is adapted for creating (S2) incrementally over the captured sensor data for each stream a sequence of more than one layer of data structures
  • - A second processing unit (103) which is adapted for determining (S3) for each layer a dependency value for the second data stream being delayed by a specified lag with respect to the first data stream by applying a dependency algorithm on the data structures of the first and a second data stream of the same level of layer, wherein the first stream' s data structures are correlated with a second stream' s data structures shifted temporal by a lag, wherein the lag increases geometrically in consecutive layers
  • An output interface (104) which is adapted for evaluating a lag of a maximum dependency value.
  • the invention refers to a computer program product, tangible embodying a program of machine readable instructions executable by a digital processing apparatus to perform a method according to one of the preceding claims, if the program is executed on the digital processing apparatus.
  • the provided method and respective electronic module detects lags in data streams using a model-free, entropy based measure for dependence.
  • Fig.lA shows a dependency values determined at equidistant lags
  • Fig. IB shows dependency values determined at lags
  • Fig.2 shows a sequence of layers of data structures of one stream of sensor data stream as created in an embodiment of the invention
  • Fig.3 shows exemplarily three pairs of layers of data
  • Fig.4 shows the results of a lag determination of an
  • Fig.5 shows a flowchart of an embodiment of the inventive method
  • Fig.6 shows an embodiment of the inventive electrical
  • Data streams appear in diverse environments, e.g., data streams of sensor data measurements detected and forwarded by sensors in, e.g., automation plants, energy grids, motion tracking or in the analysis of network traffic.
  • sensors e.g., automation plants, energy grids, motion tracking or in the analysis of network traffic.
  • energy generating devices like turbines many different sensors monitor parameters like temperature, pressure, flow velocity, which exhibit relationships and show similar chronological sequence with a certain delay in time.
  • the expression lag is used as a synonym for a delay .
  • the determination of lagged dependencies between such sensor data streams can help to get a deep understanding of a system and to explore by that the reason for a specific shape of sensor data streams. By determining the exact lag in which the dependency of two or more streams is strongest can be used for predicting e.g. for scenarios in a technical device.
  • the problem of lagged dependency is the analysis of two or more evolving sequence of data for dependence and for the lag at which the dependence is strongest. Since the streams evolve, this also becomes a continuing monitoring task.
  • a method needs a general measure for dependency in time series, has to work efficiently in linear time and at least sub-linear in space and provide accurate results over a wide range of time delays.
  • the presented method finds the shifted dependency and the corresponding lag 1 on a pair or for all pairs of large numbers of data streams. In the following example a pair of streams X,Y are considered for simplicity, with obvious generalization to pairs between three and more streams.
  • a lag 1 is the relative shift of two time series of e.g. data streams at which the behavior of one is most predictable from the other and vice versa.
  • the difficulty in lag detection is the need to keep a large amount of historic data which can be shifted relative to each other. Also the computational cost to calculate the dependency for every shift is high.
  • the inventive method finds the cross-dependency for a pair of time series which are the data streams of measurements collected over the time t .
  • mutual information I is used as a measure of dependency between the two time series. Estimating mutual information from sample data is a difficult problem.
  • the task of cross-dependency monitoring is then to calculate D for all possible lags L and report the optimum lag with a maximum dependency of a pair our group of data streams. Since there can be periodic pattern in the data such as daily or seasonally dependencies, it is more adequate and practical to report the earliest local maximum above a specified
  • Probing means the selection of adjacent data
  • Fig.l visualizes the idea. Instead of naively calculating the dependency for every possible lag, as shown in Fig. 1A, we take a subset of every 2 1 lag up to a maximum lag m. The distance, or in other word the time interval t lr t 2 , t 3 , t 4 , .. between the actually calculated points increases
  • the probing i. e. the selected time interval between data structures, reduces computation time greatly to O(log(m)) compared to 0 (m) in the naive solution.
  • Fig.2 shows an example of a data stream of measurements 11 of which a sequence of n layers L1.0, .., Ll.n comprising eight data points or data structures each, e.g. see layer LI.2 comprises data x 2 i, X22 / ⁇ 2 3 / ⁇ ⁇ / ⁇ 28 ⁇
  • Adjacent data structures in one layer are spaced by a specified time interval t 2 which is fixed or constant in that layer LI.2.
  • t 2 which is fixed or constant in that layer LI.2.
  • From layer to layer the time interval between adjacent data structures is doubled so that the time interval of the sequence of layers provides a geometrical series.
  • the data structures used in one layer are further apart from each other and therefore the layer itself expands over the double length in time to the preceding layer .
  • the data structures of data stream 12 are formed by compressing at least two data structures of the previous data stream 11, which is comprises e.g. the
  • the error introduced by the smoothing in the sequence is small for streams with low frequencies. Given the original data X ⁇ - and the smoothed X t in the stream X and the Haar wavelet coefficients w-j_ of X, the faithfulness of the
  • Fig. 3 shows an example for concrete determination of the dependency value at geometrically spaced lags for each first and second stream, which shall be compared.
  • Fig.4 shows the determined dependency value over lag.
  • the solid line is determined by a naive solution evaluating all possible lags. It is compared to the dependency values determined by the inventive method determined at
  • the single steps of the method are shown in Fig.5 as a flowchart.
  • the first method step SI is to capture the streams of sensor data continuously over time.
  • a subsequent step S2 dependency values for the second data stream being delayed by a specified lag with respect to the first data stream are determined by applying dependency algorithm on the data structures of each pair of layers of the first and the second data stream of the same level, wherein first stream's data structures are correlated with the second stream' s data structures temporal shifted by a lag which increases
  • step S4 a dependency value for each layer of the sequence of layers are provided and a lag of a maximum dependency value is evaluated.
  • step S5 the data of the first and second sensor of a similar device can be estimated by applying the lag to a captured stream of only second sensor data or first sensor data of the similar device.
  • Fig.6 shows an electronic module 100 for detecting a time delayed relationship between a first and at least a second sensor 201, 202, 203 of a technical device 200.
  • the technical device can be e.g. a turbine of a power generation plant or one or several field devices in an automation network or sensors capturing traffic flow or any other measured
  • the electronic module 100 comprises an input interface 101, a first processing unit 102, a second processing unit 103 as well as an output interface 104, which all are interconnected by e.g. a data bus.
  • the input interface 101 is adapted for capturing the streams of sensor data from the sensors 201, 202, 203 continuously over time.
  • a first processing unit 102 With a first processing unit 102 the layers of data structures are created and for all layers but the first data structures are composed by
  • the second processing unit 103 the dependency values are determined as described in step 3 of Fig.4.
  • the output interface 104 is adapted to provide a dependency value for each layer of the sequence of layers respectively and evaluate a lag of a maximum dependency value and therefore detecting the time delayed relationship between the first and one second data stream.
  • the output interface 104 is connected to a output device 300, which is adapted to estimate data of the first and second sensor of a similar device by applying e.g. the lag of a captured stream to only second sensor data or first sensor data of a similar device.
  • the output device 300 can also be applied to generate signals depending on the maximum dependency value and use it as input to the device 200 for control measures of device 200.
  • a processor can be an integrated circuit, a digital signal processor, a virtual processor, which can also include a memory unit e.g. as random access memory or any other form of memory.
  • the claimed computer program product can be e.g. computer program means which is e.g. a storage card, a USB-stick, a CD-ROM, a DVD or also a retrievable file, which can be provided by servers in a network or which is delivered or provided by another means .
  • the invention is not limited to the described examples.
  • the invention also comprises all combinations of any of the described or depicted features.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

A Method, electronic module and computer program product for detecting a time delayed relationship between a first and at least one second data stream of measurements, which are recorded on different sensors of a technical device, in particular in an energy generation device, wherein the method comprises the following steps: - Capturing (S1) the first and second streams of sensor data continuously over time, - Creating (S2) incrementally over the captured sensor data for each stream a sequence of more than one layer of data structures, - Determining (S3) for each layer a dependency value for the second data stream being delayed by a specified lag with respect to the first data stream by applying a dependency algorithm on the data structures of the first and a second data stream of the same level of layer, wherein a first stream' s data structures are correlated with a second stream' s data structures temporal shifted by a lag, wherein the lag increases geometrically in consecutive layers, and - Evaluating (S4) a lag of a maximum dependency value. Lagged dependency detection allows identifying interesting relationships of a system, which creates streams. Monitoring of the dependant sensor streams could, e. g., reveal changes in the system.

Description

Description
Method, electronic module and computer program product for detecting time delayed relationship between a first and at least a second sensor data stream of measurements
The application is related to a method, an electronic module and a computer program for detecting a time delayed, i.e. lagged, relationship between a first and at least one second data stream of measurements, which are recorded on different sensors of a technical device, in particular an energy generation device.
Sensors detecting technical parameters like voltage, current, pressure and the like are usually applied in a large-scale wireless sensor network (WSN) . The sensor networks may be use for remote monitoring of technical devices and industrial plants. This progress has spurred the need for processes and applications that work on high dimensional streaming data. Streaming data analysis is concerned with applications where the records are processed in unbounded streams of
information. The nature and volume of this type of data makes traditional batch learning exceedingly difficult, and fit naturally to algorithms that work in one pass over the data, i.e. in an online-fashion. To achieve this transmission from batch to online algorithms, window-based and incremental algorithms are popular, often favoring heuristics over exact results . In many real-world applications streams of data from
different sensors from the same monitored system exhibit relationships, which we can try to detect and use to better understand the system. This gets vastly more complicated when the relationships between data streams exhibit a delay, i.e. changes in one stream effect another stream sometime in the future. Imagine e.g. relationships in weather data, where temperature changes have delayed effects on humidity or precipitation. In an energy generation device, e.g. a turbine, changes in vibration e.g. have an effect on the measured temperature or measured flow velocity of a fluid which is delayed by the same or different lags related to the occurrence of the vibration changes.
The relationships may only appear for a short time period or they may be stable for months and years. The relationships might appear with short or also possibly large delays. A straight forward approach for finding delayed dependencies would be to save all data streams and calculate a measure of dependency for every possible delay between the streams.
The article "Time delay estimation via minimum entropy" of Benesty, Jacob, Yiteng Huang, and Jingdong Chen, published in IEEE Signal Processing Letters 14.3 (2007): 157-160,
describes finding the optimal lag corresponding to a minimum entropy. This approach has been tried a specific application e.g. for finding a time delay in signals from several
microphones with different distances to the signal source. This, however, makes use of a specific model of speech signals, e.g Laplace-distribution. For more general
application a model-free measure of dependence is required to find the lag that optimizes the relationship.
In state of the art, several approaches are known to detect pair-wise dependencys or dependencies between pairs or groups of data streams. The best known indicator for pair-wise correlation is the Pearson's correlation coefficient
essentially the normalized covariance between two random variables. Direct computation of the Pearson's correlation coefficient, however, is prohibitively expensive and, more problematic it is only a suitable indicator for linear or linear transformed relationships. For Pearson's correlation coefficient, a Braid algorithm has been developed which allows a fast approximation of a cross-correlation, which is a function of lag versus correlation. It is described in "Braid: Stream mining through group lag correlations", by Sakurai, Yasushi, Spiros Papadimitriou, and Christos
Faloutsos, published in Proceedings of the 2005 ACM SIGMOD international conference on management of data, 2005.
Since data streams are unbounded and represent unlimited data, Braid would require infinite memory to process them. Also the correlation-coefficient is far from capable of detecting more complex relationships of data streams e.g. non-linear functions. Therefore an object of the present invention is to find the delay or lag between data streams that maximize the
dependence between them, to find the corresponding strength of this dependence and to monitor it continuously. The solution should allow determining the lag that optimizes the dependence fast and with limited memory.
Lagged dependency detection allows identifying interesting relationships of a system, which creates streams. Monitoring of the dependant sensor streams could, e. g., reveal changes in the system. In a predictive scenario the determined delay could be used to detect changes in the system before the effects appear in all streams since the delay relation is known . In this application we understand the term correlation to indicate a broad class of relationships, including but not limited to linear relationships. We explicitly refer to linear relationships as such, e.g. as linear correlation. According to a first aspect the present invention refers to a method for detecting a time delayed relationship between a first and at least one second data stream of measurements, which are recorded on different sensors of a technical device, in particular in an energy generation device, wherein the method comprises the following steps:
- Capturing the first and second streams of sensor data continuously over time, - Creating incrementally over the captured sensor data for each stream a sequence of more than one layer of data
structures ,
- Determining for each layer a dependency value for the second data stream being delayed by a specified lag with respect to the first data stream by applying a dependency algorithm on the data structures of the first and a second data stream of a same level of layer, wherein the first stream' s data structures are correlated with the second stream's data structures shifted temporal by a lag, wherein the lag increases geometrically in consecutive layers, and
- Evaluating a lag of a maximum dependency value.
This limits the number of determination steps which have to be performed in a processing unit, e.g. micro processor, also for long lags. In turn an online processing with continuously captured new sensor data is possible.
A data stream is a sequence of sensors measurement data. The sequence may be a sequence of digits or other values or digital or analog signals. Usually, a plurality of data streams is processed in parallel. In particular, at least two streams are compared and all streams may be compared pair- wise. The data stream may be captured directly from the sensor or may be read from memory storage. Each layer of data structure can be seen as a window encoding a number of data structures, which is moved over the data streams
simultaneously for each pair of sensors. Within a layer a dependency value is determined for data structures of the first and second data stream that are shifted to each other by a lag, especially a lag of one or several time intervals, by which the single data structures are spaced in time.
In a preferred embodiment each layer contains the same number of data structures with subsequent layers containing data structures captured further backwards in time.
This limits the storage capacity for historic data. In a specific preferred embodiment for each layer, adjacent data structures have a temporal distance of a specified fixed time interval, and the same specified time interval is applied in each layer of the same level of each stream, wherein the specified time interval corresponds to the lag considered for a level of layers, and
determining for each layer a dependency value, wherein the first stream' s data structures are correlated with a second stream's data structures shifted temporal by one time
interval .
This means that, if the time interval between the data structures in the first layer has a value 1 the time interval between two adjacent data structures in the second layer increases by a factor of two and is therefore 2*1. In the k- th layer the time interval between data structures in is 2k-l * 1. Accordingly by determining a dependency value between a pair of layers of the same level shifted by at least one time interval means that the dependency algorithm is determined for a lag 1. Determining in the same manner the dependency value for all levels of layers, provides a
geometrically spaced sequence of dependency values. That way, errors will be small at small lags where accuracy is required most. Even for large lag the relative error is likely still small .
In a further preferred embodiment a data structure of a subsequent layer of a sequence of layers is formed by
compression of all data structures of a previous layer in the respective time interval of the considered layer.
This optimizes the amount of storage data to be required for determining a dependency at a large lag. Generating a data structure of a subsequently layer by compressing or smoothing over the data structures of the data structures lying in the time interval of the subsequent layer takes all these
contributing data structures into account and provides the far more information than just selecting one of the actual data structures in the previous layers to be used in the subsequent layer. In a preferred embodiment of the present invention, the dependency algorithm is an entropy based algorithm,
especially using a mutual information based algorithm.
Mutual information, especially a Kraskov-estimator, as a measure of dependency produces good results and is applicable to a variety of data from different fields. It is a model- independent measure and not limited to certain types of relationships in the data or data drawn from a known
distribution. Therefore a model independent analysis of the data streams can be provided.
In a further embodiment of the present invention two of the number of layers, the time interval of the first layer, or the maximum lag is configured in a configuration phase.
Each pair of these settings specifies sufficiently the runtime of the method for providing a dependency value for each layer of the sequence of layers. In a preferred embodiment every layer of data structures is incrementally updated by new captured sensor data and
compressed data structures respectively.
This ensures the provision of dependency values of the continuously captured sensor data.
In a further embodiment dependency values between determined dependency values are estimated by interpolation, especially using the cubic spline method.
An interpolation of a function which provides a dependency value for continuous lag values allows approximating the maximum of the dependency function and therefore provides a better estimate for the actual maximum dependency between the data streams occurring at the resulting lag.
In a preferred embodiment a resolution of the dependency value for a dedicated lag is increased by applying the dependency algorithm on the data structures of the lower layer with a time interval smaller than the dedicated lag and data structures of the lower layers being shifted to each other by the dedicated lag.
In this case dependency values are determined within time intervals of the lower layer, e.g. layer i, instead of a time interval of the former evaluated layer i+k. In this case the refined resolution is 21 -10 instead of 21+k · l0.
High resolution here means that the number of dependency values determined is higher with respect to the number of dependency values determined for the same lag in time
calculated by a layer of higher level.
In a further embodiment a method further comprises a step of estimating data of the first or second sensor of a similar device by applying the lag to a captured stream of only second sensor data or first sensor data of the similar device.
According to another aspect the invention refers to an electronic module (100) for detecting a time delayed
relationship between a first and at least a second sensor data stream of measurements, which are recorded on a
technical device (200), in particular in an energy generation device, comprising:
- An input interface (101) which is adapted for capturing the streams of sensor data from sensors (201, 201, 203,)
continuously over time,
- A first processing unit (102) which is adapted for creating (S2) incrementally over the captured sensor data for each stream a sequence of more than one layer of data structures, - A second processing unit (103), which is adapted for determining (S3) for each layer a dependency value for the second data stream being delayed by a specified lag with respect to the first data stream by applying a dependency algorithm on the data structures of the first and a second data stream of the same level of layer, wherein the first stream' s data structures are correlated with a second stream' s data structures shifted temporal by a lag, wherein the lag increases geometrically in consecutive layers, and - An output interface (104), which is adapted for evaluating a lag of a maximum dependency value.
In another aspect the invention refers to a computer program product, tangible embodying a program of machine readable instructions executable by a digital processing apparatus to perform a method according to one of the preceding claims, if the program is executed on the digital processing apparatus.
The provided method and respective electronic module detects lags in data streams using a model-free, entropy based measure for dependence. These features make the method applicable for all kinds of streaming data or doubts specific assumptions on the data. The probing technique, e.g. the described creation of
sequence of layers of data structures drastically reduces computation time and thereby allows the constant monitoring of an environment that produces many data streams in
parallel. In fact the described computation of dependency values in layers with a described time interval between data structures reduces computation time heavily from a factor linear to the size of the maximum lag that should be detected in a naive solution to a logarithmic factor. The data
compression allows performing the probing and a memory- efficient way, keeping only data in memory that is actually needed for the current calculations. The method also makes use of the special design data structures to speed up
frequent evaluation. Typically data structures for this task do not support frequent insertion and deletion of data which happens constantly in the streaming data setting.
An embodiment of the invention is described hereafter. In the drawing
Fig.lA shows a dependency values determined at equidistant lags ; Fig. IB shows dependency values determined at lags
according an embodiment of the inventive method;
Fig.2 shows a sequence of layers of data structures of one stream of sensor data stream as created in an embodiment of the invention;
Fig.3 shows exemplarily three pairs of layers of data
structures of two different sensors data streams used for determining dependency values in an embodiment of the invention;
Fig.4 shows the results of a lag determination of an
exemplarily data stream of sunspot activity
determined by an embodiment of the inventive method compared to a naive approach;
Fig.5 shows a flowchart of an embodiment of the inventive method; and Fig.6 shows an embodiment of the inventive electrical
module as block diagram.
In the following description same functional objects are labeled with the same reference sign.
Data streams appear in diverse environments, e.g., data streams of sensor data measurements detected and forwarded by sensors in, e.g., automation plants, energy grids, motion tracking or in the analysis of network traffic. Especially in energy generating devices like turbines many different sensors monitor parameters like temperature, pressure, flow velocity, which exhibit relationships and show similar chronological sequence with a certain delay in time. In this description the expression lag is used as a synonym for a delay .
The determination of lagged dependencies between such sensor data streams can help to get a deep understanding of a system and to explore by that the reason for a specific shape of sensor data streams. By determining the exact lag in which the dependency of two or more streams is strongest can be used for predicting e.g. for scenarios in a technical device.
The problem of lagged dependency is the analysis of two or more evolving sequence of data for dependence and for the lag at which the dependence is strongest. Since the streams evolve, this also becomes a continuing monitoring task. To solve this problem, a method needs a general measure for dependency in time series, has to work efficiently in linear time and at least sub-linear in space and provide accurate results over a wide range of time delays. The presented method finds the shifted dependency and the corresponding lag 1 on a pair or for all pairs of large numbers of data streams. In the following example a pair of streams X,Y are considered for simplicity, with obvious generalization to pairs between three and more streams.
In general, a lag 1 is the relative shift of two time series of e.g. data streams at which the behavior of one is most predictable from the other and vice versa. The difficulty in lag detection is the need to keep a large amount of historic data which can be shifted relative to each other. Also the computational cost to calculate the dependency for every shift is high. Given a time series X = ( x i , x ) with the newest element x at time t, and a second time series Y, we can naively calculate a measure of dependency for all lags as a function of the shifted series f( (xlr xn) ; (yi, yn-i) ) · In signal processing this is known as cross-dependency where f is normalized covariance.
The inventive method finds the cross-dependency for a pair of time series which are the data streams of measurements collected over the time t . As a measure of dependency between the two time series, mutual information I is used. Estimating mutual information from sample data is a difficult problem.
However, it functions as model-independent measure and is not limited to certain types of relationship in the data or data drawn from known distribution. Mutual information is defined as m Y = Sx Sx /(*,ylog dxdy ( 1 ) where f(.x,y),f(.x),f(y) are the joint and marginal probability density functions, or it can be composed from entropy and joint entropy as
1(XY)=H(X)-H(X,Y) + H(Y) (2)
We define the cross-dependency between X and Y as
D(X, Y, 1) = I((xi x„); (yi yn-i)) (3)
The task of cross-dependency monitoring is then to calculate D for all possible lags L and report the optimum lag with a maximum dependency of a pair our group of data streams. Since there can be periodic pattern in the data such as daily or seasonally dependencies, it is more adequate and practical to report the earliest local maximum above a specified
threshold .
One main idea of the presented method is to reconstruct the cross-dependency function from a geometrically spaced probing. Probing means the selection of adjacent data
structures which are evaluated and taken into account by the described function of mutual information. Fig.l visualizes the idea. Instead of naively calculating the dependency for every possible lag, as shown in Fig. 1A, we take a subset of every 21 lag up to a maximum lag m. The distance, or in other word the time interval tlr t2, t3, t4, .. between the actually calculated points increases
geometrically with the lag.
That way, errors will be small at small lags where accuracy matters most. Even for a large lag the relative error is likely still small. We can fill in the lag values between the probed points, i.e. lags, with any interpolation method. Our method of choice is cubic splines to get a smooth and
efficient interpolation. The probing, i. e. the selected time interval between data structures, reduces computation time greatly to O(log(m)) compared to 0 (m) in the naive solution.
With respect to the accuracy of this method, the Nyquist-
Shannon samplings theorem states that a signal can be
perfectly reconstructed from a uniform sampling, if the sampling is spaced at most 2-fg Hertz apart and the signal contains no frequency higher then Fg . Even for non-uniform sampling, we can reconstruct the full cross-dependency from the sampling, if the average sampling rate is at least 2-fg.
For the proposed geometric sampling, this means that the cross-dependency D can be perfectly determined up to a lag 1, if 1 < 2/fp and the highest frequency in the signal D is at most f-Q. Intuitively, we capture D accurately, if the
dependency does not change too fast from one distance to the other . To get the dependency at all sampled points, we still would need to save historic data up to the largest lag value we want to compute. Instead of that we only save a compressed, smoothed version of the sequence in increasing compressed layers. Every layer stores the same number of data points over a window with the endpoint increasingly further
backwards in time, but spaced so that the number of points stay constant. The word data point or point is used here as a synonym for data structure.
Fig.2 shows an example of a data stream of measurements 11 of which a sequence of n layers L1.0, .., Ll.n comprising eight data points or data structures each, e.g. see layer LI.2 comprises data x2i, X22/ χ23/ · ·/ χ28 · Adjacent data structures in one layer are spaced by a specified time interval t2 which is fixed or constant in that layer LI.2. From layer to layer the time interval between adjacent data structures is doubled so that the time interval of the sequence of layers provides a geometrical series. As the number of data points is in each layer the same, the data structures used in one layer are further apart from each other and therefore the layer itself expands over the double length in time to the preceding layer .
To get the dependency at all sample points, we still would need to save historic data up to the larges lag value we want to compute. Instead, we only save a compressed smoothed version of the sequence 12, 13, 14 in increasingly compressed layers. In Fig.2 the data structures of data stream 12 are formed by compressing at least two data structures of the previous data stream 11, which is comprises e.g. the
uncompressed data structure sampled at a sensor. This holds for all succeeding streams of data structures 12, 13, 14 and further.
In the example depicted in Fig. 2, we refer to increasingly compressed layers as Ll.h where LI .0 stores the last eight of the original, uncompressed data points of a stream. Every higher layer averages 0=2^ consecutive data points. Figure 2 shows the smoothening for the single streams 12, 13, 14. To calculate a lag of we use the smoothed stream of Layer
Ll.h and calculate the lag 1^ = 1 corresponding to 1 = 2h.
Accordingly, we use \log2(r)] layers. This also has the desirable effect that large lags are calculated from larger slices of the data. While the effect of small lag might only be detectable in the most recent data, the effect of a dependency with a large delay could also be detectable over a large time frame. Maintaining the layers requires a constant number of
operations every steps or
∑log(m) \
=≤2 in total, so we can update them in amortized time 0(1) of 1.
The error introduced by the smoothing in the sequence is small for streams with low frequencies. Given the original data X†- and the smoothed Xt in the stream X and the Haar wavelet coefficients w-j_ of X, the faithfulness of the
smoothing depends on the highest frequencies.
Figure imgf000016_0001
In most real datasets, most of those coefficients are small and only a few contribute significantly to the dynamic of the data, so we can expect the error introduced by the smoothing to be small.
Fig. 3 shows an example for concrete determination of the dependency value at geometrically spaced lags for each first and second stream, which shall be compared. We apply
dependency algorithm on the data structures of each pair of layers L]_ , L2 for each level of layer Li.l, Li.2, Li .3. In each layer the adjacent data structures have a temporal distance of a specified fixed time interval, see time
interval t2 in Layer Li.2, which is the considered lag 12. Accordingly data structures of the two layers are shifted by one time interval t2 and therefore determination of the dependency value of all data structures in a pair of layers provides the dependency value for a lag 12 which is the same as time interval t2 between adjacent data structures in the respective layer LI.2. The method can be refined in the resolution between two adjacent determined lags by g lags per layer, if every layer is extended by 2g steps. On the basic layer, we calculate 2-g dependency values up to the specified lag 1 =2g. On every subsequent higher layer, we calculate g lags from
2h(g+l) to 2h+1g. At the expense of 2glog (m) additional memory, we achieve a finer sampling and consequently more accurate results.
Another improvement is possible, if we realize that not all the layers are actually necessary to calculate all lags.
Further on a resolution of the dependency values for a dedicated lag is increased by applying the dependency
algorithm under data structures of a lower layer with a time interval or lag smaller than the dedicated lag and the data structures of the lower layers of the first and second stream being shifted to each other by the dedicated lag.
Fig.4 shows the determined dependency value over lag. The solid line is determined by a naive solution evaluating all possible lags. It is compared to the dependency values determined by the inventive method determined at
geometrically spaced lags, see the dashed line, and to dependency values determined by the described high resolution approach, see spotted it line.
The single steps of the method are shown in Fig.5 as a flowchart. The first method step SI is to capture the streams of sensor data continuously over time. And a subsequent step S2 dependency values for the second data stream being delayed by a specified lag with respect to the first data stream are determined by applying dependency algorithm on the data structures of each pair of layers of the first and the second data stream of the same level, wherein first stream's data structures are correlated with the second stream' s data structures temporal shifted by a lag which increases
geometrically in consecutive layers. In step S4 a dependency value for each layer of the sequence of layers are provided and a lag of a maximum dependency value is evaluated. In a further step S5 the data of the first and second sensor of a similar device can be estimated by applying the lag to a captured stream of only second sensor data or first sensor data of the similar device.
Fig.6 shows an electronic module 100 for detecting a time delayed relationship between a first and at least a second sensor 201, 202, 203 of a technical device 200. The technical device can be e.g. a turbine of a power generation plant or one or several field devices in an automation network or sensors capturing traffic flow or any other measured
parameters .
The electronic module 100 comprises an input interface 101, a first processing unit 102, a second processing unit 103 as well as an output interface 104, which all are interconnected by e.g. a data bus. The input interface 101 is adapted for capturing the streams of sensor data from the sensors 201, 202, 203 continuously over time. With a first processing unit 102 the layers of data structures are created and for all layers but the first data structures are composed by
compressing more than one data structure of the previous layer . The second processing unit 103 the dependency values are determined as described in step 3 of Fig.4. The output interface 104 is adapted to provide a dependency value for each layer of the sequence of layers respectively and evaluate a lag of a maximum dependency value and therefore detecting the time delayed relationship between the first and one second data stream. The output interface 104 is connected to a output device 300, which is adapted to estimate data of the first and second sensor of a similar device by applying e.g. the lag of a captured stream to only second sensor data or first sensor data of a similar device.
The output device 300 can also be applied to generate signals depending on the maximum dependency value and use it as input to the device 200 for control measures of device 200. In the above description, a processor can be an integrated circuit, a digital signal processor, a virtual processor, which can also include a memory unit e.g. as random access memory or any other form of memory. The claimed computer program product can be e.g. computer program means which is e.g. a storage card, a USB-stick, a CD-ROM, a DVD or also a retrievable file, which can be provided by servers in a network or which is delivered or provided by another means .
The invention is not limited to the described examples. The invention also comprises all combinations of any of the described or depicted features.

Claims

Patent Claims
1. Method for detecting a time delayed relationship between a first and at least one second data stream of measurements, which are recorded on different sensors of a technical device, in particular in an energy generation device, wherein the method comprises the following steps:
- Capturing (SI) the first and second streams of sensor data continuously over time,
- Creating (S2) incrementally over the captured sensor data for each stream a sequence of more than one layer of data structures ,
- Determining (S3) for each layer a dependency value for the second data stream being delayed by a specified lag with respect to the first data stream by applying a dependency algorithm on the data structures of the first and a second data stream of the same level of layer, wherein a first stream' s data structures are correlated with a second
stream' s data structures shifted temporal by a lag, wherein the lag increases geometrically in consecutive layers, and
- Evaluating (S4) a lag of a maximum dependency value.
2. Method according to any of the preceding claims, wherein each layer contains the same number of data structures with subsequent layers containing data structures captured further backwards in time.
3. Method according to any of the preceding claims, wherein for each layer, adjacent data structures have a temporal distance of a specified fixed time interval, and the same specified time interval is applied in each layer of the same level of each stream, wherein the specified time interval corresponds to the lag considered for the level of layers, and
determining for each layer a dependency value, wherein a first stream' s data structures are correlated with a second stream' s data structures temporal shifted by one time
interval .
4. Method according to any of the preceding claims, wherein a data structure of a subsequent layer of the sequence of layers is formed by compression of all data structures of the previous layer in the respective time interval of the
considered layer.
5. Method according to any of the preceding claims, wherein the dependency algorithm is an entropy based algorithm, especially using a Mutual Information based algorithm.
6. Method according to any of the preceding claims, wherein two of: the number of layers, the time interval of the first layer or the maximum lag is configured in a configuration phase.
7. Method according to any of the preceding claims, wherein every layer of data structures is incrementally updated by new captured sensor data and compressed data structures respectively.
8. Method according to any of the preceding claims, wherein dependency values between determined dependency values are estimated by interpolation, especially using a cubic spline function.
9. Method according to any of the preceding claims, wherein a resolution of the dependency values for a dedicated lag is increased by applying the dependency algorithm on the data structures of a lower layer with a time interval smaller than the dedicated lag and the data structures of the lower layers of the first and second stream being shifted to each other by the dedicated lag.
10. Method according to any of the preceding claims, wherein the method further comprises:
Estimating data of the first or second sensor of a similar device by applying the lag to a captured stream of only second sensor data or first sensor data of the similar device .
11. Electronic module (100) for detecting a time delayed relationship between a first and at least a second sensor data stream of measurements, which are recorded on a
technical device (200), in particular in an energy generation device, comprising:
- An input interface (101) which is adapted for capturing the streams of sensor data from sensors (201, 201, 203,)
continuously over time,
- A first processing unit (102) which is adapted for creating (S2) incrementally over the captured sensor data for each stream a sequence of more than one layer of data structures, - A second processing unit (103), which is adapted for
Determining (S3) for each layer a dependency value for the second data stream being delayed by a specified lag with respect to the first data stream by applying a dependency algorithm on the data structures of the first and a second data stream of the same level of layer, wherein a first stream' s data structures are correlated with a second stream' s data structures shifted temporal by a lag, wherein the lag increases geometrically in consecutive layers, and
- An output interface (104), which is adapted for evaluating a lag of a maximum dependency value.
12. Electronic module according to claim 11, further
comprises :
- An output device (300), which is adapted to estimate data of the first or second sensor of a similar device by applying the lag to a captured stream to only second sensor data or first sensor data of the similar device.
13. A computer program product, tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method according to one of the preceding claims, if the program is executed on the digital processing apparatus.
PCT/EP2017/052521 2017-02-06 2017-02-06 Method, electronic module and computer program product for detecting time delayed relationship between a first and at least a second sensor data stream of measurements WO2018141410A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/052521 WO2018141410A1 (en) 2017-02-06 2017-02-06 Method, electronic module and computer program product for detecting time delayed relationship between a first and at least a second sensor data stream of measurements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/052521 WO2018141410A1 (en) 2017-02-06 2017-02-06 Method, electronic module and computer program product for detecting time delayed relationship between a first and at least a second sensor data stream of measurements

Publications (1)

Publication Number Publication Date
WO2018141410A1 true WO2018141410A1 (en) 2018-08-09

Family

ID=58057095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/052521 WO2018141410A1 (en) 2017-02-06 2017-02-06 Method, electronic module and computer program product for detecting time delayed relationship between a first and at least a second sensor data stream of measurements

Country Status (1)

Country Link
WO (1) WO2018141410A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007150484A (en) * 2005-11-25 2007-06-14 Nippon Telegr & Teleph Corp <Ntt> Data stream processing method, data stream processing program, storage medium, and data stream processing apparatus
US20080147486A1 (en) * 2006-12-18 2008-06-19 Lehigh University Prediction method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007150484A (en) * 2005-11-25 2007-06-14 Nippon Telegr & Teleph Corp <Ntt> Data stream processing method, data stream processing program, storage medium, and data stream processing apparatus
US20080147486A1 (en) * 2006-12-18 2008-06-19 Lehigh University Prediction method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BENESTY, JACOB; YITENG HUANG; JINGDONG CHEN: "Time delay estimation via minimum entropy", IEEE SIGNAL PROCESSING LETTERS, vol. 14.3, 2007, pages 157 - 160, XP011165520, DOI: doi:10.1109/LSP.2006.884038
FABIAN KELLER ET AL: "Estimating mutual information on data streams", PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM '15, 1 January 2015 (2015-01-01), New York, New York, USA, pages 1 - 12, XP055374906, ISBN: 978-1-4503-3709-0, DOI: 10.1145/2791347.2791348 *
SAKURAI, YASUSHI; SPIROS PAPADIMITRIOU; CHRISTOS FALOUTSOS: "Braid: Stream mining through group lag correlations", PROCEEDINGS OF THE 2005 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2005
YASUSHI SAKURAI ET AL: "Fast Discovery of Group Lag Correlations in Streams", ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (TKDD), ASSOCIATION FOR COMPUTING MACHINERY, INC, US, vol. 5, no. 1, 1 December 2010 (2010-12-01), pages 1 - 43, XP058320538, ISSN: 1556-4681, DOI: 10.1145/1870096.1870101 *

Similar Documents

Publication Publication Date Title
Ragwitz et al. Markov models from data by simple nonlinear time series predictors in delay embedding spaces
US7676458B2 (en) System and method for historical diagnosis of sensor networks
McNames A nearest trajectory strategy for time series prediction
Sun et al. Distributed asynchronous fusion estimator for stochastic uncertain systems with multiple sensors of different fading measurement rates
CN103547899B (en) Vibration monitoring system
Fangfang et al. Distributed fusion estimation for multisensor multirate systems with stochastic observation multiplicative noises
Samparthi et al. Outlier detection of data in wireless sensor networks using kernel density estimation
Yahmed et al. ADAPTIVE SLIDING WINDOW ALGORITHM FOR WEATHER DATA SEGMENTATION.
CN114673246A (en) Anti-blocking measurement method and measurement system for sewage pipeline
Khondekar et al. Nonlinearity and chaos in 8 B solar neutrino flux signals from sudbury neutrino observatory
WO2018141410A1 (en) Method, electronic module and computer program product for detecting time delayed relationship between a first and at least a second sensor data stream of measurements
CN116881781A (en) Operation mode damping identification method, damage detection method, system and equipment
Carroll Detecting variation in chaotic attractors
CN112085926B (en) River water pollution early warning method and system
Pan et al. CIAM: An adaptive 2-in-1 missing data estimation algorithm in wireless sensor networks
Huang et al. Adaptive compressive data gathering for wireless sensor networks
Nguyen et al. Lstm based network traffic volume prediction
CN115031620B (en) Bridge monitoring method and device based on wireless low-power-consumption multichannel acquisition technology
Jayawardena et al. Chaos in hydrological time series
Larish et al. The collection channel in a wireless sensor network
CN109784661A (en) A kind of thermal process steady state detecting method for use and system
Gross et al. Method for improved iot prognostics and improved prognostic cyber security for enterprise computing systems
Mabrouk et al. Calibrated reservoir computers
Zhao et al. Equipment sub-system extraction and its application in predictive maintenance
Schreiber et al. Phase walk analysis of leptokurtic time series

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17705813

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17705813

Country of ref document: EP

Kind code of ref document: A1