WO2021087896A1

WO2021087896A1 - Data-driven graph of things for data center monitoring copyright notice

Info

Publication number: WO2021087896A1
Application number: PCT/CN2019/116370
Authority: WO
Inventors: Zhan Li; Hao Zhang; Zhixing Ren; Jialong WANG
Original assignee: Alibaba Group Holding Limited
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2021-05-14
Also published as: CN114365505A

Abstract

Disclosed herein are techniques for building and utilizing a sensor graph of things (GoT) to correlate sensors based on past events. In one embodiment, a method is disclosed comprising retrieving raw sensor data collected by a plurality of sensors; identifying a plurality of events based on the raw sensor data; building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events; and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.

Description

DATA-DRIVEN GRAPH OF THINGS FOR DATA CENTER MONITORING COPYRIGHT NOTICE

This application includes material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office files or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND

The disclosed embodiments relate to the management of sensors and, more particularly, to techniques for managing the sensor data of an Internet data center (IDC) .

Modern IDCs generally employed one or more monitoring systems for monitoring the output from various sensors. The data generated by these sensors comprises a significant amount of raw time series data. In current systems, human technicians or automation systems need to understand the information behind this data, especially when there is a real-time condition that requires intervention. For example, increasing readings of a temperature sensor could be caused by the failure of fans of a computer room air handler (CRAH) , or by the sudden jump of the load placed on a rack of servers.

Current systems fail to adequately surface useful information out of significant amounts of raw data generated by such sensors. Indeed, many such systems rely on human interpretation of events and trial-and-error techniques to address aberrant sensor readings.

BRIEF SUMMARY

The disclosed embodiments solve these and other problems in IDCs. The disclosed embodiments achieve safer and more efficient operations of such IDCs by employing a novel graph of things (GoT) that can model sensors and events.

In one embodiment, a method is disclosed comprising retrieving raw sensor data collected by a plurality of sensors; identifying a plurality of events based on the raw sensor data; building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events; and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.

In another embodiment, a non-transitory computer readable storage medium for tangibly storing computer program instructions capable of being executed by a computer processor, the computer program instructions defining the steps of: retrieving raw sensor data collected by a plurality of sensors; identifying a plurality of events based on the raw sensor data; building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events; and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.

In another embodiment, an apparatus is disclosed comprising: a processor; and a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic causing the processor to perform the operations of: retrieving raw sensor data collected by a plurality of sensors, identifying a plurality of events based on the raw sensor data, building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events, and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments as illustrated in the accompanying drawings, in which reference characters refer to the same parts throughout the various views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating principles of the disclosure.

FIG. 1 is a block diagram of a system for generating and analyzing a GoT according to some embodiments of the disclosure.

FIG. 2 is a flow diagram illustrating a method for generating a GoT according to some embodiments of the disclosure.

FIGS. 3A through 3F illustrate event graphs and methods of building the same according to some embodiments of the disclosure.

FIG. 4 illustrates a sensor graph of things according to some embodiments of the disclosure.

FIG. 5A is a flow diagram illustrating a method for determining if a spike or dip event has occurred according to some embodiments of the disclosure.

FIG. 5B is a flow diagram illustrating a method for determining if a means shift event has occurred according to some embodiments of the disclosure.

FIG. 5C is a flow diagram illustrating a method for determining if a positive or negative trend change event has occurred according to some embodiments of the disclosure.

FIG. 5D is a flow diagram illustrating a method for determining if a variance change event has occurred according to some embodiments of the disclosure.

FIG. 6 is a flow diagram illustrating a method for analyzing a GoT according to some embodiments of the disclosure.

FIG. 7 is a hardware diagram illustrating a device for generating or analyzing a GoT according to some embodiments of the disclosure.

DETAILED DESCRIPTION

In the illustrated embodiment, a system (100) includes a plurality of sensors (102a, 102b, 102c, 102n; collectively, 102) . Examples of sensors include such a device as temperature sensors, voltage sensors, current sensors, humidity sensors, air flow sensors, moisture or water sensors, smoke sensors, door sensors, video surveillance devices, power consumption/switching sensors, and other sensors measuring real-world attributes of a data center or other environment employing the system (100) . Alternatively, or in conjunction with the foregoing, the sensors may comprise software-based sensing components such as server load monitoring sensors, log file monitoring sensors, intrusion detection systems, load spike detection software and various other software used to monitor services provide by, for example, a data center.

The sensors (102) generate time series data. That is, each sensor generates a data value at a given time and generates such data over a period. Examples of time series data include a continuous temperature monitor (i.e., a periodic temperature reported by a temperature sensor) as well as the number of inbound connections to server at any given time (e.g., as reported by a server) . In general, any data that can be represented as a function of time can be considered time series data generated by the sensors (102) and the disclosed embodiments are not limited to a particular sensor or type or format of time series data.

In the disclosed embodiments, the sensors (102) are described primarily in the context of an IDC, however the sensors (102) can be installed in any type of environment. Additionally, no limit is placed on the number, type, or placement of the sensors (102) .

As illustrated, the sensors (102) are each connected to a network (118) . In one embodiment, each of the sensors (102) is physically coupled to the network (118) and the network (118) may comprise a controller area network (CAN) bus, Ethernet network, or other type of data communications medium. In other embodiments, the sensors (102) include a wireless transceiver and can communicate over a wireless medium in lieu of a physical bus. Examples of such communication networks include Wireless Fidelity (Wi-Fi) , cellular, or satellite networks. In some embodiments, the sensors (102) can be directly connected to other devices in the system (e.g., the monitoring system 116 or the pre-processor 104) and may not communicate over the network (118) . In some embodiments, the system (100) may employ multiple networks (and directly connected devices) . For instance, servers employing load monitoring can communicate over an Ethernet network while temperature sensors may communicate over a Wi-Fi network. The disclosed embodiments are not limited to any one type (or combination) of network technologies used.

In the illustrated embodiment, the data generated by the sensors (102) is referred to as raw data. The sensors transmit this data to the pre-processor (104) which forms the initial stage of an event extraction phase. This phase is designed to filter the raw data into actionable events and is described in more detail with respect to the various flow diagrams. As illustrated, the event extraction phase includes a pre-processor (104) , time-series data analysis processor (106) and event detection processor (108) . In some embodiments, each processor (104, 106, 108) comprises a dedicated hardware processing element. In other embodiments, the event extraction phase can be implemented on one or more hardware devices and the processors (104, 106, 108) can be implemented as software running on such devices.

In the illustrated embodiment, the pre-processor (104) receives the raw data from the sensors (102) over the network (118) . As described, this raw data includes a data value and a time value. In the illustrated embodiment, the pre-processor (104) cleans and smooths the received data. Details of this operation are provided in the description of FIG. 2 and, in particular, in the description of step 204.

After cleaning and smoothing the sensor data, the pre-processor (104) transmits the data to the time series data analysis processor (106) . The time series data analysis processor (106) decomposes the cleaned and smoothed time series data into trend, seasonal, and remainder components. Details of this operation are provided in the description of FIG. 2 and, in particular, in the description of step 206.

The time series data analysis processor (106) then transmits the decomposed components to the event detection processor (108) . The event detection processor (108) processes the individual components of a time series data point and identifies actionable events represented by such components. The output of event detection processor (108) comprises a vector including a value for each type of event, the time of the sensor data, and an identity of the sensor. The event detection processor (108) transmits this data to event storage (110) . Details of the operation of the event detection processor (108) are provided in the description of FIG. 2 and, in particular, in the description of step 208 as well as FIGS. 5A, 5B, and 5C.

In the illustrated embodiment, the event storage (110) comprises a storage device (physical or logical) that stores the events detected during the event extraction phase. In one embodiment, the event storage (110) comprises a relational database management system (RDBMS) or other type of database. In some embodiments, the event storage (110) can comprise a key-value data store or other less intensive data store (e.g., an object store) . In some embodiments, the event storage (110) is configured to store events in temporal order based on the time identified by the event detection processor (108) . In this manner, the event storage (110) operates similar to a queue.

In the illustrated embodiment, a separate graph phase is illustrated. During the graph phase, a graph is built and periodically updated based on the events stored in the event storage (110) . In the illustrated embodiment, a graph builder (112) accesses the event storage (110) to retrieve events. In some embodiments, the graph builder (112) actively queries the event storage (110) . In other embodiments, the graph builder (112) subscribes to the event storage (106) and periodically receives events as they are added to the event storage (110) .

The graph builder (112) additionally accesses a graph storage (114) . The graph storage (114) can comprise a graph database or may comprise an RDBMS. In general, the graph storage (114) stores a set of nodes, a set of edges, and weights for the edges. In the illustrated embodiment, the nodes comprise sensor identifiers, time points, and event types. The graph builder (112) updates the data stored in the graph storage (114) based on events received in the event storage (110) . In this manner, the graph builder (112) is responsible for updating a graph of events captured by the sensors (102) and processed during the event extraction phase. Details of the operation of the graph builder (112) and graph storage (114) are provided in the description of FIG. 2 and, in particular, in the description of step 212 as well as FIGS. 3A through 3F and 4.

As illustrated, the system (100) includes a monitoring system (116) . In the illustrated embodiment, the monitoring system (116) receives processed event data from the event detection processor (108) . Alternatively, or in conjunction with the foregoing, the monitoring system (116) may also receive raw data from the sensors (102) . The monitoring system (116) is additionally communicatively coupled to the graph storage (114) . In the illustrated embodiment, the monitoring system (116) may comprise a hardware device (or multiple devices) that is responsible for analyzing sensor data and providing actionable intelligence to operators of the device. In other embodiments, the monitoring system (116) may also automatically take action in response to events (e.g., alerting a fire department upon detecting a rise in temperature) .

To provide this insight or take such action, the monitoring system (116) queries the graph storage (114) to identify correlated sensors in response to an event and perform a root cause analysis. Details of the operation of the monitoring system (116) are provided in the description of FIG. 6.

In step 202, the method (200) receives sensor data. As described above, the method (200) may receive raw data generated by one or more sensors over a network or other communications medium.

In step 204, the method (200) cleans and smooths the data.

In some embodiments, the method (200) performs various cleaning operations on the received data such as filtering anomalies, removing aberrant sensors, smoothing short-term fluctuations in sensor data, and other operations. In some embodiments, cleaning may also include interpolating missing values and/or normalizing values. In some embodiments, the method (200) can utilize moving average smoothing to smooth and clean data. Examples of such moving average approaches include weight moving average (WMA) , exponentially weighted moving average (EWMA) , autoregressive integrated moving average (ARIMA) smoothing, among other techniques. Other techniques by used in addition to (or in lieu) of these techniques and the disclosed embodiments are not intended to be limited to a specific or single technique for cleaning and smoothing the data from sensors. Indeed, multiple techniques may be used simultaneously and the techniques may be unique to each sensor based on the type of time series data generated and observations regarding the fluctuations of the data generated by the sensors.

In step 206, the method (200) decomposes the cleaned and smoothed data into trend, season, and remainder data.

In the illustrated embodiment, time series data received in step 202 can be affected by three components: trend, seasonal, and remainder components. In brief, the seasonal component of a time series refers to patterns that repeat over a fixed interval. The trend component refers to an overall change in data over a longer period of time. Finally, the remainder refers to the residual data remaining in the time series when the trend and seasonal data are removed.

In one embodiment, a time series can be represented by a function Y (t) , wherein t represents the time elapsed. In this embodiment, Y (t) can be represented as a series of vectors Y (t) = [y (1) , y (2) , ...y (n) ] . In one embodiment, the value of Y (t) can thus be decomposed into an additive collection of trend (T) , seasonal (S) , and remainder (R) functions: Y (t) =T (t) +S (t) +R (t) , wherein t again represents the time elapsed. The choice of an additive or multiplicative model may be made based on the underlying sensor (s) .

In one embodiment, the method (200) receives the raw time series data and first detects the trend component. In one embodiment, the method (200) may use a centered moving average algorithm to calculate the trend T (t) of the time series Y (t) , however other algorithms (e.g., Fourier transform) may be used. The method (200) may then remove the trend T (t) from the time series Y (t) . The method (200) may then analyze the underlying de-trended time series (Y (t) -T (t) ) to determine the period of the resulting data. After identifying the period, the method (200) extracts the seasonal component S (t) which leaves the remainder or noise component R (t) .

In step 208, the method (200) extracts events from the decomposed data.

In one embodiment, the method (200) stores a list of pre-defined events. In general, an event comprises one or more conditions that utilize the underlying components (trend, seasonal, remainder) as inputs. That is, if one or more of the decomposed components satisfy a condition, an event is presumed to have occurred. The disclosed embodiments describe four examples of events: spikes/dips, mean shifts, trend changes and variance shift. However, the disclosed embodiments are not intended to be limited to only these events and other events may be constructed based on the underlying data or needs of the system implementing the method (200) .

Spikes/dips refer to rapid increases or decreases in the value of the remainder component of the time series. As described above, a trend refers to a gradual change in a time series while a season refers to a periodic (and regular) fluctuation. Thus, spikes or dips in a time series are generally attributable to the remainder component of the time series. The detection of spikes/dips is described more fully in the description of FIG. 5A.

A mean shift event refers to a change in the moving average of the time series data overall. In this embodiment, a sliding window is used to calculate the average value of the time series at a given time t and this value is used to determine if a mean shift has occurred. The detection of mean shifts is described more fully in the description of FIG. 5B.

A trend change event refers to a time when the trend component of a time series reverses, either from positive (upward) to negative (downward) or from negative (downward) to positive (upward) . The detection of trend change is described more fully in the description of FIG. 5C.

A variance shift event refers to a change in the variance of the underlying time series data. The detection of a variance shift is described more fully in the description of FIG. 5D.

In general, other features can be extracted from the time series, including but not limited to fluctuations in mean, variance, skewness, kurtosis, median, mode, quantile, etc. One can also include information measure such as entropy, or auto-regressive coefficients. The disclosed embodiments are not intended to be limited solely to the specific examples provided in FIGS. 5A through 5D.

In the illustrated embodiment, the method (200) generates an n-item tuple representing the detected events. Thus, using the three examples above, the method (200) will generate a three-item tuple representing whether the event is present in the time series. Thus, after executing step 208, the method (200) represents the events in a time series as y _j (t _i) = [e ₁ (t ₁) , e ₂ (t ₁) , e ₃ (t ₁) , e_4 (t ₁) ] _j, where e ₁, e ₂, e ₃, and e ₄ represent the presence spikes/dips, mean shift, trend change, or variance change event (respectively) ,

representing the results of the processing (as described in FIGS. 5A through 5D) , j represents a sensor, t _i represents a given time, and y _j represents a time series for a sensor j.

In the above manner, an entire time series can thus be defined as Y _j (t) = [y _j (t _i) , j=1...k, i=1...m] = [ [e ₁ (t _i) , e ₂ (t _i) , e ₃ (t _i) ] _j, j=1...k, i=1...m] , where k represents k sensors and t1...tm represents the observations from time t ₁ to t _m.

In step 210, the method (200) stores the extracted events.

As described above, the output of step 208 comprises a packet comprising a numeric value indicating the present/absence of each type of event, a time component, and a sensor identifier (referred to as an event data structure) . These components are stored within a database. In some embodiments, the event data structure can be stored in a RDBMS or other type of storage medium that allows for future access. As described above, in some embodiments, the event data structures can be stored in a queue type data structure. In other embodiments, the event data structures can be stored in a database that supports the pub/sub architectural pattern (e.g., REDIS) such that the stream of events can be subscribed to by a downstream processing device.

In step 212, the method (200) builds or updates a graph based on the extracted events.

In the illustrated embodiment, the method (200) may first build an event graph representing the events stored in the database. In some embodiments, this graph represents each event type, sensor, and time as a node in the graph. In the graph a given sensor node is connected to either event nodes or time nodes, whereas event nodes and time nodes are only connected to sensor nodes (i.e., are not connected to other time nodes or event nodes) . In some embodiments, the method (200) stores this graph in a graph database. In other embodiments, this graph may be converted into relational form and stored in an RDBMS. An example of building a graph from a set of events is provided in the description of FIGS. 3A through 3F, incorporated herein by reference. In some embodiments, the event graph can comprise a separately stored graph. In other embodiments, the event graph can comprise a temporary graph (e.g., stored only in memory) . In this embodiment, the event graph is used to guild the sensor graph (of things) described below and in the description of FIG. 4.

From the event graph, the method (200) further synthesizes a sensor graph (also referred to as a sensor graph of things or simply GoT) . In this graph, the nodes only comprise sensors and the edges between the nodes are weighted based on the connectivity of the event graph. In the illustrated embodiment, the sensor graph is built by analyzing the event graph to identify correlated sensors. In one embodiment, sensors are correlated if they experience an event at the same time point. Thus, as one example, if two sensors experience a spike or dip at the same time, the two sensors are correlated. In one embodiment, the weights of the edges of the sensor graph represents the number of time points in which two sensors experience an event simultaneously.

In the illustrated embodiment, the graphs in FIGS. 3A through 3F can be generated using a set of events generated using historical sensor data. Alternatively, or in conjunction with the foregoing, the graphs can be generated using real-time events extracted from a real-time stream of raw sensor data.

In FIG. 3A, a first event is analyzed comprising a Spike or Dip occurring at Time 1 for Sensor 1, the event type (Spikes and Dips) , sensor ID (1) and time point (1) are used as nodes in the graph (300a) and the sensor node is connected to an event type node and the time point node via undirected edges. As illustrated in FIGS. 3A through 3F, all edges are undirected and this point will not be repeated of the sake of brevity.

In FIG. 3B, also at Time 1, a second event is processed wherein Sensor 3 detects a spike or dip. In response, a node for Sensor 3 is added to the graph (300b) . The Time 1 node is re-used and the Sensor 3 node is connected to the Time 1 node via an edge. Additionally, the Sensor 3 node is connected to the existing Spikes and Dips node via an edge. In some embodiments, a new, duplicate Spikes and Dips node may be added; however, as illustrated, event nodes (like time nodes) may be re-used.

In FIG. 3C, a third event is received at Time 2. This event comprises a mean shift type event detected by Sensor 2. In response, a new component is added to the graph (300c) that comprises a new Sensor 2 node connected to a new Time 2 node and a new Mean Shift node. This, in FIG. 3C, the graph (300c) consists of two separate components.

In FIG. 3D, a fourth event is received occurring at Time 2. Since a Time 2 node has already been created, that node is re-used. However, a Sensor 4 node is added as well as a new Trend Change node corresponding to the trend change event detected. As will be discussed in more detail, the graph (300d) depicted in FIG. 3D begins to illustrated correlations between sensors.

In FIG. 3E, a fifth event is received at Time 3. As in FIG. 3D, a new Time 3 node is added to the graph (300e) . Further, the Sensor 1 node is connected to the new Time 3 node. However, since the Sensor 1 node is already connected to the Spikes and Dips node, this connection is re-used. As described above, in some embodiments, a new duplicate Spikes and Dips node can be added and connected to the Sensor 1 node. Alternatively, in some embodiments, a weight of the edge between the Sensor 1 node and the Spikes and Dips node can be increase by one.

In FIG. 3F, the sixth (and final) event is received at Time 3. Similar to the fifth event, the event in FIG. 3F represents a spike and dip detected on sensor 3. In this scenario, all nodes exist (Sensor 3, Time 3, Spikes and Dip) , thus no new nodes are created in the graph (300f) . However, as discussed above, the event type node may be duplicated again. Thus, in FIG. 3F a new edge between the Sensor 3 node and the Time 3 node is added and the edge between the Sensor 3 node and Spikes and Dips node is re-used (or in some embodiments, increased in weight) .

As will be described below the graph depicted in FIG. 3F can then be used to generate a sensor graph.

FIG. 4 illustrates a sensor graph of things according to some embodiments of the disclosure. In the illustrated embodiment, the sensor graph (400) is generated based on the event graph (300f) illustrated in FIG. 3F.

As illustrated in FIG. 4, the sensor graph (400) differs from the event graph (300f) in that all nodes of the sensor graph (400) comprise sensor nodes. Further edges between sensor nodes are undirected and only include a numerical weight. Thus, data regarding event types and times are removed when generating the sensor graph (400) .

In one embodiment, each sensor node in the event graph (300f) is included in the sensor graph (400) . Further, each node may be connected to every other node with a weight of zero. Alternatively, each node in the sensor graph (400) may initially be unconnected and only connected to other nodes as described herein.

In the illustrated embodiment, for a given sensor, a graph builder (as described in FIG. 1) analyzes the event graph (300f) to determine the number of paths between the given sensor and another sensor that has a distance of two and passes through a time node. To accomplish this, the graph builder can select a sensor node and identify the connected time nodes. The graph builder can then, for each time node, identify any sensor nodes connected to the time nodes. The resulting sensors nodes are then counted and the count is used as the weight between a given sensor and another sensor. Thus, in graph (300f) , Sensor 1 is connected to Sensor 3 via two time nodes (and vice versa) , thus an edge is added to sensor graph (400) between Sensor 1 and Sensor 3 and the weight is set to two. Further,

Sensors

2 and 4 are connected to each other by one time node (Time 2) . Thus, an edge is added to the sensor graph (400) between

Sensors

2 and 4 and the weight is set to one. As illustrated, all other edges are set to a zero weight or are omitted entirely.

In step 502a, the method (500a) receives a remainder component R (t) which comprises the remainder portion of a given time series calculated as described above.

In step 504a, the method (500a) compares the remainder component to a spike threshold (Th _spike) and a dip threshold (Th _dip) . In one embodiment, these thresholds comprise static values above and below a center point. For example, Th _spike can be set to a positive value (e.g., +5) while Th _dip may be set to a negative value (e.g., –5) . The specific values of these thresholds are not intended to be limiting. In one embodiment, Th _spike must be greater than zero and Th _dip must be less than zero.

In step 506a, the method (500a) determines that the value of R (t) exceeds the preconfigured spike threshold Th _spike and flags the time (t) as a spike. In one embodiment, flagging the time as a spike comprises setting an event output (e.g., e ₁) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5B through 5D, as described more fully in the description of step 208 of FIG. 2.

In step 510a, the method (500a) determines that the value of R (t) is below the preconfigured dip threshold Th _dip and flags the time (t) as a dip. In one embodiment, flagging the time as a dip comprises setting an event output (e.g., e ₁) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5B through 5D, as described more fully in the description of step 208 of FIG. 2.

In step 508a, if the value of R (t) is not greater than Th _spike and is not less than Th _dip, the method (500a) bypasses the spikes/dips event detection and sets the event output (e.g., e ₁) value to zero (0) . Thus, in the illustrated embodiment, the method (500a) utilizes the following rule to generate the output value for event indicated as e ₁:

In some embodiments, the method (500a) may utilize output values other than -1, 0, and 1. For example, the method (500a) may set the output value to represent the distance of the value of R (t) from the respective threshold.

In one embodiment, the two thresholds (Th _spike and Th _dip) are determined using the empirical quantiles of (t) such that event only happens with a small probability α (which can be tuned and may be set to be 1%) . This amounts to respecting the a priori knowledge that an interesting event is rare. In most scenarios, a time series is “normal” and does not contain much information. In some embodiments, the two thresholds (Th _spike and Th _dip) have a minimum absolute value of threshold (TH _abs) greater > 0, which can represent human expert priori knowledge. In this embodiment, TH _spike is set as follows:

TH _spike=max (TH _spike, TH _abs) , TH _dip=min (TH _dip, -TH _abs)

In step 502b, the method (500b) defines a sliding window for a time series data set. In some embodiments, the sliding window comprises a fixed duration in which the remaining steps analyze the time series data. The specific duration of this window is not limiting and may be set according to the underlying data or needs of the system.

In one embodiment, the method (500b) computes the mean of the trend component T (t) from two sliding windows, and takes the difference between them. In this embodiment, the left window consists of less recent data:

[T (t -L -R + 1) , T (t -L -R + 2) , ..., T (t -R) ] ,

and the right window contains most recent data:

[T (t -R + 1) , T (t -R + 2) , ..., T (t) ] ,

where L and R are the size of left and right time window respectively.

In this embodiment, values of μ _L (t) , μ _R (t) are calculated by computing the mean of T (t) in the left and right time window, respectively, and the difference Δμ (t) =μ _K (t) -μ _L (t) is used as the average sensor time series over the window (denoted as a (t) ) .

In step 504b, the method (500b) averages the values of the time series data points (i.e., values of Y (t) ) appearing within the window. In some embodiments, the average comprises the average value of Y (t) during the aforementioned window.

In step 506b, the method (500b) compares the computed average for a given window (a (t) ) to a mean increase threshold (Th _increase) and a mean decrease threshold (Th _decrease) . In one embodiment, these thresholds comprise static values above and below a center point. For example, Th _increase can be set to a positive value (e.g., +5) while Th _decrease may be set to a negative value (e.g., –5) . The specific values of these thresholds are not intended to be limiting. However, the value of Th _increase should be higher than the value of Th _decrease to avoid the scenario where Th _decrease＞a (t) ＞Th _increase.

In step 508b, the method (500b) determines that the value of a (t) exceeds the preconfigured mean increase threshold Th _increase and flags the time (t) as a mean increase. In one embodiment, flagging the time as a mean increase comprises setting an event output (e.g., e ₂) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5C, and 5D, as described more fully in the description of step 208 of FIG. 2.

In step 512b, the method (500b) determines that the value of a (t) is below the preconfigured mean decrease threshold Th _decrease and flags the time (t) as a decrease in the mean for a sliding window. In one embodiment, flagging the time as a mean decrease comprises setting an event output (e.g., e ₂) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5C, and 5D, as described more fully in the description of step 208 of FIG. 2.

In step 510b, if the value of a (t) is not greater than Th _increase and is not less than Th _decrease, the method (500b) bypasses the mean increase/decrease event detection and sets the event output (e.g., e ₂) value to zero (0) . Thus, in the illustrated embodiment, the method (500b) utilizes the following rule to generate the output value for event indicated as e ₂:

In some embodiments, the method (500b) may utilize output values other than -1, 0, and 1. For example, the method (500b) may set the output value to represent the distance of the value of a (t) from the respective threshold.

In step 502c, the method (500c) receives a trend component T (t) which comprises the trend portion of a given time series calculated as described above.

In step 504c, the method (500c) computes the difference over time between the current trend value (T (t) ) and the previous trend value (T (t-1) ) and then calculates the exponential moving average of ΔT (t) =T (t) -T (t-1) , which is denoted as D (t) .

In step 506c, the method (500c) compares the trend component to a positive trend threshold (Th _positive) and a negative trend threshold (Th _negative) . In one embodiment, these thresholds comprise static values above and below a center point. For example, Th _positive can be set to a positive value (e.g., +5) while Th _negative may be set to a negative value (e.g., –5) . The specific values of these thresholds are not intended to be limiting. However, the value of Th _positive should be higher than the value of Th _negative to avoid the scenario where Th _negative＞D (t) ＞Th _positive.

In step 508c, the method (500c) determines that the value of D (t) exceeds the preconfigured positive trend threshold Th _positive and flags the time (t) as a positive. In one embodiment, flagging the time as a positive comprises setting an event output (e.g., e ₃) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5B, and 5D, as described more fully in the description of step 208 of FIG. 2.

In step 512c, the method (500c) determines that the value of D (t) is below the preconfigured negative trend threshold Th _negative and flags the time (t) as a negative. In one embodiment, flagging the time as a negative trend event comprises setting an event output (e.g., e ₃) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5B, and 5D, as described more fully in the description of step 208 of FIG. 2.

In step 510c, if the value of D (t) is not greater than Th _positive and is not less than Th _negative, the method (500c) bypasses the positive/negative trend event detection and sets the event output (e.g., e ₃) value to zero (0) . Thus, in the illustrated embodiment, the method (500c) utilizes the following rule to generate the output value for event indicated as e ₃:

In some embodiments, the method (500c) may utilize output values other than -1, 0, and 1. For example, the method (500c) may set the output value to represent the distance of the value of D (t) from the respective threshold.

In step 502d, the method (500d) receives computes standard deviations σ _L (t) and σ _R (t) of the remainder component R (t) of left and right sliding windows, respectively of the time-series data. The left and right sliding windows may be computed as described previously, the disclosure of which is not repeated herein.

In step 504d, the method (500c) computes the standard deviation difference over time between the right and left standard deviation, which is denoted as Δσ (t) .

In step 506d, the method (500d) compares the standard deviation difference to a positive variance threshold (Th _positive) and a negative variance threshold (Th _negative) . In one embodiment, these thresholds comprise static values above and below a center point. For example, Th _positive can be set to a positive value (e.g., +5) while Th _negative may be set to a negative value (e.g., –5) . The specific values of these thresholds are not intended to be limiting. However, the value of Th _positive should be higher than the value of Th _negative to avoid the scenario where Th _negative＞Δσ (t) ＞Th _positive.

In step 508d, the method (500d) determines that the value of Δσ (t) exceeds the preconfigured positive variance threshold Th _positive and flags the time (t) as a positive. In one embodiment, flagging the time as a positive comprises setting an event output (e.g., e ₄) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A through 5C, as described more fully in the description of step 208 of FIG. 2.

In step 512d, the method (500d) determines that the value of Δσ (t) is below the preconfigured negative variance threshold Th _negative and flags the time (t) as a negative. In one embodiment, flagging the time as a negative variance event comprises setting an event output (e.g., e ₄) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A through 5C, as described more fully in the description of step 208 of FIG. 2.

In step 510d, if the value of Δσ (t) is not greater than Th _positive and is not less than Th _negative, the method (500d) bypasses the positive/negative variance event detection and sets the event output (e.g., e ₄) value to zero (0) . Thus, in the illustrated embodiment, the method (500d) utilizes the following rule to generate the output value for event indicated as e ₄:

In some embodiments, the method (500d) may utilize output values other than -1, 0, and 1. For example, the method (500d) may set the output value to represent the distance of the value of Δσ (t) from the respective threshold.

In step 602, the method (600) monitors sensor data. In some embodiments, the method (600) monitors raw sensor data output accessed over a communications bus as described in the description of FIG. 1.

In step 604, the method (600) determines if an event occurred.

In one embodiment, the method (600) determines if an event occurs by utilizing the processes described in FIGS. 2, 5A, 5B, and 5C. In some embodiments, these events may simultaneously be added to an event store (as described in FIG. 2) and processed by the method (600) . In the illustrated embodiment, an event detected in step 604 includes a sensor identifier (i) representing the sensor associated with the event. In some embodiments,

steps

602 and 604 may be optional. In this embodiment, the method (600) receives pre-processed events directly from the event extraction phase depicted in FIG. 1.

If the method (600) does not detect an event (e.g., spikes/dips, mean shift, trend change, or variance shift is not detected) , the method (600) continues to monitor the system for events. Alternatively, if the method (600) detects an event, the method (600) proceeds to step 608.

In step 606, the method (600) retrieves a sensor identifier (i) from the event detected in step 606. In some embodiments, the sensor identifier can comprise a globally unique identifier, Internet Protocol (IP) address, or other uniquely identifying information.

In the illustrated embodiment, the method (600) uses the sensor identifier to query the graph storage that stores the sensor graph. That is, the method (600) uses the sensor identifier as a query key to retrieve data regarding the identified sensor from the graph storage. In one embodiment, the method (600) requests a list of first-level connections of the node associated with the sensor identifier from the sensor graph.

In step 608, the method (600) determines if the query above returns an empty set; that is, if the node associated with the sensor identifier does not have any first-level connections. As illustrated in FIG. 4, a first-level connection refers to another node in the graph connected to the node associated with the sensor identifier having a distance of one.

In step 610, the method (600) flags the event for further analysis if the sensor does not have any first-level connections. In this scenario, the method (600) cannot identify any sensors correlated with the sensor identifier extracted in step 606. Thus, while an event occurred (as identified in step 604) , it cannot be correlated using the sensor graph and requires further intervention by, for example, an operator of a monitoring system.

In step 612, if the method (600) determines that one or more first-level connections exist, the method (600) retrieves the first-level connections from the sensor graph. In some embodiments,

steps

610 and 612 may be combined. In the illustrated embodiment, the first-level connections comprise a set of nodes associated with other sensors. Each of these nodes may include one or more events previously recorded for the respective sensor (and times of these events) . In some embodiments, further analysis or detail can be included for each event.

In step 614, the method (600) analyzes the identified sensor nodes to determine if the same type of event has been recorded for any of the sensors in the first-level connections list. In some embodiments, the method (600) limits the analysis in step 614 to concurrent events. In other embodiments, the method (600) may limit the analysis to events occurring within a predefined time from the current event detected in step 604. Alternatively, or in conjunction with the foregoing, the method (600) may analyze the shape of the trend of the time series data for each sensor to determine if the trend is similar to the trend for the sensor identified by the sensor identifier.

In step 616, the method (600) sets a set of correlated sensors to be equal to those sensors having a same or similar event or trend identified in step 614. In the illustrated embodiment, the method (600) stores identifiers associated with the correlated sensors and then flags the correlated sensors for further analysis (described previously in connection with step 610) .

In

steps

618, 620, and 622, the method (600) proceeds to analyze the second-level connections of the sensor identified in step 606. As used herein a second-level connection refers to a node in the sensor graph having a distance of two from the sensor identified in step 606.

Steps

618, 620, and 622 are similar to

steps

608, 612, and 614 and the details of these steps are not repeated herein.

In general, steps 608, 612, and 614 (as well as 618, 620, and 622) can be performed for any level connection of the sensor identified in step 606. That is, in addition to analyzing first-and second-level connections (as illustrated) , the method (600) can analyze nodes that have distances of three or more. In some embodiments, the number of levels to analyze may be preconfigured by the method (600) . In other embodiments, the method (600) may continue to search each level until at least one correlated sensor node is found.

In the illustrated embodiment, the method (600) halts after identifying at least one correlated sensor in a given level. In other embodiments, the method (600) may continue to analyze additional levels even when finding a correlated sensor at a given level. In this embodiment and other embodiments, the method (600) may rank the correlated sensors based on their distance from the sensor identified in step 606. For example, a correlated first-level sensor (step 614) may be ranked higher than a correlated second-level sensor (step 622) .

Device (700) may include many more or fewer components than those shown in FIG. 7. However, the components shown are sufficient to disclose an illustrative embodiment for implementing the present disclosure. Device (700) may represent, for example, devices discussed above in relation to FIG. 1.

As shown in FIG. 7, device (700) includes a processing unit (CPU) (702) in communication with a mass memory (730) via a bus (724) . Device (700) also includes one or more network interfaces (750) , an audio interface (752) , a display (754) , a keypad (756) , an illuminator (758) , an input/output interface (760) , and a camera (s) or other optical, thermal or electromagnetic sensors (762) . Device (700) can include one camera/sensor (762) , or a plurality of cameras/sensors (762) , as understood by those of skill in the art.

Device (700) may optionally communicate with a base station (not shown) , or directly with another computing device. Network interface (750) includes circuitry for coupling device (700) to one or more networks and is constructed for use with one or more communication protocols and technologies. Network interface (750) is sometimes known as a transceiver, transceiving device, or network interface card (NIC) .

Audio interface (752) is arranged to produce and receive audio signals such as the sound of a human voice. For example, audio interface (752) may be coupled to a speaker and microphone (not shown) to enable telecommunication with others and generate an audio acknowledgement for some action. Display (754) may be a liquid crystal display (LCD) , gas plasma, light emitting diode (LED) , or any other type of display used with a computing device. Display (754) may also include a touch sensitive screen arranged to receive input from an object such as a stylus or a digit from a human hand.

Keypad (756) may comprise any input device arranged to receive input from a user. For example, keypad (756) may include a push button numeric dial, or a keyboard. Keypad (756) may also include command buttons that are associated with selecting and sending images. Illuminator (758) may provide a status indication and provide light. Illuminator (758) may remain active for specific periods of time or in response to events. For example, when illuminator (758) is active, it may backlight the buttons on keypad (756) and stay on while the device is powered. Also, illuminator (758) may backlight these buttons in various patterns when particular actions are performed, such as dialing another device. Illuminator (758) may also cause light sources positioned within a transparent or translucent case of the device to illuminate in response to actions.

Device (700) also comprises input/output interface (760) for communicating with external devices. Input/output interface (760) can utilize one or more communication technologies, such as USB, infrared, Bluetooth ^TM, or the like.

Mass memory (730) includes a RAM (732) , a ROM (724) , and other storage means. Mass memory (730) illustrates another example of computer storage media for storage of information such as computer-readable instructions, data structures, program modules or other data. Mass memory (730) stores a basic input/output system ( “BIOS” ) (740) for controlling low-level operation of device (700) . The mass memory may also stores an operating system for controlling the operation of device (700) . It will be appreciated that this component may include a general purpose operating system such as a version of UNIX, or LINUX ^TM, or a specialized client communication operating system such as Windows Client ^TM, or the

operating system. The operating system may include, or interface with a Java virtual machine module that enables control of hardware components and operating system operations via Java application programs.

RAM (732) stores and executes one or more applications (734) . In some embodiments, these application comprise software configured to execute one or more of the operations described in connection with the foregoing figures. In some embodiments, the device (700) further includes persistent storage (e.g., hard disk, solid state drive, etc. ) storage for storing the applications (734) prior to executing in RAM (732) .

For the purposes of this disclosure a module is a software, hardware, or firmware (or combinations thereof) system, process or functionality, or component thereof, that performs or facilitates the processes, features, and functions described herein (with or without human interaction or augmentation) . A module can include sub-modules. Software components of a module may be stored on a computer-readable medium for execution by a processor. Modules may be integral to one or more servers, or be loaded and executed by one or more servers. One or more modules may be grouped into an engine or an application.

Those skilled in the art will recognize that the methods and systems of the present disclosure may be implemented in many manners and as such are not to be limited by the preceding exemplary embodiments and examples. In other words, functional elements being performed by single or multiple components, in various combinations of hardware and software or firmware, and individual functions, may be distributed among software applications at either the client level or server level or both. In this regard, any number of the features of the different embodiments described herein may be combined into single or multiple embodiments, and alternate embodiments having fewer than, or more than, all of the features described herein are possible.

Functionality may also be, in whole or in part, distributed among multiple components, in manners now known or to become known. Thus, myriad software/hardware/firmware combinations are possible in achieving the functions, features, interfaces and preferences described herein. Moreover, the scope of the present disclosure covers conventionally known manners for carrying out the described features and functions and interfaces, as well as those variations and modifications that may be made to the hardware or software or firmware components described herein as would be understood by those skilled in the art now and hereafter.

Furthermore, the embodiments of methods presented and described as flowcharts in this disclosure are provided by way of example in order to provide a more complete understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Alternative embodiments are contemplated in which the order of the various operations is altered and in which sub-operations described as being part of a larger operation are performed independently.

While various embodiments have been described for purposes of this disclosure, such embodiments should not be deemed to limit the teaching of this disclosure to those embodiments. Various changes and modifications may be made to the elements and operations described above to obtain a result that remains within the scope of the systems and processes described in this disclosure.

Claims

A method comprising:

retrieving raw sensor data collected by a plurality of sensors;

identifying a plurality of events based on the raw sensor data; and

building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events.
The method of claim 1, further comprising querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
The method of claim 1, the identifying a plurality of events comprising:

decomposing the raw sensor data into trend, seasonal, and remainder components; and

identifying the plurality of events based on one or more of the trend, seasonal, and remainder components.
The method of claim 3, each event in the plurality of events comprising an event selected from the group consisting of a spike or dip event, mean shift event, variance shift, or trend change event.
The method of claim 4, the identifying a plurality of events comprising identifying a spike or dip event by comparing the remainder component to a spike threshold and dip threshold.
The method of claim 4, the identifying a plurality of events comprising identifying a mean shift event by:

averaging the raw sensor data over a sliding time window; and

comparing the average to a mean increase and mean decrease threshold.
The method of claim 4, the identifying a plurality of events comprising identifying a trend change event by comparing the trend component to a positive trend or negative trend threshold.
The method of claim 1, the building a sensor graph comprising:

building an event graph based on the plurality of events, the event graph storing event types, times, and sensor identifiers as nodes; and

building the sensor graph based on the event graph.
The method of claim 1, the querying the sensor graph comprising:

detecting a second event for a second sensor;

identifying a first-level connection sensor in the sensor graph for the second sensor; and

using the first-level connection sensor as the correlated sensor.
A non-transitory computer readable storage medium for tangibly storing computer program instructions capable of being executed by a computer processor, the computer program instructions defining the steps of:

retrieving raw sensor data collected by a plurality of sensors;

identifying a plurality of events based on the raw sensor data; and

building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events.

.
The non-transitory computer readable storage medium of claim 10, the instructions further defining the step of querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph
The non-transitory computer readable storage medium of claim 10, the identifying a plurality of events comprising:

decomposing the raw sensor data into trend, seasonal, and remainder components; and

identifying the plurality of events based on one or more of the trend, seasonal, and remainder components.
The non-transitory computer readable storage medium of claim 11, each event in the plurality of events comprising an event selected from the group consisting of a spike or dip event, mean shift event, variance shift, or trend change event.
The non-transitory computer readable storage medium of claim 12, the identifying a plurality of events comprising identifying a spike or dip event by comparing the remainder component to a spike threshold and dip threshold.
The non-transitory computer readable storage medium of claim 12, the identifying a plurality of events comprising identifying a mean shift event by:

averaging the raw sensor data over a sliding time window; and

comparing the average to a mean increase and mean decrease threshold.
The non-transitory computer readable storage medium of claim 12, the identifying a plurality of events comprising identifying a trend change event by comparing the trend component to a positive trend or negative trend threshold.
The non-transitory computer readable storage medium of claim 10, the building a sensor graph comprising:

building an event graph based on the plurality of events, the event graph storing event types, times, and sensor identifiers as nodes; and

building the sensor graph based on the event graph.
The non-transitory computer readable storage medium of claim 10, the querying the sensor graph comprising:

detecting a second event for a second sensor;

identifying a first-level connection sensor in the sensor graph for the second sensor; and

using the first-level connection sensor as the correlated sensor.
An apparatus comprising:

a processor; and

a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic causing the processor to perform the operations of:

retrieving raw sensor data collected by a plurality of sensors, identifying a plurality of events based on the raw sensor data, and

building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events.
The apparatus of claim 19, the operations further including querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
The apparatus of claim 19, the identifying a plurality of events comprising:

decomposing the raw sensor data into trend, seasonal, and remainder components; and

identifying the plurality of events based on one or more of the trend, seasonal, and remainder components.
The apparatus of claim 19, the building a sensor graph comprising:

building an event graph based on the plurality of events, the event graph storing event types, times, and sensor identifiers as nodes; and

building the sensor graph based on the event graph.
The apparatus of claim 19, the querying the sensor graph comprising:

detecting a second event for a second sensor;

identifying a first-level connection sensor in the sensor graph for the second sensor; and

using the first-level connection sensor as the correlated sensor.