WO2021087896A1 - Data-driven graph of things for data center monitoring copyright notice - Google Patents

Data-driven graph of things for data center monitoring copyright notice Download PDF

Info

Publication number
WO2021087896A1
WO2021087896A1 PCT/CN2019/116370 CN2019116370W WO2021087896A1 WO 2021087896 A1 WO2021087896 A1 WO 2021087896A1 CN 2019116370 W CN2019116370 W CN 2019116370W WO 2021087896 A1 WO2021087896 A1 WO 2021087896A1
Authority
WO
WIPO (PCT)
Prior art keywords
sensor
event
graph
events
identifying
Prior art date
Application number
PCT/CN2019/116370
Other languages
French (fr)
Inventor
Zhan Li
Hao Zhang
Zhixing Ren
Jialong WANG
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to PCT/CN2019/116370 priority Critical patent/WO2021087896A1/en
Priority to CN201980099754.8A priority patent/CN114365505A/en
Publication of WO2021087896A1 publication Critical patent/WO2021087896A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q9/00Arrangements in telecontrol or telemetry systems for selectively calling a substation from a main station, in which substation desired apparatus is selected for applying a control signal thereto or for obtaining measured values therefrom
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Definitions

  • the disclosed embodiments relate to the management of sensors and, more particularly, to techniques for managing the sensor data of an Internet data center (IDC) .
  • IDC Internet data center
  • Modern IDCs generally employed one or more monitoring systems for monitoring the output from various sensors.
  • the data generated by these sensors comprises a significant amount of raw time series data.
  • human technicians or automation systems need to understand the information behind this data, especially when there is a real-time condition that requires intervention. For example, increasing readings of a temperature sensor could be caused by the failure of fans of a computer room air handler (CRAH) , or by the sudden jump of the load placed on a rack of servers.
  • CRAH computer room air handler
  • the disclosed embodiments solve these and other problems in IDCs.
  • the disclosed embodiments achieve safer and more efficient operations of such IDCs by employing a novel graph of things (GoT) that can model sensors and events.
  • GoT graph of things
  • a method comprising retrieving raw sensor data collected by a plurality of sensors; identifying a plurality of events based on the raw sensor data; building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events; and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
  • a non-transitory computer readable storage medium for tangibly storing computer program instructions capable of being executed by a computer processor, the computer program instructions defining the steps of: retrieving raw sensor data collected by a plurality of sensors; identifying a plurality of events based on the raw sensor data; building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events; and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
  • an apparatus comprising: a processor; and a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic causing the processor to perform the operations of: retrieving raw sensor data collected by a plurality of sensors, identifying a plurality of events based on the raw sensor data, building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events, and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
  • FIG. 1 is a block diagram of a system for generating and analyzing a GoT according to some embodiments of the disclosure.
  • FIG. 2 is a flow diagram illustrating a method for generating a GoT according to some embodiments of the disclosure.
  • FIGS. 3A through 3F illustrate event graphs and methods of building the same according to some embodiments of the disclosure.
  • FIG. 4 illustrates a sensor graph of things according to some embodiments of the disclosure.
  • FIG. 5A is a flow diagram illustrating a method for determining if a spike or dip event has occurred according to some embodiments of the disclosure.
  • FIG. 5B is a flow diagram illustrating a method for determining if a means shift event has occurred according to some embodiments of the disclosure.
  • FIG. 5C is a flow diagram illustrating a method for determining if a positive or negative trend change event has occurred according to some embodiments of the disclosure.
  • FIG. 5D is a flow diagram illustrating a method for determining if a variance change event has occurred according to some embodiments of the disclosure.
  • FIG. 6 is a flow diagram illustrating a method for analyzing a GoT according to some embodiments of the disclosure.
  • FIG. 7 is a hardware diagram illustrating a device for generating or analyzing a GoT according to some embodiments of the disclosure.
  • FIG. 1 is a block diagram of a system for generating and analyzing a GoT according to some embodiments of the disclosure.
  • a system (100) includes a plurality of sensors (102a, 102b, 102c, 102n; collectively, 102) .
  • sensors include such a device as temperature sensors, voltage sensors, current sensors, humidity sensors, air flow sensors, moisture or water sensors, smoke sensors, door sensors, video surveillance devices, power consumption/switching sensors, and other sensors measuring real-world attributes of a data center or other environment employing the system (100) .
  • the sensors may comprise software-based sensing components such as server load monitoring sensors, log file monitoring sensors, intrusion detection systems, load spike detection software and various other software used to monitor services provide by, for example, a data center.
  • the sensors (102) generate time series data. That is, each sensor generates a data value at a given time and generates such data over a period.
  • time series data include a continuous temperature monitor (i.e., a periodic temperature reported by a temperature sensor) as well as the number of inbound connections to server at any given time (e.g., as reported by a server) .
  • any data that can be represented as a function of time can be considered time series data generated by the sensors (102) and the disclosed embodiments are not limited to a particular sensor or type or format of time series data.
  • the sensors (102) are described primarily in the context of an IDC, however the sensors (102) can be installed in any type of environment. Additionally, no limit is placed on the number, type, or placement of the sensors (102) .
  • the sensors (102) are each connected to a network (118) .
  • each of the sensors (102) is physically coupled to the network (118) and the network (118) may comprise a controller area network (CAN) bus, Ethernet network, or other type of data communications medium.
  • the sensors (102) include a wireless transceiver and can communicate over a wireless medium in lieu of a physical bus. Examples of such communication networks include Wireless Fidelity (Wi-Fi) , cellular, or satellite networks.
  • the sensors (102) can be directly connected to other devices in the system (e.g., the monitoring system 116 or the pre-processor 104) and may not communicate over the network (118) .
  • the system (100) may employ multiple networks (and directly connected devices) .
  • servers employing load monitoring can communicate over an Ethernet network while temperature sensors may communicate over a Wi-Fi network.
  • the disclosed embodiments are not limited to any one type (or combination) of network technologies used.
  • the data generated by the sensors (102) is referred to as raw data.
  • the sensors transmit this data to the pre-processor (104) which forms the initial stage of an event extraction phase.
  • This phase is designed to filter the raw data into actionable events and is described in more detail with respect to the various flow diagrams.
  • the event extraction phase includes a pre-processor (104) , time-series data analysis processor (106) and event detection processor (108) .
  • each processor (104, 106, 108) comprises a dedicated hardware processing element.
  • the event extraction phase can be implemented on one or more hardware devices and the processors (104, 106, 108) can be implemented as software running on such devices.
  • the pre-processor (104) receives the raw data from the sensors (102) over the network (118) . As described, this raw data includes a data value and a time value. In the illustrated embodiment, the pre-processor (104) cleans and smooths the received data. Details of this operation are provided in the description of FIG. 2 and, in particular, in the description of step 204.
  • the pre-processor (104) transmits the data to the time series data analysis processor (106) .
  • the time series data analysis processor (106) decomposes the cleaned and smoothed time series data into trend, seasonal, and remainder components. Details of this operation are provided in the description of FIG. 2 and, in particular, in the description of step 206.
  • the time series data analysis processor (106) then transmits the decomposed components to the event detection processor (108) .
  • the event detection processor (108) processes the individual components of a time series data point and identifies actionable events represented by such components.
  • the output of event detection processor (108) comprises a vector including a value for each type of event, the time of the sensor data, and an identity of the sensor.
  • the event detection processor (108) transmits this data to event storage (110) . Details of the operation of the event detection processor (108) are provided in the description of FIG. 2 and, in particular, in the description of step 208 as well as FIGS. 5A, 5B, and 5C.
  • the event storage (110) comprises a storage device (physical or logical) that stores the events detected during the event extraction phase.
  • the event storage (110) comprises a relational database management system (RDBMS) or other type of database.
  • the event storage (110) can comprise a key-value data store or other less intensive data store (e.g., an object store) .
  • the event storage (110) is configured to store events in temporal order based on the time identified by the event detection processor (108) . In this manner, the event storage (110) operates similar to a queue.
  • a separate graph phase is illustrated.
  • a graph is built and periodically updated based on the events stored in the event storage (110) .
  • a graph builder (112) accesses the event storage (110) to retrieve events.
  • the graph builder (112) actively queries the event storage (110) .
  • the graph builder (112) subscribes to the event storage (106) and periodically receives events as they are added to the event storage (110) .
  • the graph builder (112) additionally accesses a graph storage (114) .
  • the graph storage (114) can comprise a graph database or may comprise an RDBMS.
  • the graph storage (114) stores a set of nodes, a set of edges, and weights for the edges.
  • the nodes comprise sensor identifiers, time points, and event types.
  • the graph builder (112) updates the data stored in the graph storage (114) based on events received in the event storage (110) . In this manner, the graph builder (112) is responsible for updating a graph of events captured by the sensors (102) and processed during the event extraction phase. Details of the operation of the graph builder (112) and graph storage (114) are provided in the description of FIG. 2 and, in particular, in the description of step 212 as well as FIGS. 3A through 3F and 4.
  • the system (100) includes a monitoring system (116) .
  • the monitoring system (116) receives processed event data from the event detection processor (108) .
  • the monitoring system (116) may also receive raw data from the sensors (102) .
  • the monitoring system (116) is additionally communicatively coupled to the graph storage (114) .
  • the monitoring system (116) may comprise a hardware device (or multiple devices) that is responsible for analyzing sensor data and providing actionable intelligence to operators of the device.
  • the monitoring system (116) may also automatically take action in response to events (e.g., alerting a fire department upon detecting a rise in temperature) .
  • the monitoring system (116) queries the graph storage (114) to identify correlated sensors in response to an event and perform a root cause analysis. Details of the operation of the monitoring system (116) are provided in the description of FIG. 6.
  • FIG. 2 is a flow diagram illustrating a method for generating a GoT according to some embodiments of the disclosure.
  • the method (200) receives sensor data.
  • the method (200) may receive raw data generated by one or more sensors over a network or other communications medium.
  • step 204 the method (200) cleans and smooths the data.
  • the method (200) performs various cleaning operations on the received data such as filtering anomalies, removing aberrant sensors, smoothing short-term fluctuations in sensor data, and other operations. In some embodiments, cleaning may also include interpolating missing values and/or normalizing values.
  • the method (200) can utilize moving average smoothing to smooth and clean data. Examples of such moving average approaches include weight moving average (WMA) , exponentially weighted moving average (EWMA) , autoregressive integrated moving average (ARIMA) smoothing, among other techniques.
  • WMA weight moving average
  • EWMA exponentially weighted moving average
  • ARIMA autoregressive integrated moving average
  • Other techniques by used in addition to (or in lieu) of these techniques and the disclosed embodiments are not intended to be limited to a specific or single technique for cleaning and smoothing the data from sensors. Indeed, multiple techniques may be used simultaneously and the techniques may be unique to each sensor based on the type of time series data generated and observations regarding the fluctuations of the data generated by the sensors.
  • step 206 the method (200) decomposes the cleaned and smoothed data into trend, season, and remainder data.
  • time series data received in step 202 can be affected by three components: trend, seasonal, and remainder components.
  • trend component of a time series refers to patterns that repeat over a fixed interval.
  • trend component refers to an overall change in data over a longer period of time.
  • remainder refers to the residual data remaining in the time series when the trend and seasonal data are removed.
  • a time series can be represented by a function Y (t) , wherein t represents the time elapsed.
  • the choice of an additive or multiplicative model may be made based on the underlying sensor (s) .
  • the method (200) receives the raw time series data and first detects the trend component.
  • the method (200) may use a centered moving average algorithm to calculate the trend T (t) of the time series Y (t) , however other algorithms (e.g., Fourier transform) may be used.
  • the method (200) may then remove the trend T (t) from the time series Y (t) .
  • the method (200) may then analyze the underlying de-trended time series (Y (t) -T (t) ) to determine the period of the resulting data. After identifying the period, the method (200) extracts the seasonal component S (t) which leaves the remainder or noise component R (t) .
  • step 208 the method (200) extracts events from the decomposed data.
  • the method (200) stores a list of pre-defined events.
  • an event comprises one or more conditions that utilize the underlying components (trend, seasonal, remainder) as inputs. That is, if one or more of the decomposed components satisfy a condition, an event is presumed to have occurred.
  • the disclosed embodiments describe four examples of events: spikes/dips, mean shifts, trend changes and variance shift. However, the disclosed embodiments are not intended to be limited to only these events and other events may be constructed based on the underlying data or needs of the system implementing the method (200) .
  • Spikes/dips refer to rapid increases or decreases in the value of the remainder component of the time series. As described above, a trend refers to a gradual change in a time series while a season refers to a periodic (and regular) fluctuation. Thus, spikes or dips in a time series are generally attributable to the remainder component of the time series. The detection of spikes/dips is described more fully in the description of FIG. 5A.
  • a mean shift event refers to a change in the moving average of the time series data overall.
  • a sliding window is used to calculate the average value of the time series at a given time t and this value is used to determine if a mean shift has occurred. The detection of mean shifts is described more fully in the description of FIG. 5B.
  • a trend change event refers to a time when the trend component of a time series reverses, either from positive (upward) to negative (downward) or from negative (downward) to positive (upward) .
  • the detection of trend change is described more fully in the description of FIG. 5C.
  • a variance shift event refers to a change in the variance of the underlying time series data. The detection of a variance shift is described more fully in the description of FIG. 5D.
  • the method (200) generates an n-item tuple representing the detected events.
  • the method (200) will generate a three-item tuple representing whether the event is present in the time series.
  • step 210 the method (200) stores the extracted events.
  • the output of step 208 comprises a packet comprising a numeric value indicating the present/absence of each type of event, a time component, and a sensor identifier (referred to as an event data structure) .
  • the event data structure can be stored in a RDBMS or other type of storage medium that allows for future access.
  • the event data structures can be stored in a queue type data structure.
  • the event data structures can be stored in a database that supports the pub/sub architectural pattern (e.g., REDIS) such that the stream of events can be subscribed to by a downstream processing device.
  • REDIS pub/sub architectural pattern
  • step 212 the method (200) builds or updates a graph based on the extracted events.
  • the method (200) may first build an event graph representing the events stored in the database.
  • this graph represents each event type, sensor, and time as a node in the graph.
  • a given sensor node is connected to either event nodes or time nodes, whereas event nodes and time nodes are only connected to sensor nodes (i.e., are not connected to other time nodes or event nodes) .
  • the method (200) stores this graph in a graph database.
  • this graph may be converted into relational form and stored in an RDBMS. An example of building a graph from a set of events is provided in the description of FIGS. 3A through 3F, incorporated herein by reference.
  • the event graph can comprise a separately stored graph.
  • the event graph can comprise a temporary graph (e.g., stored only in memory) .
  • the event graph is used to guild the sensor graph (of things) described below and in the description of FIG. 4.
  • the method (200) further synthesizes a sensor graph (also referred to as a sensor graph of things or simply GoT) .
  • the nodes only comprise sensors and the edges between the nodes are weighted based on the connectivity of the event graph.
  • the sensor graph is built by analyzing the event graph to identify correlated sensors.
  • sensors are correlated if they experience an event at the same time point.
  • the weights of the edges of the sensor graph represents the number of time points in which two sensors experience an event simultaneously.
  • FIGS. 3A through 3F illustrate event graphs and methods of building the same according to some embodiments of the disclosure.
  • the graphs in FIGS. 3A through 3F can be generated using a set of events generated using historical sensor data.
  • the graphs can be generated using real-time events extracted from a real-time stream of raw sensor data.
  • a first event is analyzed comprising a Spike or Dip occurring at Time 1 for Sensor 1, the event type (Spikes and Dips) , sensor ID (1) and time point (1) are used as nodes in the graph (300a) and the sensor node is connected to an event type node and the time point node via undirected edges. As illustrated in FIGS. 3A through 3F, all edges are undirected and this point will not be repeated of the sake of brevity.
  • FIG. 3B also at Time 1, a second event is processed wherein Sensor 3 detects a spike or dip.
  • a node for Sensor 3 is added to the graph (300b) .
  • the Time 1 node is re-used and the Sensor 3 node is connected to the Time 1 node via an edge.
  • the Sensor 3 node is connected to the existing Spikes and Dips node via an edge.
  • a new, duplicate Spikes and Dips node may be added; however, as illustrated, event nodes (like time nodes) may be re-used.
  • a third event is received at Time 2.
  • This event comprises a mean shift type event detected by Sensor 2.
  • a new component is added to the graph (300c) that comprises a new Sensor 2 node connected to a new Time 2 node and a new Mean Shift node.
  • the graph (300c) consists of two separate components.
  • FIG. 3D a fourth event is received occurring at Time 2. Since a Time 2 node has already been created, that node is re-used. However, a Sensor 4 node is added as well as a new Trend Change node corresponding to the trend change event detected. As will be discussed in more detail, the graph (300d) depicted in FIG. 3D begins to illustrated correlations between sensors.
  • a fifth event is received at Time 3.
  • a new Time 3 node is added to the graph (300e) .
  • the Sensor 1 node is connected to the new Time 3 node.
  • this connection is re-used.
  • a new duplicate Spikes and Dips node can be added and connected to the Sensor 1 node.
  • a weight of the edge between the Sensor 1 node and the Spikes and Dips node can be increase by one.
  • the sixth (and final) event is received at Time 3. Similar to the fifth event, the event in FIG. 3F represents a spike and dip detected on sensor 3. In this scenario, all nodes exist (Sensor 3, Time 3, Spikes and Dip) , thus no new nodes are created in the graph (300f) . However, as discussed above, the event type node may be duplicated again. Thus, in FIG. 3F a new edge between the Sensor 3 node and the Time 3 node is added and the edge between the Sensor 3 node and Spikes and Dips node is re-used (or in some embodiments, increased in weight) .
  • the graph depicted in FIG. 3F can then be used to generate a sensor graph.
  • FIG. 4 illustrates a sensor graph of things according to some embodiments of the disclosure.
  • the sensor graph (400) is generated based on the event graph (300f) illustrated in FIG. 3F.
  • the sensor graph (400) differs from the event graph (300f) in that all nodes of the sensor graph (400) comprise sensor nodes. Further edges between sensor nodes are undirected and only include a numerical weight. Thus, data regarding event types and times are removed when generating the sensor graph (400) .
  • each sensor node in the event graph (300f) is included in the sensor graph (400) . Further, each node may be connected to every other node with a weight of zero. Alternatively, each node in the sensor graph (400) may initially be unconnected and only connected to other nodes as described herein.
  • a graph builder analyzes the event graph (300f) to determine the number of paths between the given sensor and another sensor that has a distance of two and passes through a time node. To accomplish this, the graph builder can select a sensor node and identify the connected time nodes. The graph builder can then, for each time node, identify any sensor nodes connected to the time nodes. The resulting sensors nodes are then counted and the count is used as the weight between a given sensor and another sensor.
  • FIG. 5A is a flow diagram illustrating a method for determining if a spike or dip event has occurred according to some embodiments of the disclosure.
  • step 502a the method (500a) receives a remainder component R (t) which comprises the remainder portion of a given time series calculated as described above.
  • the method (500a) compares the remainder component to a spike threshold (Th spike ) and a dip threshold (Th dip ) .
  • these thresholds comprise static values above and below a center point.
  • Th spike can be set to a positive value (e.g., +5) while Th dip may be set to a negative value (e.g., –5) .
  • the specific values of these thresholds are not intended to be limiting.
  • Th spike must be greater than zero and Th dip must be less than zero.
  • step 506a the method (500a) determines that the value of R (t) exceeds the preconfigured spike threshold Th spike and flags the time (t) as a spike.
  • flagging the time as a spike comprises setting an event output (e.g., e 1 ) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5B through 5D, as described more fully in the description of step 208 of FIG. 2.
  • step 510a the method (500a) determines that the value of R (t) is below the preconfigured dip threshold Th dip and flags the time (t) as a dip.
  • flagging the time as a dip comprises setting an event output (e.g., e 1 ) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5B through 5D, as described more fully in the description of step 208 of FIG. 2.
  • step 508a if the value of R (t) is not greater than Th spike and is not less than Th dip , the method (500a) bypasses the spikes/dips event detection and sets the event output (e.g., e 1 ) value to zero (0) .
  • the method (500a) utilizes the following rule to generate the output value for event indicated as e 1 :
  • the method (500a) may utilize output values other than -1, 0, and 1.
  • the method (500a) may set the output value to represent the distance of the value of R (t) from the respective threshold.
  • the two thresholds are determined using the empirical quantiles of (t) such that event only happens with a small probability ⁇ (which can be tuned and may be set to be 1%) . This amounts to respecting the a priori knowledge that an interesting event is rare. In most scenarios, a time series is “normal” and does not contain much information.
  • the two thresholds (Th spike and Th dip ) have a minimum absolute value of threshold (TH abs ) greater > 0, which can represent human expert priori knowledge.
  • TH spike is set as follows:
  • TH spike max (TH spike , TH abs )
  • TH dip min (TH dip , -TH abs )
  • FIG. 5B is a flow diagram illustrating a method for determining if a means shift event has occurred according to some embodiments of the disclosure.
  • the method (500b) defines a sliding window for a time series data set.
  • the sliding window comprises a fixed duration in which the remaining steps analyze the time series data.
  • the specific duration of this window is not limiting and may be set according to the underlying data or needs of the system.
  • the method (500b) computes the mean of the trend component T (t) from two sliding windows, and takes the difference between them.
  • the left window consists of less recent data:
  • L and R are the size of left and right time window respectively.
  • the method (500b) averages the values of the time series data points (i.e., values of Y (t) ) appearing within the window.
  • the average comprises the average value of Y (t) during the aforementioned window.
  • the method (500b) compares the computed average for a given window (a (t) ) to a mean increase threshold (Th increase ) and a mean decrease threshold (Th decrease ) .
  • these thresholds comprise static values above and below a center point.
  • Th increase can be set to a positive value (e.g., +5) while Th decrease may be set to a negative value (e.g., –5) .
  • the specific values of these thresholds are not intended to be limiting. However, the value of Th increase should be higher than the value of Th decrease to avoid the scenario where Th decrease >a (t) >Th increase .
  • step 508b the method (500b) determines that the value of a (t) exceeds the preconfigured mean increase threshold Th increase and flags the time (t) as a mean increase.
  • flagging the time as a mean increase comprises setting an event output (e.g., e 2 ) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5C, and 5D, as described more fully in the description of step 208 of FIG. 2.
  • step 512b the method (500b) determines that the value of a (t) is below the preconfigured mean decrease threshold Th decrease and flags the time (t) as a decrease in the mean for a sliding window.
  • flagging the time as a mean decrease comprises setting an event output (e.g., e 2 ) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5C, and 5D, as described more fully in the description of step 208 of FIG. 2.
  • step 510b if the value of a (t) is not greater than Th increase and is not less than Th decrease , the method (500b) bypasses the mean increase/decrease event detection and sets the event output (e.g., e 2 ) value to zero (0) .
  • the method (500b) utilizes the following rule to generate the output value for event indicated as e 2 :
  • the method (500b) may utilize output values other than -1, 0, and 1.
  • the method (500b) may set the output value to represent the distance of the value of a (t) from the respective threshold.
  • FIG. 5C is a flow diagram illustrating a method for determining if a positive or negative trend change event has occurred according to some embodiments of the disclosure.
  • step 502c the method (500c) receives a trend component T (t) which comprises the trend portion of a given time series calculated as described above.
  • the method (500c) compares the trend component to a positive trend threshold (Th positive ) and a negative trend threshold (Th negative ) .
  • these thresholds comprise static values above and below a center point.
  • Th positive can be set to a positive value (e.g., +5) while Th negative may be set to a negative value (e.g., –5) .
  • the specific values of these thresholds are not intended to be limiting. However, the value of Th positive should be higher than the value of Th negative to avoid the scenario where Th negative >D (t) >Th positive .
  • step 508c the method (500c) determines that the value of D (t) exceeds the preconfigured positive trend threshold Th positive and flags the time (t) as a positive.
  • flagging the time as a positive comprises setting an event output (e.g., e 3 ) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5B, and 5D, as described more fully in the description of step 208 of FIG. 2.
  • step 512c the method (500c) determines that the value of D (t) is below the preconfigured negative trend threshold Th negative and flags the time (t) as a negative.
  • flagging the time as a negative trend event comprises setting an event output (e.g., e 3 ) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5B, and 5D, as described more fully in the description of step 208 of FIG. 2.
  • step 510c if the value of D (t) is not greater than Th positive and is not less than Th negative , the method (500c) bypasses the positive/negative trend event detection and sets the event output (e.g., e 3 ) value to zero (0) .
  • the method (500c) utilizes the following rule to generate the output value for event indicated as e 3 :
  • the method (500c) may utilize output values other than -1, 0, and 1.
  • the method (500c) may set the output value to represent the distance of the value of D (t) from the respective threshold.
  • FIG. 5D is a flow diagram illustrating a method for determining if a variance change event has occurred according to some embodiments of the disclosure.
  • step 502d the method (500d) receives computes standard deviations ⁇ L (t) and ⁇ R (t) of the remainder component R (t) of left and right sliding windows, respectively of the time-series data.
  • the left and right sliding windows may be computed as described previously, the disclosure of which is not repeated herein.
  • step 504d the method (500c) computes the standard deviation difference over time between the right and left standard deviation, which is denoted as ⁇ (t) .
  • the method (500d) compares the standard deviation difference to a positive variance threshold (Th positive ) and a negative variance threshold (Th negative ) .
  • these thresholds comprise static values above and below a center point.
  • Th positive can be set to a positive value (e.g., +5) while Th negative may be set to a negative value (e.g., –5) .
  • the specific values of these thresholds are not intended to be limiting. However, the value of Th positive should be higher than the value of Th negative to avoid the scenario where Th negative > ⁇ (t) >Th positive .
  • step 508d the method (500d) determines that the value of ⁇ (t) exceeds the preconfigured positive variance threshold Th positive and flags the time (t) as a positive.
  • flagging the time as a positive comprises setting an event output (e.g., e 4 ) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A through 5C, as described more fully in the description of step 208 of FIG. 2.
  • step 512d the method (500d) determines that the value of ⁇ (t) is below the preconfigured negative variance threshold Th negative and flags the time (t) as a negative.
  • flagging the time as a negative variance event comprises setting an event output (e.g., e 4 ) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A through 5C, as described more fully in the description of step 208 of FIG. 2.
  • step 510d if the value of ⁇ (t) is not greater than Th positive and is not less than Th negative , the method (500d) bypasses the positive/negative variance event detection and sets the event output (e.g., e 4 ) value to zero (0) .
  • the method (500d) utilizes the following rule to generate the output value for event indicated as e 4 :
  • the method (500d) may utilize output values other than -1, 0, and 1.
  • the method (500d) may set the output value to represent the distance of the value of ⁇ (t) from the respective threshold.
  • FIG. 6 is a flow diagram illustrating a method for analyzing a GoT according to some embodiments of the disclosure.
  • step 602 the method (600) monitors sensor data.
  • the method (600) monitors raw sensor data output accessed over a communications bus as described in the description of FIG. 1.
  • step 604 the method (600) determines if an event occurred.
  • the method (600) determines if an event occurs by utilizing the processes described in FIGS. 2, 5A, 5B, and 5C. In some embodiments, these events may simultaneously be added to an event store (as described in FIG. 2) and processed by the method (600) .
  • an event detected in step 604 includes a sensor identifier (i) representing the sensor associated with the event. In some embodiments, steps 602 and 604 may be optional.
  • the method (600) receives pre-processed events directly from the event extraction phase depicted in FIG. 1.
  • the method (600) does not detect an event (e.g., spikes/dips, mean shift, trend change, or variance shift is not detected) , the method (600) continues to monitor the system for events. Alternatively, if the method (600) detects an event, the method (600) proceeds to step 608.
  • an event e.g., spikes/dips, mean shift, trend change, or variance shift is not detected
  • the method (600) retrieves a sensor identifier (i) from the event detected in step 606.
  • the sensor identifier can comprise a globally unique identifier, Internet Protocol (IP) address, or other uniquely identifying information.
  • the method (600) uses the sensor identifier to query the graph storage that stores the sensor graph. That is, the method (600) uses the sensor identifier as a query key to retrieve data regarding the identified sensor from the graph storage. In one embodiment, the method (600) requests a list of first-level connections of the node associated with the sensor identifier from the sensor graph.
  • step 608 the method (600) determines if the query above returns an empty set; that is, if the node associated with the sensor identifier does not have any first-level connections.
  • a first-level connection refers to another node in the graph connected to the node associated with the sensor identifier having a distance of one.
  • step 610 the method (600) flags the event for further analysis if the sensor does not have any first-level connections.
  • the method (600) cannot identify any sensors correlated with the sensor identifier extracted in step 606.
  • an event occurred (as identified in step 604) , it cannot be correlated using the sensor graph and requires further intervention by, for example, an operator of a monitoring system.
  • step 612 if the method (600) determines that one or more first-level connections exist, the method (600) retrieves the first-level connections from the sensor graph.
  • steps 610 and 612 may be combined.
  • the first-level connections comprise a set of nodes associated with other sensors. Each of these nodes may include one or more events previously recorded for the respective sensor (and times of these events) . In some embodiments, further analysis or detail can be included for each event.
  • the method (600) analyzes the identified sensor nodes to determine if the same type of event has been recorded for any of the sensors in the first-level connections list. In some embodiments, the method (600) limits the analysis in step 614 to concurrent events. In other embodiments, the method (600) may limit the analysis to events occurring within a predefined time from the current event detected in step 604. Alternatively, or in conjunction with the foregoing, the method (600) may analyze the shape of the trend of the time series data for each sensor to determine if the trend is similar to the trend for the sensor identified by the sensor identifier.
  • step 616 the method (600) sets a set of correlated sensors to be equal to those sensors having a same or similar event or trend identified in step 614.
  • the method (600) stores identifiers associated with the correlated sensors and then flags the correlated sensors for further analysis (described previously in connection with step 610) .
  • steps 618, 620, and 622 the method (600) proceeds to analyze the second-level connections of the sensor identified in step 606.
  • a second-level connection refers to a node in the sensor graph having a distance of two from the sensor identified in step 606.
  • Steps 618, 620, and 622 are similar to steps 608, 612, and 614 and the details of these steps are not repeated herein.
  • steps 608, 612, and 614 can be performed for any level connection of the sensor identified in step 606. That is, in addition to analyzing first-and second-level connections (as illustrated) , the method (600) can analyze nodes that have distances of three or more. In some embodiments, the number of levels to analyze may be preconfigured by the method (600) . In other embodiments, the method (600) may continue to search each level until at least one correlated sensor node is found.
  • the method (600) halts after identifying at least one correlated sensor in a given level. In other embodiments, the method (600) may continue to analyze additional levels even when finding a correlated sensor at a given level. In this embodiment and other embodiments, the method (600) may rank the correlated sensors based on their distance from the sensor identified in step 606. For example, a correlated first-level sensor (step 614) may be ranked higher than a correlated second-level sensor (step 622) .
  • FIG. 7 is a hardware diagram illustrating a device for generating or analyzing a GoT according to some embodiments of the disclosure.
  • Device (700) may include many more or fewer components than those shown in FIG. 7. However, the components shown are sufficient to disclose an illustrative embodiment for implementing the present disclosure. Device (700) may represent, for example, devices discussed above in relation to FIG. 1.
  • device (700) includes a processing unit (CPU) (702) in communication with a mass memory (730) via a bus (724) .
  • Device (700) also includes one or more network interfaces (750) , an audio interface (752) , a display (754) , a keypad (756) , an illuminator (758) , an input/output interface (760) , and a camera (s) or other optical, thermal or electromagnetic sensors (762) .
  • Device (700) can include one camera/sensor (762) , or a plurality of cameras/sensors (762) , as understood by those of skill in the art.
  • Network interface (750) includes circuitry for coupling device (700) to one or more networks and is constructed for use with one or more communication protocols and technologies.
  • Network interface (750) is sometimes known as a transceiver, transceiving device, or network interface card (NIC) .
  • Audio interface (752) is arranged to produce and receive audio signals such as the sound of a human voice.
  • audio interface (752) may be coupled to a speaker and microphone (not shown) to enable telecommunication with others and generate an audio acknowledgement for some action.
  • Display (754) may be a liquid crystal display (LCD) , gas plasma, light emitting diode (LED) , or any other type of display used with a computing device.
  • Display (754) may also include a touch sensitive screen arranged to receive input from an object such as a stylus or a digit from a human hand.
  • Keypad (756) may comprise any input device arranged to receive input from a user.
  • keypad (756) may include a push button numeric dial, or a keyboard.
  • Keypad (756) may also include command buttons that are associated with selecting and sending images.
  • Illuminator (758) may provide a status indication and provide light.
  • Illuminator (758) may remain active for specific periods of time or in response to events. For example, when illuminator (758) is active, it may backlight the buttons on keypad (756) and stay on while the device is powered. Also, illuminator (758) may backlight these buttons in various patterns when particular actions are performed, such as dialing another device.
  • Illuminator (758) may also cause light sources positioned within a transparent or translucent case of the device to illuminate in response to actions.
  • Device (700) also comprises input/output interface (760) for communicating with external devices.
  • Input/output interface (760) can utilize one or more communication technologies, such as USB, infrared, Bluetooth TM , or the like.
  • Mass memory (730) includes a RAM (732) , a ROM (724) , and other storage means. Mass memory (730) illustrates another example of computer storage media for storage of information such as computer-readable instructions, data structures, program modules or other data. Mass memory (730) stores a basic input/output system ( “BIOS” ) (740) for controlling low-level operation of device (700) . The mass memory may also stores an operating system for controlling the operation of device (700) . It will be appreciated that this component may include a general purpose operating system such as a version of UNIX, or LINUX TM , or a specialized client communication operating system such as Windows Client TM , or the operating system. The operating system may include, or interface with a Java virtual machine module that enables control of hardware components and operating system operations via Java application programs.
  • BIOS basic input/output system
  • RAM (732) stores and executes one or more applications (734) .
  • these application comprise software configured to execute one or more of the operations described in connection with the foregoing figures.
  • the device (700) further includes persistent storage (e.g., hard disk, solid state drive, etc. ) storage for storing the applications (734) prior to executing in RAM (732) .
  • persistent storage e.g., hard disk, solid state drive, etc.
  • a module is a software, hardware, or firmware (or combinations thereof) system, process or functionality, or component thereof, that performs or facilitates the processes, features, and functions described herein (with or without human interaction or augmentation) .
  • a module can include sub-modules.
  • Software components of a module may be stored on a computer-readable medium for execution by a processor. Modules may be integral to one or more servers, or be loaded and executed by one or more servers. One or more modules may be grouped into an engine or an application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Signal Processing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Alarm Systems (AREA)

Abstract

Disclosed herein are techniques for building and utilizing a sensor graph of things (GoT) to correlate sensors based on past events. In one embodiment, a method is disclosed comprising retrieving raw sensor data collected by a plurality of sensors; identifying a plurality of events based on the raw sensor data; building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events; and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.

Description

DATA-DRIVEN GRAPH OF THINGS FOR DATA CENTER MONITORING COPYRIGHT NOTICE
This application includes material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office files or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND
The disclosed embodiments relate to the management of sensors and, more particularly, to techniques for managing the sensor data of an Internet data center (IDC) .
Modern IDCs generally employed one or more monitoring systems for monitoring the output from various sensors. The data generated by these sensors comprises a significant amount of raw time series data. In current systems, human technicians or automation systems need to understand the information behind this data, especially when there is a real-time condition that requires intervention. For example, increasing readings of a temperature sensor could be caused by the failure of fans of a computer room air handler (CRAH) , or by the sudden jump of the load placed on a rack of servers.
Current systems fail to adequately surface useful information out of significant amounts of raw data generated by such sensors. Indeed, many such systems rely on human interpretation of events and trial-and-error techniques to address aberrant sensor readings.
BRIEF SUMMARY
The disclosed embodiments solve these and other problems in IDCs. The disclosed embodiments achieve safer and more efficient operations of such IDCs by employing a novel graph of things (GoT) that can model sensors and events.
In one embodiment, a method is disclosed comprising retrieving raw sensor data collected by a plurality of sensors; identifying a plurality of events based on the  raw sensor data; building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events; and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
In another embodiment, a non-transitory computer readable storage medium for tangibly storing computer program instructions capable of being executed by a computer processor, the computer program instructions defining the steps of: retrieving raw sensor data collected by a plurality of sensors; identifying a plurality of events based on the raw sensor data; building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events; and querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
In another embodiment, an apparatus is disclosed comprising: a processor; and a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic causing the processor to perform the operations of: retrieving raw sensor data collected by a plurality of sensors, identifying a plurality of events based on the raw sensor data, building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events, and querying the sensor graph in response to a new event associated with a sensor in  the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments as illustrated in the accompanying drawings, in which reference characters refer to the same parts throughout the various views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating principles of the disclosure.
FIG. 1 is a block diagram of a system for generating and analyzing a GoT according to some embodiments of the disclosure.
FIG. 2 is a flow diagram illustrating a method for generating a GoT according to some embodiments of the disclosure.
FIGS. 3A through 3F illustrate event graphs and methods of building the same according to some embodiments of the disclosure.
FIG. 4 illustrates a sensor graph of things according to some embodiments of the disclosure.
FIG. 5A is a flow diagram illustrating a method for determining if a spike or dip event has occurred according to some embodiments of the disclosure.
FIG. 5B is a flow diagram illustrating a method for determining if a means shift event has occurred according to some embodiments of the disclosure.
FIG. 5C is a flow diagram illustrating a method for determining if a positive or negative trend change event has occurred according to some embodiments of the disclosure.
FIG. 5D is a flow diagram illustrating a method for determining if a variance change event has occurred according to some embodiments of the disclosure.
FIG. 6 is a flow diagram illustrating a method for analyzing a GoT according to some embodiments of the disclosure.
FIG. 7 is a hardware diagram illustrating a device for generating or analyzing a GoT according to some embodiments of the disclosure.
DETAILED DESCRIPTION
FIG. 1 is a block diagram of a system for generating and analyzing a GoT according to some embodiments of the disclosure.
In the illustrated embodiment, a system (100) includes a plurality of sensors (102a, 102b, 102c, 102n; collectively, 102) . Examples of sensors include such a device as temperature sensors, voltage sensors, current sensors, humidity sensors, air flow sensors, moisture or water sensors, smoke sensors, door sensors, video surveillance devices, power consumption/switching sensors, and other sensors measuring real-world attributes of a data center or other environment employing the system (100) . Alternatively, or in conjunction with the foregoing, the sensors may comprise software-based sensing components such as server load monitoring sensors, log file monitoring sensors, intrusion detection systems, load spike detection software and various other software used to monitor services provide by, for example, a data center.
The sensors (102) generate time series data. That is, each sensor generates a data value at a given time and generates such data over a period. Examples of time series data include a continuous temperature monitor (i.e., a periodic temperature reported by a temperature sensor) as well as the number of inbound connections to server at any given time (e.g., as reported by a server) . In general, any data that can be represented as a function of time can be considered time series data generated by the sensors (102) and the disclosed embodiments are not limited to a particular sensor or type or format of time series data.
In the disclosed embodiments, the sensors (102) are described primarily in the context of an IDC, however the sensors (102) can be installed in any type of environment. Additionally, no limit is placed on the number, type, or placement of the sensors (102) .
As illustrated, the sensors (102) are each connected to a network (118) . In one embodiment, each of the sensors (102) is physically coupled to the network (118) and the network (118) may comprise a controller area network (CAN) bus, Ethernet network, or other type of data communications medium. In other embodiments, the sensors (102) include a wireless transceiver and can communicate over a wireless medium in lieu of a physical bus. Examples of such communication networks include Wireless Fidelity (Wi-Fi) , cellular, or satellite networks. In some embodiments, the sensors (102) can be directly connected to other devices in the system (e.g., the monitoring system 116 or the pre-processor 104) and may not communicate over the network (118) . In some embodiments, the system (100) may employ multiple networks (and directly connected devices) . For instance, servers employing load monitoring can communicate over an Ethernet network while temperature sensors may communicate over a Wi-Fi network. The disclosed embodiments are not limited to any one type (or combination) of network technologies used.
In the illustrated embodiment, the data generated by the sensors (102) is referred to as raw data. The sensors transmit this data to the pre-processor (104) which forms the initial stage of an event extraction phase. This phase is designed to filter the raw data into actionable events and is described in more detail with respect to the various flow diagrams. As illustrated, the event extraction phase includes a pre-processor (104) , time-series data analysis processor (106) and event detection processor (108) . In some embodiments, each processor (104, 106, 108) comprises a dedicated hardware processing element. In other embodiments, the event extraction phase can be implemented on one or more hardware devices and the processors (104, 106, 108) can be implemented as software running on such devices.
In the illustrated embodiment, the pre-processor (104) receives the raw data from the sensors (102) over the network (118) . As described, this raw data includes a data value and a time value. In the illustrated embodiment, the pre-processor (104)  cleans and smooths the received data. Details of this operation are provided in the description of FIG. 2 and, in particular, in the description of step 204.
After cleaning and smoothing the sensor data, the pre-processor (104) transmits the data to the time series data analysis processor (106) . The time series data analysis processor (106) decomposes the cleaned and smoothed time series data into trend, seasonal, and remainder components. Details of this operation are provided in the description of FIG. 2 and, in particular, in the description of step 206.
The time series data analysis processor (106) then transmits the decomposed components to the event detection processor (108) . The event detection processor (108) processes the individual components of a time series data point and identifies actionable events represented by such components. The output of event detection processor (108) comprises a vector including a value for each type of event, the time of the sensor data, and an identity of the sensor. The event detection processor (108) transmits this data to event storage (110) . Details of the operation of the event detection processor (108) are provided in the description of FIG. 2 and, in particular, in the description of step 208 as well as FIGS. 5A, 5B, and 5C.
In the illustrated embodiment, the event storage (110) comprises a storage device (physical or logical) that stores the events detected during the event extraction phase. In one embodiment, the event storage (110) comprises a relational database management system (RDBMS) or other type of database. In some embodiments, the event storage (110) can comprise a key-value data store or other less intensive data store (e.g., an object store) . In some embodiments, the event storage (110) is configured to store events in temporal order based on the time identified by the event detection processor (108) . In this manner, the event storage (110) operates similar to a queue.
In the illustrated embodiment, a separate graph phase is illustrated. During the graph phase, a graph is built and periodically updated based on the events stored in the event storage (110) . In the illustrated embodiment, a graph builder (112) accesses the  event storage (110) to retrieve events. In some embodiments, the graph builder (112) actively queries the event storage (110) . In other embodiments, the graph builder (112) subscribes to the event storage (106) and periodically receives events as they are added to the event storage (110) .
The graph builder (112) additionally accesses a graph storage (114) . The graph storage (114) can comprise a graph database or may comprise an RDBMS. In general, the graph storage (114) stores a set of nodes, a set of edges, and weights for the edges. In the illustrated embodiment, the nodes comprise sensor identifiers, time points, and event types. The graph builder (112) updates the data stored in the graph storage (114) based on events received in the event storage (110) . In this manner, the graph builder (112) is responsible for updating a graph of events captured by the sensors (102) and processed during the event extraction phase. Details of the operation of the graph builder (112) and graph storage (114) are provided in the description of FIG. 2 and, in particular, in the description of step 212 as well as FIGS. 3A through 3F and 4.
As illustrated, the system (100) includes a monitoring system (116) . In the illustrated embodiment, the monitoring system (116) receives processed event data from the event detection processor (108) . Alternatively, or in conjunction with the foregoing, the monitoring system (116) may also receive raw data from the sensors (102) . The monitoring system (116) is additionally communicatively coupled to the graph storage (114) . In the illustrated embodiment, the monitoring system (116) may comprise a hardware device (or multiple devices) that is responsible for analyzing sensor data and providing actionable intelligence to operators of the device. In other embodiments, the monitoring system (116) may also automatically take action in response to events (e.g., alerting a fire department upon detecting a rise in temperature) .
To provide this insight or take such action, the monitoring system (116) queries the graph storage (114) to identify correlated sensors in response to an event and  perform a root cause analysis. Details of the operation of the monitoring system (116) are provided in the description of FIG. 6.
FIG. 2 is a flow diagram illustrating a method for generating a GoT according to some embodiments of the disclosure.
In step 202, the method (200) receives sensor data. As described above, the method (200) may receive raw data generated by one or more sensors over a network or other communications medium.
In step 204, the method (200) cleans and smooths the data.
In some embodiments, the method (200) performs various cleaning operations on the received data such as filtering anomalies, removing aberrant sensors, smoothing short-term fluctuations in sensor data, and other operations. In some embodiments, cleaning may also include interpolating missing values and/or normalizing values. In some embodiments, the method (200) can utilize moving average smoothing to smooth and clean data. Examples of such moving average approaches include weight moving average (WMA) , exponentially weighted moving average (EWMA) , autoregressive integrated moving average (ARIMA) smoothing, among other techniques. Other techniques by used in addition to (or in lieu) of these techniques and the disclosed embodiments are not intended to be limited to a specific or single technique for cleaning and smoothing the data from sensors. Indeed, multiple techniques may be used simultaneously and the techniques may be unique to each sensor based on the type of time series data generated and observations regarding the fluctuations of the data generated by the sensors.
In step 206, the method (200) decomposes the cleaned and smoothed data into trend, season, and remainder data.
In the illustrated embodiment, time series data received in step 202 can be affected by three components: trend, seasonal, and remainder components. In brief, the seasonal component of a time series refers to patterns that repeat over a fixed interval.  The trend component refers to an overall change in data over a longer period of time. Finally, the remainder refers to the residual data remaining in the time series when the trend and seasonal data are removed.
In one embodiment, a time series can be represented by a function Y (t) , wherein t represents the time elapsed. In this embodiment, Y (t) can be represented as a series of vectors Y (t) = [y (1) , y (2) , ...y (n) ] . In one embodiment, the value of Y (t) can thus be decomposed into an additive collection of trend (T) , seasonal (S) , and remainder (R) functions: Y (t) =T (t) +S (t) +R (t) , wherein t again represents the time elapsed. The choice of an additive or multiplicative model may be made based on the underlying sensor (s) .
In one embodiment, the method (200) receives the raw time series data and first detects the trend component. In one embodiment, the method (200) may use a centered moving average algorithm to calculate the trend T (t) of the time series Y (t) , however other algorithms (e.g., Fourier transform) may be used. The method (200) may then remove the trend T (t) from the time series Y (t) . The method (200) may then analyze the underlying de-trended time series (Y (t) -T (t) ) to determine the period of the resulting data. After identifying the period, the method (200) extracts the seasonal component S (t) which leaves the remainder or noise component R (t) .
In step 208, the method (200) extracts events from the decomposed data.
In one embodiment, the method (200) stores a list of pre-defined events. In general, an event comprises one or more conditions that utilize the underlying components (trend, seasonal, remainder) as inputs. That is, if one or more of the decomposed components satisfy a condition, an event is presumed to have occurred. The disclosed embodiments describe four examples of events: spikes/dips, mean shifts, trend changes and variance shift. However, the disclosed embodiments are not intended to be limited to only these events and other events may be constructed based on the underlying data or needs of the system implementing the method (200) .
Spikes/dips refer to rapid increases or decreases in the value of the remainder component of the time series. As described above, a trend refers to a gradual change in a time series while a season refers to a periodic (and regular) fluctuation. Thus, spikes or dips in a time series are generally attributable to the remainder component of the time series. The detection of spikes/dips is described more fully in the description of FIG. 5A.
A mean shift event refers to a change in the moving average of the time series data overall. In this embodiment, a sliding window is used to calculate the average value of the time series at a given time t and this value is used to determine if a mean shift has occurred. The detection of mean shifts is described more fully in the description of FIG. 5B.
A trend change event refers to a time when the trend component of a time series reverses, either from positive (upward) to negative (downward) or from negative (downward) to positive (upward) . The detection of trend change is described more fully in the description of FIG. 5C.
A variance shift event refers to a change in the variance of the underlying time series data. The detection of a variance shift is described more fully in the description of FIG. 5D.
In general, other features can be extracted from the time series, including but not limited to fluctuations in mean, variance, skewness, kurtosis, median, mode, quantile, etc. One can also include information measure such as entropy, or auto-regressive coefficients. The disclosed embodiments are not intended to be limited solely to the specific examples provided in FIGS. 5A through 5D.
In the illustrated embodiment, the method (200) generates an n-item tuple representing the detected events. Thus, using the three examples above, the method (200) will generate a three-item tuple representing whether the event is present in the time series. Thus, after executing step 208, the method (200) represents the events in a  time series as y j (t i) = [e 1 (t 1) , e 2 (t 1) , e 3 (t 1) , e_4 (t 1) ]  j, where e 1, e 2, e 3, and e 4 represent the presence spikes/dips, mean shift, trend change, or variance change event (respectively) ,
Figure PCTCN2019116370-appb-000001
representing the results of the processing (as described in FIGS. 5A through 5D) , j represents a sensor, t i represents a given time, and y j represents a time series for a sensor j.
In the above manner, an entire time series can thus be defined as Y j (t) = [y j (t i) , j=1...k, i=1...m] = [ [e 1 (t i) , e 2 (t i) , e 3 (t i) ]  j, j=1...k, i=1...m] , where k represents k sensors and t1...tm represents the observations from time t 1 to t m.
In step 210, the method (200) stores the extracted events.
As described above, the output of step 208 comprises a packet comprising a numeric value indicating the present/absence of each type of event, a time component, and a sensor identifier (referred to as an event data structure) . These components are stored within a database. In some embodiments, the event data structure can be stored in a RDBMS or other type of storage medium that allows for future access. As described above, in some embodiments, the event data structures can be stored in a queue type data structure. In other embodiments, the event data structures can be stored in a database that supports the pub/sub architectural pattern (e.g., REDIS) such that the stream of events can be subscribed to by a downstream processing device.
In step 212, the method (200) builds or updates a graph based on the extracted events.
In the illustrated embodiment, the method (200) may first build an event graph representing the events stored in the database. In some embodiments, this graph represents each event type, sensor, and time as a node in the graph. In the graph a given sensor node is connected to either event nodes or time nodes, whereas event nodes and time nodes are only connected to sensor nodes (i.e., are not connected to other time nodes or event nodes) . In some embodiments, the method (200) stores this graph in a  graph database. In other embodiments, this graph may be converted into relational form and stored in an RDBMS. An example of building a graph from a set of events is provided in the description of FIGS. 3A through 3F, incorporated herein by reference. In some embodiments, the event graph can comprise a separately stored graph. In other embodiments, the event graph can comprise a temporary graph (e.g., stored only in memory) . In this embodiment, the event graph is used to guild the sensor graph (of things) described below and in the description of FIG. 4.
From the event graph, the method (200) further synthesizes a sensor graph (also referred to as a sensor graph of things or simply GoT) . In this graph, the nodes only comprise sensors and the edges between the nodes are weighted based on the connectivity of the event graph. In the illustrated embodiment, the sensor graph is built by analyzing the event graph to identify correlated sensors. In one embodiment, sensors are correlated if they experience an event at the same time point. Thus, as one example, if two sensors experience a spike or dip at the same time, the two sensors are correlated. In one embodiment, the weights of the edges of the sensor graph represents the number of time points in which two sensors experience an event simultaneously.
FIGS. 3A through 3F illustrate event graphs and methods of building the same according to some embodiments of the disclosure.
In the illustrated embodiment, the graphs in FIGS. 3A through 3F can be generated using a set of events generated using historical sensor data. Alternatively, or in conjunction with the foregoing, the graphs can be generated using real-time events extracted from a real-time stream of raw sensor data.
In FIG. 3A, a first event is analyzed comprising a Spike or Dip occurring at Time 1 for Sensor 1, the event type (Spikes and Dips) , sensor ID (1) and time point (1) are used as nodes in the graph (300a) and the sensor node is connected to an event type node and the time point node via undirected edges. As illustrated in FIGS. 3A through 3F, all edges are undirected and this point will not be repeated of the sake of brevity.
In FIG. 3B, also at Time 1, a second event is processed wherein Sensor 3 detects a spike or dip. In response, a node for Sensor 3 is added to the graph (300b) . The Time 1 node is re-used and the Sensor 3 node is connected to the Time 1 node via an edge. Additionally, the Sensor 3 node is connected to the existing Spikes and Dips node via an edge. In some embodiments, a new, duplicate Spikes and Dips node may be added; however, as illustrated, event nodes (like time nodes) may be re-used.
In FIG. 3C, a third event is received at Time 2. This event comprises a mean shift type event detected by Sensor 2. In response, a new component is added to the graph (300c) that comprises a new Sensor 2 node connected to a new Time 2 node and a new Mean Shift node. This, in FIG. 3C, the graph (300c) consists of two separate components.
In FIG. 3D, a fourth event is received occurring at Time 2. Since a Time 2 node has already been created, that node is re-used. However, a Sensor 4 node is added as well as a new Trend Change node corresponding to the trend change event detected. As will be discussed in more detail, the graph (300d) depicted in FIG. 3D begins to illustrated correlations between sensors.
In FIG. 3E, a fifth event is received at Time 3. As in FIG. 3D, a new Time 3 node is added to the graph (300e) . Further, the Sensor 1 node is connected to the new Time 3 node. However, since the Sensor 1 node is already connected to the Spikes and Dips node, this connection is re-used. As described above, in some embodiments, a new duplicate Spikes and Dips node can be added and connected to the Sensor 1 node. Alternatively, in some embodiments, a weight of the edge between the Sensor 1 node and the Spikes and Dips node can be increase by one.
In FIG. 3F, the sixth (and final) event is received at Time 3. Similar to the fifth event, the event in FIG. 3F represents a spike and dip detected on sensor 3. In this scenario, all nodes exist (Sensor 3, Time 3, Spikes and Dip) , thus no new nodes are created in the graph (300f) . However, as discussed above, the event type node may be  duplicated again. Thus, in FIG. 3F a new edge between the Sensor 3 node and the Time 3 node is added and the edge between the Sensor 3 node and Spikes and Dips node is re-used (or in some embodiments, increased in weight) .
As will be described below the graph depicted in FIG. 3F can then be used to generate a sensor graph.
FIG. 4 illustrates a sensor graph of things according to some embodiments of the disclosure. In the illustrated embodiment, the sensor graph (400) is generated based on the event graph (300f) illustrated in FIG. 3F.
As illustrated in FIG. 4, the sensor graph (400) differs from the event graph (300f) in that all nodes of the sensor graph (400) comprise sensor nodes. Further edges between sensor nodes are undirected and only include a numerical weight. Thus, data regarding event types and times are removed when generating the sensor graph (400) .
In one embodiment, each sensor node in the event graph (300f) is included in the sensor graph (400) . Further, each node may be connected to every other node with a weight of zero. Alternatively, each node in the sensor graph (400) may initially be unconnected and only connected to other nodes as described herein.
In the illustrated embodiment, for a given sensor, a graph builder (as described in FIG. 1) analyzes the event graph (300f) to determine the number of paths between the given sensor and another sensor that has a distance of two and passes through a time node. To accomplish this, the graph builder can select a sensor node and identify the connected time nodes. The graph builder can then, for each time node, identify any sensor nodes connected to the time nodes. The resulting sensors nodes are then counted and the count is used as the weight between a given sensor and another sensor. Thus, in graph (300f) , Sensor 1 is connected to Sensor 3 via two time nodes (and vice versa) , thus an edge is added to sensor graph (400) between Sensor 1 and Sensor 3 and the weight is set to two. Further,  Sensors  2 and 4 are connected to each other by one time node (Time 2) . Thus, an edge is added to the sensor graph (400) between  Sensors  2 and 4 and the  weight is set to one. As illustrated, all other edges are set to a zero weight or are omitted entirely.
FIG. 5A is a flow diagram illustrating a method for determining if a spike or dip event has occurred according to some embodiments of the disclosure.
In step 502a, the method (500a) receives a remainder component R (t) which comprises the remainder portion of a given time series calculated as described above.
In step 504a, the method (500a) compares the remainder component to a spike threshold (Th spike) and a dip threshold (Th dip) . In one embodiment, these thresholds comprise static values above and below a center point. For example, Th spike can be set to a positive value (e.g., +5) while Th dip may be set to a negative value (e.g., –5) . The specific values of these thresholds are not intended to be limiting. In one embodiment, Th spike must be greater than zero and Th dip must be less than zero.
In step 506a, the method (500a) determines that the value of R (t) exceeds the preconfigured spike threshold Th spike and flags the time (t) as a spike. In one embodiment, flagging the time as a spike comprises setting an event output (e.g., e 1) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5B through 5D, as described more fully in the description of step 208 of FIG. 2.
In step 510a, the method (500a) determines that the value of R (t) is below the preconfigured dip threshold Th dip and flags the time (t) as a dip. In one embodiment, flagging the time as a dip comprises setting an event output (e.g., e 1) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5B through 5D, as described more fully in the description of step 208 of FIG. 2.
In step 508a, if the value of R (t) is not greater than Th spike and is not less than Th dip, the method (500a) bypasses the spikes/dips event detection and sets the event  output (e.g., e 1) value to zero (0) . Thus, in the illustrated embodiment, the method (500a) utilizes the following rule to generate the output value for event indicated as e 1:
Figure PCTCN2019116370-appb-000002
In some embodiments, the method (500a) may utilize output values other than -1, 0, and 1. For example, the method (500a) may set the output value to represent the distance of the value of R (t) from the respective threshold.
In one embodiment, the two thresholds (Th spike and Th dip) are determined using the empirical quantiles of (t) such that event only happens with a small probability α (which can be tuned and may be set to be 1%) . This amounts to respecting the a priori knowledge that an interesting event is rare. In most scenarios, a time series is “normal” and does not contain much information. In some embodiments, the two thresholds (Th spike and Th dip) have a minimum absolute value of threshold (TH abs) greater > 0, which can represent human expert priori knowledge. In this embodiment, TH spike is set as follows:
TH spike=max (TH spike, TH abs) , TH dip=min (TH dip, -TH abs)
FIG. 5B is a flow diagram illustrating a method for determining if a means shift event has occurred according to some embodiments of the disclosure.
In step 502b, the method (500b) defines a sliding window for a time series data set. In some embodiments, the sliding window comprises a fixed duration in which the remaining steps analyze the time series data. The specific duration of this window is not limiting and may be set according to the underlying data or needs of the system.
In one embodiment, the method (500b) computes the mean of the trend component T (t) from two sliding windows, and takes the difference between them. In this embodiment, the left window consists of less recent data:
[T (t -L -R + 1) , T (t -L -R + 2) , ..., T (t -R) ] ,
and the right window contains most recent data:
[T (t -R + 1) , T (t -R + 2) , ..., T (t) ] ,
where L and R are the size of left and right time window respectively.
In this embodiment, values of μ L (t) , μ R (t) are calculated by computing the mean of T (t) in the left and right time window, respectively, and the difference Δμ (t) =μ K (t) -μ L (t) is used as the average sensor time series over the window (denoted as a (t) ) .
In step 504b, the method (500b) averages the values of the time series data points (i.e., values of Y (t) ) appearing within the window. In some embodiments, the average comprises the average value of Y (t) during the aforementioned window.
In step 506b, the method (500b) compares the computed average for a given window (a (t) ) to a mean increase threshold (Th increase) and a mean decrease threshold (Th decrease) . In one embodiment, these thresholds comprise static values above and below a center point. For example, Th increase can be set to a positive value (e.g., +5) while Th decrease may be set to a negative value (e.g., –5) . The specific values of these thresholds are not intended to be limiting. However, the value of Th increase should be higher than the value of Th decrease to avoid the scenario where Th decrease>a (t) >Th increase.
In step 508b, the method (500b) determines that the value of a (t) exceeds the preconfigured mean increase threshold Th increase and flags the time (t) as a mean increase. In one embodiment, flagging the time as a mean increase comprises setting an event output (e.g., e 2) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5C, and 5D, as described more fully in the description of step 208 of FIG. 2.
In step 512b, the method (500b) determines that the value of a (t) is below the preconfigured mean decrease threshold Th decrease and flags the time (t) as a decrease in the mean for a sliding window. In one embodiment, flagging the time as a mean decrease comprises setting an event output (e.g., e 2) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5C, and 5D, as described more fully in the description of step 208 of FIG. 2.
In step 510b, if the value of a (t) is not greater than Th increase and is not less than Th decrease, the method (500b) bypasses the mean increase/decrease event detection and sets the event output (e.g., e 2) value to zero (0) . Thus, in the illustrated embodiment, the method (500b) utilizes the following rule to generate the output value for event indicated as e 2:
Figure PCTCN2019116370-appb-000003
In some embodiments, the method (500b) may utilize output values other than -1, 0, and 1. For example, the method (500b) may set the output value to represent the distance of the value of a (t) from the respective threshold.
FIG. 5C is a flow diagram illustrating a method for determining if a positive or negative trend change event has occurred according to some embodiments of the disclosure.
In step 502c, the method (500c) receives a trend component T (t) which comprises the trend portion of a given time series calculated as described above.
In step 504c, the method (500c) computes the difference over time between the current trend value (T (t) ) and the previous trend value (T (t-1) ) and then calculates the exponential moving average of ΔT (t) =T (t) -T (t-1) , which is denoted as D (t) .
In step 506c, the method (500c) compares the trend component to a positive trend threshold (Th positive) and a negative trend threshold (Th negative) . In one embodiment, these thresholds comprise static values above and below a center point. For example, Th positive can be set to a positive value (e.g., +5) while Th negative may be set to a negative value (e.g., –5) . The specific values of these thresholds are not intended to be limiting. However, the value of Th positive should be higher than the value of Th negative to avoid the scenario where Th negative>D (t) >Th positive.
In step 508c, the method (500c) determines that the value of D (t) exceeds the preconfigured positive trend threshold Th positive and flags the time (t) as a positive. In one embodiment, flagging the time as a positive comprises setting an event output (e.g., e 3) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5B, and 5D, as described more fully in the description of step 208 of FIG. 2.
In step 512c, the method (500c) determines that the value of D (t) is below the preconfigured negative trend threshold Th negative and flags the time (t) as a negative. In one embodiment, flagging the time as a negative trend event comprises setting an event output (e.g., e 3) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A, 5B, and 5D, as described more fully in the description of step 208 of FIG. 2.
In step 510c, if the value of D (t) is not greater than Th positive and is not less than Th negative, the method (500c) bypasses the positive/negative trend event detection and sets the event output (e.g., e 3) value to zero (0) . Thus, in the illustrated embodiment, the method (500c) utilizes the following rule to generate the output value for event indicated as e 3:
Figure PCTCN2019116370-appb-000004
In some embodiments, the method (500c) may utilize output values other than -1, 0, and 1. For example, the method (500c) may set the output value to represent the distance of the value of D (t) from the respective threshold.
FIG. 5D is a flow diagram illustrating a method for determining if a variance change event has occurred according to some embodiments of the disclosure.
In step 502d, the method (500d) receives computes standard deviations σ L (t) and σ R (t) of the remainder component R (t) of left and right sliding windows, respectively of the time-series data. The left and right sliding windows may be computed as described previously, the disclosure of which is not repeated herein.
In step 504d, the method (500c) computes the standard deviation difference over time between the right and left standard deviation, which is denoted as Δσ (t) .
In step 506d, the method (500d) compares the standard deviation difference to a positive variance threshold (Th positive) and a negative variance threshold (Th negative) . In one embodiment, these thresholds comprise static values above and below a center point. For example, Th positive can be set to a positive value (e.g., +5) while Th negative may be set to a negative value (e.g., –5) . The specific values of these thresholds are not intended to be limiting. However, the value of Th positive should be higher than the value of Th negative to avoid the scenario where Th negative>Δσ (t) >Th positive.
In step 508d, the method (500d) determines that the value of Δσ (t) exceeds the preconfigured positive variance threshold Th positive and flags the time (t) as a positive. In one embodiment, flagging the time as a positive comprises setting an event output (e.g., e 4) value to one (1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A through 5C, as described more fully in the description of step 208 of FIG. 2.
In step 512d, the method (500d) determines that the value of Δσ (t) is below the preconfigured negative variance threshold Th negative and flags the time (t) as a negative.  In one embodiment, flagging the time as a negative variance event comprises setting an event output (e.g., e 4) value to negative one (–1) . This value is then passed along to and combined with the other outputs generated in FIGS. 5A through 5C, as described more fully in the description of step 208 of FIG. 2.
In step 510d, if the value of Δσ (t) is not greater than Th positive and is not less than Th negative, the method (500d) bypasses the positive/negative variance event detection and sets the event output (e.g., e 4) value to zero (0) . Thus, in the illustrated embodiment, the method (500d) utilizes the following rule to generate the output value for event indicated as e 4:
Figure PCTCN2019116370-appb-000005
In some embodiments, the method (500d) may utilize output values other than -1, 0, and 1. For example, the method (500d) may set the output value to represent the distance of the value of Δσ (t) from the respective threshold.
FIG. 6 is a flow diagram illustrating a method for analyzing a GoT according to some embodiments of the disclosure.
In step 602, the method (600) monitors sensor data. In some embodiments, the method (600) monitors raw sensor data output accessed over a communications bus as described in the description of FIG. 1.
In step 604, the method (600) determines if an event occurred.
In one embodiment, the method (600) determines if an event occurs by utilizing the processes described in FIGS. 2, 5A, 5B, and 5C. In some embodiments, these events may simultaneously be added to an event store (as described in FIG. 2) and processed by the method (600) . In the illustrated embodiment, an event detected in step 604 includes a sensor identifier (i) representing the sensor associated with the event. In  some embodiments,  steps  602 and 604 may be optional. In this embodiment, the method (600) receives pre-processed events directly from the event extraction phase depicted in FIG. 1.
If the method (600) does not detect an event (e.g., spikes/dips, mean shift, trend change, or variance shift is not detected) , the method (600) continues to monitor the system for events. Alternatively, if the method (600) detects an event, the method (600) proceeds to step 608.
In step 606, the method (600) retrieves a sensor identifier (i) from the event detected in step 606. In some embodiments, the sensor identifier can comprise a globally unique identifier, Internet Protocol (IP) address, or other uniquely identifying information.
In the illustrated embodiment, the method (600) uses the sensor identifier to query the graph storage that stores the sensor graph. That is, the method (600) uses the sensor identifier as a query key to retrieve data regarding the identified sensor from the graph storage. In one embodiment, the method (600) requests a list of first-level connections of the node associated with the sensor identifier from the sensor graph.
In step 608, the method (600) determines if the query above returns an empty set; that is, if the node associated with the sensor identifier does not have any first-level connections. As illustrated in FIG. 4, a first-level connection refers to another node in the graph connected to the node associated with the sensor identifier having a distance of one.
In step 610, the method (600) flags the event for further analysis if the sensor does not have any first-level connections. In this scenario, the method (600) cannot identify any sensors correlated with the sensor identifier extracted in step 606. Thus, while an event occurred (as identified in step 604) , it cannot be correlated using the sensor graph and requires further intervention by, for example, an operator of a monitoring system.
In step 612, if the method (600) determines that one or more first-level connections exist, the method (600) retrieves the first-level connections from the sensor graph. In some embodiments,  steps  610 and 612 may be combined. In the illustrated embodiment, the first-level connections comprise a set of nodes associated with other sensors. Each of these nodes may include one or more events previously recorded for the respective sensor (and times of these events) . In some embodiments, further analysis or detail can be included for each event.
In step 614, the method (600) analyzes the identified sensor nodes to determine if the same type of event has been recorded for any of the sensors in the first-level connections list. In some embodiments, the method (600) limits the analysis in step 614 to concurrent events. In other embodiments, the method (600) may limit the analysis to events occurring within a predefined time from the current event detected in step 604. Alternatively, or in conjunction with the foregoing, the method (600) may analyze the shape of the trend of the time series data for each sensor to determine if the trend is similar to the trend for the sensor identified by the sensor identifier.
In step 616, the method (600) sets a set of correlated sensors to be equal to those sensors having a same or similar event or trend identified in step 614. In the illustrated embodiment, the method (600) stores identifiers associated with the correlated sensors and then flags the correlated sensors for further analysis (described previously in connection with step 610) .
In  steps  618, 620, and 622, the method (600) proceeds to analyze the second-level connections of the sensor identified in step 606. As used herein a second-level connection refers to a node in the sensor graph having a distance of two from the sensor identified in step 606.  Steps  618, 620, and 622 are similar to  steps  608, 612, and 614 and the details of these steps are not repeated herein.
In general, steps 608, 612, and 614 (as well as 618, 620, and 622) can be performed for any level connection of the sensor identified in step 606. That is, in  addition to analyzing first-and second-level connections (as illustrated) , the method (600) can analyze nodes that have distances of three or more. In some embodiments, the number of levels to analyze may be preconfigured by the method (600) . In other embodiments, the method (600) may continue to search each level until at least one correlated sensor node is found.
In the illustrated embodiment, the method (600) halts after identifying at least one correlated sensor in a given level. In other embodiments, the method (600) may continue to analyze additional levels even when finding a correlated sensor at a given level. In this embodiment and other embodiments, the method (600) may rank the correlated sensors based on their distance from the sensor identified in step 606. For example, a correlated first-level sensor (step 614) may be ranked higher than a correlated second-level sensor (step 622) .
FIG. 7 is a hardware diagram illustrating a device for generating or analyzing a GoT according to some embodiments of the disclosure.
Device (700) may include many more or fewer components than those shown in FIG. 7. However, the components shown are sufficient to disclose an illustrative embodiment for implementing the present disclosure. Device (700) may represent, for example, devices discussed above in relation to FIG. 1.
As shown in FIG. 7, device (700) includes a processing unit (CPU) (702) in communication with a mass memory (730) via a bus (724) . Device (700) also includes one or more network interfaces (750) , an audio interface (752) , a display (754) , a keypad (756) , an illuminator (758) , an input/output interface (760) , and a camera (s) or other optical, thermal or electromagnetic sensors (762) . Device (700) can include one camera/sensor (762) , or a plurality of cameras/sensors (762) , as understood by those of skill in the art.
Device (700) may optionally communicate with a base station (not shown) , or directly with another computing device. Network interface (750) includes circuitry for  coupling device (700) to one or more networks and is constructed for use with one or more communication protocols and technologies. Network interface (750) is sometimes known as a transceiver, transceiving device, or network interface card (NIC) .
Audio interface (752) is arranged to produce and receive audio signals such as the sound of a human voice. For example, audio interface (752) may be coupled to a speaker and microphone (not shown) to enable telecommunication with others and generate an audio acknowledgement for some action. Display (754) may be a liquid crystal display (LCD) , gas plasma, light emitting diode (LED) , or any other type of display used with a computing device. Display (754) may also include a touch sensitive screen arranged to receive input from an object such as a stylus or a digit from a human hand.
Keypad (756) may comprise any input device arranged to receive input from a user. For example, keypad (756) may include a push button numeric dial, or a keyboard. Keypad (756) may also include command buttons that are associated with selecting and sending images. Illuminator (758) may provide a status indication and provide light. Illuminator (758) may remain active for specific periods of time or in response to events. For example, when illuminator (758) is active, it may backlight the buttons on keypad (756) and stay on while the device is powered. Also, illuminator (758) may backlight these buttons in various patterns when particular actions are performed, such as dialing another device. Illuminator (758) may also cause light sources positioned within a transparent or translucent case of the device to illuminate in response to actions.
Device (700) also comprises input/output interface (760) for communicating with external devices. Input/output interface (760) can utilize one or more communication technologies, such as USB, infrared, Bluetooth TM, or the like.
Mass memory (730) includes a RAM (732) , a ROM (724) , and other storage means. Mass memory (730) illustrates another example of computer storage media for storage of information such as computer-readable instructions, data structures, program  modules or other data. Mass memory (730) stores a basic input/output system ( “BIOS” ) (740) for controlling low-level operation of device (700) . The mass memory may also stores an operating system for controlling the operation of device (700) . It will be appreciated that this component may include a general purpose operating system such as a version of UNIX, or LINUX TM, or a specialized client communication operating system such as Windows Client TM, or the
Figure PCTCN2019116370-appb-000006
operating system. The operating system may include, or interface with a Java virtual machine module that enables control of hardware components and operating system operations via Java application programs.
RAM (732) stores and executes one or more applications (734) . In some embodiments, these application comprise software configured to execute one or more of the operations described in connection with the foregoing figures. In some embodiments, the device (700) further includes persistent storage (e.g., hard disk, solid state drive, etc. ) storage for storing the applications (734) prior to executing in RAM (732) .
For the purposes of this disclosure a module is a software, hardware, or firmware (or combinations thereof) system, process or functionality, or component thereof, that performs or facilitates the processes, features, and functions described herein (with or without human interaction or augmentation) . A module can include sub-modules. Software components of a module may be stored on a computer-readable medium for execution by a processor. Modules may be integral to one or more servers, or be loaded and executed by one or more servers. One or more modules may be grouped into an engine or an application.
Those skilled in the art will recognize that the methods and systems of the present disclosure may be implemented in many manners and as such are not to be limited by the preceding exemplary embodiments and examples. In other words, functional elements being performed by single or multiple components, in various  combinations of hardware and software or firmware, and individual functions, may be distributed among software applications at either the client level or server level or both. In this regard, any number of the features of the different embodiments described herein may be combined into single or multiple embodiments, and alternate embodiments having fewer than, or more than, all of the features described herein are possible.
Functionality may also be, in whole or in part, distributed among multiple components, in manners now known or to become known. Thus, myriad software/hardware/firmware combinations are possible in achieving the functions, features, interfaces and preferences described herein. Moreover, the scope of the present disclosure covers conventionally known manners for carrying out the described features and functions and interfaces, as well as those variations and modifications that may be made to the hardware or software or firmware components described herein as would be understood by those skilled in the art now and hereafter.
Furthermore, the embodiments of methods presented and described as flowcharts in this disclosure are provided by way of example in order to provide a more complete understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Alternative embodiments are contemplated in which the order of the various operations is altered and in which sub-operations described as being part of a larger operation are performed independently.
While various embodiments have been described for purposes of this disclosure, such embodiments should not be deemed to limit the teaching of this disclosure to those embodiments. Various changes and modifications may be made to the elements and operations described above to obtain a result that remains within the scope of the systems and processes described in this disclosure.

Claims (23)

  1. A method comprising:
    retrieving raw sensor data collected by a plurality of sensors;
    identifying a plurality of events based on the raw sensor data; and
    building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events.
  2. The method of claim 1, further comprising querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
  3. The method of claim 1, the identifying a plurality of events comprising:
    decomposing the raw sensor data into trend, seasonal, and remainder components; and
    identifying the plurality of events based on one or more of the trend, seasonal, and remainder components.
  4. The method of claim 3, each event in the plurality of events comprising an event selected from the group consisting of a spike or dip event, mean shift event, variance shift, or trend change event.
  5. The method of claim 4, the identifying a plurality of events comprising identifying a spike or dip event by comparing the remainder component to a spike threshold and dip threshold.
  6. The method of claim 4, the identifying a plurality of events comprising identifying a mean shift event by:
    averaging the raw sensor data over a sliding time window; and
    comparing the average to a mean increase and mean decrease threshold.
  7. The method of claim 4, the identifying a plurality of events comprising identifying a trend change event by comparing the trend component to a positive trend or negative trend threshold.
  8. The method of claim 1, the building a sensor graph comprising:
    building an event graph based on the plurality of events, the event graph storing event types, times, and sensor identifiers as nodes; and
    building the sensor graph based on the event graph.
  9. The method of claim 1, the querying the sensor graph comprising:
    detecting a second event for a second sensor;
    identifying a first-level connection sensor in the sensor graph for the second sensor; and
    using the first-level connection sensor as the correlated sensor.
  10. A non-transitory computer readable storage medium for tangibly storing computer program instructions capable of being executed by a computer processor, the computer program instructions defining the steps of:
    retrieving raw sensor data collected by a plurality of sensors;
    identifying a plurality of events based on the raw sensor data; and
    building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events.
    .
  11. The non-transitory computer readable storage medium of claim 10, the instructions further defining the step of querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph
  12. The non-transitory computer readable storage medium of claim 10, the identifying a plurality of events comprising:
    decomposing the raw sensor data into trend, seasonal, and remainder components; and
    identifying the plurality of events based on one or more of the trend, seasonal, and remainder components.
  13. The non-transitory computer readable storage medium of claim 11, each event in the plurality of events comprising an event selected from the group consisting of a spike or dip event, mean shift event, variance shift, or trend change event.
  14. The non-transitory computer readable storage medium of claim 12, the identifying a plurality of events comprising identifying a spike or dip event by comparing the remainder component to a spike threshold and dip threshold.
  15. The non-transitory computer readable storage medium of claim 12, the identifying a plurality of events comprising identifying a mean shift event by:
    averaging the raw sensor data over a sliding time window; and
    comparing the average to a mean increase and mean decrease threshold.
  16. The non-transitory computer readable storage medium of claim 12, the identifying a plurality of events comprising identifying a trend change event by comparing the trend component to a positive trend or negative trend threshold.
  17. The non-transitory computer readable storage medium of claim 10, the building a sensor graph comprising:
    building an event graph based on the plurality of events, the event graph storing event types, times, and sensor identifiers as nodes; and
    building the sensor graph based on the event graph.
  18. The non-transitory computer readable storage medium of claim 10, the querying the sensor graph comprising:
    detecting a second event for a second sensor;
    identifying a first-level connection sensor in the sensor graph for the second sensor; and
    using the first-level connection sensor as the correlated sensor.
  19. An apparatus comprising:
    a processor; and
    a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic causing the processor to perform the operations of:
    retrieving raw sensor data collected by a plurality of sensors, identifying a plurality of events based on the raw sensor data, and
    building a sensor graph based on the plurality of events, the sensor graph having nodes representing the plurality of sensors and a set of edges connecting the nodes, each edge in the set of edges associated with a weight calculated based on a number of correlated events detected in the plurality of events.
  20. The apparatus of claim 19, the operations further including querying the sensor graph in response to a new event associated with a sensor in the plurality of sensors, the querying comprising identifying at least one correlated sensor connected to the sensor in the sensor graph.
  21. The apparatus of claim 19, the identifying a plurality of events comprising:
    decomposing the raw sensor data into trend, seasonal, and remainder components; and
    identifying the plurality of events based on one or more of the trend, seasonal, and remainder components.
  22. The apparatus of claim 19, the building a sensor graph comprising:
    building an event graph based on the plurality of events, the event graph storing event types, times, and sensor identifiers as nodes; and
    building the sensor graph based on the event graph.
  23. The apparatus of claim 19, the querying the sensor graph comprising:
    detecting a second event for a second sensor;
    identifying a first-level connection sensor in the sensor graph for the second sensor; and
    using the first-level connection sensor as the correlated sensor.
PCT/CN2019/116370 2019-11-07 2019-11-07 Data-driven graph of things for data center monitoring copyright notice WO2021087896A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/116370 WO2021087896A1 (en) 2019-11-07 2019-11-07 Data-driven graph of things for data center monitoring copyright notice
CN201980099754.8A CN114365505A (en) 2019-11-07 2019-11-07 Data-driven object graph for data center monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/116370 WO2021087896A1 (en) 2019-11-07 2019-11-07 Data-driven graph of things for data center monitoring copyright notice

Publications (1)

Publication Number Publication Date
WO2021087896A1 true WO2021087896A1 (en) 2021-05-14

Family

ID=75848677

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116370 WO2021087896A1 (en) 2019-11-07 2019-11-07 Data-driven graph of things for data center monitoring copyright notice

Country Status (2)

Country Link
CN (1) CN114365505A (en)
WO (1) WO2021087896A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350508A (en) * 2023-10-31 2024-01-05 深圳市黑云精密工业有限公司 Production work order distribution system based on real-time acquisition data of production line collector

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090138590A1 (en) * 2007-11-26 2009-05-28 Eun Young Lee Apparatus and method for detecting anomalous traffic
CN106254130A (en) * 2016-08-25 2016-12-21 华青融天(北京)技术股份有限公司 A kind of data processing method and device
WO2019160433A1 (en) * 2018-02-15 2019-08-22 Siemens Aktiengesellschaft Method and device for processing transient events in a distribution network

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101808289B (en) * 2010-04-07 2012-09-05 上海交通大学 Method for acquiring data of wireless sensor network based on mobile sink node
EP2725552A1 (en) * 2012-10-29 2014-04-30 ATS Group (IP Holdings) Limited System and method for selecting sensors in surveillance applications
US9600550B2 (en) * 2013-03-15 2017-03-21 Uda, Llc Optimization for real-time, parallel execution of models for extracting high-value information from data streams
CN103561420B (en) * 2013-11-07 2016-06-08 东南大学 Method for detecting abnormality based on data snapshot figure
EP2919124A1 (en) * 2014-03-12 2015-09-16 Haltian Oy Relevance determination of sensor event
US9632846B2 (en) * 2015-04-02 2017-04-25 Microsoft Technology Licensing, Llc Complex event processor for historic/live/replayed data
CN106560824A (en) * 2015-09-30 2017-04-12 中兴通讯股份有限公司 Event detection method, device and system
US10410129B2 (en) * 2015-12-21 2019-09-10 Intel Corporation User pattern recognition and prediction system for wearables
CN105764162B (en) * 2016-05-10 2019-05-17 江苏大学 A kind of wireless sensor network accident detection method based on more Attribute Associations
CN109791585B (en) * 2016-09-19 2023-10-10 西门子股份公司 Detecting network attacks affecting computing devices computer-implemented method and system of
US10447526B2 (en) * 2016-11-02 2019-10-15 Servicenow, Inc. Network event grouping
CN108173670B (en) * 2016-12-07 2020-06-02 华为技术有限公司 Method and device for detecting network
US10771486B2 (en) * 2017-09-25 2020-09-08 Splunk Inc. Systems and methods for detecting network security threat event patterns
CN107544450B (en) * 2017-10-11 2019-06-21 齐鲁工业大学 Process industry network model construction method and system based on data
CN108829794B (en) * 2018-06-04 2022-04-12 北京交通大学 Alarm analysis method based on interval graph
CN109361728B (en) * 2018-08-30 2021-01-29 中国科学院上海微系统与信息技术研究所 Hierarchical event reporting system and method based on multi-source sensing data relevance
CN109600378B (en) * 2018-12-14 2021-04-20 中国人民解放军战略支援部队信息工程大学 Heterogeneous sensor network abnormal event detection method without central node
CN110059238A (en) * 2019-04-19 2019-07-26 上海应用技术大学 Emergency event sensor network construction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090138590A1 (en) * 2007-11-26 2009-05-28 Eun Young Lee Apparatus and method for detecting anomalous traffic
CN106254130A (en) * 2016-08-25 2016-12-21 华青融天(北京)技术股份有限公司 A kind of data processing method and device
WO2019160433A1 (en) * 2018-02-15 2019-08-22 Siemens Aktiengesellschaft Method and device for processing transient events in a distribution network

Also Published As

Publication number Publication date
CN114365505A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN110865929B (en) Abnormality detection early warning method and system
US10476749B2 (en) Graph-based fusing of heterogeneous alerts
US11151014B2 (en) System operational analytics using additional features for health score computation
US20220027556A1 (en) Mapper component for a neuro-linguistic behavior recognition system
US10438124B2 (en) Machine discovery of aberrant operating states
US10346756B2 (en) Machine discovery and rapid agglomeration of similar states
US10476752B2 (en) Blue print graphs for fusing of heterogeneous alerts
US20240070388A1 (en) Lexical analyzer for a neuro-linguistic behavior recognition system
CN108123849A (en) Detect threshold value determination method, device, equipment and the storage medium of network traffics
JP5933463B2 (en) Log occurrence abnormality detection device and method
CN107888441A (en) A kind of network traffics baseline self study adaptive approach
EP3430767A1 (en) Method and device for real-time network event processing
CN109815085B (en) Alarm data classification method and device, electronic equipment and storage medium
US12013880B2 (en) Dynamic resolution estimation for a detector
CN115617606A (en) Equipment monitoring method and system, electronic equipment and storage medium
CN112465237A (en) Fault prediction method, device, equipment and storage medium based on big data analysis
WO2021087896A1 (en) Data-driven graph of things for data center monitoring copyright notice
KR20190078685A (en) Method of Anomaly Pattern Detection for Sensor Data using Increamental Clustering
CN108989083B (en) Fault detection performance optimization method based on hybrid strategy in cloud environment
JP7339321B2 (en) Machine learning model update method, computer program and management device
CN111258863A (en) Data anomaly detection method, device, server and computer-readable storage medium
WO2020261621A1 (en) Monitoring system, monitoring method, and program
WO2023069310A1 (en) Dynamic resolution estimation in metric time series data
CN114615131A (en) Self-adaptive fault diagnosis algorithm for multi-stage cloud computing system
CN117909173A (en) Cloud application health degree analysis method and device based on big data cloud platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19951640

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19951640

Country of ref document: EP

Kind code of ref document: A1