US10706080B2

US10706080B2 - Event clustering and event series characterization based on expected frequency

Info

Publication number: US10706080B2
Application number: US15/720,779
Authority: US
Inventors: Conrad M. Albrecht; Marcus Freitag; Theodore Van Kessel; Siyuan Lu; Hendrik F. Hamann
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2017-09-29
Filing date: 2017-09-29
Publication date: 2020-07-07
Also published as: US20190102449A1

Abstract

A method for clustering time stamps in time series data includes receiving a one-dimensional array of ordered timestamps and an expected frequency; determining a set of time intervals; determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency; determining a second binary array of differences between a corresponding pair of adjacent elements of the first binary array; appending an ith timestamp to one of a set of opening interval bounds, a set of closing interval bounds, or a set of isolated points; merging the set of set of opening interval bounds and the set of closing interval bounds into a set of cluster intervals τ; and outputting the set of cluster intervals and the set of isolated points.

Description

TECHNICAL FIELD

Embodiments of the present disclosure are directed to a one-dimensional, one-parameter clustering method with linear complexity on input that quantifies data availability or measures the reliability/stability of a device connecting network, such as IoT.

DISCUSSION OF THE RELATED ART

The concept of Internet of Things (IoT) is intimately related to records of certain events, e.g. a network attached device capturing weather information to be broadcast to other devices for processing. Given (a) transmitter that frequently sends out such information every time interval ΔT, and (b) receiver that keeps track of the timestamps when data was transmitted/recorded, a one-dimensional time series t={t_i}, i=1, . . . , N stores information on failures of the recording/sending/transmission/receiving.

If the one-dimensional data t is clustered such that consecutive events are not more than ΔT apart, periods in time where data might be missing can be inferred. Upon detection, corresponding actions, such as retransmission, data interpolation, etc., can be performed. Moreover, the characteristics of intervals of no data, e.g., relative frequency, duration, etc., can help diagnose the status of the communication network.

Since general purpose, multi-dimensional clustering methods such as Fisher's discriminant, k-means clustering, or more generally expectation-maximization (EM) do not exploit the special property of ordering in one dimension, a simpler approach that does not need knowledge of the number of clusters/intervals can be used, that avoids density estimation.

SUMMARY

Exemplary embodiments of the present disclosure are directed to event clustering for application in IoT service quality characterization using a computer-implemented method with time complexity O(N). Embodiments exploit the natural ordering of a sequence of timestamps recorded by a receiver device to minimize computational complexity by a factor of O(log N). Given an expected frequency ΔT⁻¹, an O(N)-efficient clustering procedure of N events represented by an ordered series of timestamps t₁, t₂, . . . , t_Ncan identify time intervals of missing data, locate isolated events, and characterize a series of events by varying ΔT to determine the quality of service over an IoT network.

According to an embodiment of the disclosure, there is provided a computer-implemented method for clustering time stamps in time series data, including the steps of: receiving, from an electrically connected network, a one-dimensional array t of ordered timestamps and an expected frequency dT, wherein N is a number of timestamps in the one-dimensional array t; determining a set of time intervals, wherein a time interval is determined between each pair of adjacent timestamps in the one-dimensional array t of ordered timestamps; determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency dT; determining a second binary array, wherein each element of the second binary array is a difference between a corresponding pair of adjacent elements of the first binary array; appending an ith timestamp to a set of opening interval bounds τ− if a corresponding second binary array is equal to −1; appending an ith timestamp to a set of closing interval bounds τ+ if a corresponding second binary array is equal to 1; appending an ith timestamp to a set of isolated points x if a next corresponding first binary array value is equal to 1; merging the set of set of opening interval bounds τ− and the set of closing interval bounds τ+ into a set of cluster intervals τ; and outputting the set of cluster intervals τ and the set of isolated points x.

According to a further embodiment of the disclosure, the electrically connected network is an Internet-of-things network.

According to a further embodiment of the disclosure, the electrically connected network is a network of satellites.

According to a further embodiment of the disclosure, the electrically connected network is a direct link to a database.

According to a further embodiment of the disclosure, the method includes determining a first service quality measure that measures a reliability of the electrically connected network from

C_{t}^{o} (f) = \frac{1}{Δ t} \sum_{k} \langle τ_{k} \rangle = {\begin{matrix} 1 & Δ T \leq 0 \\ 0 \dots 1, & 0 < Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix}

wherein Δt=t_N−t₁is a total time interval, and f=log [ΔT⁻¹/(Δt/|t|)⁻¹]=−log ΔT|t|/Δt, wherein C_t ^o(f) is a fraction of time with no operation failures in the electrically connected network.

According to a further embodiment of the disclosure, the method includes determining a second service quality measure that is a normed number of clusters from

C_{t}^{n} (f) = 2 \frac{\langle τ \rangle - δ_{1 \langle τ \rangle}}{\langle t \rangle} = {\begin{matrix} 0 & Δ T < 0 \\ 0 \dots 1, & 0 \leq Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix}

wherein Δt=t_N−t₁is a total time interval, f=log [ΔT⁻¹/(Δt/|t|)⁻¹]=−log ΔT|t|/Δt, |τ| is a number of clusters, and |t| is a number of time stamps.

According to a further embodiment of the disclosure, the method includes determining a third service quality measure that is a measure of a number of isolated events from

C_{t}^{s} (f) = \frac{\langle x \rangle}{\langle t \rangle} = {\begin{matrix} 0 & Δ T < 0 \\ 0 \dots 1, & 0 \leq Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix},

wherein Δt=t_N−t₁is a total time interval, f=log [ΔT⁻¹/(Δt/|t|)⁻¹]=−log ΔT|t|/Δt, |x| is a number of isolated points, and |t| is a number of time stamps.

According to a further embodiment of the disclosure, the method includes determining the set of opening interval bounds τ− and the set of closing interval bounds τ+ from
τ^±={τ_k ^± :k<k′⇒τ _k ^±<τ_k′ ^± },k=1, . . . ,K≤N/2−1,
and determining cluster intervals as τ_k=[τ_k−1 ⁻,τ_k ⁺], k=1, . . . , K.

According to a further embodiment of the disclosure, letting b represent the first binary array, b[0]=1 and b[N]=1.

According to an embodiment of the disclosure, there is provided a non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for clustering time stamps in time series data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an exemplary system for implementation a method clustering a one-dimensional (1-D) time series of events, according to an embodiment of the disclosure.

FIG. 2 is a sample plot of IoT service quality measures from event time series obtained by varying ΔT, according to an embodiment of the disclosure.

FIG. 3 is a flowchart of a method of clustering a 1-D time series of events, according to an embodiment of the disclosure.

FIG. 4 is a schematic of an exemplary cloud computing node that implements an embodiment of the disclosure.

FIG. 5 shows an exemplary cloud computing environment according to embodiments of the disclosure.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Exemplary embodiments of the disclosure as described herein generally provide systems and methods for clustering and characterizing signal events on the IoT. While embodiments are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure.

Embodiments of the disclosure provide a method to cluster timestamps based on an expected frequency of a service, e.g., an IoT domain, satellite images, etc., and output information that can be used to characterize the quality of a service, and thus rate reliability/stability of the service.

Further embodiments of the disclosure provide methods for randomly probing and clustering time series of point data satellite imagery stored in a database given the latitude and longitude coordinates, and to efficiently estimate the availability of geo-spatial data, given the constraint that it is practically intractable to scan the whole database.

One-Dimensional Clustering

Given a set of N ordered timestamps t={t_i}, i=1, . . . , N, i.e.
i≤j⇒t _i ≤t _j (1)
and an expected time interval ΔT, time intervals τ_kare defined such that
t _i ,t _i+1∈τ_k=[τ_k ⁻,τ_k ⁺]⇒δt _i =t _i+1 −t _i ≤ΔT. (2)
Note that ΔT might be an external parameter to an algorithm that provides a solution or it can be defined by t itself, e.g. through

〈 δ t 〉 = \frac{1}{N} \sum_{i} δ t_{i} .

The δt_ineed to be computed and therefore

δt

is efficiently determined along these lines.

To fulfill EQ. (2), of course, at least N−1 time intervals need to be computed:
δt{δt _i =t _i+1 −t _i },i=1, . . . ,N−1. (3)

Whenever a new time series point t_N+1>t_Nis (randomly) added, there is no a priori way of determining from the existing t_i≤Nwhether ΔT is exceeded.

To classify the t_ias interval bounds τ_k ^±, note that the binary sequence
b={b _i=int(δt _i >ΔT)},i=1, . . . ,N−1 (4)
switches from 1 to 0 for an opening interval bound τ−, and from 0 to 1 for a closing interval bound τ+, only. The function int( ) has values int(True)→1 and int(False)→0. Hence the quantity
B={B _i =b _i −b _i−1 },i=2, . . . ,N−1 with B _i∈{−1,0,1} (5)
yields the desired association
B _i=±1⇒t _i∈τ^±, (6)
Based on the property that the t_iare ordered, as expressed by EQ. (1), then the binary (discrete) function b_iimplies the alternating property:
∀i<j:1=B _i =B _i ⇒∃i<l<j:B _i=−1. (7)
Thus, linearly scanning through the t_iand their corresponding B_iresults in two sets
τ^±{τ_k ^± :k<k′⇒τ _k ^±<τ_k′ ^± },k=1, . . . ,K≤N/2−1 (8)
That can be interleaved to obtain the corresponding time intervals as a solution, EQ. (2).
Boundary Conditions

According to embodiments of the disclosure, there several options for interleaving the τ^±, which depend on boundary conditions. More specifically, assume that the sequence t₁, t₂, . . . starts with, e.g., intervals that are smaller than ΔT. In this case, τ₁ ⁻>τ₁ ⁺, and a τ₀ ⁻ one needs to manually added to construct intervals:
τ_k=[τ_k−1 ⁻,τ_k ⁺]. (9)
A corresponding issue might occur at the end of the time series {t_i}, depending on whether K⁺=|τ⁺| is equal or not equal to K⁻=|τ⁻|. Note that by virtue of EQ. (7) the difference |K⁺−K⁻| is at most 1. However, due to the fact that |B|=N−2≠|t|=N, boundary values τ^± need to be manually added.

According to embodiments, to prevent manually dealing with all (four) different boundary condition scenarios, the following timestamps can be added from the outset:
t ₀=−∞ and t _N+1=+∞. (10)
Hence, δt₀ =δt _N=+∞, and therefore
b ₀ =b _N=1 (11)
which yields N B_ithat corresponding to N t_ifor classification such that
τ={τ_k=[τ_k ⁻,τ_k ⁺],k=1, . . . ,K (12)
from
τ^±={τ_k ^± :k<k′⇒τ _k ^±<τ_k′ ^± },k=1, . . . ,K (13)
with |τ|=K≤N/2.
Isolated Points

According to embodiments, given the solution EQ. (12), due to the ordering of the t_i, open intervals
τ={τ _k=(τ_k ⁺,τ_k+1 ⁻)}_{k=1, . . . ,K−1}, (14)
can be formed that are associated with time intervals of failure. Note, that [t₁, t_N]=τ∪τ. However, these intervals do not imply
∀i,k:t _i∉τ _k, (15)
i.e., informally, it is not true that no event occurs during the intervals f, but is true that
τ_i∈τ _k⇒|τ_i−τ_i±1 |>ΔT (16)
where t_iis referred to as an isolated event. These timestamps can be considered to be noise, while all τ_i∉τ^± are border points.

According to embodiments, isolated events have b_i=1 and since they are not interval boundary points, they need to have B_i=0. This way b and B can be used to classify isolated events according to
B _i=0∧b _i=1⇒t _i ∈x, (17)
where x denotes the set of isolated timestamps. Likewise, clustered timestamps can be defined as
B _i=0∧b _i=0⇒t _i ∈x (18)
Since b is binary and EQ. (5) holds for B, all t_iare uniquely classified, i.e. t=τ+∪τ−∪×∪x. According to embodiments, there is an association x⁽⁻⁾↔τ⁽⁻⁾in the sense that all t_i∉x are within an interval of τ and all t_i∉x are within an interval of τ.
Implementation & Computational Complexity

FIG. 1 is a block diagram of an exemplary system that implements a method for clustering a 1-D time series of events, according to an embodiment of the disclosure. Referring now to the figure, an information service 11, such as an IoT device or a geo-spatial database, transmits a time series 12 of information packets through a network, such as, e.g., the Internet which employs physical networks such as e.g. a WiFi network, the Ethernet, etc., to an event cluster engine 3. The event cluster engine 13 includes a frequency detector 13.5, that can receive an event frequency ΔT that was manually input by a user or automatically determined from an average of the event intervals in the data. The event cluster engine 13 can store the time series data in physical storage 14, such as memory or a hard drive, etc. The time series data includes cluster intervals demarcated by respective start and end boundaries τ_n ⁻ and τ_n ⁺, τ_n+1 ⁻ and τ_n+1 ⁺, etc., and isolated points t_m, t_m+1, t_m+2, etc. A user can interact with the event cluster engine 13 via a user interface 15, such as a RESTful API service, to, e.g., generate output, 18, such as a JSON text file. The event cluster engine 13 is also connected to a cluster measure engine 16, that can calculate various cluster measures, as described below. The cluster measures can be sent to a service monitor 17, such as, e.g., Ganglia (http://ganglia.info/) or Nagios (https://www.nagios.com/), that can determine the reliability or stability of the information service 11 based on the cluster measures, and generate suitable warning messages 19, such as email or Apache Kafka (https://kafka.apache.org/) messages, etc.

The following listing provides a pseudo-code implementation according to an embodiment of a method of clustering a 1-D time series of events, with reference to steps numbers of the flowchart of FIG. 3.


algorithm cluster_events is
input: one-dimensional array t of ordered timestamps to cluster,
expected frequency dT
output: set of cluster intervals tau, list of isolated timestamps x
define ordered list tauMinus, tanPlus, x
define set tau
define array dt, b, B

N = length of t	// Step 30
for i from 1 to N−1 do dt[i] = t[i]−t[i−1] ]	// Step 31
b[0] = 1	// Step 32
for i from 1 to N−1 do

if dt[i] > dT then	b[i] = 1
else	b[i] = 0
b[N] = 1

for i from 0 to N−1 do B[i] = b[i+1]−b[i]	// Step 33
for i from 0 to N−1 do	// Step 34

if B[i] == −1 then	append t[i] to tauMinus
else if B[i] == 1 then	append t[i] to tauPlus
else if b[i+1] == 1 then	append t[i] to x

for i from 0 to length of tauMinus	// Step 35
add interval [tauMinus[i], tauPlus[i]] to tau
return tau, x	// Step 36

For example, a call of cluster_events(t, dT) on

t=[−20, −18, 1, 2, 2.9, 10, 11, 100, 200, 202, 202, 203]

given

dT as −1, 0, 1, 10, 100,

which equals to ΔT, returns output equivalent to

[ ], [−20, −18, 1, 2, 2.9, 10, 11, 100, 200, 202, 202, 203]

[[202, 202]], [−20, −18, 1, 2, 2.9, 10, 11, 100, 200, 203]

([(1, 2.9), (10, 11), (202, 203)], [−20, −18, 100, 200]

([(−20, −18), (1, 11), (200, 203)], [100]

[(−20, 203)], [ ]

[(−20, 11), (200, 203)], [100]

respectively. In the above output listing, the bracketed quantities on the left are the cluster intervals τ, the bracketed quantities on the right are the isolated points x. Square brackets with comma separated elements denote list data structures. In the first line, r is empty, as indicated by the [ ], and thus each time stamp is classified as an isolated point.

According to an embodiment of the disclosure, the above pseudo-code listing can be implemented in any suitable computer language, such as Python or C/C++. An efficient implementation in C/C++ uses N−1 algebraic operations for δt, N−2 logical operations for b and again N algebraic operations for B which determines the interval boundary classification with a total of 3(N−1) operations.

A naive approach would compute two time intervals for each t_i, and perform two logical operations of those against ΔT, which would determine the classification, hence 4N computations.

As a further example according to an embodiment, a sample JSON load for a query to the user interface 15 of FIG. 1 in the case of a data availability service of geo-spatial satellite images in a database such as IBM PAIRS (https://pairs.res.ibm.com) may be:


{ “query”: {
“datalayer-IDs”: [ 23, 15100, 3],
“data-frequencies”: [ 3800, 90000, 2800000],
“start-UTC-time”: [ 2016, 10, 1],
“aoi”: { “type”: “Polygon”,
“coordinates”: [[[−73.9, 40.5], [ −73.8, 40.55],
[ −73.85, 40.6], [ −73.9, 40.5]]] },
“explore”: false,
“end-UTC-time”: [ 2016, 12, 31, 23, 59, 59 ]
} }
and a sample JSON text file response, output format 18 of FIG. 1, might
look like, e.g.,
{ “layers-available”: [{
“time-intervals”: [[ 1483001560, 1490001110],
[1653006000, 1680282350 ]],
“avail-flag”: 0,
“datalayer-ID”: 1234,
“num-timestamps”: 346,
“timestamps”: [ 893001560, 1093001560, 1260340121],
“data-frequency”: 31536000 }],
“aoi”: {“type”: “Polygon”,
“coordinates”: [[[ −73.9, 40.5], [ −73.8, 40.55],
[ −73.85, 40.6], [−73.9, 40.5]]] }
}

In the above JSON listings:

“layers available” is a list of existing layers with their temporal data coverage for an area of interest specified by “aoi”;

“time intervals” is a collection of time intervals [start, end] in Unix epoch time, which is τ_kfrom the One-dimensional Clustering section above;

“avail-flag” indicates whether data is found for a specified time interval—0 indicates data is found;

“datalayer-ID” is a database datalayer ID associated with the coverage information;

“num-timestamps” is the number of distinct timestamps discovered on probing “aoi”;

“timestamps” is the list of isolated timestamps in Unix epoch time, as described in the Isolated Points section above;

“data-frequency” is the expected temporal interval ΔT in seconds in which a layer's data is available;

“aoi” refers to a spatial area of interest for which to check for data coverage;

“type” is the type of an aoi, which needs to be a simple polygon for this example;

“coordinates” are the longitude and latitude of the aoi polygon;

“datalayer-IDs” is a list of datalayer IDs to check data coverage for;

“data-frequencies” is a list of time periods ΔT of data in seconds on which data availability intervals are computed;

“start-UTC-time” is the data coverage start UTC time in the format of [year, month, day, hours, minutes, seconds] where hours range from 0 to 23;

“explore” triggers whether to randomly pick points in aoi for probing the time series; and

“end-UTC-time” is the data coverage end UTC time in the “start-UTC-time” format.

As another example, given geo-spatial areas, a method according to an embodiment can determine whether there is any data at all. A method according to an embodiment randomly probes a time series at various locations and merges the timestamps of all these probes together. Then a cluster_events method such as that detailed above is applied to the time series.

Time Series Cluster Characterizations

According to embodiments, observe that for a given, fixed event series t with total time interval Δt=t_N−t₁, the quantity

C_{t}^{o} (f) = \frac{1}{Δ t} \sum_{k} \langle τ_{k} \rangle = {\begin{matrix} 1 & Δ T \leq 0 \\ 0 \dots 1, & 0 < Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix}

where
f=log [ΔT ⁻¹/(Δt/|t|)⁻¹]=−log ΔT|t|/Δt (20)
computes the fraction of time with no operation failures. AT is fixed by the expected, logarithmic, and normalized event frequency f, i.e. f=0 represents the scale of frequency where all timestamps are equally spaced within the time series interval, f>0 corresponds to smaller scales, f<0 to larger ones.

According to embodiments, scanning C_t ^oby varying f provides a characteristic that quantifies the reliability of, e.g., an IoT service. It can be shown that C_t ⁰is monotonically decreases as f increases. This is because for larger ΔT, more clusters τ cover the whole time series. Due to EQ. (2), clusters never shrink in size for increasing ΔT; they either grow or merge to bigger clusters, letting the overall coverage increase.

According to embodiments, in a case where the time series is generated by a single, periodic data stream, a unit step function C_t ^o(f)=Θ(−f) is obtained, i.e. Θ=1 for f<0 and Θ=0 for f>0. Nevertheless, similar information could be obtained by simply checking a histogram n(δt), cf. EQ. (3), that counts the number of intervals δt_ifor some binning interval. In the case above, there would be a single peak in n(δt). Note that C_t ^ocontains information similar to

\frac{1}{Δ t} \int_{0}^{Δ T} n (δ t) d δ t .

However, a clustering output (τ, x) according to an embodiment provides information that n(δt) is blind to, because n(δt) does not account for the ordering of the δt_i. In particular,

\begin{matrix} C_{t}^{n} (f) = 2 \frac{\langle τ \rangle - δ_{1 \langle τ \rangle}}{\langle t \rangle} = {\begin{matrix} 0 & Δ T < 0 \\ 0 \dots 1, & 0 \leq Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix} & (21) \end{matrix}

provides a normed measure of the number of clusters, since the Kronecker delta δ_ijis 1 for i=j, 0 otherwise. It forces |τ|∈{0, 1} to result in C_t ⁿ=0. While C_t ⁰just quantifies the total coverage of t by the clusters, C_t ⁿprovides insight whether the coverage is established by a number of patches or a single/a few intervals with data frequency of at least ΔT−1. This way conclusions may be drawn on, e.g., the reliability of an IoT service. Ideally, C_t ⁿ<<1.

Finally, according to embodiments, consider the number of isolated events

\begin{matrix} C_{t}^{s} (f) = \frac{\langle x \rangle}{\langle t \rangle} = {\begin{matrix} 0 & Δ T < 0 \\ 0 \dots 1, & 0 \leq Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix}, & (22) \end{matrix}

as an additional indicator of reliability, since they are orthogonal to the information contained in τ. Isolated events can be classified as indicators of loose IoT service quality and thus should stay close to zero until it quickly increases to one for some f>0.

FIG. 2 illustrates these applications by plotting C_t ^o,n,sfor an event series t generated from 10⁴uniformly random samples drawn from [0, 1] joined by 10³equi-distant samples in [1, 10]. FIG. 2 is a sample plot of IoT service quality measures from event series t: C _t ⁰21, C _t ⁿ22, and C _t ^s23 by varying ΔT. The series t comprises a burst of random events that covers approximately 10% of the total time range δt. During the rest of the time the events are periodic at a rate of about 1/50Δt. Note that at f=0 there is little variance in C_t ^o,s, indicating that there is no single dominant event frequency v₀=|t|/Δt. Moreover, there is a step in C_t ^oat f≈−1 that covers 90% of its range which refers to a dominant event frequency one order of magnitude lower than v₀. Since C_t ⁿ<<1. It is concluded that this frequency is present along major time intervals within [t₁, t_N]. Also, C_t ^srapidly drops. Therefore, the existence of isolated events vanishes at time scales larger than ˜ΔT/10v₀such that there is a clean signal.

In contrast, C_t ⁿ˜1 for f≈1. Thus, due to the randomness introduced into the sample, for high-frequency events, increasing coverage of [t₁, t_N] is achieved by a number of isolated clusters, due to the random nature of the signal. Finally, for frequencies 3 orders of magnitude larger than v₀, C_f ^s≈1, i.e. no more clustering of events occurs.

System Implementations

It is to be understood that embodiments of the present disclosure can be implemented in various forms of hardware, software, firmware, special purpose processes, or a combination thereof. In one embodiment, an embodiment of the present disclosure can be implemented in software as an application program tangible embodied on a computer readable program storage device. The application program can be uploaded to, and executed by, a machine comprising any suitable architecture. Furthermore, it is understood in advance that although this disclosure includes a detailed description on cloud computing, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present disclosure are capable of being implemented in conjunction with any other type of computing environment now known or later developed. An automatic troubleshooting system according to an embodiment of the disclosure is also suitable for a cloud implementation.

Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service. This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.

Characteristics are as follows:

On-demand self-service: a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.

Broad network access: capabilities are available over a network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).

Resource pooling: the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).

Rapid elasticity: capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.

Measured service: cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service.

Service Models are as follows:

Software as a Service (SaaS): the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.

Platform as a Service (PaaS): the capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.

Infrastructure as a Service (IaaS): the capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).

Deployment Models are as follows:

Private cloud: the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.

Community cloud: the cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises.

Public cloud: the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.

Hybrid cloud: the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

A cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure comprising a network of interconnected nodes.

Referring now to FIG. 4, a schematic of an example of a cloud computing node is shown. Cloud computing node 410 is only one example of a suitable cloud computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the disclosure described herein. Regardless, cloud computing node 410 is capable of being implemented and/or performing any of the functionality set forth herein above.

In cloud computing node 410 there is a computer system/server 412, which is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 412 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.

Computer system/server 412 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 412 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

As shown in FIG. 4, computer system/server 412 in cloud computing node 410 is shown in the form of a general-purpose computing device. The components of computer system/server 412 may include, but are not limited to, one or more processors or processing units 416, a system memory 428, and a bus 418 that couples various system components including system memory 428 to processor 416.

Bus

418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer system/server 412 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 412, and it includes both volatile and non-volatile media, removable and non-removable media.

System memory

428 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 430 and/or cache memory 432. Computer system/server 412 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 434 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 418 by one or more data media interfaces. As will be further depicted and described below, memory 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

Program/utility 440, having a set (at least one) of program modules 442, may be stored in memory 428 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 442 generally carry out the functions and/or methodologies of embodiments of the disclosure as described herein.

Computer system/server 412 may also communicate with one or more external devices 414 such as a keyboard, a pointing device, a display 424, etc.; one or more devices that enable a user to interact with computer system/server 412; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 412 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 422. Still yet, computer system/server 412 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 420. As depicted, network adapter 420 communicates with the other components of computer system/server 412 via bus 418. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 412. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.

Referring now to FIG. 5, illustrative cloud computing environment 50 is depicted. As shown, cloud computing environment 50 comprises one or more cloud computing nodes 400 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone 54A, desktop computer 54B, laptop computer 54C, and/or automobile computer system 54N may communicate. Nodes 400 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof. This allows cloud computing environment 50 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device. It is understood that the types of computing devices 54A-N shown in FIG. 5 are intended to be illustrative only and that computing nodes 900 and cloud computing environment 50 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).

While embodiments of the present disclosure has been described in detail with reference to exemplary embodiments, those skilled in the art will appreciate that various modifications and substitutions can be made thereto without departing from the spirit and scope of the disclosure as set forth in the appended claims.

Claims

What is claimed is:

1. A computer-implemented method of clustering time stamps in time series data, comprising the steps of:

receiving, from an electrically connected network, a one-dimensional array t of ordered timestamps and an expected frequency dT, wherein N is a number of timestamps in the one-dimensional array t;

determining a set of time intervals, wherein a time interval is determined between each pair of adjacent timestamps in the one-dimensional array t of ordered timestamps;

determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency dT;

determining a second binary array, wherein each element of the second binary array is a difference between a corresponding pair of adjacent elements of the first binary array;

appending an ith timestamp to a set of opening interval bounds τ− if a corresponding second binary array is equal to −1;

appending an ith timestamp to a set of closing interval bounds τ+ if a corresponding second binary array is equal to 1;

appending an ith timestamp to a set of isolated points x if a next corresponding first binary array value is equal to 1;

merging the set of opening interval bounds τ− and the set of closing interval bounds τ+ into a set of cluster intervals τ;

outputting the set of cluster intervals τ and the set of isolated points x; and

using said set of cluster intervals and said set of isolated points to determine a reliability of said electrically connected network and generate messages indicative of the reliability of said electrically connected network.

2. The method of claim 1, wherein the electrically connected network is an Internet-of-things network.

3. The method of claim 1, wherein the electrically connected network is a network of satellites.

4. The method of claim 1, wherein the electrically connected network is a direct link to a database.

5. The method of claim 1, further comprising determining a first service quality measure that measures a reliability of the electrically connected network from

C_{t}^{o} (f) = \frac{1}{Δ t} \sum_{k} \langle τ_{k} \rangle = {\begin{matrix} 1 & Δ T \leq 0 \\ 0 \dots 1, & 0 < Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix}

wherein Δt=t_N−t₁is a total time interval, and f=log

[ΔT⁻¹/(Δt/|t|)⁻¹]=−log ΔT|t|/Δt, wherein C_t ^o(f) is a fraction of time with no operation failures in the electrically connected network.

6. The method of claim 1, further comprising determining a second service quality measure that is a normed number of clusters from

C_{t}^{n} (f) = 2 \frac{\langle τ \rangle - δ_{1 \langle τ \rangle}}{\langle t \rangle} = {\begin{matrix} 0 & Δ T < 0 \\ 0 \dots 1, & 0 \leq Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix}

wherein Δt=t_N−t₁is a total time interval, f=log

[ΔT⁻¹/(Δt/|t|)⁻¹]=−log ΔT|t|/Δt, |τ| is a number of clusters, and |t| is a number of time stamps.

7. The method of claim 1, further comprising determining a third service quality measure that is a measure of a number of isolated events from

C_{t}^{s} (f) = \frac{\langle x \rangle}{\langle t \rangle} = {\begin{matrix} 0 & Δ T < 0 \\ 0 \dots 1, & 0 \leq Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix},

wherein Δt=t_N−t₁is a total time interval, f=log

[ΔT⁻¹/(Δt/|t|)⁻¹]=−log ΔT|t|/Δt, |x| is a number of isolated points, and |t| is a number of time stamps.

8. The method of claim 1, further comprising determining the set of opening interval bounds τ− and the set of closing interval bounds τ+ from

τ^±={τ_k ^± :k<k′⇒τ _k ^±<τ_k′ ^± },k=1, . . . ,K≤N/2−1,

and determining cluster intervals as τ_k=[τ_k−1 ⁻,τ_k ⁺], k=1, . . . , K.

9. The method of claim 1, wherein, letting b represent the first binary array, b[0]=1 and b[N]=1.

10. A non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for clustering time stamps in time series data, the method comprising the steps of:

merging the set of set-e-opening interval bounds τ− and the set of closing interval bounds τ+ into a set of cluster intervals τ; and

outputting the set of cluster intervals τ and the set of isolated points x; and using said set of cluster intervals and said set of isolated points to determine a reliability of said electrically connected network and generate messages indicative of the reliability of said electrically connected network.

11. The computer readable program storage device of claim 10, wherein the electrically connected network is an Internet-of-things network.

12. The computer readable program storage device of claim 10, wherein the electrically connected network is a network of satellites.

13. The computer readable program storage device of claim 10, wherein the electrically connected network is a direct link to a database.

14. The computer readable program storage device of claim 10, the method further comprising determining a first service quality measure that measures a reliability of the electrically connected network from

C_{t}^{o} (f) = \frac{1}{Δ t} \sum_{k} \langle τ_{k} \rangle = {\begin{matrix} 1 & Δ T \leq 0 \\ 0 \dots 1, & 0 < Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix}

wherein Δt=t_N−t₁is a total time interval, and f=log

15. The computer readable program storage device of claim 10, the method further comprising determining a second service quality measure that is a normed number of clusters from

C_{t}^{n} (f) = 2 \frac{\langle τ \rangle - δ_{1 \langle τ \rangle}}{\langle t \rangle} = {\begin{matrix} 0 & Δ T < 0 \\ 0 \dots 1, & 0 \leq Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix}

wherein Δt=t_N−t₁is a total time interval, f=log

16. The computer readable program storage device of claim 10, the method further comprising determining a third service quality measure that is a measure of a number of isolated events from

C_{t}^{s} (f) = \frac{\langle x \rangle}{\langle t \rangle} = {\begin{matrix} 0 & Δ T < 0 \\ 0 \dots 1, & 0 \leq Δ T < Δ t \\ 0 & Δ T \geq Δ t \end{matrix},

wherein Δt=t_N−t₁is a total time interval, f=log

17. The computer readable program storage device of claim 10, the method further comprising determining the set of opening interval bounds τ− and the set of closing interval bounds τ+ from

18. The computer readable program storage device of claim 10, wherein, letting b represent the first binary array, b[0]=1 and b[N]=1.

19. The method of claim 4, further comprising using said set of cluster intervals to determine an availability of data on said database, when scanning said database is intractable.

20. The computer readable program storage device of claim 13, the method further comprising using said set of cluster intervals to determine an availability of data on said database, when scanning said database is intractable.