US10706080B2 - Event clustering and event series characterization based on expected frequency - Google Patents

Event clustering and event series characterization based on expected frequency Download PDF

Info

Publication number
US10706080B2
US10706080B2 US15/720,779 US201715720779A US10706080B2 US 10706080 B2 US10706080 B2 US 10706080B2 US 201715720779 A US201715720779 A US 201715720779A US 10706080 B2 US10706080 B2 US 10706080B2
Authority
US
United States
Prior art keywords
determining
electrically connected
time
intervals
connected network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US15/720,779
Other versions
US20190102449A1 (en
Inventor
Conrad M. Albrecht
Marcus Freitag
Theodore Van Kessel
Siyuan Lu
Hendrik F. Hamann
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US15/720,779 priority Critical patent/US10706080B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FREITAG, Marcus, VAN KESSEL, THEODORE, ALBRECHT, CONRAD M., HAMANN, HENDRIK F., LU, SIYUAN
Publication of US20190102449A1 publication Critical patent/US20190102449A1/en
Application granted granted Critical
Publication of US10706080B2 publication Critical patent/US10706080B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • G06F16/287Visualization; Browsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/12Simultaneous equations, e.g. systems of linear equations
    • G06K9/6218
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • G06F16/2336Pessimistic concurrency control approaches, e.g. locking or multiple versions without time stamps

Definitions

  • Embodiments of the present disclosure are directed to a one-dimensional, one-parameter clustering method with linear complexity on input that quantifies data availability or measures the reliability/stability of a device connecting network, such as IoT.
  • IoT Internet of Things
  • a network attached device capturing weather information to be broadcast to other devices for processing.
  • transmitter that frequently sends out such information every time interval ⁇ T
  • receiver that keeps track of the timestamps when data was transmitted/recorded
  • the one-dimensional data t is clustered such that consecutive events are not more than ⁇ T apart, periods in time where data might be missing can be inferred.
  • corresponding actions such as retransmission, data interpolation, etc.
  • the characteristics of intervals of no data e.g., relative frequency, duration, etc., can help diagnose the status of the communication network.
  • Exemplary embodiments of the present disclosure are directed to event clustering for application in IoT service quality characterization using a computer-implemented method with time complexity O(N).
  • Embodiments exploit the natural ordering of a sequence of timestamps recorded by a receiver device to minimize computational complexity by a factor of O(log N).
  • an O(N)-efficient clustering procedure of N events represented by an ordered series of timestamps t 1 , t 2 , . . . , t N can identify time intervals of missing data, locate isolated events, and characterize a series of events by varying ⁇ T to determine the quality of service over an IoT network.
  • a computer-implemented method for clustering time stamps in time series data including the steps of: receiving, from an electrically connected network, a one-dimensional array t of ordered timestamps and an expected frequency dT, wherein N is a number of timestamps in the one-dimensional array t; determining a set of time intervals, wherein a time interval is determined between each pair of adjacent timestamps in the one-dimensional array t of ordered timestamps; determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency dT; determining a second binary array, wherein each element of the second binary array is a difference between a corresponding pair of adjacent elements of the first binary array; appending an ith timestamp to a set of opening interval bounds ⁇ if a corresponding second binary array is equal to ⁇ 1; appending an ith timestamp to a set of closing interval bounds ⁇ + if
  • the electrically connected network is an Internet-of-things network.
  • the electrically connected network is a network of satellites.
  • the electrically connected network is a direct link to a database.
  • the method includes determining a first service quality measure that measures a reliability of the electrically connected network from
  • f log [ ⁇ T ⁇ 1 /( ⁇ t/
  • ) ⁇ 1 ] ⁇ log ⁇ T
  • C t o (f) is a fraction of time with no operation failures in the electrically connected network.
  • the method includes determining a second service quality measure that is a normed number of clusters from
  • the method includes determining a third service quality measure that is a measure of a number of isolated events from
  • a non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for clustering time stamps in time series data.
  • FIG. 1 is a block diagram of an exemplary system for implementation a method clustering a one-dimensional (1-D) time series of events, according to an embodiment of the disclosure.
  • FIG. 2 is a sample plot of IoT service quality measures from event time series obtained by varying ⁇ T, according to an embodiment of the disclosure.
  • FIG. 3 is a flowchart of a method of clustering a 1-D time series of events, according to an embodiment of the disclosure.
  • FIG. 4 is a schematic of an exemplary cloud computing node that implements an embodiment of the disclosure.
  • FIG. 5 shows an exemplary cloud computing environment according to embodiments of the disclosure.
  • Exemplary embodiments of the disclosure as described herein generally provide systems and methods for clustering and characterizing signal events on the IoT. While embodiments are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure.
  • Embodiments of the disclosure provide a method to cluster timestamps based on an expected frequency of a service, e.g., an IoT domain, satellite images, etc., and output information that can be used to characterize the quality of a service, and thus rate reliability/stability of the service.
  • a service e.g., an IoT domain, satellite images, etc.
  • Further embodiments of the disclosure provide methods for randomly probing and clustering time series of point data satellite imagery stored in a database given the latitude and longitude coordinates, and to efficiently estimate the availability of geo-spatial data, given the constraint that it is practically intractable to scan the whole database.
  • ⁇ T might be an external parameter to an algorithm that provides a solution or it can be defined by t itself, e.g. through
  • ⁇ ⁇ ⁇ ⁇ t ⁇ 1 N ⁇ ⁇ i ⁇ ⁇ ⁇ ⁇ t i .
  • the ⁇ t i need to be computed and therefore ⁇ t is efficiently determined along these lines.
  • is equal or not equal to K ⁇
  • N ⁇ 2 ⁇
  • N, boundary values ⁇ ⁇ need to be manually added.
  • K ⁇ N/2.
  • FIG. 1 is a block diagram of an exemplary system that implements a method for clustering a 1-D time series of events, according to an embodiment of the disclosure.
  • an information service 11 such as an IoT device or a geo-spatial database, transmits a time series 12 of information packets through a network, such as, e.g., the Internet which employs physical networks such as e.g. a WiFi network, the Ethernet, etc., to an event cluster engine 3 .
  • the event cluster engine 13 includes a frequency detector 13 . 5 , that can receive an event frequency ⁇ T that was manually input by a user or automatically determined from an average of the event intervals in the data.
  • the event cluster engine 13 can store the time series data in physical storage 14 , such as memory or a hard drive, etc.
  • the time series data includes cluster intervals demarcated by respective start and end boundaries ⁇ n ⁇ and ⁇ n + , ⁇ n+1 ⁇ and ⁇ n+1 + , etc., and isolated points t m , t m+1 , t m+2 , etc.
  • a user can interact with the event cluster engine 13 via a user interface 15 , such as a RESTful API service, to, e.g., generate output, 18 , such as a JSON text file.
  • the event cluster engine 13 is also connected to a cluster measure engine 16 , that can calculate various cluster measures, as described below.
  • the cluster measures can be sent to a service monitor 17 , such as, e.g., Ganglia (http://ganglia.info/) or Nagios (https://www.nagios.com/), that can determine the reliability or stability of the information service 11 based on the cluster measures, and generate suitable warning messages 19 , such as email or Apache Kafka (https://kafka.apache.org/) messages, etc.
  • a service monitor 17 such as, e.g., Ganglia (http://ganglia.info/) or Nagios (https://www.nagios.com/)
  • suitable warning messages 19 such as email or Apache Kafka (https://kafka.apache.org/) messages, etc.
  • the following listing provides a pseudo-code implementation according to an embodiment of a method of clustering a 1-D time series of events, with reference to steps numbers of the flowchart of FIG. 3 .
  • t [ ⁇ 20, ⁇ 18, 1, 2, 2.9, 10, 11, 100, 200, 202, 202, 203]
  • bracketed quantities on the left are the cluster intervals ⁇
  • bracketed quantities on the right are the isolated points x.
  • Square brackets with comma separated elements denote list data structures.
  • r is empty, as indicated by the [ ], and thus each time stamp is classified as an isolated point.
  • the above pseudo-code listing can be implemented in any suitable computer language, such as Python or C/C++.
  • An efficient implementation in C/C++ uses N ⁇ 1 algebraic operations for ⁇ t, N ⁇ 2 logical operations for b and again N algebraic operations for B which determines the interval boundary classification with a total of 3(N ⁇ 1) operations.
  • a naive approach would compute two time intervals for each t i , and perform two logical operations of those against ⁇ T, which would determine the classification, hence 4N computations.
  • a sample JSON load for a query to the user interface 15 of FIG. 1 in the case of a data availability service of geo-spatial satellite images in a database such as IBM PAIRS (https://pairs.res.ibm.com) may be:
  • “layers available” is a list of existing layers with their temporal data coverage for an area of interest specified by “aoi”;
  • time intervals is a collection of time intervals [start, end] in Unix epoch time, which is ⁇ k from the One-dimensional Clustering section above;
  • “avail-flag” indicates whether data is found for a specified time interval—0 indicates data is found
  • “datalayer-ID” is a database datalayer ID associated with the coverage information
  • number-timestamps is the number of distinct timestamps discovered on probing “aoi”
  • timestamps is the list of isolated timestamps in Unix epoch time, as described in the Isolated Points section above;
  • data-frequency is the expected temporal interval ⁇ T in seconds in which a layer's data is available
  • “aoi” refers to a spatial area of interest for which to check for data coverage
  • “type” is the type of an aoi, which needs to be a simple polygon for this example;
  • Coordinates are the longitude and latitude of the aoi polygon
  • datalayer-IDs is a list of datalayer IDs to check data coverage for
  • data-frequencies is a list of time periods ⁇ T of data in seconds on which data availability intervals are computed
  • start-UTC-time is the data coverage start UTC time in the format of [year, month, day, hours, minutes, seconds] where hours range from 0 to 23;
  • end-UTC-time is the data coverage end UTC time in the “start-UTC-time” format.
  • a method according to an embodiment can determine whether there is any data at all.
  • a method according to an embodiment randomly probes a time series at various locations and merges the timestamps of all these probes together. Then a cluster_events method such as that detailed above is applied to the time series.
  • scanning C t o by varying f provides a characteristic that quantifies the reliability of, e.g., an IoT service. It can be shown that C t 0 is monotonically decreases as f increases. This is because for larger ⁇ T, more clusters ⁇ cover the whole time series. Due to EQ. (2), clusters never shrink in size for increasing ⁇ T; they either grow or merge to bigger clusters, letting the overall coverage increase.
  • similar information could be obtained by simply checking a histogram n( ⁇ t), cf. EQ. (3), that counts the number of intervals ⁇ t i for some binning interval. In the case above, there would be a single peak in n( ⁇ t). Note that C t o contains information similar to
  • a clustering output ( ⁇ , x) provides information that n( ⁇ t) is blind to, because n( ⁇ t) does not account for the ordering of the ⁇ t i .
  • ⁇ 0, 1 ⁇ to result in C t n 0.
  • C t n While C t 0 just quantifies the total coverage of t by the clusters, C t n provides insight whether the coverage is established by a number of patches or a single/a few intervals with data frequency of at least ⁇ T ⁇ 1. This way conclusions may be drawn on, e.g., the reliability of an IoT service. Ideally, C t n ⁇ 1.
  • FIG. 2 illustrates these applications by plotting C t o,n,s for an event series t generated from 10 4 uniformly random samples drawn from [0, 1] joined by 10 3 equi-distant samples in [1, 10].
  • FIG. 2 is a sample plot of IoT service quality measures from event series t: C t 0 21, C t n 22, and C t s 23 by varying ⁇ T.
  • the series t comprises a burst of random events that covers approximately 10% of the total time range ⁇ t. During the rest of the time the events are periodic at a rate of about 1/50 ⁇ t.
  • an embodiment of the present disclosure can be implemented in various forms of hardware, software, firmware, special purpose processes, or a combination thereof.
  • an embodiment of the present disclosure can be implemented in software as an application program tangible embodied on a computer readable program storage device.
  • the application program can be uploaded to, and executed by, a machine comprising any suitable architecture.
  • this disclosure includes a detailed description on cloud computing, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present disclosure are capable of being implemented in conjunction with any other type of computing environment now known or later developed.
  • An automatic troubleshooting system according to an embodiment of the disclosure is also suitable for a cloud implementation.
  • Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service.
  • This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
  • On-demand self-service a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
  • Resource pooling the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
  • Rapid elasticity capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
  • Measured service cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service.
  • level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts).
  • SaaS Software as a Service: the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure.
  • the applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email).
  • a web browser e.g., web-based email.
  • the consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
  • PaaS Platform as a Service
  • the consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
  • IaaS Infrastructure as a Service
  • the consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
  • Private cloud the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
  • Public cloud the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
  • Hybrid cloud the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
  • a cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability.
  • An infrastructure comprising a network of interconnected nodes.
  • Cloud computing node 410 is only one example of a suitable cloud computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the disclosure described herein. Regardless, cloud computing node 410 is capable of being implemented and/or performing any of the functionality set forth herein above.
  • cloud computing node 410 there is a computer system/server 412 , which is operational with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 412 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
  • Computer system/server 412 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system.
  • program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.
  • Computer system/server 412 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer system storage media including memory storage devices.
  • computer system/server 412 in cloud computing node 410 is shown in the form of a general-purpose computing device.
  • the components of computer system/server 412 may include, but are not limited to, one or more processors or processing units 416 , a system memory 428 , and a bus 418 that couples various system components including system memory 428 to processor 416 .
  • Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
  • bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
  • Computer system/server 412 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 412 , and it includes both volatile and non-volatile media, removable and non-removable media.
  • System memory 428 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 430 and/or cache memory 432 .
  • Computer system/server 412 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 434 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”).
  • a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”)
  • an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media
  • each can be connected to bus 418 by one or more data media interfaces.
  • memory 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
  • Program/utility 440 having a set (at least one) of program modules 442 , may be stored in memory 428 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment.
  • Program modules 442 generally carry out the functions and/or methodologies of embodiments of the disclosure as described herein.
  • Computer system/server 412 may also communicate with one or more external devices 414 such as a keyboard, a pointing device, a display 424 , etc.; one or more devices that enable a user to interact with computer system/server 412 ; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 412 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 422 . Still yet, computer system/server 412 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 420 .
  • LAN local area network
  • WAN wide area network
  • public network e.g., the Internet
  • network adapter 420 communicates with the other components of computer system/server 412 via bus 418 .
  • bus 418 It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 412 . Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
  • cloud computing environment 50 comprises one or more cloud computing nodes 400 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone 54 A, desktop computer 54 B, laptop computer 54 C, and/or automobile computer system 54 N may communicate.
  • Nodes 400 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof.
  • This allows cloud computing environment 50 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device.
  • computing devices 54 A-N shown in FIG. 5 are intended to be illustrative only and that computing nodes 900 and cloud computing environment 50 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A method for clustering time stamps in time series data includes receiving a one-dimensional array of ordered timestamps and an expected frequency; determining a set of time intervals; determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency; determining a second binary array of differences between a corresponding pair of adjacent elements of the first binary array; appending an ith timestamp to one of a set of opening interval bounds, a set of closing interval bounds, or a set of isolated points; merging the set of set of opening interval bounds and the set of closing interval bounds into a set of cluster intervals τ; and outputting the set of cluster intervals and the set of isolated points.

Description

TECHNICAL FIELD
Embodiments of the present disclosure are directed to a one-dimensional, one-parameter clustering method with linear complexity on input that quantifies data availability or measures the reliability/stability of a device connecting network, such as IoT.
DISCUSSION OF THE RELATED ART
The concept of Internet of Things (IoT) is intimately related to records of certain events, e.g. a network attached device capturing weather information to be broadcast to other devices for processing. Given (a) transmitter that frequently sends out such information every time interval ΔT, and (b) receiver that keeps track of the timestamps when data was transmitted/recorded, a one-dimensional time series t={ti}, i=1, . . . , N stores information on failures of the recording/sending/transmission/receiving.
If the one-dimensional data t is clustered such that consecutive events are not more than ΔT apart, periods in time where data might be missing can be inferred. Upon detection, corresponding actions, such as retransmission, data interpolation, etc., can be performed. Moreover, the characteristics of intervals of no data, e.g., relative frequency, duration, etc., can help diagnose the status of the communication network.
Since general purpose, multi-dimensional clustering methods such as Fisher's discriminant, k-means clustering, or more generally expectation-maximization (EM) do not exploit the special property of ordering in one dimension, a simpler approach that does not need knowledge of the number of clusters/intervals can be used, that avoids density estimation.
SUMMARY
Exemplary embodiments of the present disclosure are directed to event clustering for application in IoT service quality characterization using a computer-implemented method with time complexity O(N). Embodiments exploit the natural ordering of a sequence of timestamps recorded by a receiver device to minimize computational complexity by a factor of O(log N). Given an expected frequency ΔT−1, an O(N)-efficient clustering procedure of N events represented by an ordered series of timestamps t1, t2, . . . , tN can identify time intervals of missing data, locate isolated events, and characterize a series of events by varying ΔT to determine the quality of service over an IoT network.
According to an embodiment of the disclosure, there is provided a computer-implemented method for clustering time stamps in time series data, including the steps of: receiving, from an electrically connected network, a one-dimensional array t of ordered timestamps and an expected frequency dT, wherein N is a number of timestamps in the one-dimensional array t; determining a set of time intervals, wherein a time interval is determined between each pair of adjacent timestamps in the one-dimensional array t of ordered timestamps; determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency dT; determining a second binary array, wherein each element of the second binary array is a difference between a corresponding pair of adjacent elements of the first binary array; appending an ith timestamp to a set of opening interval bounds τ− if a corresponding second binary array is equal to −1; appending an ith timestamp to a set of closing interval bounds τ+ if a corresponding second binary array is equal to 1; appending an ith timestamp to a set of isolated points x if a next corresponding first binary array value is equal to 1; merging the set of set of opening interval bounds τ− and the set of closing interval bounds τ+ into a set of cluster intervals τ; and outputting the set of cluster intervals τ and the set of isolated points x.
According to a further embodiment of the disclosure, the electrically connected network is an Internet-of-things network.
According to a further embodiment of the disclosure, the electrically connected network is a network of satellites.
According to a further embodiment of the disclosure, the electrically connected network is a direct link to a database.
According to a further embodiment of the disclosure, the method includes determining a first service quality measure that measures a reliability of the electrically connected network from
C t o ( f ) = 1 Δ t k τ k = { 1 Δ T 0 0 1 , 0 < Δ T < Δ t 0 Δ T Δ t
wherein Δt=tN−t1 is a total time interval, and f=log [ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, wherein Ct o(f) is a fraction of time with no operation failures in the electrically connected network.
According to a further embodiment of the disclosure, the method includes determining a second service quality measure that is a normed number of clusters from
C t n ( f ) = 2 τ - δ 1 τ t = { 0 Δ T < 0 0 1 , 0 Δ T < Δ t 0 Δ T Δ t
wherein Δt=tN−t1 is a total time interval, f=log [ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, |τ| is a number of clusters, and |t| is a number of time stamps.
According to a further embodiment of the disclosure, the method includes determining a third service quality measure that is a measure of a number of isolated events from
C t s ( f ) = x t = { 0 Δ T < 0 0 1 , 0 Δ T < Δ t 0 Δ T Δ t ,
wherein Δt=tN−t1 is a total time interval, f=log [ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, |x| is a number of isolated points, and |t| is a number of time stamps.
According to a further embodiment of the disclosure, the method includes determining the set of opening interval bounds τ− and the set of closing interval bounds τ+ from
τ±={τk ± :k<k′⇒τ k ±k′ ± },k=1, . . . ,K≤N/2−1,
and determining cluster intervals as τk=[τk−1 k +], k=1, . . . , K.
According to a further embodiment of the disclosure, letting b represent the first binary array, b[0]=1 and b[N]=1.
According to an embodiment of the disclosure, there is provided a non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for clustering time stamps in time series data.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an exemplary system for implementation a method clustering a one-dimensional (1-D) time series of events, according to an embodiment of the disclosure.
FIG. 2 is a sample plot of IoT service quality measures from event time series obtained by varying ΔT, according to an embodiment of the disclosure.
FIG. 3 is a flowchart of a method of clustering a 1-D time series of events, according to an embodiment of the disclosure.
FIG. 4 is a schematic of an exemplary cloud computing node that implements an embodiment of the disclosure.
FIG. 5 shows an exemplary cloud computing environment according to embodiments of the disclosure.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
Exemplary embodiments of the disclosure as described herein generally provide systems and methods for clustering and characterizing signal events on the IoT. While embodiments are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure.
Embodiments of the disclosure provide a method to cluster timestamps based on an expected frequency of a service, e.g., an IoT domain, satellite images, etc., and output information that can be used to characterize the quality of a service, and thus rate reliability/stability of the service.
Further embodiments of the disclosure provide methods for randomly probing and clustering time series of point data satellite imagery stored in a database given the latitude and longitude coordinates, and to efficiently estimate the availability of geo-spatial data, given the constraint that it is practically intractable to scan the whole database.
One-Dimensional Clustering
Given a set of N ordered timestamps t={ti}, i=1, . . . , N, i.e.
i≤j⇒t i ≤t j  (1)
and an expected time interval ΔT, time intervals τk are defined such that
t i ,t i+1∈τk=[τk k +]⇒δt i =t i+1 −t i ≤ΔT.  (2)
Note that ΔT might be an external parameter to an algorithm that provides a solution or it can be defined by t itself, e.g. through
δ t = 1 N i δ t i .
The δti need to be computed and therefore
Figure US10706080-20200707-P00001
δt
Figure US10706080-20200707-P00002
is efficiently determined along these lines.
To fulfill EQ. (2), of course, at least N−1 time intervals need to be computed:
δt{δt i =t i+1 −t i },i=1, . . . ,N−1.  (3)
Whenever a new time series point tN+1>tN is (randomly) added, there is no a priori way of determining from the existing ti≤N whether ΔT is exceeded.
To classify the ti as interval bounds τk ±, note that the binary sequence
b={b i=int(δt i >ΔT)},i=1, . . . ,N−1  (4)
switches from 1 to 0 for an opening interval bound τ−, and from 0 to 1 for a closing interval bound τ+, only. The function int( ) has values int(True)→1 and int(False)→0. Hence the quantity
B={B i =b i −b i−1 },i=2, . . . ,N−1 with B i∈{−1,0,1}  (5)
yields the desired association
B i=±1⇒t i∈τ±,  (6)
Based on the property that the ti are ordered, as expressed by EQ. (1), then the binary (discrete) function bi implies the alternating property:
i<j:1=B i =B i ⇒∃i<l<j:B i=−1.  (7)
Thus, linearly scanning through the ti and their corresponding Bi results in two sets
τ±k ± :k<k′⇒τ k ±k′ ± },k=1, . . . ,K≤N/2−1  (8)
That can be interleaved to obtain the corresponding time intervals as a solution, EQ. (2).
Boundary Conditions
According to embodiments of the disclosure, there several options for interleaving the τ±, which depend on boundary conditions. More specifically, assume that the sequence t1, t2, . . . starts with, e.g., intervals that are smaller than ΔT. In this case, τ1 1 +, and a τ0 one needs to manually added to construct intervals:
τk=[τk−1 k +].  (9)
A corresponding issue might occur at the end of the time series {ti}, depending on whether K+=|τ+| is equal or not equal to K=|τ|. Note that by virtue of EQ. (7) the difference |K+−K| is at most 1. However, due to the fact that |B|=N−2≠|t|=N, boundary values τ± need to be manually added.
According to embodiments, to prevent manually dealing with all (four) different boundary condition scenarios, the following timestamps can be added from the outset:
t 0=−∞ and t N+1=+∞.  (10)
Hence, δt0 =δt N=+∞, and therefore
b 0 =b N=1  (11)
which yields N Bi that corresponding to N ti for classification such that
τ={τk=[τk k +],k=1, . . . ,K  (12)
from
τ±={τk ± :k<k′⇒τ k ±k′ ± },k=1, . . . ,K  (13)
with |τ|=K≤N/2.
Isolated Points
According to embodiments, given the solution EQ. (12), due to the ordering of the ti, open intervals
τ={τ k=(τk +k+1 )}k=1, . . . ,K−1,  (14)
can be formed that are associated with time intervals of failure. Note, that [t1, tN]=τ∪τ. However, these intervals do not imply
i,k:t iτ k,  (15)
i.e., informally, it is not true that no event occurs during the intervals f, but is true that
τiτ k⇒|τi−τi±1 |>ΔT  (16)
where ti is referred to as an isolated event. These timestamps can be considered to be noise, while all τi∉τ± are border points.
According to embodiments, isolated events have bi=1 and since they are not interval boundary points, they need to have Bi=0. This way b and B can be used to classify isolated events according to
B i=0∧b i=1⇒t i ∈x,  (17)
where x denotes the set of isolated timestamps. Likewise, clustered timestamps can be defined as
B i=0∧b i=0⇒t i x   (18)
Since b is binary and EQ. (5) holds for B, all ti are uniquely classified, i.e. t=τ+∪τ−∪×∪x. According to embodiments, there is an association x(−)↔τ(−) in the sense that all ti∉x are within an interval of τ and all tix are within an interval of τ.
Implementation & Computational Complexity
FIG. 1 is a block diagram of an exemplary system that implements a method for clustering a 1-D time series of events, according to an embodiment of the disclosure. Referring now to the figure, an information service 11, such as an IoT device or a geo-spatial database, transmits a time series 12 of information packets through a network, such as, e.g., the Internet which employs physical networks such as e.g. a WiFi network, the Ethernet, etc., to an event cluster engine 3. The event cluster engine 13 includes a frequency detector 13.5, that can receive an event frequency ΔT that was manually input by a user or automatically determined from an average of the event intervals in the data. The event cluster engine 13 can store the time series data in physical storage 14, such as memory or a hard drive, etc. The time series data includes cluster intervals demarcated by respective start and end boundaries τn and τn +, τn+1 and τn+1 +, etc., and isolated points tm, tm+1, tm+2, etc. A user can interact with the event cluster engine 13 via a user interface 15, such as a RESTful API service, to, e.g., generate output, 18, such as a JSON text file. The event cluster engine 13 is also connected to a cluster measure engine 16, that can calculate various cluster measures, as described below. The cluster measures can be sent to a service monitor 17, such as, e.g., Ganglia (http://ganglia.info/) or Nagios (https://www.nagios.com/), that can determine the reliability or stability of the information service 11 based on the cluster measures, and generate suitable warning messages 19, such as email or Apache Kafka (https://kafka.apache.org/) messages, etc.
The following listing provides a pseudo-code implementation according to an embodiment of a method of clustering a 1-D time series of events, with reference to steps numbers of the flowchart of FIG. 3.
algorithm cluster_events is
input: one-dimensional array t of ordered timestamps to cluster,
  expected frequency dT
output: set of cluster intervals tau, list of isolated timestamps x
define ordered list tauMinus, tanPlus, x
define set tau
define array dt, b, B
N = length of t // Step 30
for i from 1 to N−1 do dt[i] = t[i]−t[i−1] ] // Step 31
b[0] = 1 // Step 32
for i from 1 to N−1 do
  if dt[i] > dT then b[i] = 1
  else b[i] = 0
b[N] = 1
for i from 0 to N−1 do B[i] = b[i+1]−b[i] // Step 33
for i from 0 to N−1 do // Step 34
  if B[i] == −1 then append t[i] to tauMinus
  else if B[i] == 1 then append t[i] to tauPlus
  else if b[i+1] == 1 then append t[i] to x
for i from 0 to length of tauMinus // Step 35
  add interval [tauMinus[i], tauPlus[i]] to tau
return tau, x // Step 36
For example, a call of cluster_events(t, dT) on
t=[−20, −18, 1, 2, 2.9, 10, 11, 100, 200, 202, 202, 203]
given
dT as −1, 0, 1, 10, 100,
which equals to ΔT, returns output equivalent to
[ ], [−20, −18, 1, 2, 2.9, 10, 11, 100, 200, 202, 202, 203]
[[202, 202]], [−20, −18, 1, 2, 2.9, 10, 11, 100, 200, 203]
([(1, 2.9), (10, 11), (202, 203)], [−20, −18, 100, 200]
([(−20, −18), (1, 11), (200, 203)], [100]
[(−20, 203)], [ ]
[(−20, 11), (200, 203)], [100]
respectively. In the above output listing, the bracketed quantities on the left are the cluster intervals τ, the bracketed quantities on the right are the isolated points x. Square brackets with comma separated elements denote list data structures. In the first line, r is empty, as indicated by the [ ], and thus each time stamp is classified as an isolated point.
According to an embodiment of the disclosure, the above pseudo-code listing can be implemented in any suitable computer language, such as Python or C/C++. An efficient implementation in C/C++ uses N−1 algebraic operations for δt, N−2 logical operations for b and again N algebraic operations for B which determines the interval boundary classification with a total of 3(N−1) operations.
A naive approach would compute two time intervals for each ti, and perform two logical operations of those against ΔT, which would determine the classification, hence 4N computations.
As a further example according to an embodiment, a sample JSON load for a query to the user interface 15 of FIG. 1 in the case of a data availability service of geo-spatial satellite images in a database such as IBM PAIRS (https://pairs.res.ibm.com) may be:
  { “query”: {
   “datalayer-IDs”: [ 23, 15100, 3],
   “data-frequencies”: [ 3800, 90000, 2800000],
   “start-UTC-time”: [ 2016, 10, 1],
   “aoi”: { “type”: “Polygon”,
    “coordinates”: [[[−73.9, 40.5], [ −73.8, 40.55],
    [ −73.85, 40.6], [ −73.9, 40.5]]] },
   “explore”: false,
   “end-UTC-time”: [ 2016, 12, 31, 23, 59, 59 ]
  } }
and a sample JSON text file response, output format 18 of FIG. 1, might
look like, e.g.,
  { “layers-available”: [{
    “time-intervals”: [[ 1483001560, 1490001110],
    [1653006000, 1680282350 ]],
   “avail-flag”: 0,
   “datalayer-ID”: 1234,
   “num-timestamps”: 346,
   “timestamps”: [ 893001560, 1093001560, 1260340121],
   “data-frequency”: 31536000 }],
   “aoi”: {“type”: “Polygon”,
    “coordinates”: [[[ −73.9, 40.5], [ −73.8, 40.55],
    [ −73.85, 40.6], [−73.9, 40.5]]] }
  }

In the above JSON listings:
“layers available” is a list of existing layers with their temporal data coverage for an area of interest specified by “aoi”;
“time intervals” is a collection of time intervals [start, end] in Unix epoch time, which is τk from the One-dimensional Clustering section above;
“avail-flag” indicates whether data is found for a specified time interval—0 indicates data is found;
“datalayer-ID” is a database datalayer ID associated with the coverage information;
“num-timestamps” is the number of distinct timestamps discovered on probing “aoi”;
“timestamps” is the list of isolated timestamps in Unix epoch time, as described in the Isolated Points section above;
“data-frequency” is the expected temporal interval ΔT in seconds in which a layer's data is available;
“aoi” refers to a spatial area of interest for which to check for data coverage;
“type” is the type of an aoi, which needs to be a simple polygon for this example;
“coordinates” are the longitude and latitude of the aoi polygon;
“datalayer-IDs” is a list of datalayer IDs to check data coverage for;
“data-frequencies” is a list of time periods ΔT of data in seconds on which data availability intervals are computed;
“start-UTC-time” is the data coverage start UTC time in the format of [year, month, day, hours, minutes, seconds] where hours range from 0 to 23;
“explore” triggers whether to randomly pick points in aoi for probing the time series; and
“end-UTC-time” is the data coverage end UTC time in the “start-UTC-time” format.
As another example, given geo-spatial areas, a method according to an embodiment can determine whether there is any data at all. A method according to an embodiment randomly probes a time series at various locations and merges the timestamps of all these probes together. Then a cluster_events method such as that detailed above is applied to the time series.
Time Series Cluster Characterizations
According to embodiments, observe that for a given, fixed event series t with total time interval Δt=tN−t1, the quantity
C t o ( f ) = 1 Δ t k τ k = { 1 Δ T 0 0 1 , 0 < Δ T < Δ t 0 Δ T Δ t
where
f=log [ΔT −1/(Δt/|t|)−1]=−log ΔT|t|/Δt  (20)
computes the fraction of time with no operation failures. AT is fixed by the expected, logarithmic, and normalized event frequency f, i.e. f=0 represents the scale of frequency where all timestamps are equally spaced within the time series interval, f>0 corresponds to smaller scales, f<0 to larger ones.
According to embodiments, scanning Ct o by varying f provides a characteristic that quantifies the reliability of, e.g., an IoT service. It can be shown that Ct 0 is monotonically decreases as f increases. This is because for larger ΔT, more clusters τ cover the whole time series. Due to EQ. (2), clusters never shrink in size for increasing ΔT; they either grow or merge to bigger clusters, letting the overall coverage increase.
According to embodiments, in a case where the time series is generated by a single, periodic data stream, a unit step function Ct o(f)=Θ(−f) is obtained, i.e. Θ=1 for f<0 and Θ=0 for f>0. Nevertheless, similar information could be obtained by simply checking a histogram n(δt), cf. EQ. (3), that counts the number of intervals δti for some binning interval. In the case above, there would be a single peak in n(δt). Note that Ct o contains information similar to
1 Δ t 0 Δ T n ( δ t ) d δ t .
However, a clustering output (τ, x) according to an embodiment provides information that n(δt) is blind to, because n(δt) does not account for the ordering of the δti. In particular,
C t n ( f ) = 2 τ - δ 1 τ t = { 0 Δ T < 0 0 1 , 0 Δ T < Δ t 0 Δ T Δ t ( 21 )
provides a normed measure of the number of clusters, since the Kronecker delta δij is 1 for i=j, 0 otherwise. It forces |τ|∈{0, 1} to result in Ct n=0. While Ct 0 just quantifies the total coverage of t by the clusters, Ct n provides insight whether the coverage is established by a number of patches or a single/a few intervals with data frequency of at least ΔT−1. This way conclusions may be drawn on, e.g., the reliability of an IoT service. Ideally, Ct n<<1.
Finally, according to embodiments, consider the number of isolated events
C t s ( f ) = x t = { 0 Δ T < 0 0 1 , 0 Δ T < Δ t 0 Δ T Δ t , ( 22 )
as an additional indicator of reliability, since they are orthogonal to the information contained in τ. Isolated events can be classified as indicators of loose IoT service quality and thus should stay close to zero until it quickly increases to one for some f>0.
FIG. 2 illustrates these applications by plotting Ct o,n,s for an event series t generated from 104 uniformly random samples drawn from [0, 1] joined by 103 equi-distant samples in [1, 10]. FIG. 2 is a sample plot of IoT service quality measures from event series t: C t 0 21, C t n 22, and C t s 23 by varying ΔT. The series t comprises a burst of random events that covers approximately 10% of the total time range δt. During the rest of the time the events are periodic at a rate of about 1/50Δt. Note that at f=0 there is little variance in Ct o,s, indicating that there is no single dominant event frequency v0=|t|/Δt. Moreover, there is a step in Ct o at f≈−1 that covers 90% of its range which refers to a dominant event frequency one order of magnitude lower than v0. Since Ct n<<1. It is concluded that this frequency is present along major time intervals within [t1, tN]. Also, Ct s rapidly drops. Therefore, the existence of isolated events vanishes at time scales larger than ˜ΔT/10v0 such that there is a clean signal.
In contrast, Ct n˜1 for f≈1. Thus, due to the randomness introduced into the sample, for high-frequency events, increasing coverage of [t1, tN] is achieved by a number of isolated clusters, due to the random nature of the signal. Finally, for frequencies 3 orders of magnitude larger than v0, Cf s≈1, i.e. no more clustering of events occurs.
System Implementations
It is to be understood that embodiments of the present disclosure can be implemented in various forms of hardware, software, firmware, special purpose processes, or a combination thereof. In one embodiment, an embodiment of the present disclosure can be implemented in software as an application program tangible embodied on a computer readable program storage device. The application program can be uploaded to, and executed by, a machine comprising any suitable architecture. Furthermore, it is understood in advance that although this disclosure includes a detailed description on cloud computing, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present disclosure are capable of being implemented in conjunction with any other type of computing environment now known or later developed. An automatic troubleshooting system according to an embodiment of the disclosure is also suitable for a cloud implementation.
Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service. This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
Characteristics are as follows:
On-demand self-service: a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
Broad network access: capabilities are available over a network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
Resource pooling: the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
Rapid elasticity: capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
Measured service: cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service.
Service Models are as follows:
Software as a Service (SaaS): the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS): the capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
Infrastructure as a Service (IaaS): the capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Deployment Models are as follows:
Private cloud: the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
Community cloud: the cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises.
Public cloud: the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
Hybrid cloud: the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
A cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure comprising a network of interconnected nodes.
Referring now to FIG. 4, a schematic of an example of a cloud computing node is shown. Cloud computing node 410 is only one example of a suitable cloud computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the disclosure described herein. Regardless, cloud computing node 410 is capable of being implemented and/or performing any of the functionality set forth herein above.
In cloud computing node 410 there is a computer system/server 412, which is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 412 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
Computer system/server 412 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 412 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
As shown in FIG. 4, computer system/server 412 in cloud computing node 410 is shown in the form of a general-purpose computing device. The components of computer system/server 412 may include, but are not limited to, one or more processors or processing units 416, a system memory 428, and a bus 418 that couples various system components including system memory 428 to processor 416.
Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer system/server 412 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 412, and it includes both volatile and non-volatile media, removable and non-removable media.
System memory 428 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 430 and/or cache memory 432. Computer system/server 412 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 434 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 418 by one or more data media interfaces. As will be further depicted and described below, memory 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
Program/utility 440, having a set (at least one) of program modules 442, may be stored in memory 428 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 442 generally carry out the functions and/or methodologies of embodiments of the disclosure as described herein.
Computer system/server 412 may also communicate with one or more external devices 414 such as a keyboard, a pointing device, a display 424, etc.; one or more devices that enable a user to interact with computer system/server 412; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 412 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 422. Still yet, computer system/server 412 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 420. As depicted, network adapter 420 communicates with the other components of computer system/server 412 via bus 418. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 412. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
Referring now to FIG. 5, illustrative cloud computing environment 50 is depicted. As shown, cloud computing environment 50 comprises one or more cloud computing nodes 400 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone 54A, desktop computer 54B, laptop computer 54C, and/or automobile computer system 54N may communicate. Nodes 400 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof. This allows cloud computing environment 50 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device. It is understood that the types of computing devices 54A-N shown in FIG. 5 are intended to be illustrative only and that computing nodes 900 and cloud computing environment 50 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).
While embodiments of the present disclosure has been described in detail with reference to exemplary embodiments, those skilled in the art will appreciate that various modifications and substitutions can be made thereto without departing from the spirit and scope of the disclosure as set forth in the appended claims.

Claims (20)

What is claimed is:
1. A computer-implemented method of clustering time stamps in time series data, comprising the steps of:
receiving, from an electrically connected network, a one-dimensional array t of ordered timestamps and an expected frequency dT, wherein N is a number of timestamps in the one-dimensional array t;
determining a set of time intervals, wherein a time interval is determined between each pair of adjacent timestamps in the one-dimensional array t of ordered timestamps;
determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency dT;
determining a second binary array, wherein each element of the second binary array is a difference between a corresponding pair of adjacent elements of the first binary array;
appending an ith timestamp to a set of opening interval bounds τ− if a corresponding second binary array is equal to −1;
appending an ith timestamp to a set of closing interval bounds τ+ if a corresponding second binary array is equal to 1;
appending an ith timestamp to a set of isolated points x if a next corresponding first binary array value is equal to 1;
merging the set of opening interval bounds τ− and the set of closing interval bounds τ+ into a set of cluster intervals τ;
outputting the set of cluster intervals τ and the set of isolated points x; and
using said set of cluster intervals and said set of isolated points to determine a reliability of said electrically connected network and generate messages indicative of the reliability of said electrically connected network.
2. The method of claim 1, wherein the electrically connected network is an Internet-of-things network.
3. The method of claim 1, wherein the electrically connected network is a network of satellites.
4. The method of claim 1, wherein the electrically connected network is a direct link to a database.
5. The method of claim 1, further comprising determining a first service quality measure that measures a reliability of the electrically connected network from
C t o ( f ) = 1 Δ t k τ k = { 1 Δ T 0 0 1 , 0 < Δ T < Δ t 0 Δ T Δ t
wherein Δt=tN−t1 is a total time interval, and f=log
[ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, wherein Ct o(f) is a fraction of time with no operation failures in the electrically connected network.
6. The method of claim 1, further comprising determining a second service quality measure that is a normed number of clusters from
C t n ( f ) = 2 τ - δ 1 τ t = { 0 Δ T < 0 0 1 , 0 Δ T < Δ t 0 Δ T Δ t
wherein Δt=tN−t1 is a total time interval, f=log
[ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, |τ| is a number of clusters, and |t| is a number of time stamps.
7. The method of claim 1, further comprising determining a third service quality measure that is a measure of a number of isolated events from
C t s ( f ) = x t = { 0 Δ T < 0 0 1 , 0 Δ T < Δ t 0 Δ T Δ t ,
wherein Δt=tN−t1 is a total time interval, f=log
[ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, |x| is a number of isolated points, and |t| is a number of time stamps.
8. The method of claim 1, further comprising determining the set of opening interval bounds τ− and the set of closing interval bounds τ+ from

τ±={τk ± :k<k′⇒τ k ±k′ ± },k=1, . . . ,K≤N/2−1,
and determining cluster intervals as τk=[τk−1 k +], k=1, . . . , K.
9. The method of claim 1, wherein, letting b represent the first binary array, b[0]=1 and b[N]=1.
10. A non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for clustering time stamps in time series data, the method comprising the steps of:
receiving, from an electrically connected network, a one-dimensional array t of ordered timestamps and an expected frequency dT, wherein N is a number of timestamps in the one-dimensional array t;
determining a set of time intervals, wherein a time interval is determined between each pair of adjacent timestamps in the one-dimensional array t of ordered timestamps;
determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency dT;
determining a second binary array, wherein each element of the second binary array is a difference between a corresponding pair of adjacent elements of the first binary array;
appending an ith timestamp to a set of opening interval bounds τ− if a corresponding second binary array is equal to −1;
appending an ith timestamp to a set of closing interval bounds τ+ if a corresponding second binary array is equal to 1;
appending an ith timestamp to a set of isolated points x if a next corresponding first binary array value is equal to 1;
merging the set of set-e-opening interval bounds τ− and the set of closing interval bounds τ+ into a set of cluster intervals τ; and
outputting the set of cluster intervals τ and the set of isolated points x; and using said set of cluster intervals and said set of isolated points to determine a reliability of said electrically connected network and generate messages indicative of the reliability of said electrically connected network.
11. The computer readable program storage device of claim 10, wherein the electrically connected network is an Internet-of-things network.
12. The computer readable program storage device of claim 10, wherein the electrically connected network is a network of satellites.
13. The computer readable program storage device of claim 10, wherein the electrically connected network is a direct link to a database.
14. The computer readable program storage device of claim 10, the method further comprising determining a first service quality measure that measures a reliability of the electrically connected network from
C t o ( f ) = 1 Δ t k τ k = { 1 Δ T 0 0 1 , 0 < Δ T < Δ t 0 Δ T Δ t
wherein Δt=tN−t1 is a total time interval, and f=log
[ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, wherein Ct o(f) is a fraction of time with no operation failures in the electrically connected network.
15. The computer readable program storage device of claim 10, the method further comprising determining a second service quality measure that is a normed number of clusters from
C t n ( f ) = 2 τ - δ 1 τ t = { 0 Δ T < 0 0 1 , 0 Δ T < Δ t 0 Δ T Δ t
wherein Δt=tN−t1 is a total time interval, f=log
[ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, |τ| is a number of clusters, and |t| is a number of time stamps.
16. The computer readable program storage device of claim 10, the method further comprising determining a third service quality measure that is a measure of a number of isolated events from
C t s ( f ) = x t = { 0 Δ T < 0 0 1 , 0 Δ T < Δ t 0 Δ T Δ t ,
wherein Δt=tN−t1 is a total time interval, f=log
[ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, |x| is a number of isolated points, and |t| is a number of time stamps.
17. The computer readable program storage device of claim 10, the method further comprising determining the set of opening interval bounds τ− and the set of closing interval bounds τ+ from

τ±={τk ± :k<k′⇒τ k ±k′ ± },k=1, . . . ,K≤N/2−1,
and determining cluster intervals as τk=[τk−1 k +], k=1, . . . , K.
18. The computer readable program storage device of claim 10, wherein, letting b represent the first binary array, b[0]=1 and b[N]=1.
19. The method of claim 4, further comprising using said set of cluster intervals to determine an availability of data on said database, when scanning said database is intractable.
20. The computer readable program storage device of claim 13, the method further comprising using said set of cluster intervals to determine an availability of data on said database, when scanning said database is intractable.
US15/720,779 2017-09-29 2017-09-29 Event clustering and event series characterization based on expected frequency Expired - Fee Related US10706080B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/720,779 US10706080B2 (en) 2017-09-29 2017-09-29 Event clustering and event series characterization based on expected frequency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/720,779 US10706080B2 (en) 2017-09-29 2017-09-29 Event clustering and event series characterization based on expected frequency

Publications (2)

Publication Number Publication Date
US20190102449A1 US20190102449A1 (en) 2019-04-04
US10706080B2 true US10706080B2 (en) 2020-07-07

Family

ID=65896688

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/720,779 Expired - Fee Related US10706080B2 (en) 2017-09-29 2017-09-29 Event clustering and event series characterization based on expected frequency

Country Status (1)

Country Link
US (1) US10706080B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10979137B2 (en) * 2019-08-01 2021-04-13 Planet Labs, Inc. Multi-pathway satellite communication systems and methods

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110085630A1 (en) * 2009-10-08 2011-04-14 Alexandre Gerber Tcp flow clock extraction
US20150169429A1 (en) * 2013-12-17 2015-06-18 International Business Machines Corporation Dynamic allocation of trace array timestamp data
US20170359398A1 (en) * 2016-06-13 2017-12-14 Microsoft Technology Licensing, Llc Efficient Sorting for a Stream Processing Engine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110085630A1 (en) * 2009-10-08 2011-04-14 Alexandre Gerber Tcp flow clock extraction
US20150169429A1 (en) * 2013-12-17 2015-06-18 International Business Machines Corporation Dynamic allocation of trace array timestamp data
US20170359398A1 (en) * 2016-06-13 2017-12-14 Microsoft Technology Licensing, Llc Efficient Sorting for a Stream Processing Engine

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Alexander Hinneburg, et al., "An Efficient Approach to Clustering in Large Multimedia Databases with Noise," American Associated for Artificial Intelligence, pp. 58-65, 1998.
Martin Ester, et al., "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Published in Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 1-6.
Mung Chiang, et al., "Fog and IoT: An Overview of Research Opportunities," IEEE Internet of Things Journal, vol. 3, No. 6, Dec. 2016, pp. 854-864.

Also Published As

Publication number Publication date
US20190102449A1 (en) 2019-04-04

Similar Documents

Publication Publication Date Title
US11488095B2 (en) Data delivery and validation in hybrid cloud environments
US9514387B2 (en) System and method of monitoring and measuring cluster performance hosted by an IAAS provider by means of outlier detection
CN106878064B (en) Data monitoring method and device
US11816586B2 (en) Event identification through machine learning
CA2980583C (en) Networking flow logs for multi-tenant environments
Picoreti et al. Multilevel observability in cloud orchestration
US10097433B2 (en) Dynamic configuration of entity polling using network topology and entity status
US11294748B2 (en) Identification of constituent events in an event storm in operations management
US20100271956A1 (en) System and Method for Identifying and Managing Service Disruptions Using Network and Systems Data
US12101235B2 (en) Service level objective platform
US20120143616A1 (en) System for and method of transaction management
CN107241242B (en) A data processing method and device
CN109992473A (en) Monitoring method, device, equipment and the storage medium of application system
CN107704387B (en) Method, device, electronic equipment and computer readable medium for system early warning
US10893015B2 (en) Priority topic messaging
US8799460B2 (en) Method and system of providing a summary of web application performance monitoring
CN110727563A (en) Cloud service alarm method and device for preset customer
US10706080B2 (en) Event clustering and event series characterization based on expected frequency
US10616306B2 (en) System and method for large-scale capture and tracking of web-based application parameters
CN115047817A (en) Monitoring system and method for private cloud
US10296967B1 (en) System, method, and computer program for aggregating fallouts in an ordering system
US20240422081A1 (en) Network Analytics System with Data Loss Detection
CN113364602B (en) Method, device and storage medium for triggering page fault alarm
CN117194269A (en) Detection methods, devices, electronic equipment and computer media
US10282775B1 (en) System, method, and computer program for automatically remediating fallouts in an ordering system of a consumer telecommunications network

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALBRECHT, CONRAD M.;FREITAG, MARCUS;VAN KESSEL, THEODORE;AND OTHERS;SIGNING DATES FROM 20170928 TO 20170929;REEL/FRAME:044071/0235

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALBRECHT, CONRAD M.;FREITAG, MARCUS;VAN KESSEL, THEODORE;AND OTHERS;SIGNING DATES FROM 20170928 TO 20170929;REEL/FRAME:044071/0235

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20240707