US10706080B2 - Event clustering and event series characterization based on expected frequency - Google Patents
Event clustering and event series characterization based on expected frequency Download PDFInfo
- Publication number
- US10706080B2 US10706080B2 US15/720,779 US201715720779A US10706080B2 US 10706080 B2 US10706080 B2 US 10706080B2 US 201715720779 A US201715720779 A US 201715720779A US 10706080 B2 US10706080 B2 US 10706080B2
- Authority
- US
- United States
- Prior art keywords
- determining
- electrically connected
- time
- intervals
- connected network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
- G06F16/287—Visualization; Browsing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/12—Simultaneous equations, e.g. systems of linear equations
-
- G06K9/6218—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2308—Concurrency control
- G06F16/2336—Pessimistic concurrency control approaches, e.g. locking or multiple versions without time stamps
Definitions
- Embodiments of the present disclosure are directed to a one-dimensional, one-parameter clustering method with linear complexity on input that quantifies data availability or measures the reliability/stability of a device connecting network, such as IoT.
- IoT Internet of Things
- a network attached device capturing weather information to be broadcast to other devices for processing.
- transmitter that frequently sends out such information every time interval ⁇ T
- receiver that keeps track of the timestamps when data was transmitted/recorded
- the one-dimensional data t is clustered such that consecutive events are not more than ⁇ T apart, periods in time where data might be missing can be inferred.
- corresponding actions such as retransmission, data interpolation, etc.
- the characteristics of intervals of no data e.g., relative frequency, duration, etc., can help diagnose the status of the communication network.
- Exemplary embodiments of the present disclosure are directed to event clustering for application in IoT service quality characterization using a computer-implemented method with time complexity O(N).
- Embodiments exploit the natural ordering of a sequence of timestamps recorded by a receiver device to minimize computational complexity by a factor of O(log N).
- an O(N)-efficient clustering procedure of N events represented by an ordered series of timestamps t 1 , t 2 , . . . , t N can identify time intervals of missing data, locate isolated events, and characterize a series of events by varying ⁇ T to determine the quality of service over an IoT network.
- a computer-implemented method for clustering time stamps in time series data including the steps of: receiving, from an electrically connected network, a one-dimensional array t of ordered timestamps and an expected frequency dT, wherein N is a number of timestamps in the one-dimensional array t; determining a set of time intervals, wherein a time interval is determined between each pair of adjacent timestamps in the one-dimensional array t of ordered timestamps; determining a first binary array that indicates whether each time interval in the set of time intervals is greater than or less than the expected frequency dT; determining a second binary array, wherein each element of the second binary array is a difference between a corresponding pair of adjacent elements of the first binary array; appending an ith timestamp to a set of opening interval bounds ⁇ if a corresponding second binary array is equal to ⁇ 1; appending an ith timestamp to a set of closing interval bounds ⁇ + if
- the electrically connected network is an Internet-of-things network.
- the electrically connected network is a network of satellites.
- the electrically connected network is a direct link to a database.
- the method includes determining a first service quality measure that measures a reliability of the electrically connected network from
- f log [ ⁇ T ⁇ 1 /( ⁇ t/
- ) ⁇ 1 ] ⁇ log ⁇ T
- C t o (f) is a fraction of time with no operation failures in the electrically connected network.
- the method includes determining a second service quality measure that is a normed number of clusters from
- the method includes determining a third service quality measure that is a measure of a number of isolated events from
- a non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for clustering time stamps in time series data.
- FIG. 1 is a block diagram of an exemplary system for implementation a method clustering a one-dimensional (1-D) time series of events, according to an embodiment of the disclosure.
- FIG. 2 is a sample plot of IoT service quality measures from event time series obtained by varying ⁇ T, according to an embodiment of the disclosure.
- FIG. 3 is a flowchart of a method of clustering a 1-D time series of events, according to an embodiment of the disclosure.
- FIG. 4 is a schematic of an exemplary cloud computing node that implements an embodiment of the disclosure.
- FIG. 5 shows an exemplary cloud computing environment according to embodiments of the disclosure.
- Exemplary embodiments of the disclosure as described herein generally provide systems and methods for clustering and characterizing signal events on the IoT. While embodiments are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure.
- Embodiments of the disclosure provide a method to cluster timestamps based on an expected frequency of a service, e.g., an IoT domain, satellite images, etc., and output information that can be used to characterize the quality of a service, and thus rate reliability/stability of the service.
- a service e.g., an IoT domain, satellite images, etc.
- Further embodiments of the disclosure provide methods for randomly probing and clustering time series of point data satellite imagery stored in a database given the latitude and longitude coordinates, and to efficiently estimate the availability of geo-spatial data, given the constraint that it is practically intractable to scan the whole database.
- ⁇ T might be an external parameter to an algorithm that provides a solution or it can be defined by t itself, e.g. through
- ⁇ ⁇ ⁇ ⁇ t ⁇ 1 N ⁇ ⁇ i ⁇ ⁇ ⁇ ⁇ t i .
- the ⁇ t i need to be computed and therefore ⁇ t is efficiently determined along these lines.
- is equal or not equal to K ⁇
- N ⁇ 2 ⁇
- N, boundary values ⁇ ⁇ need to be manually added.
- K ⁇ N/2.
- FIG. 1 is a block diagram of an exemplary system that implements a method for clustering a 1-D time series of events, according to an embodiment of the disclosure.
- an information service 11 such as an IoT device or a geo-spatial database, transmits a time series 12 of information packets through a network, such as, e.g., the Internet which employs physical networks such as e.g. a WiFi network, the Ethernet, etc., to an event cluster engine 3 .
- the event cluster engine 13 includes a frequency detector 13 . 5 , that can receive an event frequency ⁇ T that was manually input by a user or automatically determined from an average of the event intervals in the data.
- the event cluster engine 13 can store the time series data in physical storage 14 , such as memory or a hard drive, etc.
- the time series data includes cluster intervals demarcated by respective start and end boundaries ⁇ n ⁇ and ⁇ n + , ⁇ n+1 ⁇ and ⁇ n+1 + , etc., and isolated points t m , t m+1 , t m+2 , etc.
- a user can interact with the event cluster engine 13 via a user interface 15 , such as a RESTful API service, to, e.g., generate output, 18 , such as a JSON text file.
- the event cluster engine 13 is also connected to a cluster measure engine 16 , that can calculate various cluster measures, as described below.
- the cluster measures can be sent to a service monitor 17 , such as, e.g., Ganglia (http://ganglia.info/) or Nagios (https://www.nagios.com/), that can determine the reliability or stability of the information service 11 based on the cluster measures, and generate suitable warning messages 19 , such as email or Apache Kafka (https://kafka.apache.org/) messages, etc.
- a service monitor 17 such as, e.g., Ganglia (http://ganglia.info/) or Nagios (https://www.nagios.com/)
- suitable warning messages 19 such as email or Apache Kafka (https://kafka.apache.org/) messages, etc.
- the following listing provides a pseudo-code implementation according to an embodiment of a method of clustering a 1-D time series of events, with reference to steps numbers of the flowchart of FIG. 3 .
- t [ ⁇ 20, ⁇ 18, 1, 2, 2.9, 10, 11, 100, 200, 202, 202, 203]
- bracketed quantities on the left are the cluster intervals ⁇
- bracketed quantities on the right are the isolated points x.
- Square brackets with comma separated elements denote list data structures.
- r is empty, as indicated by the [ ], and thus each time stamp is classified as an isolated point.
- the above pseudo-code listing can be implemented in any suitable computer language, such as Python or C/C++.
- An efficient implementation in C/C++ uses N ⁇ 1 algebraic operations for ⁇ t, N ⁇ 2 logical operations for b and again N algebraic operations for B which determines the interval boundary classification with a total of 3(N ⁇ 1) operations.
- a naive approach would compute two time intervals for each t i , and perform two logical operations of those against ⁇ T, which would determine the classification, hence 4N computations.
- a sample JSON load for a query to the user interface 15 of FIG. 1 in the case of a data availability service of geo-spatial satellite images in a database such as IBM PAIRS (https://pairs.res.ibm.com) may be:
- “layers available” is a list of existing layers with their temporal data coverage for an area of interest specified by “aoi”;
- time intervals is a collection of time intervals [start, end] in Unix epoch time, which is ⁇ k from the One-dimensional Clustering section above;
- “avail-flag” indicates whether data is found for a specified time interval—0 indicates data is found
- “datalayer-ID” is a database datalayer ID associated with the coverage information
- number-timestamps is the number of distinct timestamps discovered on probing “aoi”
- timestamps is the list of isolated timestamps in Unix epoch time, as described in the Isolated Points section above;
- data-frequency is the expected temporal interval ⁇ T in seconds in which a layer's data is available
- “aoi” refers to a spatial area of interest for which to check for data coverage
- “type” is the type of an aoi, which needs to be a simple polygon for this example;
- Coordinates are the longitude and latitude of the aoi polygon
- datalayer-IDs is a list of datalayer IDs to check data coverage for
- data-frequencies is a list of time periods ⁇ T of data in seconds on which data availability intervals are computed
- start-UTC-time is the data coverage start UTC time in the format of [year, month, day, hours, minutes, seconds] where hours range from 0 to 23;
- end-UTC-time is the data coverage end UTC time in the “start-UTC-time” format.
- a method according to an embodiment can determine whether there is any data at all.
- a method according to an embodiment randomly probes a time series at various locations and merges the timestamps of all these probes together. Then a cluster_events method such as that detailed above is applied to the time series.
- scanning C t o by varying f provides a characteristic that quantifies the reliability of, e.g., an IoT service. It can be shown that C t 0 is monotonically decreases as f increases. This is because for larger ⁇ T, more clusters ⁇ cover the whole time series. Due to EQ. (2), clusters never shrink in size for increasing ⁇ T; they either grow or merge to bigger clusters, letting the overall coverage increase.
- similar information could be obtained by simply checking a histogram n( ⁇ t), cf. EQ. (3), that counts the number of intervals ⁇ t i for some binning interval. In the case above, there would be a single peak in n( ⁇ t). Note that C t o contains information similar to
- a clustering output ( ⁇ , x) provides information that n( ⁇ t) is blind to, because n( ⁇ t) does not account for the ordering of the ⁇ t i .
- ⁇ 0, 1 ⁇ to result in C t n 0.
- C t n While C t 0 just quantifies the total coverage of t by the clusters, C t n provides insight whether the coverage is established by a number of patches or a single/a few intervals with data frequency of at least ⁇ T ⁇ 1. This way conclusions may be drawn on, e.g., the reliability of an IoT service. Ideally, C t n ⁇ 1.
- FIG. 2 illustrates these applications by plotting C t o,n,s for an event series t generated from 10 4 uniformly random samples drawn from [0, 1] joined by 10 3 equi-distant samples in [1, 10].
- FIG. 2 is a sample plot of IoT service quality measures from event series t: C t 0 21, C t n 22, and C t s 23 by varying ⁇ T.
- the series t comprises a burst of random events that covers approximately 10% of the total time range ⁇ t. During the rest of the time the events are periodic at a rate of about 1/50 ⁇ t.
- an embodiment of the present disclosure can be implemented in various forms of hardware, software, firmware, special purpose processes, or a combination thereof.
- an embodiment of the present disclosure can be implemented in software as an application program tangible embodied on a computer readable program storage device.
- the application program can be uploaded to, and executed by, a machine comprising any suitable architecture.
- this disclosure includes a detailed description on cloud computing, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present disclosure are capable of being implemented in conjunction with any other type of computing environment now known or later developed.
- An automatic troubleshooting system according to an embodiment of the disclosure is also suitable for a cloud implementation.
- Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service.
- This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
- On-demand self-service a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
- Resource pooling the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
- Rapid elasticity capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
- Measured service cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service.
- level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts).
- SaaS Software as a Service: the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure.
- the applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email).
- a web browser e.g., web-based email.
- the consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
- PaaS Platform as a Service
- the consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
- IaaS Infrastructure as a Service
- the consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
- Private cloud the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
- Public cloud the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
- Hybrid cloud the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
- a cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability.
- An infrastructure comprising a network of interconnected nodes.
- Cloud computing node 410 is only one example of a suitable cloud computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the disclosure described herein. Regardless, cloud computing node 410 is capable of being implemented and/or performing any of the functionality set forth herein above.
- cloud computing node 410 there is a computer system/server 412 , which is operational with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 412 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
- Computer system/server 412 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system.
- program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.
- Computer system/server 412 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote computer system storage media including memory storage devices.
- computer system/server 412 in cloud computing node 410 is shown in the form of a general-purpose computing device.
- the components of computer system/server 412 may include, but are not limited to, one or more processors or processing units 416 , a system memory 428 , and a bus 418 that couples various system components including system memory 428 to processor 416 .
- Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
- Computer system/server 412 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 412 , and it includes both volatile and non-volatile media, removable and non-removable media.
- System memory 428 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 430 and/or cache memory 432 .
- Computer system/server 412 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- storage system 434 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”).
- a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”)
- an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media
- each can be connected to bus 418 by one or more data media interfaces.
- memory 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
- Program/utility 440 having a set (at least one) of program modules 442 , may be stored in memory 428 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment.
- Program modules 442 generally carry out the functions and/or methodologies of embodiments of the disclosure as described herein.
- Computer system/server 412 may also communicate with one or more external devices 414 such as a keyboard, a pointing device, a display 424 , etc.; one or more devices that enable a user to interact with computer system/server 412 ; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 412 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 422 . Still yet, computer system/server 412 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 420 .
- LAN local area network
- WAN wide area network
- public network e.g., the Internet
- network adapter 420 communicates with the other components of computer system/server 412 via bus 418 .
- bus 418 It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 412 . Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
- cloud computing environment 50 comprises one or more cloud computing nodes 400 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone 54 A, desktop computer 54 B, laptop computer 54 C, and/or automobile computer system 54 N may communicate.
- Nodes 400 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof.
- This allows cloud computing environment 50 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device.
- computing devices 54 A-N shown in FIG. 5 are intended to be illustrative only and that computing nodes 900 and cloud computing environment 50 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
wherein Δt=tN−t1 is a total time interval, and f=log [ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, wherein Ct o(f) is a fraction of time with no operation failures in the electrically connected network.
wherein Δt=tN−t1 is a total time interval, f=log [ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, |τ| is a number of clusters, and |t| is a number of time stamps.
wherein Δt=tN−t1 is a total time interval, f=log [ΔT−1/(Δt/|t|)−1]=−log ΔT|t|/Δt, |x| is a number of isolated points, and |t| is a number of time stamps.
τ±={τk ± :k<k′⇒τ k ±<τk′ ± },k=1, . . . ,K≤N/2−1,
and determining cluster intervals as τk=[τk−1 −,τk +], k=1, . . . , K.
i≤j⇒t i ≤t j (1)
and an expected time interval ΔT, time intervals τk are defined such that
t i ,t i+1∈τk=[τk −,τk +]⇒δt i =t i+1 −t i ≤ΔT. (2)
Note that ΔT might be an external parameter to an algorithm that provides a solution or it can be defined by t itself, e.g. through
δt{δt i =t i+1 −t i },i=1, . . . ,N−1. (3)
b={b i=int(δt i >ΔT)},i=1, . . . ,N−1 (4)
switches from 1 to 0 for an opening interval bound τ−, and from 0 to 1 for a closing interval bound τ+, only. The function int( ) has values int(True)→1 and int(False)→0. Hence the quantity
B={B i =b i −b i−1 },i=2, . . . ,N−1 with B i∈{−1,0,1} (5)
yields the desired association
B i=±1⇒t i∈τ±, (6)
Based on the property that the ti are ordered, as expressed by EQ. (1), then the binary (discrete) function bi implies the alternating property:
∀i<j:1=B i =B i ⇒∃i<l<j:B i=−1. (7)
Thus, linearly scanning through the ti and their corresponding Bi results in two sets
τ±{τk ± :k<k′⇒τ k ±<τk′ ± },k=1, . . . ,K≤N/2−1 (8)
That can be interleaved to obtain the corresponding time intervals as a solution, EQ. (2).
Boundary Conditions
τk=[τk−1 −,τk +]. (9)
A corresponding issue might occur at the end of the time series {ti}, depending on whether K+=|τ+| is equal or not equal to K−=|τ−|. Note that by virtue of EQ. (7) the difference |K+−K−| is at most 1. However, due to the fact that |B|=N−2≠|t|=N, boundary values τ± need to be manually added.
t 0=−∞ and t N+1=+∞. (10)
Hence, δt0 =δt N=+∞, and therefore
b 0 =b N=1 (11)
which yields N Bi that corresponding to N ti for classification such that
τ={τk=[τk −,τk +],k=1, . . . ,K (12)
from
τ±={τk ± :k<k′⇒τ k ±<τk′ ± },k=1, . . . ,K (13)
with |τ|=K≤N/2.
Isolated Points
can be formed that are associated with time intervals of failure. Note, that [t1, tN]=τ∪
∀i,k:t i∉
i.e., informally, it is not true that no event occurs during the intervals f, but is true that
τi∈
where ti is referred to as an isolated event. These timestamps can be considered to be noise, while all τi∉τ± are border points.
B i=0∧b i=1⇒t i ∈x, (17)
where x denotes the set of isolated timestamps. Likewise, clustered timestamps can be defined as
B i=0∧b i=0⇒t i ∈
Since b is binary and EQ. (5) holds for B, all ti are uniquely classified, i.e. t=τ+∪τ−∪×∪
Implementation & Computational Complexity
| algorithm cluster_events is |
| input: one-dimensional array t of ordered timestamps to cluster, |
| expected frequency dT |
| output: set of cluster intervals tau, list of isolated timestamps x |
| define ordered list tauMinus, tanPlus, x |
| define set tau |
| define array dt, b, B |
| N = length of t | // Step 30 |
| for i from 1 to N−1 do dt[i] = t[i]−t[i−1] ] | // Step 31 |
| b[0] = 1 | // Step 32 |
| for i from 1 to N−1 do |
| if dt[i] > dT then | b[i] = 1 |
| else | b[i] = 0 |
| b[N] = 1 |
| for i from 0 to N−1 do B[i] = b[i+1]−b[i] | // Step 33 |
| for i from 0 to N−1 do | // Step 34 |
| if B[i] == −1 then | append t[i] to tauMinus |
| else if B[i] == 1 then | append t[i] to tauPlus |
| else if b[i+1] == 1 then | append t[i] to x |
| for i from 0 to length of tauMinus | // Step 35 |
| add interval [tauMinus[i], tauPlus[i]] to tau | |
| return tau, x | // Step 36 |
| { “query”: { |
| “datalayer-IDs”: [ 23, 15100, 3], |
| “data-frequencies”: [ 3800, 90000, 2800000], |
| “start-UTC-time”: [ 2016, 10, 1], |
| “aoi”: { “type”: “Polygon”, |
| “coordinates”: [[[−73.9, 40.5], [ −73.8, 40.55], |
| [ −73.85, 40.6], [ −73.9, 40.5]]] }, |
| “explore”: false, |
| “end-UTC-time”: [ 2016, 12, 31, 23, 59, 59 ] |
| } } |
| and a sample JSON text file response, output format 18 of FIG. 1, might |
| look like, e.g., |
| { “layers-available”: [{ |
| “time-intervals”: [[ 1483001560, 1490001110], |
| [1653006000, 1680282350 ]], |
| “avail-flag”: 0, |
| “datalayer-ID”: 1234, |
| “num-timestamps”: 346, |
| “timestamps”: [ 893001560, 1093001560, 1260340121], |
| “data-frequency”: 31536000 }], |
| “aoi”: {“type”: “Polygon”, |
| “coordinates”: [[[ −73.9, 40.5], [ −73.8, 40.55], |
| [ −73.85, 40.6], [−73.9, 40.5]]] } |
| } |
In the above JSON listings:
where
f=log [ΔT −1/(Δt/|t|)−1]=−log ΔT|t|/Δt (20)
computes the fraction of time with no operation failures. AT is fixed by the expected, logarithmic, and normalized event frequency f, i.e. f=0 represents the scale of frequency where all timestamps are equally spaced within the time series interval, f>0 corresponds to smaller scales, f<0 to larger ones.
provides a normed measure of the number of clusters, since the Kronecker delta δij is 1 for i=j, 0 otherwise. It forces |τ|∈{0, 1} to result in Ct n=0. While Ct 0 just quantifies the total coverage of t by the clusters, Ct n provides insight whether the coverage is established by a number of patches or a single/a few intervals with data frequency of at least ΔT−1. This way conclusions may be drawn on, e.g., the reliability of an IoT service. Ideally, Ct n<<1.
as an additional indicator of reliability, since they are orthogonal to the information contained in τ. Isolated events can be classified as indicators of loose IoT service quality and thus should stay close to zero until it quickly increases to one for some f>0.
Claims (20)
τ±={τk ± :k<k′⇒τ k ±<τk′ ± },k=1, . . . ,K≤N/2−1,
τ±={τk ± :k<k′⇒τ k ±<τk′ ± },k=1, . . . ,K≤N/2−1,
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/720,779 US10706080B2 (en) | 2017-09-29 | 2017-09-29 | Event clustering and event series characterization based on expected frequency |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/720,779 US10706080B2 (en) | 2017-09-29 | 2017-09-29 | Event clustering and event series characterization based on expected frequency |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20190102449A1 US20190102449A1 (en) | 2019-04-04 |
| US10706080B2 true US10706080B2 (en) | 2020-07-07 |
Family
ID=65896688
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/720,779 Expired - Fee Related US10706080B2 (en) | 2017-09-29 | 2017-09-29 | Event clustering and event series characterization based on expected frequency |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US10706080B2 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10979137B2 (en) * | 2019-08-01 | 2021-04-13 | Planet Labs, Inc. | Multi-pathway satellite communication systems and methods |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110085630A1 (en) * | 2009-10-08 | 2011-04-14 | Alexandre Gerber | Tcp flow clock extraction |
| US20150169429A1 (en) * | 2013-12-17 | 2015-06-18 | International Business Machines Corporation | Dynamic allocation of trace array timestamp data |
| US20170359398A1 (en) * | 2016-06-13 | 2017-12-14 | Microsoft Technology Licensing, Llc | Efficient Sorting for a Stream Processing Engine |
-
2017
- 2017-09-29 US US15/720,779 patent/US10706080B2/en not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110085630A1 (en) * | 2009-10-08 | 2011-04-14 | Alexandre Gerber | Tcp flow clock extraction |
| US20150169429A1 (en) * | 2013-12-17 | 2015-06-18 | International Business Machines Corporation | Dynamic allocation of trace array timestamp data |
| US20170359398A1 (en) * | 2016-06-13 | 2017-12-14 | Microsoft Technology Licensing, Llc | Efficient Sorting for a Stream Processing Engine |
Non-Patent Citations (3)
| Title |
|---|
| Alexander Hinneburg, et al., "An Efficient Approach to Clustering in Large Multimedia Databases with Noise," American Associated for Artificial Intelligence, pp. 58-65, 1998. |
| Martin Ester, et al., "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Published in Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 1-6. |
| Mung Chiang, et al., "Fog and IoT: An Overview of Research Opportunities," IEEE Internet of Things Journal, vol. 3, No. 6, Dec. 2016, pp. 854-864. |
Also Published As
| Publication number | Publication date |
|---|---|
| US20190102449A1 (en) | 2019-04-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11488095B2 (en) | Data delivery and validation in hybrid cloud environments | |
| US9514387B2 (en) | System and method of monitoring and measuring cluster performance hosted by an IAAS provider by means of outlier detection | |
| CN106878064B (en) | Data monitoring method and device | |
| US11816586B2 (en) | Event identification through machine learning | |
| CA2980583C (en) | Networking flow logs for multi-tenant environments | |
| Picoreti et al. | Multilevel observability in cloud orchestration | |
| US10097433B2 (en) | Dynamic configuration of entity polling using network topology and entity status | |
| US11294748B2 (en) | Identification of constituent events in an event storm in operations management | |
| US20100271956A1 (en) | System and Method for Identifying and Managing Service Disruptions Using Network and Systems Data | |
| US12101235B2 (en) | Service level objective platform | |
| US20120143616A1 (en) | System for and method of transaction management | |
| CN107241242B (en) | A data processing method and device | |
| CN109992473A (en) | Monitoring method, device, equipment and the storage medium of application system | |
| CN107704387B (en) | Method, device, electronic equipment and computer readable medium for system early warning | |
| US10893015B2 (en) | Priority topic messaging | |
| US8799460B2 (en) | Method and system of providing a summary of web application performance monitoring | |
| CN110727563A (en) | Cloud service alarm method and device for preset customer | |
| US10706080B2 (en) | Event clustering and event series characterization based on expected frequency | |
| US10616306B2 (en) | System and method for large-scale capture and tracking of web-based application parameters | |
| CN115047817A (en) | Monitoring system and method for private cloud | |
| US10296967B1 (en) | System, method, and computer program for aggregating fallouts in an ordering system | |
| US20240422081A1 (en) | Network Analytics System with Data Loss Detection | |
| CN113364602B (en) | Method, device and storage medium for triggering page fault alarm | |
| CN117194269A (en) | Detection methods, devices, electronic equipment and computer media | |
| US10282775B1 (en) | System, method, and computer program for automatically remediating fallouts in an ordering system of a consumer telecommunications network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALBRECHT, CONRAD M.;FREITAG, MARCUS;VAN KESSEL, THEODORE;AND OTHERS;SIGNING DATES FROM 20170928 TO 20170929;REEL/FRAME:044071/0235 Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALBRECHT, CONRAD M.;FREITAG, MARCUS;VAN KESSEL, THEODORE;AND OTHERS;SIGNING DATES FROM 20170928 TO 20170929;REEL/FRAME:044071/0235 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20240707 |